All of lore.kernel.org
 help / color / mirror / Atom feed
* Low network throughput on i.MX28
@ 2016-10-12 23:09 Jörg Krause
  2016-10-13  6:48 ` Lothar Waßmann
  0 siblings, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-10-12 23:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I am using a custom i.MX28 board similar to the i.MX28-EVK. For Wi-Fi
the board assembles a BCM43362 from Broadcom and for Ethernet a
LAN8720A from Microchip. The board is running mainline Linux 4.7.

While both, wireless and wired network interfaces work, the TCP
throughput measured with iperf is low.

The bandwith for Ethernet is between 20-30 MBits/sec and for WLAN is
about 4-5 MBits/sec.

There exists an Application Note "i.MX28 Ethernet Performance on
Linux" [1] which shows a bandwith of > 60 MBits/sec. A user an the NXP
forum [2] told he achieved 20 MBits/sec with some Qualcom chip.

Note, that these values are most probably measured with the legacy
Linux Kernel 2.6.35 from NXP.

Does anybody has done throughput tests on i.MX28 with mainline Kernel?
If so, what are the results? What might be the bottleneck?

[1] http://cache.freescale.com/files/32bit/doc/app_note/AN4544.pdf
[2] https://community.nxp.com/thread/353921

Best regards
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-12 23:09 Low network throughput on i.MX28 Jörg Krause
@ 2016-10-13  6:48 ` Lothar Waßmann
  2016-10-13 19:43   ` Jörg Krause
  0 siblings, 1 reply; 31+ messages in thread
From: Lothar Waßmann @ 2016-10-13  6:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, 13 Oct 2016 01:09:13 +0200 J?rg Krause wrote:
> Hi,
> 
> I am using a custom i.MX28 board similar to the i.MX28-EVK. For Wi-Fi
> the board assembles a BCM43362 from Broadcom and for Ethernet a
> LAN8720A from Microchip. The board is running mainline Linux 4.7.
> 
> While both, wireless and wired network interfaces work, the TCP
> throughput measured with iperf is low.
> 
> The bandwith for Ethernet is between 20-30 MBits/sec and for WLAN is
> about 4-5 MBits/sec.
> 
> There exists an Application Note "i.MX28 Ethernet Performance on
> Linux" [1] which shows a bandwith of > 60 MBits/sec. A user an the NXP
> forum [2] told he achieved 20 MBits/sec with some Qualcom chip.
> 
> Note, that these values are most probably measured with the legacy
> Linux Kernel 2.6.35 from NXP.
> 
> Does anybody has done throughput tests on i.MX28 with mainline Kernel?
> If so, what are the results? What might be the bottleneck?
>

This is the iperf output on a TX28 with current mainline kernel
(4.8.0-rc5):
------------------------------------------------------------
Client connecting to 192.168.100.1, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  57.5 MBytes  48.2 Mbits/sec

You might check your kernel DEBUG configs (especially
CONFIG_DEBUG_PAGEALLOC).


Lothar Wa?mann

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-13  6:48 ` Lothar Waßmann
@ 2016-10-13 19:43   ` Jörg Krause
  2016-10-13 20:42     ` Uwe Kleine-König
  2016-10-14  6:13     ` Lothar Waßmann
  0 siblings, 2 replies; 31+ messages in thread
From: Jörg Krause @ 2016-10-13 19:43 UTC (permalink / raw)
  To: linux-arm-kernel



Hi Lothar,

Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KARO-electronics.de>:
>Hi,
>
>On Thu, 13 Oct 2016 01:09:13 +0200 J?rg Krause wrote:
>> Hi,
>> 
>> I am using a custom i.MX28 board similar to the i.MX28-EVK. For Wi-Fi
>> the board assembles a BCM43362 from Broadcom and for Ethernet a
>> LAN8720A from Microchip. The board is running mainline Linux 4.7.
>> 
>> While both, wireless and wired network interfaces work, the TCP
>> throughput measured with iperf is low.
>> 
>> The bandwith for Ethernet is between 20-30 MBits/sec and for WLAN is
>> about 4-5 MBits/sec.
>> 
>> There exists an Application Note "i.MX28 Ethernet Performance on
>> Linux" [1] which shows a bandwith of > 60 MBits/sec. A user an the
>NXP
>> forum [2] told he achieved 20 MBits/sec with some Qualcom chip.
>> 
>> Note, that these values are most probably measured with the legacy
>> Linux Kernel 2.6.35 from NXP.
>> 
>> Does anybody has done throughput tests on i.MX28 with mainline
>Kernel?
>> If so, what are the results? What might be the bottleneck?
>>
>
>This is the iperf output on a TX28 with current mainline kernel
>(4.8.0-rc5):
>------------------------------------------------------------
>Client connecting to 192.168.100.1, TCP port 5001
>TCP window size: 43.8 KByte (default)
>------------------------------------------------------------
>[  3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port
>5001
>[ ID] Interval       Transfer     Bandwidth
>[  3]  0.0-10.0 sec  57.5 MBytes  48.2 Mbits/sec
>
>You might check your kernel DEBUG configs (especially
>CONFIG_DEBUG_PAGEALLOC).

Thanks for sharing the iperf output. What LAN transceiver does the TX28 has assembled?

I checked the config and is has no DEBUG_PAGEALLOC enabled and no DEBUG options related to network.

Best regards
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-13 19:43   ` Jörg Krause
@ 2016-10-13 20:42     ` Uwe Kleine-König
  2016-10-14  6:13     ` Lothar Waßmann
  1 sibling, 0 replies; 31+ messages in thread
From: Uwe Kleine-König @ 2016-10-13 20:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Thu, Oct 13, 2016 at 09:43:00PM +0200, J?rg Krause wrote:
> Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KARO-electronics.de>:
> >This is the iperf output on a TX28 with current mainline kernel
> >(4.8.0-rc5):
> >------------------------------------------------------------
> >Client connecting to 192.168.100.1, TCP port 5001
> >TCP window size: 43.8 KByte (default)
> >------------------------------------------------------------
> >[  3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port
> >5001
> >[ ID] Interval       Transfer     Bandwidth
> >[  3]  0.0-10.0 sec  57.5 MBytes  48.2 Mbits/sec

Just for the record: I have another i.MX28 system here and got 43
Mbits/sec with PREEMPT_NONE and 37 Mbits/sec with PREEMPT_RT both using
a 3.14 kernel.

> >You might check your kernel DEBUG configs (especially
> >CONFIG_DEBUG_PAGEALLOC).
> 
> Thanks for sharing the iperf output. What LAN transceiver does the
> TX28 has assembled?

My system has a Marvell Switch (88e6083) as "transceiver".

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-13 19:43   ` Jörg Krause
  2016-10-13 20:42     ` Uwe Kleine-König
@ 2016-10-14  6:13     ` Lothar Waßmann
  2016-10-15  8:46       ` Jörg Krause
  1 sibling, 1 reply; 31+ messages in thread
From: Lothar Waßmann @ 2016-10-14  6:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, 13 Oct 2016 21:43:00 +0200 J?rg Krause wrote:
> Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KARO-electronics.de>:
> >On Thu, 13 Oct 2016 01:09:13 +0200 J?rg Krause wrote:
[...]
> >This is the iperf output on a TX28 with current mainline kernel
> >(4.8.0-rc5):
> >------------------------------------------------------------
> >Client connecting to 192.168.100.1, TCP port 5001
> >TCP window size: 43.8 KByte (default)
> >------------------------------------------------------------
> >[  3] local 192.168.100.56 port 60325 connected with 192.168.100.1 port
> >5001
> >[ ID] Interval       Transfer     Bandwidth
> >[  3]  0.0-10.0 sec  57.5 MBytes  48.2 Mbits/sec
> >
> >You might check your kernel DEBUG configs (especially
> >CONFIG_DEBUG_PAGEALLOC).
> 
> Thanks for sharing the iperf output. What LAN transceiver does the TX28 has assembled?
> 
The ethernet PHY is an SMSC LAN8710A.


Lothar Wa?mann

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-14  6:13     ` Lothar Waßmann
@ 2016-10-15  8:46       ` Jörg Krause
  2016-10-15  8:59         ` Stefan Wahren
  0 siblings, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-10-15  8:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2016-10-14 at 08:13 +0200, Lothar Wa?mann wrote:
> Hi,
> 
> On Thu, 13 Oct 2016 21:43:00 +0200 J?rg Krause wrote:
> > Am 13. Oktober 2016 08:48:07 MESZ, schrieb "Lothar Wa?mann" <LW@KAR
> > O-electronics.de>:
> > > On Thu, 13 Oct 2016 01:09:13 +0200 J?rg Krause wrote:
> 
> [...]
> > > This is the iperf output on a TX28 with current mainline kernel
> > > (4.8.0-rc5):
> > > ------------------------------------------------------------
> > > Client connecting to 192.168.100.1, TCP port 5001
> > > TCP window size: 43.8 KByte (default)
> > > ------------------------------------------------------------
> > > [??3] local 192.168.100.56 port 60325 connected with
> > > 192.168.100.1 port
> > > 5001
> > > [ ID] Interval???????Transfer?????Bandwidth
> > > [??3]??0.0-10.0 sec??57.5 MBytes??48.2 Mbits/sec
> > > 
> > > You might check your kernel DEBUG configs (especially
> > > CONFIG_DEBUG_PAGEALLOC).
> > 
> > Thanks for sharing the iperf output. What LAN transceiver does the
> > TX28 has assembled?
> > 
> 
> The ethernet PHY is an SMSC LAN8710A.

Thanks!


For the record:

Note, this is the result for the wireless interface.

I got one of my custom boards running the legacy Linux Kernel 2.6.35
officially supported from Freescale (NXP) and the bcmdhd driver from
the Wiced project, where I get >20Mbps TCP throughput. The firmware
version reported is:

# wl ver
5.90 RC115.2
wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-bc2d0891


I got it also running with the Linux Kernel 4.1.15 from Freescale [2],
which is not officially supported for the i.MX28 target, with the
latest bcmdhd version where I get <7Mbps TCP throughput (which is much
the same I get with the brcmfmac driver). The firmware version reported
is:

# wl ver
1.107 RC5.0
wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0

So, probably something is missing in the newer Kernel version, which is
present in the legacy Kernel 2.6.35.

[1] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
imx_2.6.35_1.1.0
[2] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
imx_4.1.15_1.0.0_ga

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-15  8:46       ` Jörg Krause
@ 2016-10-15  8:59         ` Stefan Wahren
  2016-10-15  9:41           ` Jörg Krause
  2016-10-15 11:18           ` Jörg Krause
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan Wahren @ 2016-10-15  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi J?rg,

> J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016 um 10:46
> geschrieben:
> 
> 
> Thanks!
> 
> 
> For the record:
> 
> Note, this is the result for the wireless interface.
> 
> I got one of my custom boards running the legacy Linux Kernel 2.6.35
> officially supported from Freescale (NXP) and the bcmdhd driver from
> the Wiced project, where I get >20Mbps TCP throughput. The firmware
> version reported is:
> 
> # wl ver
> 5.90 RC115.2
> wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-bc2d0891
> 
> 
> I got it also running with the Linux Kernel 4.1.15 from Freescale [2],
> which is not officially supported for the i.MX28 target, with the
> latest bcmdhd version where I get <7Mbps TCP throughput (which is much
> the same I get with the brcmfmac driver). The firmware version reported
> is:
> 
> # wl ver
> 1.107 RC5.0
> wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0
> 
> So, probably something is missing in the newer Kernel version, which is
> present in the legacy Kernel 2.6.35.
> 
> [1] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
> imx_2.6.35_1.1.0
> [2] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
> imx_4.1.15_1.0.0_ga

during implementation of DDR mode for the mmc driver [1] i noticed a performance
gap between the vendor kernel and mainline by a factor of 2. I expect that your
wireless interface is connected via SDIO.

Stefan

[1] -
http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-3-mmc-mxs-mmc-implement-ddr-support

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-15  8:59         ` Stefan Wahren
@ 2016-10-15  9:41           ` Jörg Krause
  2016-10-15 16:16             ` Stefan Wahren
  2016-10-15 11:18           ` Jörg Krause
  1 sibling, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-10-15  9:41 UTC (permalink / raw)
  To: linux-arm-kernel



Hi Stefan,

Am 15. Oktober 2016 10:59:41 MESZ, schrieb Stefan Wahren <stefan.wahren@i2se.com>:
>Hi J?rg,
>
>> J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016 um
>10:46
>> geschrieben:
>> 
>> 
>> Thanks!
>> 
>> 
>> For the record:
>> 
>> Note, this is the result for the wireless interface.
>> 
>> I got one of my custom boards running the legacy Linux Kernel 2.6.35
>> officially supported from Freescale (NXP) and the bcmdhd driver from
>> the Wiced project, where I get >20Mbps TCP throughput. The firmware
>> version reported is:
>> 
>> # wl ver
>> 5.90 RC115.2
>> wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-bc2d0891
>> 
>> 
>> I got it also running with the Linux Kernel 4.1.15 from Freescale
>[2],
>> which is not officially supported for the i.MX28 target, with the
>> latest bcmdhd version where I get <7Mbps TCP throughput (which is
>much
>> the same I get with the brcmfmac driver). The firmware version
>reported
>> is:
>> 
>> # wl ver
>> 1.107 RC5.0
>> wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0
>> 
>> So, probably something is missing in the newer Kernel version, which
>is
>> present in the legacy Kernel 2.6.35.
>> 
>> [1]
>http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
>> imx_2.6.35_1.1.0
>> [2]
>http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
>> imx_4.1.15_1.0.0_ga
>
>during implementation of DDR mode for the mmc driver [1] i noticed a
>performance
>gap between the vendor kernel and mainline by a factor of 2. I expect
>that your
>wireless interface is connected via SDIO.

Yes, it is. I had the suspicion that the MMC or the DMA driver is the bootleneck, too.

>
>[1] -
>http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-3-mmc-mxs-mmc-implement-ddr-support

Looks like the patches might help. Have you tried SDIO wifi so far?

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-15  8:59         ` Stefan Wahren
  2016-10-15  9:41           ` Jörg Krause
@ 2016-10-15 11:18           ` Jörg Krause
  1 sibling, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-10-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel



Am 15. Oktober 2016 10:59:41 MESZ, schrieb Stefan Wahren <stefan.wahren@i2se.com>:
>Hi J?rg,
>
>> J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016 um
>10:46
>> geschrieben:
>> 
>> 
>> Thanks!
>> 
>> 
>> For the record:
>> 
>> Note, this is the result for the wireless interface.
>> 
>> I got one of my custom boards running the legacy Linux Kernel 2.6.35
>> officially supported from Freescale (NXP) and the bcmdhd driver from
>> the Wiced project, where I get >20Mbps TCP throughput. The firmware
>> version reported is:
>> 
>> # wl ver
>> 5.90 RC115.2
>> wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-bc2d0891
>> 
>> 
>> I got it also running with the Linux Kernel 4.1.15 from Freescale
>[2],
>> which is not officially supported for the i.MX28 target, with the
>> latest bcmdhd version where I get <7Mbps TCP throughput (which is
>much
>> the same I get with the brcmfmac driver). The firmware version
>reported
>> is:
>> 
>> # wl ver
>> 1.107 RC5.0
>> wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0
>> 
>> So, probably something is missing in the newer Kernel version, which
>is
>> present in the legacy Kernel 2.6.35.
>> 
>> [1]
>http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
>> imx_2.6.35_1.1.0
>> [2]
>http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
>> imx_4.1.15_1.0.0_ga
>
>during implementation of DDR mode for the mmc driver [1] i noticed a
>performance
>gap between the vendor kernel and mainline by a factor of 2. I expect
>that your
>wireless interface is connected via SDIO.

I wonder if this [2] might be related. As far as I can see it is not present in mainline.

>
>[1] -
>http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-3-mmc-mxs-mmc-implement-ddr-support

[2]
http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/commit/?h=imx_2.6.35_1.1.0&id=c105f3ef1d461aaeedbc6361941096b6684cc812

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-15  9:41           ` Jörg Krause
@ 2016-10-15 16:16             ` Stefan Wahren
  2016-10-28 23:07               ` Jörg Krause
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Wahren @ 2016-10-15 16:16 UTC (permalink / raw)
  To: linux-arm-kernel


> J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016 um 11:41
> geschrieben:
> 
> 
> 
> 
> Hi Stefan,
> 
> Am 15. Oktober 2016 10:59:41 MESZ, schrieb Stefan Wahren
> <stefan.wahren@i2se.com>:
> >Hi J?rg,
> >
> >> J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016 um
> >10:46
> >> geschrieben:
> >> 
> >> 
> >> Thanks!
> >> 
> >> 
> >> For the record:
> >> 
> >> Note, this is the result for the wireless interface.
> >> 
> >> I got one of my custom boards running the legacy Linux Kernel 2.6.35
> >> officially supported from Freescale (NXP) and the bcmdhd driver from
> >> the Wiced project, where I get >20Mbps TCP throughput. The firmware
> >> version reported is:
> >> 
> >> # wl ver
> >> 5.90 RC115.2
> >> wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-bc2d0891
> >> 
> >> 
> >> I got it also running with the Linux Kernel 4.1.15 from Freescale
> >[2],
> >> which is not officially supported for the i.MX28 target, with the
> >> latest bcmdhd version where I get <7Mbps TCP throughput (which is
> >much
> >> the same I get with the brcmfmac driver). The firmware version
> >reported
> >> is:
> >> 
> >> # wl ver
> >> 1.107 RC5.0
> >> wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0
> >> 
> >> So, probably something is missing in the newer Kernel version, which
> >is
> >> present in the legacy Kernel 2.6.35.
> >> 
> >> [1]
> >http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
> >> imx_2.6.35_1.1.0
> >> [2]
> >http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?h=
> >> imx_4.1.15_1.0.0_ga
> >
> >during implementation of DDR mode for the mmc driver [1] i noticed a
> >performance
> >gap between the vendor kernel and mainline by a factor of 2. I expect
> >that your
> >wireless interface is connected via SDIO.
> 
> Yes, it is. I had the suspicion that the MMC or the DMA driver is the
> bootleneck, too.
> 
> >
> >[1] -
> >http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-3-mmc-mxs-mmc-implement-ddr-support
> 
> Looks like the patches might help.

Unfortunately not, the performance gain is smaller than expected.

> Have you tried SDIO wifi so far?

No.

> 
> J?rg
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-15 16:16             ` Stefan Wahren
@ 2016-10-28 23:07               ` Jörg Krause
  2016-10-29  9:08                 ` Stefan Wahren
  0 siblings, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-10-28 23:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-10-15 at 18:16 +0200, Stefan Wahren wrote:
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober 2016
> > um 11:41
> > geschrieben:
> > 
> > 
> > 
> > 
> > Hi Stefan,
> > 
> > Am 15. Oktober 2016 10:59:41 MESZ, schrieb Stefan Wahren
> > <stefan.wahren@i2se.com>:
> > > Hi J?rg,
> > > 
> > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 15. Oktober
> > > > 2016 um
> > > 
> > > 10:46
> > > > geschrieben:
> > > > 
> > > > 
> > > > Thanks!
> > > > 
> > > > 
> > > > For the record:
> > > > 
> > > > Note, this is the result for the wireless interface.
> > > > 
> > > > I got one of my custom boards running the legacy Linux Kernel
> > > > 2.6.35
> > > > officially supported from Freescale (NXP) and the bcmdhd driver
> > > > from
> > > > the Wiced project, where I get >20Mbps TCP throughput. The
> > > > firmware
> > > > version reported is:
> > > > 
> > > > # wl ver
> > > > 5.90 RC115.2
> > > > wl0: Apr 24 2014 14:08:41 version 5.90.195.89.24 FWID 01-
> > > > bc2d0891
> > > > 
> > > > 
> > > > I got it also running with the Linux Kernel 4.1.15 from
> > > > Freescale
> > > 
> > > [2],
> > > > which is not officially supported for the i.MX28 target, with
> > > > the
> > > > latest bcmdhd version where I get <7Mbps TCP throughput (which
> > > > is
> > > 
> > > much
> > > > the same I get with the brcmfmac driver). The firmware version
> > > 
> > > reported
> > > > is:
> > > > 
> > > > # wl ver
> > > > 1.107 RC5.0
> > > > wl0: Aug??8 2016 02:17:48 version 5.90.232 FWID 01-0
> > > > 
> > > > So, probably something is missing in the newer Kernel version,
> > > > which
> > > 
> > > is
> > > > present in the legacy Kernel 2.6.35.
> > > > 
> > > > [1]
> > > 
> > > http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?
> > > h=
> > > > imx_2.6.35_1.1.0
> > > > [2]
> > > 
> > > http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/log/?
> > > h=
> > > > imx_4.1.15_1.0.0_ga
> > > 
> > > during implementation of DDR mode for the mmc driver [1] i
> > > noticed a
> > > performance
> > > gap between the vendor kernel and mainline by a factor of 2. I
> > > expect
> > > that your
> > > wireless interface is connected via SDIO.
> > 
> > Yes, it is. I had the suspicion that the MMC or the DMA driver is
> > the
> > bootleneck, too.
> > 
> > > 
> > > [1] -
> > > http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-
> > > 0-3-mmc-mxs-mmc-implement-ddr-support
> > 
> > Looks like the patches might help.
> 
> Unfortunately not, the performance gain is smaller than expected.
> 
> > Have you tried SDIO wifi so far?
> 
> No.

You mentioned [1] an optimization in the Freescale vendor Linux kernel
[2]. I would really like to see this optimization in the mainline
kernel.

Did you ever tried to port this code from Freescale to mainline?

Is it even possible, as the mainline driver uses the DMA engine?

[1] http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-
3-mmc-mxs-mmc-implement-ddr-support#post8
[2] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/commit/
?h=imx_2.6.35_maintain&id=b09358887fb4b67f6d497fac8cc48475c8bd292d

Best regards,
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-28 23:07               ` Jörg Krause
@ 2016-10-29  9:08                 ` Stefan Wahren
  2016-10-29 13:08                   ` Jörg Krause
  2016-11-02  8:14                   ` Jörg Krause
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan Wahren @ 2016-10-29  9:08 UTC (permalink / raw)
  To: linux-arm-kernel


> J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober 2016 um 01:07
> geschrieben:
> 
> 
> You mentioned [1] an optimization in the Freescale vendor Linux kernel
> [2]. I would really like to see this optimization in the mainline
> kernel.
> 
> Did you ever tried to port this code from Freescale to mainline?

Yes, i tried once but i was frustrated soon because of the lot of required
changes and resulting issues.

> 
> Is it even possible, as the mainline driver uses the DMA engine?

I think the more important part would be analyse why the Mainline driver is
slowlier. I mean to exactly identify the bottleneck.

I don't have enough time and equipment for this. I better concentrate on standby
support.

> 
> [1] http://linux-arm-kernel.infradead.narkive.com/GNkqjvo8/patch-rfc-0-
> 3-mmc-mxs-mmc-implement-ddr-support#post8
> [2] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/commit/
> ?h=imx_2.6.35_maintain&id=b09358887fb4b67f6d497fac8cc48475c8bd292d
> 
> Best regards,
> J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-29  9:08                 ` Stefan Wahren
@ 2016-10-29 13:08                   ` Jörg Krause
  2016-11-02  8:14                   ` Jörg Krause
  1 sibling, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-10-29 13:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober 2016
> > um 01:07
> > geschrieben:
> > 
> > 
> > You mentioned [1] an optimization in the Freescale vendor Linux
> > kernel
> > [2]. I would really like to see this optimization in the mainline
> > kernel.
> > 
> > Did you ever tried to port this code from Freescale to mainline?
> 
> Yes, i tried once but i was frustrated soon because of the lot of
> required
> changes and resulting issues.

I can imagine.

> > 
> > Is it even possible, as the mainline driver uses the DMA engine?
> 
> I think the more important part would be analyse why the Mainline
> driver is
> slowlier. I mean to exactly identify the bottleneck.

I'll try to understand the driver implementation. However, I am not a
Linux kernel developer, so this will need some time for sure. Any help
will be appreciated!

> I don't have enough time and equipment for this. I better concentrate
> on standby
> support.

Many thanks for your work!

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-10-29  9:08                 ` Stefan Wahren
  2016-10-29 13:08                   ` Jörg Krause
@ 2016-11-02  8:14                   ` Jörg Krause
  2016-11-02  8:24                     ` Stefan Wahren
  1 sibling, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-02  8:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober 2016
> > um 01:07
> > geschrieben:
> > 
> > 
> > You mentioned [1] an optimization in the Freescale vendor Linux
> > kernel
> > [2]. I would really like to see this optimization in the mainline
> > kernel.
> > 
> > Did you ever tried to port this code from Freescale to mainline?
> 
> Yes, i tried once but i was frustrated soon because of the lot of
> required
> changes and resulting issues.

I got the PIO mode working for the mxs-mmc driver. For this I ported
the PIO code from the vendor kernel and removed the usage of the DMA
engine entirely.

Testing network bandwidth with iperf, I get about ~10Mbit/sec with PIO
mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.

Note, that the vendor kernel implements a switch between PIO and DMA
mode for the ADTC command type depending on the data size. For this
test, I removed this switch and used PIO mode solely.

> > 
> > Is it even possible, as the mainline driver uses the DMA engine?
> 
> I think the more important part would be analyse why the Mainline
> driver is slowlier. I mean to exactly identify the bottleneck.

I will further investigate this issue.

Best regards,
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-02  8:14                   ` Jörg Krause
@ 2016-11-02  8:24                     ` Stefan Wahren
  2016-11-02  8:30                       ` Jörg Krause
  2016-11-04 18:44                       ` Jörg Krause
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan Wahren @ 2016-11-02  8:24 UTC (permalink / raw)
  To: linux-arm-kernel

Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
>>> J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober 2016
>>> um 01:07
>>> geschrieben:
>>>
>>>
>>> You mentioned [1] an optimization in the Freescale vendor Linux
>>> kernel
>>> [2]. I would really like to see this optimization in the mainline
>>> kernel.
>>>
>>> Did you ever tried to port this code from Freescale to mainline?
>> Yes, i tried once but i was frustrated soon because of the lot of
>> required
>> changes and resulting issues.
> I got the PIO mode working for the mxs-mmc driver. For this I ported
> the PIO code from the vendor kernel and removed the usage of the DMA
> engine entirely.

Good job

>
> Testing network bandwidth with iperf, I get about ~10Mbit/sec with PIO
> mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.

And how about MMC / sd card performance?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-02  8:24                     ` Stefan Wahren
@ 2016-11-02  8:30                       ` Jörg Krause
  2016-11-04 18:44                       ` Jörg Krause
  1 sibling, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-11-02  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober
> > > > 2016
> > > > um 01:07
> > > > geschrieben:
> > > > 
> > > > 
> > > > You mentioned [1] an optimization in the Freescale vendor Linux
> > > > kernel
> > > > [2]. I would really like to see this optimization in the
> > > > mainline
> > > > kernel.
> > > > 
> > > > Did you ever tried to port this code from Freescale to
> > > > mainline?
> > > 
> > > Yes, i tried once but i was frustrated soon because of the lot of
> > > required
> > > changes and resulting issues.
> > 
> > I got the PIO mode working for the mxs-mmc driver. For this I
> > ported
> > the PIO code from the vendor kernel and removed the usage of the
> > DMA
> > engine entirely.
> 
> Good job

Thanks!

> > 
> > Testing network bandwidth with iperf, I get about ~10Mbit/sec with
> > PIO
> > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> 
> And how about MMC / sd card performance?

Can't tell as my custom i.MX28 board does not have a SD card interface.

I will share the code after doing some cleanups and further tests so
you might test it as well.

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-02  8:24                     ` Stefan Wahren
  2016-11-02  8:30                       ` Jörg Krause
@ 2016-11-04 18:44                       ` Jörg Krause
  2016-11-04 19:30                         ` Stefan Wahren
  1 sibling, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-04 18:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Shawn,

On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober
> > > > 2016
> > > > um 01:07
> > > > geschrieben:
> > > > 
> > > > 
> > > > You mentioned [1] an optimization in the Freescale vendor Linux
> > > > kernel
> > > > [2]. I would really like to see this optimization in the
> > > > mainline
> > > > kernel.
> > > > 
> > > > Did you ever tried to port this code from Freescale to
> > > > mainline?
> > > 
> > > Yes, i tried once but i was frustrated soon because of the lot of
> > > required
> > > changes and resulting issues.
> > 
> > I got the PIO mode working for the mxs-mmc driver. For this I
> > ported
> > the PIO code from the vendor kernel and removed the usage of the
> > DMA
> > engine entirely.
> 
> Good job
> 
> > 
> > Testing network bandwidth with iperf, I get about ~10Mbit/sec with
> > PIO
> > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> 
> And how about MMC / sd card performance?

I noticed poor performance with the i.MX28 MMC and/or DMA driver using
the mainline kernel compared to the vendor Freescale kernel 2.6.35.
I've seen that hou have added the drivers to mainline some years ago.

My custom i.MX28 board has a wifi chip attached to the SSP2 interface.
Comparing the bandwith with iperf I get >20Mbits/sec on the vendor
kernel and <5Mbits/sec on the mainline kernel.

My best guess is that there is some kind of bottleneck in the drivers.
I already started looking at the vendor drivers as well as@the
mainline drivers, but I need some more investigation to understand the
complexity.

Do you have any idea what the bottleneck might be?

Best regards,
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-04 18:44                       ` Jörg Krause
@ 2016-11-04 19:30                         ` Stefan Wahren
  2016-11-04 20:56                           ` Jörg Krause
  2016-11-04 22:42                           ` Jörg Krause
  0 siblings, 2 replies; 31+ messages in thread
From: Stefan Wahren @ 2016-11-04 19:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi J?rg,

> J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016 um 19:44
> geschrieben:
> 
> 
> Hi Shawn,
> 
> On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29. Oktober
> > > > > 2016
> > > > > um 01:07
> > > > > geschrieben:
> > > > > 
> > > > > 
> > > > > You mentioned [1] an optimization in the Freescale vendor Linux
> > > > > kernel
> > > > > [2]. I would really like to see this optimization in the
> > > > > mainline
> > > > > kernel.
> > > > > 
> > > > > Did you ever tried to port this code from Freescale to
> > > > > mainline?
> > > > 
> > > > Yes, i tried once but i was frustrated soon because of the lot of
> > > > required
> > > > changes and resulting issues.
> > > 
> > > I got the PIO mode working for the mxs-mmc driver. For this I
> > > ported
> > > the PIO code from the vendor kernel and removed the usage of the
> > > DMA
> > > engine entirely.
> > 
> > Good job
> > 
> > > 
> > > Testing network bandwidth with iperf, I get about ~10Mbit/sec with
> > > PIO
> > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> > 
> > And how about MMC / sd card performance?
> 
> I noticed poor performance with the i.MX28 MMC and/or DMA driver using
> the mainline kernel compared to the vendor Freescale kernel 2.6.35.
> I've seen that hou have added the drivers to mainline some years ago.
> 
> My custom i.MX28 board has a wifi chip attached to the SSP2 interface.
> Comparing the bandwith with iperf I get >20Mbits/sec on the vendor
> kernel and <5Mbits/sec on the mainline kernel.

there is one thing about the clock handling. I noticed that the Vendor Kernel
round up the clock frequency and the Mainline Kernel round down the clock
frequency [1]. So don't trust the clock ratings from DT / board code. Better
verify the register settings or check it with an osci.

[1] - http://www.spinics.net/lists/linux-mmc/msg09132.html

> 
> My best guess is that there is some kind of bottleneck in the drivers.
> I already started looking at the vendor drivers as well as at the
> mainline drivers, but I need some more investigation to understand the
> complexity.
> 
> Do you have any idea what the bottleneck might be?
> 
> Best regards,
> J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-04 19:30                         ` Stefan Wahren
@ 2016-11-04 20:56                           ` Jörg Krause
  2016-11-04 22:42                           ` Jörg Krause
  1 sibling, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-11-04 20:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2016-11-04 at 20:30 +0100, Stefan Wahren wrote:
> Hi J?rg,
> 
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016
> > um 19:44
> > geschrieben:
> > 
> > 
> > Hi Shawn,
> > 
> > On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > > Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29.
> > > > > > Oktober
> > > > > > 2016
> > > > > > um 01:07
> > > > > > geschrieben:
> > > > > > 
> > > > > > 
> > > > > > You mentioned [1] an optimization in the Freescale vendor
> > > > > > Linux
> > > > > > kernel
> > > > > > [2]. I would really like to see this optimization in the
> > > > > > mainline
> > > > > > kernel.
> > > > > > 
> > > > > > Did you ever tried to port this code from Freescale to
> > > > > > mainline?
> > > > > 
> > > > > Yes, i tried once but i was frustrated soon because of the
> > > > > lot of
> > > > > required
> > > > > changes and resulting issues.
> > > > 
> > > > I got the PIO mode working for the mxs-mmc driver. For this I
> > > > ported
> > > > the PIO code from the vendor kernel and removed the usage of
> > > > the
> > > > DMA
> > > > engine entirely.
> > > 
> > > Good job
> > > 
> > > > 
> > > > Testing network bandwidth with iperf, I get about ~10Mbit/sec
> > > > with
> > > > PIO
> > > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> > > 
> > > And how about MMC / sd card performance?
> > 
> > I noticed poor performance with the i.MX28 MMC and/or DMA driver
> > using
> > the mainline kernel compared to the vendor Freescale kernel 2.6.35.
> > I've seen that hou have added the drivers to mainline some years
> > ago.
> > 
> > My custom i.MX28 board has a wifi chip attached to the SSP2
> > interface.
> > Comparing the bandwith with iperf I get >20Mbits/sec on the vendor
> > kernel and <5Mbits/sec on the mainline kernel.
> 
> there is one thing about the clock handling. I noticed that the
> Vendor Kernel
> round up the clock frequency and the Mainline Kernel round down the
> clock
> frequency [1]. So don't trust the clock ratings from DT / board code.
> Better
> verify the register settings or check it with an osci.
> 
> [1] - http://www.spinics.net/lists/linux-mmc/msg09132.html

I checked the clock rate setting by reading the register 0x80014070
(HW_SSP2_TIMING). CLOCK_DIVIDE is 0x2 and CLOCK_RATE is 0x0. As SSP CLK
is 96MHz this makes a clock rate of 48MHz.

There was a discussion on the mailing list [1] about that tasklets
might be slow.

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-04 19:30                         ` Stefan Wahren
  2016-11-04 20:56                           ` Jörg Krause
@ 2016-11-04 22:42                           ` Jörg Krause
  2016-11-05 11:33                             ` Stefan Wahren
  1 sibling, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-04 22:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stefan,

sorry, I forget the link in the previous mail.

On Fri, 2016-11-04 at 20:30 +0100, Stefan Wahren wrote:
> Hi J?rg,
> 
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016
> > um 19:44
> > geschrieben:
> > 
> > 
> > Hi Shawn,
> > 
> > On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > > Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29.
> > > > > > Oktober
> > > > > > 2016
> > > > > > um 01:07
> > > > > > geschrieben:
> > > > > > 
> > > > > > 
> > > > > > You mentioned [1] an optimization in the Freescale vendor
> > > > > > Linux
> > > > > > kernel
> > > > > > [2]. I would really like to see this optimization in the
> > > > > > mainline
> > > > > > kernel.
> > > > > > 
> > > > > > Did you ever tried to port this code from Freescale to
> > > > > > mainline?
> > > > > 
> > > > > Yes, i tried once but i was frustrated soon because of the
> > > > > lot of
> > > > > required
> > > > > changes and resulting issues.
> > > > 
> > > > I got the PIO mode working for the mxs-mmc driver. For this I
> > > > ported
> > > > the PIO code from the vendor kernel and removed the usage of
> > > > the
> > > > DMA
> > > > engine entirely.
> > > 
> > > Good job
> > > 
> > > > 
> > > > Testing network bandwidth with iperf, I get about ~10Mbit/sec
> > > > with
> > > > PIO
> > > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> > > 
> > > And how about MMC / sd card performance?
> > 
> > I noticed poor performance with the i.MX28 MMC and/or DMA driver
> > using
> > the mainline kernel compared to the vendor Freescale kernel 2.6.35.
> > I've seen that hou have added the drivers to mainline some years
> > ago.
> > 
> > My custom i.MX28 board has a wifi chip attached to the SSP2
> > interface.
> > Comparing the bandwith with iperf I get >20Mbits/sec on the vendor
> > kernel and <5Mbits/sec on the mainline kernel.
> 
> there is one thing about the clock handling. I noticed that the
> Vendor Kernel
> round up the clock frequency and the Mainline Kernel round down the
> clock
> frequency [1]. So don't trust the clock ratings from DT / board code.
> Better
> verify the register settings or check it with an osci.
> 
> [1] - http://www.spinics.net/lists/linux-mmc/msg09132.html

I checked the clock rate setting by reading the register 0x80014070
(HW_SSP2_TIMING). CLOCK_DIVIDE is 0x2 and CLOCK_RATE is 0x0. As SSP CLK
is 96MHz this makes a clock rate of 48MHz.

There was a discussion on the mailing list [1] about that tasklets
might be slow.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February
/043395.html

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-04 22:42                           ` Jörg Krause
@ 2016-11-05 11:33                             ` Stefan Wahren
  2016-11-05 12:06                               ` Jörg Krause
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Wahren @ 2016-11-05 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi J?rg,

> J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016 um 23:42
> geschrieben:
> 
> 
> Hi Stefan,
> 
> sorry, I forget the link in the previous mail.
> 
> On Fri, 2016-11-04 at 20:30 +0100, Stefan Wahren wrote:
> > Hi J?rg,
> > 
> > > J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016
> > > um 19:44
> > > geschrieben:
> > > 
> > > 
> > > Hi Shawn,
> > > 
> > > On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > > > Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > > > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29.
> > > > > > > Oktober
> > > > > > > 2016
> > > > > > > um 01:07
> > > > > > > geschrieben:
> > > > > > > 
> > > > > > > 
> > > > > > > You mentioned [1] an optimization in the Freescale vendor
> > > > > > > Linux
> > > > > > > kernel
> > > > > > > [2]. I would really like to see this optimization in the
> > > > > > > mainline
> > > > > > > kernel.
> > > > > > > 
> > > > > > > Did you ever tried to port this code from Freescale to
> > > > > > > mainline?
> > > > > > 
> > > > > > Yes, i tried once but i was frustrated soon because of the
> > > > > > lot of
> > > > > > required
> > > > > > changes and resulting issues.
> > > > > 
> > > > > I got the PIO mode working for the mxs-mmc driver. For this I
> > > > > ported
> > > > > the PIO code from the vendor kernel and removed the usage of
> > > > > the
> > > > > DMA
> > > > > engine entirely.
> > > > 
> > > > Good job
> > > > 
> > > > > 
> > > > > Testing network bandwidth with iperf, I get about ~10Mbit/sec
> > > > > with
> > > > > PIO
> > > > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and about
> > > > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for TCP.
> > > > 
> > > > And how about MMC / sd card performance?
> > > 
> > > I noticed poor performance with the i.MX28 MMC and/or DMA driver
> > > using
> > > the mainline kernel compared to the vendor Freescale kernel 2.6.35.
> > > I've seen that hou have added the drivers to mainline some years
> > > ago.
> > > 
> > > My custom i.MX28 board has a wifi chip attached to the SSP2
> > > interface.
> > > Comparing the bandwith with iperf I get >20Mbits/sec on the vendor
> > > kernel and <5Mbits/sec on the mainline kernel.
> > 
> > there is one thing about the clock handling. I noticed that the
> > Vendor Kernel
> > round up the clock frequency and the Mainline Kernel round down the
> > clock
> > frequency [1]. So don't trust the clock ratings from DT / board code.
> > Better
> > verify the register settings or check it with an osci.
> > 
> > [1] - http://www.spinics.net/lists/linux-mmc/msg09132.html
> 
> I checked the clock rate setting by reading the register 0x80014070
> (HW_SSP2_TIMING). CLOCK_DIVIDE is 0x2 and CLOCK_RATE is 0x0. As SSP CLK
> is 96MHz this makes a clock rate of 48MHz.
> 
> There was a discussion on the mailing list [1] about that tasklets
> might be slow.
> 
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-February
> /043395.html

if i unterstand it right the tasklet is not the problem, but the design of the
MXS DMA driver. Please refer to the chapter "General Design Notes" to the
documentation of the DMA provider [2].
I think the MXS DMA driver is affected. Maybe you should ask Vinod Koul about
this.

[2] - https://www.kernel.org/doc/Documentation/dmaengine/provider.txt

> 
> J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 11:33                             ` Stefan Wahren
@ 2016-11-05 12:06                               ` Jörg Krause
  2016-11-05 12:39                                 ` Koul, Vinod
  0 siblings, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-05 12:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Vinod,

as recommanded by Stefan Wahren I'm turning on you about this issue.
Please see below... 

On Sat, 2016-11-05 at 12:33 +0100, Stefan Wahren wrote:
> Hi J?rg,
> 
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November 2016
> > um 23:42
> > geschrieben:
> > 
> > 
> > Hi Stefan,
> > 
> > sorry, I forget the link in the previous mail.
> > 
> > On Fri, 2016-11-04 at 20:30 +0100, Stefan Wahren wrote:
> > > Hi J?rg,
> > > 
> > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 4. November
> > > > 2016
> > > > um 19:44
> > > > geschrieben:
> > > > 
> > > > 
> > > > Hi Shawn,
> > > > 
> > > > On Wed, 2016-11-02 at 09:24 +0100, Stefan Wahren wrote:
> > > > > Am 02.11.2016 um 09:14 schrieb J?rg Krause:
> > > > > > On Sat, 2016-10-29 at 11:08 +0200, Stefan Wahren wrote:
> > > > > > > > J?rg Krause <joerg.krause@embedded.rocks> hat am 29.
> > > > > > > > Oktober
> > > > > > > > 2016
> > > > > > > > um 01:07
> > > > > > > > geschrieben:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > You mentioned [1] an optimization in the Freescale
> > > > > > > > vendor
> > > > > > > > Linux
> > > > > > > > kernel
> > > > > > > > [2]. I would really like to see this optimization in
> > > > > > > > the
> > > > > > > > mainline
> > > > > > > > kernel.
> > > > > > > > 
> > > > > > > > Did you ever tried to port this code from Freescale to
> > > > > > > > mainline?
> > > > > > > 
> > > > > > > Yes, i tried once but i was frustrated soon because of
> > > > > > > the
> > > > > > > lot of
> > > > > > > required
> > > > > > > changes and resulting issues.
> > > > > > 
> > > > > > I got the PIO mode working for the mxs-mmc driver. For this
> > > > > > I
> > > > > > ported
> > > > > > the PIO code from the vendor kernel and removed the usage
> > > > > > of
> > > > > > the
> > > > > > DMA
> > > > > > engine entirely.
> > > > > 
> > > > > Good job
> > > > > 
> > > > > > 
> > > > > > Testing network bandwidth with iperf, I get about
> > > > > > ~10Mbit/sec
> > > > > > with
> > > > > > PIO
> > > > > > mode compared to ~6.5Mbit/sec with DMA mode for UDP and
> > > > > > about
> > > > > > ~6.5Mbit/sec compared to ~4.5Mbit/sec with DMA mode for
> > > > > > TCP.
> > > > > 
> > > > > And how about MMC / sd card performance?
> > > > 
> > > > I noticed poor performance with the i.MX28 MMC and/or DMA
> > > > driver
> > > > using
> > > > the mainline kernel compared to the vendor Freescale kernel
> > > > 2.6.35.
> > > > I've seen that hou have added the drivers to mainline some
> > > > years
> > > > ago.
> > > > 
> > > > My custom i.MX28 board has a wifi chip attached to the SSP2
> > > > interface.
> > > > Comparing the bandwith with iperf I get >20Mbits/sec on the
> > > > vendor
> > > > kernel and <5Mbits/sec on the mainline kernel.
> > > 
> > > there is one thing about the clock handling. I noticed that the
> > > Vendor Kernel
> > > round up the clock frequency and the Mainline Kernel round down
> > > the
> > > clock
> > > frequency [1]. So don't trust the clock ratings from DT / board
> > > code.
> > > Better
> > > verify the register settings or check it with an osci.
> > > 
> > > [1] - http://www.spinics.net/lists/linux-mmc/msg09132.html
> > 
> > I checked the clock rate setting by reading the register 0x80014070
> > (HW_SSP2_TIMING). CLOCK_DIVIDE is 0x2 and CLOCK_RATE is 0x0. As SSP
> > CLK
> > is 96MHz this makes a clock rate of 48MHz.
> > 
> > There was a discussion on the mailing list [1] about that tasklets
> > might be slow.
> > 
> > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2011-Febr
> > uary
> > /043395.html
> 
> if i unterstand it right the tasklet is not the problem, but the
> design of the
> MXS DMA driver. Please refer to the chapter "General Design Notes" to
> the
> documentation of the DMA provider [2].
> I think the MXS DMA driver is affected. Maybe you should ask Vinod
> Koul about
> this.
> 
> [2] - https://www.kernel.org/doc/Documentation/dmaengine/provider.txt

@ Vinod
In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
interface on a custom i.MX28 board with a wifi chip attached. Comparing
the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
<5Mbits/sec on the mainline kernel. I am trying to investigate what the
bottleneck is.

@ Stefan, all
My understanding is that the tasklet in this case is responsible for
reading the response registers of the DMA controller and return the
response to the MMC host driver.

The vendor kernel does this in the interrupt routine of mxs-mmc by
issueing a complete whereas the mainline kernel does this in the
interrupt routine in mxs-dma by scheduling the tasklet.

To check if this makes any difference I replaced the tasklet() usage
with using the complete() infrastructure. For this I hacked the DMA
engine and the MXS DMA driver. However, the performance stays the same.

So, if I understand correctly, this is not an issue here, right? So if
not the tasklet, what do you suspect?

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 12:06                               ` Jörg Krause
@ 2016-11-05 12:39                                 ` Koul, Vinod
  2016-11-05 12:47                                   ` Jörg Krause
                                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Koul, Vinod @ 2016-11-05 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
> @ Vinod
> In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
> interface on a custom i.MX28 board with a wifi chip attached.
> Comparing
> the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
> <5Mbits/sec on the mainline kernel. I am trying to investigate what
> the
> bottleneck is.

is this imx-dma or imx-sdma..

> 
> @ Stefan, all
> My understanding is that the tasklet in this case is responsible for
> reading the response registers of the DMA controller and return the
> response to the MMC host driver.
> 
> The vendor kernel does this in the interrupt routine of mxs-mmc by
> issueing a complete whereas the mainline kernel does this in the
> interrupt routine in mxs-dma by scheduling the tasklet.

Is vendor kernel using dmaengine APIs or not?

Okay, if we talk about getting best performance, I always advise folks
to issue next transaction in the interrupt routine and then do
descriptor management and callback in tasklet.

Some drivers do that correctly but some don't..

Tasklet can be an issue but only if there is a huge scheduling delay for
the tasklet. You can check using tracing tools and confirm.

> 
> To check if this makes any difference I replaced the tasklet() usage
> with using the complete() infrastructure. For this I hacked the DMA
> engine and the MXS DMA driver. However, the performance stays the
> same.
> 
> So, if I understand correctly, this is not an issue here, right? So if
> not the tasklet, what do you suspect?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 12:39                                 ` Koul, Vinod
@ 2016-11-05 12:47                                   ` Jörg Krause
  2016-11-05 12:48                                   ` Fabio Estevam
  2016-11-05 13:14                                   ` Jörg Krause
  2 siblings, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-11-05 12:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-11-05 at 12:39 +0000, Koul, Vinod wrote:
> On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
> > @ Vinod
> > In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
> > interface on a custom i.MX28 board with a wifi chip attached.
> > Comparing
> > the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
> > <5Mbits/sec on the mainline kernel. I am trying to investigate what
> > the
> > bottleneck is.
> 
> is this imx-dma or imx-sdma..

Its' mxs-dma.

> 
> > 
> > @ Stefan, all
> > My understanding is that the tasklet in this case is responsible
> > for
> > reading the response registers of the DMA controller and return the
> > response to the MMC host driver.
> > 
> > The vendor kernel does this in the interrupt routine of mxs-mmc by
> > issueing a complete whereas the mainline kernel does this in the
> > interrupt routine in mxs-dma by scheduling the tasklet.
> 
> Is vendor kernel using dmaengine APIs or not?

No. It's using a custom dmaengine.

> 
> Okay, if we talk about getting best performance, I always advise
> folks
> to issue next transaction in the interrupt routine and then do
> descriptor management and callback in tasklet.
> 
> Some drivers do that correctly but some don't..

Do you have an example for a driver doing it correctly?

> Tasklet can be an issue but only if there is a huge scheduling delay
> for
> the tasklet. You can check using tracing tools and confirm.

Don't think the tasklets is an issue here as I replaced the tasklets in
the dmaengine API by completion (which the vendor kernel uses) and
there are no performance benefits. However, I am not a Linux kernel
developer...

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 12:39                                 ` Koul, Vinod
  2016-11-05 12:47                                   ` Jörg Krause
@ 2016-11-05 12:48                                   ` Fabio Estevam
  2016-11-05 13:14                                   ` Jörg Krause
  2 siblings, 0 replies; 31+ messages in thread
From: Fabio Estevam @ 2016-11-05 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vinod,

On Sat, Nov 5, 2016 at 10:39 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
> On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
>> @ Vinod
>> In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
>> interface on a custom i.MX28 board with a wifi chip attached.
>> Comparing
>> the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
>> <5Mbits/sec on the mainline kernel. I am trying to investigate what
>> the
>> bottleneck is.
>
> is this imx-dma or imx-sdma..

This is drivers/dma/mxs-dma.c, thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 12:39                                 ` Koul, Vinod
  2016-11-05 12:47                                   ` Jörg Krause
  2016-11-05 12:48                                   ` Fabio Estevam
@ 2016-11-05 13:14                                   ` Jörg Krause
  2016-11-05 15:45                                     ` Koul, Vinod
  2 siblings, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-05 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-11-05 at 12:39 +0000, Koul, Vinod wrote:
> On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
> > @ Vinod
> > In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
> > interface on a custom i.MX28 board with a wifi chip attached.
> > Comparing
> > the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
> > <5Mbits/sec on the mainline kernel. I am trying to investigate what
> > the
> > bottleneck is.
> 
> is this imx-dma or imx-sdma..
> 
> > 
> > @ Stefan, all
> > My understanding is that the tasklet in this case is responsible
> > for
> > reading the response registers of the DMA controller and return the
> > response to the MMC host driver.
> > 
> > The vendor kernel does this in the interrupt routine of mxs-mmc by
> > issueing a complete whereas the mainline kernel does this in the
> > interrupt routine in mxs-dma by scheduling the tasklet.
> 
> Is vendor kernel using dmaengine APIs or not?

It's this engine [1].

[1] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/ar
ch/arm/plat-mxs/dmaengine.c?h=imx_2.6.35_1.1.0

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 13:14                                   ` Jörg Krause
@ 2016-11-05 15:45                                     ` Koul, Vinod
  2016-11-05 22:37                                       ` Jörg Krause
  2016-11-18 23:49                                       ` Jörg Krause
  0 siblings, 2 replies; 31+ messages in thread
From: Koul, Vinod @ 2016-11-05 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-11-05 at 14:14 +0100, J?rg Krause wrote:
> On Sat, 2016-11-05 at 12:39 +0000, Koul, Vinod wrote:
> > 
> > On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
> > > 
> > > @ Vinod
> > > In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
> > > interface on a custom i.MX28 board with a wifi chip attached.
> > > Comparing
> > > the bandwith with iperf I get >20Mbits/sec on the vendor kernel
> > > and
> > > <5Mbits/sec on the mainline kernel. I am trying to investigate
> > > what
> > > the
> > > bottleneck is.
> > is this imx-dma or imx-sdma..
> > 
> > > 
> > > 
> > > @ Stefan, all
> > > My understanding is that the tasklet in this case is responsible
> > > for
> > > reading the response registers of the DMA controller and return
> > > the
> > > response to the MMC host driver.
> > > 
> > > The vendor kernel does this in the interrupt routine of mxs-mmc by
> > > issueing a complete whereas the mainline kernel does this in the
> > > interrupt routine in mxs-dma by scheduling the tasklet.
> > Is vendor kernel using dmaengine APIs or not?
> It's this engine [1].
> 
> [1] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/a
> r
> ch/arm/plat-mxs/dmaengine.c?h=imx_2.6.35_1.1.0

Thanks for info, this looks okay.

First can you confirm that register configuration for DMA transaction is
same in both cases.

Second, looking at the driver I see that interrupt handler is not
pushing next descriptor. Also the tasklet is doing callback action and
not pushing any descriptors, did I miss anything in this?

For good dma throughput, you should have multiple dma transactions
queued up and submitted as fast as possible. Can you check if this is
being done.?

We need to minimize/eliminate the delay between two transactions. This
can be done in SW or HW based on support from HW. If HW supports
chaining of descriptors then next transaction which is given to
dmaengine driver should be appended at the end. If not submit the
descriptor to hw immediately on interrupt.?

For good example of latter please look at?drivers/dma/sa11x0-dma.c

HTH
-- 
~Vinod

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 15:45                                     ` Koul, Vinod
@ 2016-11-05 22:37                                       ` Jörg Krause
  2016-11-18 23:49                                       ` Jörg Krause
  1 sibling, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-11-05 22:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 2016-11-05 at 15:45 +0000, Koul, Vinod wrote:
> On Sat, 2016-11-05 at 14:14 +0100, J?rg Krause wrote:
> > On Sat, 2016-11-05 at 12:39 +0000, Koul, Vinod wrote:
> > > 
> > > On Sat, 2016-11-05 at 13:06 +0100, J?rg Krause wrote:
> > > > 
> > > > @ Vinod
> > > > In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
> > > > interface on a custom i.MX28 board with a wifi chip attached.
> > > > Comparing
> > > > the bandwith with iperf I get >20Mbits/sec on the vendor kernel
> > > > and
> > > > <5Mbits/sec on the mainline kernel. I am trying to investigate
> > > > what
> > > > the
> > > > bottleneck is.
> > > 
> > > is this imx-dma or imx-sdma..
> > > 
> > > > 
> > > > 
> > > > @ Stefan, all
> > > > My understanding is that the tasklet in this case is
> > > > responsible
> > > > for
> > > > reading the response registers of the DMA controller and return
> > > > the
> > > > response to the MMC host driver.
> > > > 
> > > > The vendor kernel does this in the interrupt routine of mxs-mmc 
> > > > by
> > > > issueing a complete whereas the mainline kernel does this in
> > > > the
> > > > interrupt routine in mxs-dma by scheduling the tasklet.
> > > 
> > > Is vendor kernel using dmaengine APIs or not?
> > 
> > It's this engine [1].
> > 
> > [1] http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tre
> > e/a
> > r
> > ch/arm/plat-mxs/dmaengine.c?h=imx_2.6.35_1.1.0
> 
> Thanks for info, this looks okay.
> 
> First can you confirm that register configuration for DMA transaction
> is
> same in both cases.

They are almost identical. The difference is that the mainline MMC
driver has SDIO IRQ enabled and the APB bus has burst mode enable. Both
don't have any influence.

> Second, looking at the driver I see that interrupt handler is not
> pushing next descriptor. Also the tasklet is doing callback action
> and
> not pushing any descriptors, did I miss anything in this?

Right. However, after observing the registers I noticed that the vendor
MMC kernel driver only issues one DMA command, whereas the mainline
driver issues two chained DMA commands. The relevant function in both
drivers is mxs_mmc_adtc().

The mainline function issues a DMA transaction with setting the PIO
words only and appends the data from the MMC host.

The vendor function copies the MMC host data from the scatterlist into
an owned DMA buffer, sets the buffer address as the next command
address and issues the descriptor to the DMA engine.

> For good dma throughput, you should have multiple dma transactions
> queued up and submitted as fast as possible. Can you check if this is
> being done.?
> 
> We need to minimize/eliminate the delay between two transactions.
> This
> can be done in SW or HW based on support from HW. If HW supports
> chaining of descriptors then next transaction which is given to
> dmaengine driver should be appended at the end. If not submit the
> descriptor to hw immediately on interrupt.?

I see! In this particular case, the vendor driver reduces the chaining
of descriptors, whereas the mainline driver chains two DMA commands.
Note, that the i.MX28 hardware does support chaining. So, might this be
an issue for poor performance?

J?rg

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-05 15:45                                     ` Koul, Vinod
  2016-11-05 22:37                                       ` Jörg Krause
@ 2016-11-18 23:49                                       ` Jörg Krause
  2016-11-19 11:36                                         ` Stefan Wahren
  1 sibling, 1 reply; 31+ messages in thread
From: Jörg Krause @ 2016-11-18 23:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

[snip]

I did some time measurements on the wifi, mmc and dma driver to compare
the performance between the vendor and the mainline kernel. For this I
toggled some GPIOs and measured the time difference with an osci. I
started measuring the time before calling sdio_readsb() in the wifi
driver [1] and stopped the time when the call returns. Note that the
time was only measured for a packet length of 1536 bytes.

The vendor kernel took about 250 us to return whereas the mainline
kernel took about 325 us. To investigate where this additional time
comes from I divided the whole procedure into seperate parts and
compared their time consumed.

I noticed that the mainline kernel does took much longer to return
after the DMA request is done, signalled in this case by calling
mxs_mmc_dma_irq_callback() [2] in the mxs-mmc driver. From here it
takes about 150 us to get back to sdio_readsb().

An example for consuming much more time is the mainline mmc driver
where it hangs in?mmc_wait_done() [2] about 50 us just calling
complete(), whereas the vendor mmc driver almost immediately returns
here.

I wonder why this call to complete consumes so much time? Any ideas?

[1] http://lxr.free-electrons.com/source/drivers/net/wireless/broadcom/
brcm80211/brcmfmac/bcmsdh.c#L488

[2] http://lxr.free-electrons.com/source/drivers/mmc/host/mxs-mmc.c#L17
9

[3] http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L386

Best regards,
J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-18 23:49                                       ` Jörg Krause
@ 2016-11-19 11:36                                         ` Stefan Wahren
  2016-11-20  9:14                                           ` Jörg Krause
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Wahren @ 2016-11-19 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi J?rg,

> J?rg Krause <joerg.krause@embedded.rocks> hat am 19. November 2016 um 00:49
> geschrieben:
> 
> 
> Hi all,
> 
> [snip]
> 
> I did some time measurements on the wifi, mmc and dma driver to compare
> the performance between the vendor and the mainline kernel. For this I
> toggled some GPIOs and measured the time difference with an osci. I
> started measuring the time before calling sdio_readsb() in the wifi
> driver [1] and stopped the time when the call returns. Note that the
> time was only measured for a packet length of 1536 bytes.
> 
> The vendor kernel took about 250 us to return whereas the mainline
> kernel took about 325 us. To investigate where this additional time
> comes from I divided the whole procedure into seperate parts and
> compared their time consumed.
> 
> I noticed that the mainline kernel does took much longer to return
> after the DMA request is done, signalled in this case by calling
> mxs_mmc_dma_irq_callback() [2] in the mxs-mmc driver. From here it
> takes about 150 us to get back to sdio_readsb().
> 
> An example for consuming much more time is the mainline mmc driver
> where it hangs in?mmc_wait_done() [2] about 50 us just calling
> complete(), whereas the vendor mmc driver almost immediately returns
> here.
> 
> I wonder why this call to complete consumes so much time? Any ideas?

i don't know why, but how about putting the SDIO clk signal parallel to the
GPIOs at your osci? So could get a better view of the runtime behavior.

Btw you should also verify the necessary time between to 2 packets.

Stefan

> 
> [1] http://lxr.free-electrons.com/source/drivers/net/wireless/broadcom/
> brcm80211/brcmfmac/bcmsdh.c#L488
> 
> [2] http://lxr.free-electrons.com/source/drivers/mmc/host/mxs-mmc.c#L17
> 9
> 
> [3] http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L386
> 
> Best regards,
> J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Low network throughput on i.MX28
  2016-11-19 11:36                                         ` Stefan Wahren
@ 2016-11-20  9:14                                           ` Jörg Krause
  0 siblings, 0 replies; 31+ messages in thread
From: Jörg Krause @ 2016-11-20  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stefan,

On Sat, 2016-11-19 at 12:36 +0100, Stefan Wahren wrote:
> Hi J?rg,
> 
> > J?rg Krause <joerg.krause@embedded.rocks> hat am 19. November 2016
> > um 00:49
> > geschrieben:
> > 
> > 
> > Hi all,
> > 
> > [snip]
> > 
> > I did some time measurements on the wifi, mmc and dma driver to
> > compare
> > the performance between the vendor and the mainline kernel. For
> > this I
> > toggled some GPIOs and measured the time difference with an osci. I
> > started measuring the time before calling sdio_readsb() in the wifi
> > driver [1] and stopped the time when the call returns. Note that
> > the
> > time was only measured for a packet length of 1536 bytes.
> > 
> > The vendor kernel took about 250 us to return whereas the mainline
> > kernel took about 325 us. To investigate where this additional time
> > comes from I divided the whole procedure into seperate parts and
> > compared their time consumed.
> > 
> > I noticed that the mainline kernel does took much longer to return
> > after the DMA request is done, signalled in this case by calling
> > mxs_mmc_dma_irq_callback() [2] in the mxs-mmc driver. From here it
> > takes about 150 us to get back to sdio_readsb().
> > 
> > An example for consuming much more time is the mainline mmc driver
> > where it hangs in?mmc_wait_done() [2] about 50 us just calling
> > complete(), whereas the vendor mmc driver almost immediately
> > returns
> > here.
> > 
> > I wonder why this call to complete consumes so much time? Any
> > ideas?
> 
> i don't know why, but how about putting the SDIO clk signal parallel
> to the
> GPIOs at your osci? So could get a better view of the runtime
> behavior.

Unfortunately, the board layout does not allow me to access the SDIO
pins.

The main question for me is, why the mmc core driver needs around 120
us beginning from calling complete() in mmc_wait_done() [1] until
receiving the completion signal in mmc_wait_for_req_done() [2]. Why
does signaling the completion consumes so much time?

For comparision, the time to do the mmc request (preparing request,
preparing DMA, doing DMA, waiting, reading response, starting signal
completion) takes about 215 us, whereas just sending the signal that
completion is done takes 120 us. For me this issue is the bottleneck.

Does anyone has an idea what may be responsible that signaling the
completion is so slow?

[1] http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L386
[2] http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L492

> Btw you should also verify the necessary time between to 2 packets.
> 
> Stefan
> 
> > 
> > [1] http://lxr.free-electrons.com/source/drivers/net/wireless/broad
> > com/
> > brcm80211/brcmfmac/bcmsdh.c#L488
> > 
> > [2] http://lxr.free-electrons.com/source/drivers/mmc/host/mxs-mmc.c
> > #L17
> > 9
> > 
> > [3] http://lxr.free-electrons.com/source/drivers/mmc/core/core.c#L3
> > 86
> > 
> > Best regards,
> > J?rg Krause

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-11-20  9:14 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12 23:09 Low network throughput on i.MX28 Jörg Krause
2016-10-13  6:48 ` Lothar Waßmann
2016-10-13 19:43   ` Jörg Krause
2016-10-13 20:42     ` Uwe Kleine-König
2016-10-14  6:13     ` Lothar Waßmann
2016-10-15  8:46       ` Jörg Krause
2016-10-15  8:59         ` Stefan Wahren
2016-10-15  9:41           ` Jörg Krause
2016-10-15 16:16             ` Stefan Wahren
2016-10-28 23:07               ` Jörg Krause
2016-10-29  9:08                 ` Stefan Wahren
2016-10-29 13:08                   ` Jörg Krause
2016-11-02  8:14                   ` Jörg Krause
2016-11-02  8:24                     ` Stefan Wahren
2016-11-02  8:30                       ` Jörg Krause
2016-11-04 18:44                       ` Jörg Krause
2016-11-04 19:30                         ` Stefan Wahren
2016-11-04 20:56                           ` Jörg Krause
2016-11-04 22:42                           ` Jörg Krause
2016-11-05 11:33                             ` Stefan Wahren
2016-11-05 12:06                               ` Jörg Krause
2016-11-05 12:39                                 ` Koul, Vinod
2016-11-05 12:47                                   ` Jörg Krause
2016-11-05 12:48                                   ` Fabio Estevam
2016-11-05 13:14                                   ` Jörg Krause
2016-11-05 15:45                                     ` Koul, Vinod
2016-11-05 22:37                                       ` Jörg Krause
2016-11-18 23:49                                       ` Jörg Krause
2016-11-19 11:36                                         ` Stefan Wahren
2016-11-20  9:14                                           ` Jörg Krause
2016-10-15 11:18           ` Jörg Krause

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.