linux-amlogic.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] meson: Fix IRQ trigger type
@ 2018-12-07 10:52 Carlo Caione
  2018-12-07 10:52 ` [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq Carlo Caione
  2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
  0 siblings, 2 replies; 10+ messages in thread
From: Carlo Caione @ 2018-12-07 10:52 UTC (permalink / raw)
  To: robh+dt, mark.rutland, khilman, devicetree, linux-arm-kernel,
	linux-amlogic, martin.blumenstingl, ingrassia
  Cc: Carlo Caione

The wrong IRQ trigger type for the macirq was causing the connection
speed to drop after a few hours when stress testing the DUT. The fix
seems also to fix another long standing issue with EEE.

The fixes are tested on a AXG board but we think that the same fix is
valid also for all the others Amlogic SoC families.

Changelog:

V2:
 - Merge arm64 patches in one single patch
 - Merge arm patches in a different patch
 - Added T/R/A to the arm64 patch

Carlo Caione (2):
  arm64: dts: meson: Fix IRQ trigger type for macirq
  arm: dts: meson: Fix IRQ trigger type for macirq

 arch/arm/boot/dts/meson.dtsi                        | 2 +-
 arch/arm/boot/dts/meson8b-odroidc1.dts              | 1 -
 arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
 arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
 arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
 7 files changed, 3 insertions(+), 7 deletions(-)

-- 
2.19.1


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 [PATCH v2 0/2] meson: Fix IRQ trigger type Carlo Caione
@ 2018-12-07 10:52 ` Carlo Caione
  2018-12-08  0:20   ` Kevin Hilman
  2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
  1 sibling, 1 reply; 10+ messages in thread
From: Carlo Caione @ 2018-12-07 10:52 UTC (permalink / raw)
  To: robh+dt, mark.rutland, khilman, devicetree, linux-arm-kernel,
	linux-amlogic, martin.blumenstingl, ingrassia
  Cc: Carlo Caione

A long running stress test on a custom board shipping an AXG SoCs and a
Realtek RTL8211F PHY revealed that after a few hours the connection
speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
the 'macirq' (eth0) IRQ would stop being triggered at all and as
consequence the GMAC IRQs never ACKed.

After a painful investigation the problem seemed to be due to a wrong
defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
EDGE_RISING.

The change in the macirq IRQ type also solved another long standing
issue affecting this SoC/PHY where EEE was causing the network
connection to die after stressing it with iperf3 (even though much
sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
well.

Fixes: feb3cbea0946 ("ARM64: dts: meson-gxbb-odroidc2: fix GbE tx link breakage")
Fixes: 6d28d577510f ("ARM64: dts: meson-axg: fix ethernet stability issue")
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
---
 arch/arm64/boot/dts/amlogic/meson-axg-s400.dts      | 1 -
 arch/arm64/boot/dts/amlogic/meson-axg.dtsi          | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gx.dtsi           | 2 +-
 arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 -
 arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi   | 1 -
 5 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts b/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
index 18778ada7bd3..4d57363ac536 100644
--- a/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-axg-s400.dts
@@ -357,7 +357,6 @@
 		eth_phy0: ethernet-phy@0 {
 			/* Realtek RTL8211F (0x001cc916) */
 			reg = <0>;
-			eee-broken-1000t;
 		};
 	};
 };
diff --git a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
index df017dbd2e57..b1a42e99cb67 100644
--- a/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-axg.dtsi
@@ -143,7 +143,7 @@
 			compatible = "amlogic,meson-axg-dwmac", "snps,dwmac";
 			reg = <0x0 0xff3f0000 0x0 0x10000
 			       0x0 0xff634540 0x0 0x8>;
-			interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>;
+			interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
 			interrupt-names = "macirq";
 			clocks = <&clkc CLKID_ETH>,
 				 <&clkc CLKID_FCLK_DIV2>,
diff --git a/arch/arm64/boot/dts/amlogic/meson-gx.dtsi b/arch/arm64/boot/dts/amlogic/meson-gx.dtsi
index f1e5cdbade5e..58e6bcaac1d8 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gx.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-gx.dtsi
@@ -462,7 +462,7 @@
 			compatible = "amlogic,meson-gx-dwmac", "amlogic,meson-gxbb-dwmac", "snps,dwmac";
 			reg = <0x0 0xc9410000 0x0 0x10000
 			       0x0 0xc8834540 0x0 0x4>;
-			interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>;
+			interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
 			interrupt-names = "macirq";
 			status = "disabled";
 		};
diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
index 54954b314a45..f8d1cedbe600 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
@@ -143,7 +143,6 @@
 			interrupt-parent = <&gpio_intc>;
 			/* MAC_INTR on GPIOZ_15 */
 			interrupts = <29 IRQ_TYPE_LEVEL_LOW>;
-			eee-broken-1000t;
 		};
 	};
 };
diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi b/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
index 70325b273bd2..ec09bb5792b7 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-wetek.dtsi
@@ -142,7 +142,6 @@
 		eth_phy0: ethernet-phy@0 {
 			/* Realtek RTL8211F (0x001cc916) */
 			reg = <0>;
-			eee-broken-1000t;
 		};
 	};
 };
-- 
2.19.1


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 [PATCH v2 0/2] meson: Fix IRQ trigger type Carlo Caione
  2018-12-07 10:52 ` [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq Carlo Caione
@ 2018-12-07 10:52 ` Carlo Caione
  2018-12-07 18:51   ` Emiliano Ingrassia
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Carlo Caione @ 2018-12-07 10:52 UTC (permalink / raw)
  To: robh+dt, mark.rutland, khilman, devicetree, linux-arm-kernel,
	linux-amlogic, martin.blumenstingl, ingrassia
  Cc: Carlo Caione

A long running stress test on a custom board shipping an AXG SoCs and a
Realtek RTL8211F PHY revealed that after a few hours the connection
speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
the 'macirq' (eth0) IRQ would stop being triggered at all and as
consequence the GMAC IRQs never ACKed.

After a painful investigation the problem seemed to be due to a wrong
defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
EDGE_RISING.

The change in the macirq IRQ type also solved another long standing
issue affecting this SoC/PHY where EEE was causing the network
connection to die after stressing it with iperf3 (even though much
sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
well.

Fixes: 9c15795a4f96 ("ARM: dts: meson8b-odroidc1: ethernet support")
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
---
 arch/arm/boot/dts/meson.dtsi           | 2 +-
 arch/arm/boot/dts/meson8b-odroidc1.dts | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/meson.dtsi b/arch/arm/boot/dts/meson.dtsi
index 0d9faf1a51ea..a86b89086334 100644
--- a/arch/arm/boot/dts/meson.dtsi
+++ b/arch/arm/boot/dts/meson.dtsi
@@ -263,7 +263,7 @@
 			compatible = "amlogic,meson6-dwmac", "snps,dwmac";
 			reg = <0xc9410000 0x10000
 			       0xc1108108 0x4>;
-			interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>;
+			interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
 			interrupt-names = "macirq";
 			status = "disabled";
 		};
diff --git a/arch/arm/boot/dts/meson8b-odroidc1.dts b/arch/arm/boot/dts/meson8b-odroidc1.dts
index 58669abda259..a951a6632d0c 100644
--- a/arch/arm/boot/dts/meson8b-odroidc1.dts
+++ b/arch/arm/boot/dts/meson8b-odroidc1.dts
@@ -221,7 +221,6 @@
 		/* Realtek RTL8211F (0x001cc916) */
 		eth_phy: ethernet-phy@0 {
 			reg = <0>;
-			eee-broken-1000t;
 			interrupt-parent = <&gpio_intc>;
 			/* GPIOH_3 */
 			interrupts = <17 IRQ_TYPE_LEVEL_LOW>;
-- 
2.19.1


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
@ 2018-12-07 18:51   ` Emiliano Ingrassia
  2018-12-08 10:46     ` Carlo Caione
  2018-12-07 22:06   ` Martin Blumenstingl
  2018-12-29  0:17   ` Martin Blumenstingl
  2 siblings, 1 reply; 10+ messages in thread
From: Emiliano Ingrassia @ 2018-12-07 18:51 UTC (permalink / raw)
  To: Carlo Caione
  Cc: mark.rutland, devicetree, martin.blumenstingl, khilman, robh+dt,
	linux-amlogic, linux-arm-kernel

Hi Carlo,

tests[0] conducted on an Odroid-C1+ board equipped with a Meson8b SoC
have shown an high packet loss (90% and more) during a simple ping
test from a laptop to the board.
Testing the two patches separately clearly showed that this depends on the
removal of the "eee-broken-1000t" flag from the board PHY description
in the relative device tree.

About the first patch (MAC IRQ type), no tests have shown an evidence
that it is needed. I suggest you to conduct some test on real hardware
as I do to confirm or disprove my tests.

Thanks for your work,

Emiliano

[0] http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009397.html

On Fri, Dec 07, 2018 at 10:52:31AM +0000, Carlo Caione wrote:
> A long running stress test on a custom board shipping an AXG SoCs and a
> Realtek RTL8211F PHY revealed that after a few hours the connection
> speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
> the 'macirq' (eth0) IRQ would stop being triggered at all and as
> consequence the GMAC IRQs never ACKed.
>
> After a painful investigation the problem seemed to be due to a wrong
> defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
> EDGE_RISING.
>
> The change in the macirq IRQ type also solved another long standing
> issue affecting this SoC/PHY where EEE was causing the network
> connection to die after stressing it with iperf3 (even though much
> sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
> well.
>
> Fixes: 9c15795a4f96 ("ARM: dts: meson8b-odroidc1: ethernet support")
> Signed-off-by: Carlo Caione <ccaione@baylibre.com>
> ---
>  arch/arm/boot/dts/meson.dtsi           | 2 +-
>  arch/arm/boot/dts/meson8b-odroidc1.dts | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/arm/boot/dts/meson.dtsi b/arch/arm/boot/dts/meson.dtsi
> index 0d9faf1a51ea..a86b89086334 100644
> --- a/arch/arm/boot/dts/meson.dtsi
> +++ b/arch/arm/boot/dts/meson.dtsi
> @@ -263,7 +263,7 @@
>  			compatible = "amlogic,meson6-dwmac", "snps,dwmac";
>  			reg = <0xc9410000 0x10000
>  			       0xc1108108 0x4>;
> -			interrupts = <GIC_SPI 8 IRQ_TYPE_EDGE_RISING>;
> +			interrupts = <GIC_SPI 8 IRQ_TYPE_LEVEL_HIGH>;
>  			interrupt-names = "macirq";
>  			status = "disabled";
>  		};
> diff --git a/arch/arm/boot/dts/meson8b-odroidc1.dts b/arch/arm/boot/dts/meson8b-odroidc1.dts
> index 58669abda259..a951a6632d0c 100644
> --- a/arch/arm/boot/dts/meson8b-odroidc1.dts
> +++ b/arch/arm/boot/dts/meson8b-odroidc1.dts
> @@ -221,7 +221,6 @@
>  		/* Realtek RTL8211F (0x001cc916) */
>  		eth_phy: ethernet-phy@0 {
>  			reg = <0>;
> -			eee-broken-1000t;
>  			interrupt-parent = <&gpio_intc>;
>  			/* GPIOH_3 */
>  			interrupts = <17 IRQ_TYPE_LEVEL_LOW>;
> --
> 2.19.1
>

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
  2018-12-07 18:51   ` Emiliano Ingrassia
@ 2018-12-07 22:06   ` Martin Blumenstingl
  2018-12-29  0:17   ` Martin Blumenstingl
  2 siblings, 0 replies; 10+ messages in thread
From: Martin Blumenstingl @ 2018-12-07 22:06 UTC (permalink / raw)
  To: ccaione
  Cc: mark.rutland, devicetree, khilman, robh+dt, ingrassia,
	linux-amlogic, linux-arm-kernel

Hi Carlo,

On Fri, Dec 7, 2018 at 11:52 AM Carlo Caione <ccaione@baylibre.com> wrote:
>
> A long running stress test on a custom board shipping an AXG SoCs and a
> Realtek RTL8211F PHY revealed that after a few hours the connection
> speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
> the 'macirq' (eth0) IRQ would stop being triggered at all and as
> consequence the GMAC IRQs never ACKed.
>
> After a painful investigation the problem seemed to be due to a wrong
> defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
> EDGE_RISING.
>
> The change in the macirq IRQ type also solved another long standing
> issue affecting this SoC/PHY where EEE was causing the network
> connection to die after stressing it with iperf3 (even though much
> sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
> well.
I tested this on my Odroid-C1. however, I must admit that I never had
issues *without* eee-broken-1000t on any of my boards

without your changes:
[root@alarm ~]# iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  5] local 192.168.1.194 port 38870 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  80.6 MBytes   675 Mbits/sec    0   2.78 MBytes
[  5]   1.00-2.00   sec   108 MBytes   904 Mbits/sec    0   3.04 MBytes
[  5]   2.00-3.00   sec   106 MBytes   891 Mbits/sec    0   3.04 MBytes
[  5]   3.00-4.00   sec   105 MBytes   880 Mbits/sec    0   3.04 MBytes
[  5]   4.00-5.00   sec  65.0 MBytes   545 Mbits/sec    0   3.04 MBytes
[  5]   5.00-6.00   sec  92.5 MBytes   777 Mbits/sec    0   3.04 MBytes
[  5]   6.00-7.00   sec  72.5 MBytes   608 Mbits/sec    0   3.04 MBytes
[  5]   7.00-8.19   sec  76.2 MBytes   537 Mbits/sec    0   3.04 MBytes
[  5]   8.19-9.00   sec  48.8 MBytes   504 Mbits/sec    0   3.04 MBytes
[  5]   9.00-10.00  sec  87.5 MBytes   736 Mbits/sec    0   3.04 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   842 MBytes   706 Mbits/sec    0             sender
[  5]   0.00-10.05  sec   839 MBytes   701 Mbits/sec                  receiver

iperf Done.
[root@alarm ~]# iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  5] local 192.168.1.194 port 38874 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  21.0 MBytes   175 Mbits/sec
[  5]   1.00-2.00   sec  20.7 MBytes   174 Mbits/sec
[  5]   2.00-3.00   sec  22.4 MBytes   187 Mbits/sec
[  5]   3.00-4.69   sec  25.2 MBytes   125 Mbits/sec
[  5]   4.69-5.00   sec  7.56 MBytes   206 Mbits/sec
[  5]   5.00-6.00   sec  23.4 MBytes   196 Mbits/sec
[  5]   6.00-7.00   sec  14.6 MBytes   123 Mbits/sec
[  5]   7.00-8.00   sec  23.3 MBytes   196 Mbits/sec
[  5]   8.00-9.00   sec  27.8 MBytes   233 Mbits/sec
[  5]   9.00-10.03  sec  24.9 MBytes   203 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-9.36   sec   212 MBytes   190 Mbits/sec  1588             sender
[  5]   0.00-10.03  sec   211 MBytes   176 Mbits/sec                  receiver

iperf Done.
[root@alarm ~]#

with your changes:
[root@alarm ~]# iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  5] local 192.168.1.197 port 45020 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  74.4 MBytes   624 Mbits/sec    0   2.75 MBytes
[  5]   1.00-2.00   sec   105 MBytes   881 Mbits/sec    0   3.03 MBytes
[  5]   2.00-3.00   sec   106 MBytes   891 Mbits/sec    0   3.03 MBytes
[  5]   3.00-4.00   sec  78.8 MBytes   661 Mbits/sec    0   3.03 MBytes
[  5]   4.00-5.00   sec  73.8 MBytes   617 Mbits/sec    0   3.03 MBytes
[  5]   5.00-6.00   sec  87.5 MBytes   735 Mbits/sec    0   3.03 MBytes
[  5]   6.00-7.15   sec  81.2 MBytes   594 Mbits/sec    0   3.03 MBytes
[  5]   7.15-8.00   sec  61.2 MBytes   603 Mbits/sec    0   3.03 MBytes
[  5]   8.00-9.02   sec  76.2 MBytes   625 Mbits/sec    0   3.03 MBytes
[  5]   9.02-10.00  sec   102 MBytes   880 Mbits/sec    0   3.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   847 MBytes   710 Mbits/sec    0             sender
[  5]   0.00-10.05  sec   846 MBytes   706 Mbits/sec                  receiver

iperf Done.
[root@alarm ~]# iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  5] local 192.168.1.197 port 45024 connected to 192.168.1.100 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  22.6 MBytes   190 Mbits/sec
[  5]   1.00-2.00   sec  19.3 MBytes   162 Mbits/sec
[  5]   2.00-3.00   sec  22.1 MBytes   185 Mbits/sec
[  5]   3.00-4.00   sec  29.6 MBytes   248 Mbits/sec
[  5]   4.00-5.00   sec  30.1 MBytes   253 Mbits/sec
[  5]   5.00-6.00   sec  16.7 MBytes   140 Mbits/sec
[  5]   6.00-7.00   sec  21.5 MBytes   180 Mbits/sec
[  5]   7.00-8.00   sec  14.0 MBytes   118 Mbits/sec
[  5]   8.00-9.04   sec  20.4 MBytes   165 Mbits/sec
[  5]   9.04-10.00  sec  19.6 MBytes   171 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   217 MBytes   181 Mbits/sec  1795             sender
[  5]   0.00-10.00  sec   216 MBytes   181 Mbits/sec                  receiver

iperf Done.
[root@alarm ~]#

RX and TX speeds are within 10Mbit/s before and after the test, so I
would call the result "identical" (within a bit of measurement
tolerance)
I'll wait a few days and see what Emiliano finds out on his board,
then I'll send my Tested-by and Acked-by


Regards
Martin

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 ` [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq Carlo Caione
@ 2018-12-08  0:20   ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2018-12-08  0:20 UTC (permalink / raw)
  To: Carlo Caione, robh+dt, mark.rutland, devicetree,
	linux-arm-kernel, linux-amlogic, martin.blumenstingl, ingrassia
  Cc: Carlo Caione

Carlo Caione <ccaione@baylibre.com> writes:

> A long running stress test on a custom board shipping an AXG SoCs and a
> Realtek RTL8211F PHY revealed that after a few hours the connection
> speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
> the 'macirq' (eth0) IRQ would stop being triggered at all and as
> consequence the GMAC IRQs never ACKed.
>
> After a painful investigation the problem seemed to be due to a wrong
> defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
> EDGE_RISING.
>
> The change in the macirq IRQ type also solved another long standing
> issue affecting this SoC/PHY where EEE was causing the network
> connection to die after stressing it with iperf3 (even though much
> sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
> well.
>
> Fixes: feb3cbea0946 ("ARM64: dts: meson-gxbb-odroidc2: fix GbE tx link breakage")
> Fixes: 6d28d577510f ("ARM64: dts: meson-axg: fix ethernet stability issue")
> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
> Tested-by: Jerome Brunet <jbrunet@baylibre.com>
> Acked-by: Neil Armstrong <narmstrong@baylibre.com>
> Signed-off-by: Carlo Caione <ccaione@baylibre.com>

Queuing this one for v4.21 (dt64 branch)

I'm going to wait for the dust to settle on 'PATCH v2 2/2' for the
32-bit SoCs (and will rely on Martin's recommendation for the final
decision there.)

Kevin

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 18:51   ` Emiliano Ingrassia
@ 2018-12-08 10:46     ` Carlo Caione
  2018-12-12 10:49       ` Emiliano Ingrassia
  0 siblings, 1 reply; 10+ messages in thread
From: Carlo Caione @ 2018-12-08 10:46 UTC (permalink / raw)
  To: Emiliano Ingrassia
  Cc: mark.rutland, devicetree, martin.blumenstingl, khilman, robh+dt,
	linux-amlogic, linux-arm-kernel

On Fri, 2018-12-07 at 19:51 +0100, Emiliano Ingrassia wrote:
> Hi Carlo,

Hi Emiliano,

> tests[0] conducted on an Odroid-C1+ board equipped with a Meson8b SoC
> have shown an high packet loss (90% and more) during a simple ping
> test from a laptop to the board.
> Testing the two patches separately clearly showed that this depends
> on the
> removal of the "eee-broken-1000t" flag from the board PHY description
> in the relative device tree.
> 
> About the first patch (MAC IRQ type), no tests have shown an evidence
> that it is needed. I suggest you to conduct some test on real
> hardware
> as I do to confirm or disprove my tests.

Let's try to step back a bit and see what we can do to clarify this
situation.

First of all for arm64 we are pretty sure that both patches are needed
because we ran extensive and lengthy tests, especially regarding the
change in the IRQ trigger type. For arm things are not so clear, so for
now we decided to merge the arm64 patch and just wait on the arm one.

First of all we can focus on the patch regarding the change in the IRQ
type.

The problem with the IRQ type is triggered on the arm64 boards we
tested using the script in [0]. If we run this stress test on the arm64
boards without the trigger changing patch after a few hours (variable
from 2h to 6h sometimes more) we can see the connection dropping from
~1Gbps to <30Mbps. Jerome gave a nice explanation of the why, but after
changing the IRQ trigger type we couldn't see the issue anymore. This
was confirmed not just by BayLibre but also from other different
sources, so we are pretty confident in this solution.

So my first two points for you to answer are:

1) Can you reproduce this problem on your board without the patches
when running this script?

2) If yes, does only the first patch solve the problem?

This brings us to the second issue, the one regarding the 'eee-broken-
1000t' quirk. Since the two issues are strictly related we are
confident that the change in the IRQ type solves this problem as well
(and this was confirmed by Jerome as well on the arm64 boards).

For this case I cannot provide a real reproducer so we need only to
stress test the network with iperf3 trying to reproduce the issue. This
is also because we think that you approach of using UDP and your packet
generator probably is not the best way to test the patch given that (1)
using UDP is not reliable according to our tests, (2) there is an
asymmetry in TX/RX, (3) the packet loss could be due to the saturation
on the bandwidth, etc...

So AFAIK the best way to test this problem is using iperf3, the same
way it is done in the script in [0]. I was not involved with this issue
1 year and half ago but AFAIK this is the way it was reproduced.

This brings me to more answers for you to answer:

3) Running iperf3 tests in TX / RX / TX+RX without the 'eee-broken-
1000' quirk applied are you able to reproduce the EEE problem?

4) Any change when the 'eee-broken-1000' quirk is applied?

When testing (3) and (4) also please check the status of the EEE using
ethtool.

Hopefully this will bring a bit of clarity to the whole situation :)

Cheers,

[0] https://paste.fedoraproject.org/paste/GBFxjAQ0JULsYQlyYO2KOw

--
Carlo Caione


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-08 10:46     ` Carlo Caione
@ 2018-12-12 10:49       ` Emiliano Ingrassia
  0 siblings, 0 replies; 10+ messages in thread
From: Emiliano Ingrassia @ 2018-12-12 10:49 UTC (permalink / raw)
  To: Carlo Caione
  Cc: mark.rutland, devicetree, martin.blumenstingl, khilman, robh+dt,
	linux-amlogic, linux-arm-kernel

Hi Carlo,

On Sat, Dec 08, 2018 at 10:46:17AM +0000, Carlo Caione wrote:
> On Fri, 2018-12-07 at 19:51 +0100, Emiliano Ingrassia wrote:
> > Hi Carlo,
>
> Hi Emiliano,
>
> > tests[0] conducted on an Odroid-C1+ board equipped with a Meson8b SoC
> > have shown an high packet loss (90% and more) during a simple ping
> > test from a laptop to the board.
> > Testing the two patches separately clearly showed that this depends
> > on the
> > removal of the "eee-broken-1000t" flag from the board PHY description
> > in the relative device tree.
> >
> > About the first patch (MAC IRQ type), no tests have shown an evidence
> > that it is needed. I suggest you to conduct some test on real
> > hardware
> > as I do to confirm or disprove my tests.
>
> Let's try to step back a bit and see what we can do to clarify this
> situation.
>

Ok, I'll be glad to help you :)

> First of all for arm64 we are pretty sure that both patches are needed
> because we ran extensive and lengthy tests, especially regarding the
> change in the IRQ trigger type. For arm things are not so clear, so for
> now we decided to merge the arm64 patch and just wait on the arm one.
>
> First of all we can focus on the patch regarding the change in the IRQ
> type.
>
> The problem with the IRQ type is triggered on the arm64 boards we
> tested using the script in [0]. If we run this stress test on the arm64
> boards without the trigger changing patch after a few hours (variable
> from 2h to 6h sometimes more) we can see the connection dropping from
> ~1Gbps to <30Mbps. Jerome gave a nice explanation of the why, but after
> changing the IRQ trigger type we couldn't see the issue anymore. This
> was confirmed not just by BayLibre but also from other different
> sources, so we are pretty confident in this solution.
>
> So my first two points for you to answer are:
>
> 1) Can you reproduce this problem on your board without the patches
> when running this script?
>
> 2) If yes, does only the first patch solve the problem?
>

I ran two tests executing the script you provide on an Odroid-C1+ board
(REV 0.4 20150930) for 6 hours, using my laptop as server.
The kernel I used was compiled from "v4.21/dt64-testing" branch provided by
Kevin Hilman (thank you Kevin!). The results are available in [0].

The first test (no-patch-iperf-20181211000039.log) was run
with none of your patches applied.
The second test (irq-patch-iperf-20181211130953.log) was run
with only the patch about IRQ type applied.

As you can see, I did not experiment exactly the problem you had
but I see a more stable behavior with the IRQ type patch applied.

> This brings us to the second issue, the one regarding the 'eee-broken-
> 1000t' quirk. Since the two issues are strictly related we are
> confident that the change in the IRQ type solves this problem as well
> (and this was confirmed by Jerome as well on the arm64 boards).
>

The problem here is that, without the "eee-broken-1000t" flag, simple ping
tests from an host to the board showed an high packet loss (about ~90%),
even with the IRQ type patch applied.

> For this case I cannot provide a real reproducer so we need only to
> stress test the network with iperf3 trying to reproduce the issue. This
> is also because we think that you approach of using UDP and your packet
> generator probably is not the best way to test the patch given that (1)
> using UDP is not reliable according to our tests, (2) there is an
> asymmetry in TX/RX, (3) the packet loss could be due to the saturation
> on the bandwidth, etc...
>

The tests I ran with the kernel packet generator showed interesting
informations to me. The board dropped all incoming traffic when
transmitting at full rate (~940 Mbps).
Although there is an asymmetry in the transmission FIFOs size
(Rx FIFO is twice as Tx FIFO), I would expect a result more similar
to the one I had in step 2 of TEST 0 [1], after a while.
However, this behavior could be due to the driver and not so
interesting in this discussion ;)

> So AFAIK the best way to test this problem is using iperf3, the same
> way it is done in the script in [0]. I was not involved with this issue
> 1 year and half ago but AFAIK this is the way it was reproduced.
>
> This brings me to more answers for you to answer:
>
> 3) Running iperf3 tests in TX / RX / TX+RX without the 'eee-broken-
> 1000' quirk applied are you able to reproduce the EEE problem?
>
> 4) Any change when the 'eee-broken-1000' quirk is applied?
>
> When testing (3) and (4) also please check the status of the EEE using
> ethtool.
>
> Hopefully this will bring a bit of clarity to the whole situation :)
>
> Cheers,
>
> [0] https://paste.fedoraproject.org/paste/GBFxjAQ0JULsYQlyYO2KOw
>
> --
> Carlo Caione
>

Best reagrds,

Emiliano

[0] https://drive.google.com/drive/folders/1BMe8vkm16KdgijlhFfZH_xph5eDNdkqO?usp=sharing
[1] http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009397.html

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
  2018-12-07 18:51   ` Emiliano Ingrassia
  2018-12-07 22:06   ` Martin Blumenstingl
@ 2018-12-29  0:17   ` Martin Blumenstingl
  2019-01-11  0:21     ` Kevin Hilman
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Blumenstingl @ 2018-12-29  0:17 UTC (permalink / raw)
  To: Carlo Caione
  Cc: mark.rutland, devicetree, khilman, robh+dt, ingrassia,
	linux-amlogic, linux-arm-kernel

Hi Carlo,

On Fri, Dec 7, 2018 at 11:52 AM Carlo Caione <ccaione@baylibre.com> wrote:
>
> A long running stress test on a custom board shipping an AXG SoCs and a
> Realtek RTL8211F PHY revealed that after a few hours the connection
> speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
> the 'macirq' (eth0) IRQ would stop being triggered at all and as
> consequence the GMAC IRQs never ACKed.
>
> After a painful investigation the problem seemed to be due to a wrong
> defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
> EDGE_RISING.
>
> The change in the macirq IRQ type also solved another long standing
> issue affecting this SoC/PHY where EEE was causing the network
> connection to die after stressing it with iperf3 (even though much
> sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
> well.
(disclaimer: I was not able to reproduce this bug without your
patches, but I didn't run iperf3 for more than a couple of minutes)
I did test your patch with and without my "Meson8b RGMII Ethernet pin
cleanup" from [0] which shows that there's another performance related
problem:
1) before and after your patch receive speeds were fine (above
700Mbit/s and no transmit errors / retries in iperf3) but the transmit
speed was bad (<200Mbit/s and >1500 retries in perf3)
2) transmit errors (when Odroid-C1 is sending) are not occurring
anymore after my patch from [0]

thus I believe your patch is fine, especially since we already have
IRQ_TYPE_LEVEL_HIGH for the dwc2 controllers

> Fixes: 9c15795a4f96 ("ARM: dts: meson8b-odroidc1: ethernet support")
> Signed-off-by: Carlo Caione <ccaione@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

I wonder if Kevin can send this as a fix for v4.20


Regards
Martin


[0] http://lists.infradead.org/pipermail/linux-amlogic/2018-December/009665.html

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] arm: dts: meson: Fix IRQ trigger type for macirq
  2018-12-29  0:17   ` Martin Blumenstingl
@ 2019-01-11  0:21     ` Kevin Hilman
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Hilman @ 2019-01-11  0:21 UTC (permalink / raw)
  To: Martin Blumenstingl, Carlo Caione
  Cc: mark.rutland, devicetree, robh+dt, linux-arm-kernel,
	linux-amlogic, ingrassia

Martin Blumenstingl <martin.blumenstingl@googlemail.com> writes:

> Hi Carlo,
>
> On Fri, Dec 7, 2018 at 11:52 AM Carlo Caione <ccaione@baylibre.com> wrote:
>>
>> A long running stress test on a custom board shipping an AXG SoCs and a
>> Realtek RTL8211F PHY revealed that after a few hours the connection
>> speed would drop drastically, from ~1000Mbps to ~3Mbps. At the same time
>> the 'macirq' (eth0) IRQ would stop being triggered at all and as
>> consequence the GMAC IRQs never ACKed.
>>
>> After a painful investigation the problem seemed to be due to a wrong
>> defined IRQ type for the GMAC IRQ that should be LEVEL_HIGH instead of
>> EDGE_RISING.
>>
>> The change in the macirq IRQ type also solved another long standing
>> issue affecting this SoC/PHY where EEE was causing the network
>> connection to die after stressing it with iperf3 (even though much
>> sooner). It's now possible to remove the 'eee-broken-1000t' quirk as
>> well.
> (disclaimer: I was not able to reproduce this bug without your
> patches, but I didn't run iperf3 for more than a couple of minutes)
> I did test your patch with and without my "Meson8b RGMII Ethernet pin
> cleanup" from [0] which shows that there's another performance related
> problem:
> 1) before and after your patch receive speeds were fine (above
> 700Mbit/s and no transmit errors / retries in iperf3) but the transmit
> speed was bad (<200Mbit/s and >1500 retries in perf3)
> 2) transmit errors (when Odroid-C1 is sending) are not occurring
> anymore after my patch from [0]
>
> thus I believe your patch is fine, especially since we already have
> IRQ_TYPE_LEVEL_HIGH for the dwc2 controllers
>
>> Fixes: 9c15795a4f96 ("ARM: dts: meson8b-odroidc1: ethernet support")
>> Signed-off-by: Carlo Caione <ccaione@baylibre.com>
> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
>
> I wonder if Kevin can send this as a fix for v4.20

Queued as a fix for v5.0-rc

Kevin

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-11  0:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07 10:52 [PATCH v2 0/2] meson: Fix IRQ trigger type Carlo Caione
2018-12-07 10:52 ` [PATCH v2 1/2] arm64: dts: meson: Fix IRQ trigger type for macirq Carlo Caione
2018-12-08  0:20   ` Kevin Hilman
2018-12-07 10:52 ` [PATCH v2 2/2] arm: " Carlo Caione
2018-12-07 18:51   ` Emiliano Ingrassia
2018-12-08 10:46     ` Carlo Caione
2018-12-12 10:49       ` Emiliano Ingrassia
2018-12-07 22:06   ` Martin Blumenstingl
2018-12-29  0:17   ` Martin Blumenstingl
2019-01-11  0:21     ` Kevin Hilman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).