All of lore.kernel.org
 help / color / mirror / Atom feed
* [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-08-11 16:15 ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-11 16:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Steven Rostedt
  Cc: linux-omap, Alison Chaiken, linux-rt-users, netdev, Grygorii Strashko

Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
where this IRQs are forced threaded:
 rx-irq
  |- schedule threaded rx-irq handler
...
  |- threaded rx-irq handler -> cpsw_rx_interrupt()
     |- napi_schedule()
	|- __raise_softirq_irqoff()
	   |- wakeup_proper_softirq()
...
  napi

after:
 rx-irq
  |- cpsw_rx_interrupt()
     |- napi_schedule()
  |- irq_exit()
     |- invoke_softirq()
	   |- wakeup_softirqd()
...
  napi

And, as result, get benefits from the following improvements (tested
on am57xx-evm):

1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
   seen any more. Now these warnings can be seen once iperf is started.
   # iperf -c $IPERFHOST -w 128K  -d -t 60

2) latency reduction when cyclictest is run in parallel with network load
 where net_perf.sh is:
   iperf -c $IPERFHOST -w 8K    -d -t 60
   iperf -c $IPERFHOST -w 16K   -d -t 60
   iperf -c $IPERFHOST -w 32K   -d -t 60
   iperf -c $IPERFHOST -w 64K   -d -t 60
   iperf -c $IPERFHOST -w 128K  -d -t 60

before:
T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
after:
T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33

3) network performance increase

win, K	Mbits/s
	before	after	%
8K	354	350.3	0.0
16K	412	551	33.7
32K	423	659.5	55.9
64K	436	728.3	67.0
128K	537	845	57.4

This change does not affect on non-RT.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
Hi All,

I'll be appreciated on any feedback or tested-by.
In case of positive feedback I'll resend it for upstream.

 drivers/net/ethernet/ti/cpsw.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 7b59283..fa4bb81 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
 		priv->tx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_tx);
+	napi_schedule_irqoff(&priv->napi_tx);
 	return IRQ_HANDLED;
 }
 
@@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
 		priv->rx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_rx);
+	napi_schedule_irqoff(&priv->napi_rx);
 	return IRQ_HANDLED;
 }
 
@@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[0] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
@@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[1] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-08-11 16:15 ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-11 16:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Steven Rostedt
  Cc: linux-omap, Alison Chaiken, linux-rt-users, netdev, Grygorii Strashko

Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
where this IRQs are forced threaded:
 rx-irq
  |- schedule threaded rx-irq handler
...
  |- threaded rx-irq handler -> cpsw_rx_interrupt()
     |- napi_schedule()
	|- __raise_softirq_irqoff()
	   |- wakeup_proper_softirq()
...
  napi

after:
 rx-irq
  |- cpsw_rx_interrupt()
     |- napi_schedule()
  |- irq_exit()
     |- invoke_softirq()
	   |- wakeup_softirqd()
...
  napi

And, as result, get benefits from the following improvements (tested
on am57xx-evm):

1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
   seen any more. Now these warnings can be seen once iperf is started.
   # iperf -c $IPERFHOST -w 128K  -d -t 60

2) latency reduction when cyclictest is run in parallel with network load
 where net_perf.sh is:
   iperf -c $IPERFHOST -w 8K    -d -t 60
   iperf -c $IPERFHOST -w 16K   -d -t 60
   iperf -c $IPERFHOST -w 32K   -d -t 60
   iperf -c $IPERFHOST -w 64K   -d -t 60
   iperf -c $IPERFHOST -w 128K  -d -t 60

before:
T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
after:
T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33

3) network performance increase

win, K	Mbits/s
	before	after	%
8K	354	350.3	0.0
16K	412	551	33.7
32K	423	659.5	55.9
64K	436	728.3	67.0
128K	537	845	57.4

This change does not affect on non-RT.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
Hi All,

I'll be appreciated on any feedback or tested-by.
In case of positive feedback I'll resend it for upstream.

 drivers/net/ethernet/ti/cpsw.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 7b59283..fa4bb81 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
 		priv->tx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_tx);
+	napi_schedule_irqoff(&priv->napi_tx);
 	return IRQ_HANDLED;
 }
 
@@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
 		priv->rx_irq_disabled = true;
 	}
 
-	napi_schedule(&priv->napi_rx);
+	napi_schedule_irqoff(&priv->napi_rx);
 	return IRQ_HANDLED;
 }
 
@@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[0] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
@@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev)
 
 	priv->irqs_table[1] = irq;
 	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
-			       0, dev_name(&pdev->dev), priv);
+			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
 	if (ret < 0) {
 		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
 		goto clean_ale_ret;
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-11 16:15 ` Grygorii Strashko
@ 2016-08-11 16:36   ` Steven Rostedt
  -1 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2016-08-11 16:36 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Sebastian Andrzej Siewior, linux-omap, Alison Chaiken,
	linux-rt-users, netdev

On Thu, 11 Aug 2016 19:15:40 +0300
Grygorii Strashko <grygorii.strashko@ti.com> wrote:

> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>      |- napi_schedule()
> 	|- __raise_softirq_irqoff()
> 	   |- wakeup_proper_softirq()
> ...
>   napi
> 
> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>      |- napi_schedule()
>   |- irq_exit()
>      |- invoke_softirq()
> 	   |- wakeup_softirqd()
> ...
>   napi
> 
> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>    seen any more. Now these warnings can be seen once iperf is started.
>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>    iperf -c $IPERFHOST -w 8K    -d -t 60
>    iperf -c $IPERFHOST -w 16K   -d -t 60
>    iperf -c $IPERFHOST -w 32K   -d -t 60
>    iperf -c $IPERFHOST -w 64K   -d -t 60
>    iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> 3) network performance increase
> 
> win, K	Mbits/s
> 	before	after	%
> 8K	354	350.3	0.0
> 16K	412	551	33.7
> 32K	423	659.5	55.9
> 64K	436	728.3	67.0
> 128K	537	845	57.4
> 
> This change does not affect on non-RT.

This looks fine to me, but it should go into the development branch,
which is currently 4.6-rt. And I can then pull it from there.

-- Steve


> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> Hi All,
> 
> I'll be appreciated on any feedback or tested-by.
> In case of positive feedback I'll resend it for upstream.
> 
>  drivers/net/ethernet/ti/cpsw.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 7b59283..fa4bb81 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
>  		priv->tx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_tx);
> +	napi_schedule_irqoff(&priv->napi_tx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
>  		priv->rx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_rx);
> +	napi_schedule_irqoff(&priv->napi_rx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[0] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;
> @@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[1] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-08-11 16:36   ` Steven Rostedt
  0 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2016-08-11 16:36 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Sebastian Andrzej Siewior, linux-omap, Alison Chaiken,
	linux-rt-users, netdev

On Thu, 11 Aug 2016 19:15:40 +0300
Grygorii Strashko <grygorii.strashko@ti.com> wrote:

> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>      |- napi_schedule()
> 	|- __raise_softirq_irqoff()
> 	   |- wakeup_proper_softirq()
> ...
>   napi
> 
> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>      |- napi_schedule()
>   |- irq_exit()
>      |- invoke_softirq()
> 	   |- wakeup_softirqd()
> ...
>   napi
> 
> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>    seen any more. Now these warnings can be seen once iperf is started.
>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>    iperf -c $IPERFHOST -w 8K    -d -t 60
>    iperf -c $IPERFHOST -w 16K   -d -t 60
>    iperf -c $IPERFHOST -w 32K   -d -t 60
>    iperf -c $IPERFHOST -w 64K   -d -t 60
>    iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> 3) network performance increase
> 
> win, K	Mbits/s
> 	before	after	%
> 8K	354	350.3	0.0
> 16K	412	551	33.7
> 32K	423	659.5	55.9
> 64K	436	728.3	67.0
> 128K	537	845	57.4
> 
> This change does not affect on non-RT.

This looks fine to me, but it should go into the development branch,
which is currently 4.6-rt. And I can then pull it from there.

-- Steve


> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
> Hi All,
> 
> I'll be appreciated on any feedback or tested-by.
> In case of positive feedback I'll resend it for upstream.
> 
>  drivers/net/ethernet/ti/cpsw.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 7b59283..fa4bb81 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -769,7 +769,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id)
>  		priv->tx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_tx);
> +	napi_schedule_irqoff(&priv->napi_tx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -785,7 +785,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
>  		priv->rx_irq_disabled = true;
>  	}
>  
> -	napi_schedule(&priv->napi_rx);
> +	napi_schedule_irqoff(&priv->napi_rx);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -2827,7 +2827,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[0] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_rx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;
> @@ -2842,7 +2842,7 @@ static int cpsw_probe(struct platform_device *pdev)
>  
>  	priv->irqs_table[1] = irq;
>  	ret = devm_request_irq(&pdev->dev, irq, cpsw_tx_interrupt,
> -			       0, dev_name(&pdev->dev), priv);
> +			       IRQF_NO_THREAD, dev_name(&pdev->dev), priv);
>  	if (ret < 0) {
>  		dev_err(priv->dev, "error attaching irq (%d)\n", ret);
>  		goto clean_ale_ret;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-11 16:15 ` Grygorii Strashko
  (?)
  (?)
@ 2016-08-12  8:32 ` Sebastian Andrzej Siewior
  2016-08-12 15:58     ` Grygorii Strashko
  2016-08-19 14:29     ` Grygorii Strashko
  -1 siblings, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-08-12  8:32 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
> where this IRQs are forced threaded:
>  rx-irq
>   |- schedule threaded rx-irq handler
> ...
>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>      |- napi_schedule()
> 	|- __raise_softirq_irqoff()
> 	   |- wakeup_proper_softirq()
> ...
>   napi

This should not be the default path. The default should be napi running
in the context of the threaded rx-irq handler once the handler is done.
The wakeup_proper_softirq() part is only done if napi thinks that the
callback functions runs for too long. So in *that* case you continue
NAPI in the softirq-thread which runs at SCHED_OTHER.

> after:
>  rx-irq
>   |- cpsw_rx_interrupt()
>      |- napi_schedule()
>   |- irq_exit()
>      |- invoke_softirq()
> 	   |- wakeup_softirqd()
> ...
>   napi

Since you schedule the softirq from an IRQ-off region / without a
process context you force the softirq to run in the thread at
SCHED_OTHER priority.

> And, as result, get benefits from the following improvements (tested
> on am57xx-evm):
> 
> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>    seen any more. Now these warnings can be seen once iperf is started.
>    # iperf -c $IPERFHOST -w 128K  -d -t 60

Do you also see "sched: RT throttling activated"? Because I don't see
otherwise why this should pop up.

> 2) latency reduction when cyclictest is run in parallel with network load
>  where net_perf.sh is:
>    iperf -c $IPERFHOST -w 8K    -d -t 60
>    iperf -c $IPERFHOST -w 16K   -d -t 60
>    iperf -c $IPERFHOST -w 32K   -d -t 60
>    iperf -c $IPERFHOST -w 64K   -d -t 60
>    iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> before:
> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
> after:
> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33

-d 0 to have I: set to the same value.
What does -i 250 say?
And without network load we are where we were at "after" values?

What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
in net/core/dev.c and chrt the priority of you network interrupt
handlers to SCHED_OTHER priority?

> 3) network performance increase
> 
> win, K	Mbits/s
> 	before	after	%
> 8K	354	350.3	0.0
> 16K	412	551	33.7
> 32K	423	659.5	55.9
> 64K	436	728.3	67.0
> 128K	537	845	57.4

How close are the after numbers to !RT?

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-12  8:32 ` Sebastian Andrzej Siewior
@ 2016-08-12 15:58     ` Grygorii Strashko
  2016-08-19 14:29     ` Grygorii Strashko
  1 sibling, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-12 15:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

Hi Sebastian,

On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.

Thankds for comment. You're right:
irq_thread()->irq_forced_thread_fn()->local_bh_enable()

but wouldn't here two wake_up_process() calls any way,
plus preempt_check_resched_rt() in napi_schedule().

> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.

Thanks a lot for comments.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.

I've reverted my patch an did requested experiments (some additional info below).

I do not see "sched: RT throttling activated" :(

root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
[1] 1301
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 47500 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59955
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.31 GBytes   188 Mbits/sec
[  5]  0.0-60.0 sec   868 MBytes   121 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47502 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59957
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.92 GBytes   275 Mbits/sec
[  4]  0.0-60.0 sec  1.14 GBytes   163 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47504 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59965
[ 2285.928516] NOHZ: local_softirq_pending 80
[ 2285.928549] NOHZ: local_softirq_pending 80
[ 2285.928594] NOHZ: local_softirq_pending 80
[ 2285.928689] NOHZ: local_softirq_pending 80
[ 2286.348572] NOHZ: local_softirq_pending 80
[ 2290.358623] NOHZ: local_softirq_pending 80
[ 2291.588624] NOHZ: local_softirq_pending 80
[ 2291.698579] NOHZ: local_softirq_pending 80
[ 2292.608594] NOHZ: local_softirq_pending 80
[ 2293.178627] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.43 GBytes   205 Mbits/sec
[  4]  0.0-60.0 sec  1.58 GBytes   226 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47506 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59980
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.84 GBytes   264 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47508 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59982
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec   974 MBytes   136 Mbits/sec
[  4]  0.0-60.0 sec  2.84 GBytes   406 Mbits/sec

[1]+  Done                    ./net_perf.sh



> 
>> 2) latency reduction when cyclictest is run in parallel with network load
>>  where net_perf.sh is:
>>    iperf -c $IPERFHOST -w 8K    -d -t 60
>>    iperf -c $IPERFHOST -w 16K   -d -t 60
>>    iperf -c $IPERFHOST -w 32K   -d -t 60
>>    iperf -c $IPERFHOST -w 64K   -d -t 60
>>    iperf -c $IPERFHOST -w 128K  -d -t 60
>>
>> before:
>> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
>> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
>> after:
>> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
>> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> -d 0 to have I: set to the same value.
> What does -i 250 say?
> And without network load we are where we were at "after" values?

=============================================== before, no net load:
cyclictest -m -Sp98 -q  -D4m
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1270) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      38
T: 1 ( 1271) P:98 I:1500 C: 159986 Min:      8 Act:   10 Avg:    9 Max:      23

cyclictest -m -Sp98 -q  -D4m -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1280) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      27
T: 1 ( 1281) P:98 I:1000 C: 239982 Min:      8 Act:   13 Avg:    9 Max:      26

cyclictest -m -Sp98 -q  -D4m -i250
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1284) P:98 I:250 C: 960000 Min:      8 Act:   10 Avg:    8 Max:      24
T: 1 ( 1285) P:98 I:750 C: 319977 Min:      8 Act:   11 Avg:    9 Max:      21

cyclictest -m -Sp98 -q  -D4m -i250 -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26

=============================================== before, with net load:
cyclictest -m -Sp98 -q  -D4m
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58

cyclictest -m -Sp98 -q -D4m -d0
T: 0 ( 1339) P:98 I:1000 C: 240000 Min:      8 Act:   28 Avg:   18 Max:      60
T: 1 ( 1340) P:98 I:1000 C: 239969 Min:      8 Act:   25 Avg:   16 Max:      43

cyclictest -m -Sp98 -q -D4m -i250
T: 0 ( 1369) P:98 I:250 C: 960000 Min:      8 Act:   13 Avg:   17 Max:      64
T: 1 ( 1370) P:98 I:750 C: 319941 Min:      8 Act:   18 Avg:   16 Max:      56

cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48

=============================================== after, no net load:

cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1287) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      30
T: 1 ( 1288) P:98 I:1500 C: 159987 Min:      8 Act:   13 Avg:   10 Max:      32

cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1295) P:98 I:1000 C: 240000 Min:      8 Act:    8 Avg:    9 Max:      39
T: 1 ( 1296) P:98 I:1000 C: 239981 Min:      7 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1298) P:98 I:250 C: 960000 Min:      7 Act:    8 Avg:    9 Max:      42
T: 1 ( 1299) P:98 I:750 C: 319970 Min:      8 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28

=============================================== after, with net load:
cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1315) P:98 I:1000 C: 240000 Min:      8 Act:   11 Avg:   14 Max:      40
T: 1 ( 1316) P:98 I:1500 C: 159962 Min:      8 Act:   19 Avg:   15 Max:      38


cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1365) P:98 I:1000 C: 240000 Min:      8 Act:   14 Avg:   15 Max:      51
T: 1 ( 1366) P:98 I:1000 C: 239943 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1309) P:98 I:250 C: 960000 Min:      8 Act:   15 Avg:   13 Max:      42
T: 1 ( 1310) P:98 I:750 C: 319986 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36

> 
> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> in net/core/dev.c and chrt the priority of you network interrupt
> handlers to SCHED_OTHER priority?

===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER

with net load:
cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37

and net parformance is better:
root@am57xx-evm:~# ps -A | grep 4848
   82 ?        00:00:00 irq/354-4848400
   83 ?        00:00:00 irq/355-4848400
root@am57xx-evm:~# chrt -o -p 0 82
root@am57xx-evm:~# chrt -o -p 0 83
./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
[1] 1298
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 58672 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60672
[   78.718828] NOHZ: local_softirq_pending 80
[   78.908849] NOHZ: local_softirq_pending 80
[   78.908927] NOHZ: local_softirq_pending 80
[   78.918827] NOHZ: local_softirq_pending 80
[   79.058826] NOHZ: local_softirq_pending 80
[   81.488918] NOHZ: local_softirq_pending 80
[   81.489162] NOHZ: local_softirq_pending 80
[   81.489398] NOHZ: local_softirq_pending 80
[   81.489646] NOHZ: local_softirq_pending 80
[   81.489895] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.30 GBytes   187 Mbits/sec
[  5]  0.0-60.0 sec   637 MBytes  89.0 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58674 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60680
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.78 GBytes   255 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58676 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60693
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.62 GBytes   232 Mbits/sec
[  4]  0.0-60.0 sec  2.07 GBytes   296 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58678 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60702
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.77 GBytes   254 Mbits/sec
[  4]  0.0-60.0 sec  1.77 GBytes   253 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58680 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60711
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  2.18 GBytes   312 Mbits/sec
[  4]  0.0-60.0 sec  2.53 GBytes   362 Mbits/sec


> 
>> 3) network performance increase
>>
>> win, K	Mbits/s
>> 	before	after	%
>> 8K	354	350.3	0.0
>> 16K	412	551	33.7 
>> 32K	423	659.5	55.9 
>> 64K	436	728.3	67.0 
>> 128K	537	845	57.4 
> 
> How close are the after numbers to !RT?

http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#AM57XX_TCP_Performance

Min 20% drop even with this patch :(

-- 
regards,
-grygorii


========================= ps -l -A (after boot)
root@am57xx-evm:~# ps -l -A
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0     1     0  1  80   0 -  1321 SyS_ep ?        00:00:05 systemd
1 S     0     2     0  0  80   0 -     0 kthrea ?        00:00:00 kthreadd
1 S     0     3     2  0  80   0 -     0 smpboo ?        00:00:00 ksoftirqd/0
5 S     0     4     2  0  58   - -     0 smpboo ?        00:00:00 ktimersoftd/0
1 S     0     5     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:0
1 S     0     6     2  0  60 -20 -     0 worker ?        00:00:00 kworker/0:0H
1 S     0     7     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:0
1 S     0     8     2  0  58   - -     0 rcu_gp ?        00:00:00 rcu_preempt
1 S     0     9     2  0  58   - -     0 rcu_gp ?        00:00:00 rcu_sched
1 S     0    10     2  0  58   - -     0 rcu_bo ?        00:00:00 rcub/0
1 S     0    11     2  0  58   - -     0 smpboo ?        00:00:00 rcuc/0
1 S     0    12     2  0  80   0 -     0 run_cl ?        00:00:00 kclksetdelayd
1 S     0    13     2  0 -40   - -     0 posix_ ?        00:00:00 posixcputmr/0
1 S     0    14     2  0  80   0 -     0 run_cm ?        00:00:00 kcmosdelayd
1 S     0    15     2  0 -40   - -     0 smpboo ?        00:00:00 migration/0
1 S     0    16     2  0 -40   - -     0 smpboo ?        00:00:00 migration/1
1 S     0    17     2  0  58   - -     0 smpboo ?        00:00:00 rcuc/1
5 S     0    18     2  0  58   - -     0 smpboo ?        00:00:00 ktimersoftd/1
1 S     0    19     2  0  80   0 -     0 smpboo ?        00:00:00 ksoftirqd/1
1 S     0    20     2  0 -40   - -     0 posix_ ?        00:00:00 posixcputmr/1
1 S     0    21     2  0  80   0 -     0 worker ?        00:00:00 kworker/1:0
1 S     0    22     2  0  60 -20 -     0 worker ?        00:00:00 kworker/1:0H
5 S     0    23     2  0  80   0 -     0 devtmp ?        00:00:00 kdevtmpfs
1 S     0    24     2  0  80   0 -     0 swork_ ?        00:00:00 kswork
1 S     0    25     2  0  60 -20 -     0 rescue ?        00:00:00 perf
1 S     0    26     2  0   9   - -     0 irq_th ?        00:00:00 irq/37-4ae10000
1 S     0    27     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:1
1 S     0    28     2  0   9   - -     0 irq_th ?        00:00:00 irq/70-48055000
1 S     0    29     2  0   9   - -     0 irq_th ?        00:00:00 irq/103-4805700
1 S     0    30     2  0   9   - -     0 irq_th ?        00:00:00 irq/136-4805900
1 S     0    31     2  0   9   - -     0 irq_th ?        00:00:00 irq/169-4805b00
1 S     0    32     2  0   9   - -     0 irq_th ?        00:00:00 irq/202-4805d00
1 S     0    33     2  0   9   - -     0 irq_th ?        00:00:00 irq/235-4805100
1 S     0    34     2  0   9   - -     0 irq_th ?        00:00:00 irq/268-4805300
1 S     0    35     2  0  60 -20 -     0 rescue ?        00:00:00 writeback
1 S     0    36     2  0  60 -20 -     0 rescue ?        00:00:00 crypto
1 S     0    37     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    38     2  0  60 -20 -     0 rescue ?        00:00:00 kblockd
1 S     0    39     2  0   9   - -     0 irq_th ?        00:00:00 irq/29-omap-dma
1 S     0    40     2  0   9   - -     0 irq_th ?        00:00:00 irq/32-43300000
1 S     0    41     2  0   9   - -     0 irq_th ?        00:00:00 irq/34-43300000
1 S     0    42     2  0   9   - -     0 irq_th ?        00:00:00 irq/343-40d0100
1 S     0    43     2  0   9   - -     0 irq_th ?        00:00:00 irq/344-40d0200
1 S     0    44     2  0   9   - -     0 irq_th ?        00:00:00 irq/345-5888200
1 S     0    45     2  0   9   - -     0 irq_th ?        00:00:00 irq/346-5508200
1 S     0    46     2  0   9   - -     0 irq_th ?        00:00:00 irq/363-4150100
1 S     0    47     2  0   9   - -     0 irq_th ?        00:00:00 irq/364-4150200
1 S     0    48     2  0   9   - -     0 irq_th ?        00:00:00 irq/337-4807000
1 S     0    49     2  0   9   - -     0 irq_th ?        00:00:00 irq/38-palmas
1 S     0    50     2  0   9   - -     0 irq_th ?        00:00:00 irq/338-4806000
1 S     0    51     2  0   9   - -     0 irq_th ?        00:00:00 irq/339-4807c00
1 S     0    52     2  0  60 -20 -     0 rescue ?        00:00:00 edac-poller
1 S     0    53     2  0  60 -20 -     0 rescue ?        00:00:00 rpciod
1 S     0    54     2  0  80   0 -     0 kswapd ?        00:00:00 kswapd0
1 S     0    55     2  0  60 -20 -     0 rescue ?        00:00:00 vmstat
1 S     0    56     2  0  80   0 -     0 fsnoti ?        00:00:00 fsnotify_mark
1 S     0    57     2  0  60 -20 -     0 rescue ?        00:00:00 nfsiod
1 S     0    66     2  0   9   - -     0 irq_th ?        00:00:00 irq/25-dra7xx-p
1 S     0    67     2  0   9   - -     0 irq_th ?        00:00:00 irq/437-PCIe PM
1 S     0    68     2  0   9   - -     0 irq_th ?        00:00:00 irq/437-aerdrv
1 S     0    69     2  0   9   - -     0 irq_th ?        00:00:00 irq/370-OMAP HD
1 S     0    70     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    71     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    72     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    73     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    74     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    75     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    76     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    77     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    78     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:1
1 S     0    81     2  0  80   0 -     0 worker ?        00:00:01 kworker/1:1
1 S     0    82     2  0   9   - -     0 irq_th ?        00:00:00 irq/354-4848400
1 S     0    83     2  0   9   - -     0 irq_th ?        00:00:00 irq/355-4848400
1 S     0    84     2  0  60 -20 -     0 rescue ?        00:00:00 kpsmoused
1 S     0    85     2  0   9   - -     0 irq_th ?        00:00:00 irq/27-talert
1 S     0    86     2  0   9   - -     0 irq_th ?        00:00:00 irq/340-mmc0
1 S     0    87     2  0   9   - -     0 irq_th ?        00:00:00 irq/230-4809c00
1 S     0    88     2  0   9   - -     0 irq_th ?        00:00:00 irq/341-mmc1
1 S     0    89     2  0   9   - -     0 irq_th ?        00:00:00 irq/342-mmc2
1 S     0    90     2  0   9   - -     0 irq_th ?        00:00:00 irq/352-4846800
1 S     0    91     2  0   9   - -     0 irq_th ?        00:00:00 irq/351-4846800
1 S     0    92     2  0  60 -20 -     0 rescue ?        00:00:00 ipv6_addrconf
1 S     0    93     2  0   9   - -     0 irq_th ?        00:00:00 irq/335-omap_dm
1 S     0    94     2  0   9   - -     0 irq_th ?        00:00:00 irq/369-OMAP DI
1 S     0    95     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    96     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/0
1 S     0    97     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    98     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1
1 S     0    99     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0   100     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1boot0
1 S     0   101     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0   102     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1boot1
1 S     0   103     2  0  60 -20 -     0 rescue ?        00:00:00 deferwq
1 S     0   104     2  0   9   - -     0 irq_th ?        00:00:00 irq/94-Up
1 S     0   105     2  0   9   - -     0 irq_th ?        00:00:00 irq/96-Down
1 S     0   106     2  0   9   - -     0 irq_th ?        00:00:00 irq/99-Left
1 S     0   107     2  0   9   - -     0 irq_th ?        00:00:00 irq/95-Right
1 S     0   110     2  0  60 -20 -     0 worker ?        00:00:00 kworker/1:1H
1 S     0   111     2  0  80   0 -     0 kjourn ?        00:00:00 jbd2/mmcblk0p2-
1 S     0   112     2  0  60 -20 -     0 rescue ?        00:00:00 ext4-rsv-conver
1 S     0   129     2  0  60 -20 -     0 worker ?        00:00:00 kworker/0:1H
1 S     0   165     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:2
1 S     0   168     2  0  80   0 -     0 worker ?        00:00:00 kworker/1:2
4 S     0   197     1  0  80   0 -  2566 SyS_ep ?        00:00:01 systemd-journal
1 S     0   249     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:2
1 S     0   255     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:3
4 S     0   289     1  0  80   0 -  2731 SyS_ep ?        00:00:02 systemd-udevd
1 S     0   330     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:4
1 S     0   339     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:5
1 S     0   387     2  0   9   - -     0 irq_th ?        00:00:00 irq/362-4809000
1 S     0   407     2  0   9   - -     0 irq_th ?        00:00:00 irq/470-2-006f
1 S     0   409     2  0   9   - -     0 irq_th ?        00:00:00 irq/469-mcp7941
1 S     0   413     2  0   9   - -     0 irq_th ?        00:00:00 irq/75-pixcir_t
4 S   998   418     1  0  80   0 -  3038 SyS_ep ?        00:00:00 systemd-timesyn
1 S     0   429     2  0   9   - -     0 irq_th ?        00:00:00 irq/158-palmas_
1 S     0   462     2  0   9   - -     0 irq_th ?        00:00:00 irq/348-rtc2
1 S     0   464     2  0   9   - -     0 irq_th ?        00:00:00 irq/349-dwc3-om
1 S     0   466     2  0   9   - -     0 irq_th ?        00:00:00 irq/350-dwc3-om
1 S     0   493     2  0   9   - -     0 irq_th ?        00:00:00 irq/357-vpe
1 S     0   503     2  0  60 -20 -     0 rescue ?        00:00:00 ata_sff
1 S     0   509     2  0   9   - -     0 irq_th ?        00:00:00 irq/365-vip2-s0
1 S     0   510     2  0   9   - -     0 irq_th ?        00:00:00 irq/366-vip2-s1
1 S     0   514     2  0   9   - -     0 irq_th ?        00:00:00 irq/347-4a14000
1 S     0   516     2  0  80   0 -     0 scsi_e ?        00:00:00 scsi_eh_0
1 S     0   517     2  0  60 -20 -     0 rescue ?        00:00:00 scsi_tmf_0
1 S     0   595     2  0   9   - -     0 irq_th ?        00:00:00 irq/361-4b10100
1 S     0   605     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:3
4 S     0   765     1  0  80   0 -   498 poll_s ?        00:00:00 starter
4 S     0   792     1  0  80   0 -   497 syslog ?        00:00:00 klogd
4 S     0   819     1  0  80   0 -  2210 poll_s ?        00:00:01 ofonod
1 S     0   825     2  0  80   0 -     0 worker ?        00:00:02 kworker/1:3
1 S     0   887     2  0   9   - -     0 irq_th ?        00:00:00 irq/313-mbox_ip
4 S     0   900   765  0  80   0 - 35826 sigtim ?        00:00:00 charon
4 S   995   901     1  0  80   0 -  1164 poll_s ?        00:00:00 avahi-daemon
1 S     0   902     2  0   9   - -     0 irq_th ?        00:00:00 irq/317-mbox_ip
1 S     0   903     2  0   9   - -     0 irq_th ?        00:00:00 irq/321-rproc-w
1 S     0   904     2  0   9   - -     0 irq_th ?        00:00:00 irq/326-rproc-w
1 S     0   909     2  0   9   - -     0 irq_th ?        00:00:00 irq/313-mbox_ds
1 S   995   910   901  0  80   0 -  1164 unix_s ?        00:00:00 avahi-daemon
4 S     0   911     1  0  80   0 -   501 skb_re ?        00:00:00 syslogd
1 S     0   912     2  0   9   - -     0 irq_th ?        00:00:00 irq/317-mbox_ds
1 S     0   929     2  0   9   - -     0 irq_th ?        00:00:00 irq/490-remotep
1 S     0   930     2  0   9   - -     0 irq_th ?        00:00:00 irq/492-remotep
1 S     0   931     2  0   9   - -     0 irq_th ?        00:00:00 irq/494-remotep
1 S     0   934     2  0   9   - -     0 irq_th ?        00:00:00 irq/496-remotep
4 S   999   938     1  0  80   0 -  1080 SyS_ep ?        00:00:00 dbus-daemon
4 S   997   948     1  0  80   0 -  2725 SyS_ep ?        00:00:00 systemd-network
4 S     0   970     1  0  80   0 -   656 SyS_ep ?        00:00:00 systemd-logind
1 S     0  1006     2  0   9   - -     0 irq_th ?        00:00:00 irq/472-xhci-hc
4 S     0  1035     1  0  80   0 -  1221 wait   ttyS2    00:00:00 login
4 S     0  1036     1  0  80   0 -   447 poll_s tty1     00:00:00 agetty
4 S   996  1040     1  0  80   0 -   754 SyS_ep ?        00:00:00 systemd-resolve
4 S     0  1048     1  0  80   0 -  1143 SyS_ep ?        00:00:00 lighttpd
5 S     0  1176     1  0  80   0 -   497 poll_s ?        00:00:00 telnetd
5 S     0  1177     1  0  80   0 -  2441 pipe_w ?        00:00:00 lad_dra7xx
5 S 65534  1190     1  0  80   0 -   614 poll_s ?        00:00:00 thttpd
5 S     0  1199     1  0  80   0 -   581 poll_s ?        00:00:00 rngd
1 S     0  1241     2  0   9   - -     0 irq_th ?        00:00:00 irq/91-48020000
1 S     0  1242     2  0   9   - -     0 irq_th ?        00:00:00 irq/301-4802000
4 S     0  1248     1  0  80   0 -  1354 SyS_ep ?        00:00:00 systemd
5 S     0  1252  1248  0  80   0 -  1854 sigtim ?        00:00:00 (sd-pam)
4 S     0  1259  1035  0  80   0 -   676 wait   ttyS2    00:00:00 sh
0 R     0  1263  1259  0  80   0 -   553 -      ttyS2    00:00:00 ps

========================= /proc/interrupts & /proc/softirqs (after boot)
root@am57xx-evm:~# cat /proc/interrupts 
           CPU0       CPU1       
 16:          0          0      CBAR  32 Level     gp_timer
 19:      30322      29398       GIC  27 Edge      arch_timer
 22:          0          0      CBAR   4 Level     l3-dbg-irq
 23:          0          0     WUGEN  10 Level     l3-app-irq
 25:          7          0      CBAR 232 Level     dra7xx-pcie-main
 26:          0          0      CBAR 233 Level     dra7-pcie-msi
 27:          1          0      CBAR 121 Level     talert
 29:         57          0      CBAR   8 Level     omap-dma-engine
 32:        583          0      CBAR 361 Level     43300000.edma_ccint
 34:          0          0      CBAR 359 Level     43300000.edma_ccerrint
 37:          0          0      CBAR  24 Level     4ae10000.gpio
 38:          0          0  4ae10000.gpio   0 Level     palmas
 70:          1          0      CBAR  25 Level     48055000.gpio
 75:          0          1  48055000.gpio   4 Edge      pixcir_tangoc
 91:          0          0  48055000.gpio  20 Edge      48020000.serial
 94:          0          0  48055000.gpio  23 Edge      Up
 95:          0          0  48055000.gpio  24 Edge      Right
 96:          0          0  48055000.gpio  25 Edge      Down
 99:          0          0  48055000.gpio  28 Edge      Left
103:          0          0      CBAR  26 Level     48057000.gpio
136:          0          0      CBAR  27 Level     48059000.gpio
158:          0          0  48059000.gpio  21 Edge      palmas_usb_vbus
169:          0          0      CBAR  28 Level     4805b000.gpio
202:          0          0      CBAR  29 Level     4805d000.gpio
230:          0          0  4805d000.gpio  27 Edge      4809c000.mmc cd
235:          0          0      CBAR  30 Level     48051000.gpio
268:          0          0      CBAR 116 Level     48053000.gpio
301:        552          0      CBAR  69 Level     48020000.serial
313:         50          0      CBAR 251 Level     mbox_ipu1_ipc3x, mbox_dsp1_ipc3x
317:         60          0      CBAR 255 Level     mbox_ipu2_ipc3x, mbox_dsp2_ipc3x
321:          0          0      CBAR  35 Level     rproc-wdt
326:          0          0      CBAR  40 Level     rproc-wdt
335:          8          0      CBAR 108 Level     omap_dmm_irq_handler
337:       1888          0      CBAR  51 Level     48070000.i2c
338:         40          0      CBAR  56 Level     48060000.i2c
339:         28          0      CBAR  55 Level     4807c000.i2c
340:      12831          0      CBAR  78 Level     mmc0
341:       1473          0      CBAR  81 Level     mmc1
342:         13          0      CBAR  89 Level     mmc2
343:          0          0      CBAR  23 Level     40d01000.mmu
344:          0          0      CBAR 145 Level     40d02000.mmu
345:          0          0      CBAR 395 Level     58882000.mmu
346:          0          0      CBAR 396 Level     55082000.mmu
347:          0          0      CBAR  49 Level     4a140000.sata
348:          0          0      CBAR 217 Level     rtc2
349:          5          0      CBAR  72 Level     dwc3-omap
350:          2          0      CBAR  87 Level     dwc3-omap
351:          0          0      CBAR 151 Level     48468000.mcasp_tx
352:          0          0      CBAR 150 Level     48468000.mcasp_rx
354:         24          0      CBAR 335 Level     48484000.ethernet
355:         67          0      CBAR 336 Level     48484000.ethernet
357:          0          0      CBAR 354 Level     vpe
361:         20          0      CBAR  46 Level     4b101000.sham
362:          0          0      CBAR  47 Level     48090000.rng
363:          0          0      CBAR 146 Level     41501000.mmu
364:          0          0      CBAR 147 Level     41502000.mmu
365:          0          0      CBAR 352 Level     vip2-s0
366:          0          0      CBAR 393 Level     vip2-s1
369:      16949          0      CBAR  20 Level     OMAP DISPC
370:          0          0      CBAR  96 Level     OMAP HDMI
435:          0          0    palmas   8 Edge      48070000.i2c:tps659038@58:tps659038_rtc
437:          0          0   PCI-MSI   0 Edge      PCIe PME, aerdrv
469:          0          0      CBAR   2 Edge      mcp7941x
470:          0          0   pinctrl 1060 Edge      2-006f
472:         66          0      CBAR  71 Level     xhci-hcd:usb1
490:          1          0  4b220000.intc  16 Edge      remoteproc4
492:          1          0  4b220000.intc  18 Edge      remoteproc5
494:          1          0  4b2a0000.intc  16 Edge      remoteproc6
496:          1          0  4b2a0000.intc  18 Edge      remoteproc7
IPI0:          0          1  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      48241      64393  Rescheduling interrupts
IPI3:          0          0  Function call interrupts
IPI4:        432        620  Single function call interrupts
IPI5:          0          0  CPU stop interrupts
IPI6:          0          0  IRQ work interrupts
IPI7:          0          0  completion interrupts
Err:          0
root@am57xx-evm:~# cat /proc/softirqs 
                    CPU0       CPU1       
          HI:          0          2
       TIMER:      29167      29156
      NET_TX:          0         57
      NET_RX:        123         86
       BLOCK:          0          0
BLOCK_IOPOLL:          0          0
     TASKLET:        406        386
       SCHED:      28322      28253
     HRTIMER:        609        384
         RCU:          0          0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-08-12 15:58     ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-12 15:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

Hi Sebastian,

On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.

Thankds for comment. You're right:
irq_thread()->irq_forced_thread_fn()->local_bh_enable()

but wouldn't here two wake_up_process() calls any way,
plus preempt_check_resched_rt() in napi_schedule().

> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.

Thanks a lot for comments.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.

I've reverted my patch an did requested experiments (some additional info below).

I do not see "sched: RT throttling activated" :(

root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
[1] 1301
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 47500 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59955
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.31 GBytes   188 Mbits/sec
[  5]  0.0-60.0 sec   868 MBytes   121 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47502 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59957
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.92 GBytes   275 Mbits/sec
[  4]  0.0-60.0 sec  1.14 GBytes   163 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47504 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59965
[ 2285.928516] NOHZ: local_softirq_pending 80
[ 2285.928549] NOHZ: local_softirq_pending 80
[ 2285.928594] NOHZ: local_softirq_pending 80
[ 2285.928689] NOHZ: local_softirq_pending 80
[ 2286.348572] NOHZ: local_softirq_pending 80
[ 2290.358623] NOHZ: local_softirq_pending 80
[ 2291.588624] NOHZ: local_softirq_pending 80
[ 2291.698579] NOHZ: local_softirq_pending 80
[ 2292.608594] NOHZ: local_softirq_pending 80
[ 2293.178627] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.43 GBytes   205 Mbits/sec
[  4]  0.0-60.0 sec  1.58 GBytes   226 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47506 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59980
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.84 GBytes   264 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 47508 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 59982
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec   974 MBytes   136 Mbits/sec
[  4]  0.0-60.0 sec  2.84 GBytes   406 Mbits/sec

[1]+  Done                    ./net_perf.sh



> 
>> 2) latency reduction when cyclictest is run in parallel with network load
>>  where net_perf.sh is:
>>    iperf -c $IPERFHOST -w 8K    -d -t 60
>>    iperf -c $IPERFHOST -w 16K   -d -t 60
>>    iperf -c $IPERFHOST -w 32K   -d -t 60
>>    iperf -c $IPERFHOST -w 64K   -d -t 60
>>    iperf -c $IPERFHOST -w 128K  -d -t 60
>>
>> before:
>> T: 0 ( 1326) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      70
>> T: 1 ( 1327) P:98 I:1500 C: 159981 Min:      9 Act:   15 Avg:   16 Max:      43
>> after:
>> T: 0 ( 1331) P:98 I:1000 C: 240000 Min:      8 Act:   15 Avg:   14 Max:      51
>> T: 1 ( 1332) P:98 I:1500 C: 159953 Min:      8 Act:   16 Avg:   15 Max:      33
> 
> -d 0 to have I: set to the same value.
> What does -i 250 say?
> And without network load we are where we were at "after" values?

=============================================== before, no net load:
cyclictest -m -Sp98 -q  -D4m
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1270) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      38
T: 1 ( 1271) P:98 I:1500 C: 159986 Min:      8 Act:   10 Avg:    9 Max:      23

cyclictest -m -Sp98 -q  -D4m -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1280) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      27
T: 1 ( 1281) P:98 I:1000 C: 239982 Min:      8 Act:   13 Avg:    9 Max:      26

cyclictest -m -Sp98 -q  -D4m -i250
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1284) P:98 I:250 C: 960000 Min:      8 Act:   10 Avg:    8 Max:      24
T: 1 ( 1285) P:98 I:750 C: 319977 Min:      8 Act:   11 Avg:    9 Max:      21

cyclictest -m -Sp98 -q  -D4m -i250 -d0
# /dev/cpu_dma_latency set to 0us
T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26

=============================================== before, with net load:
cyclictest -m -Sp98 -q  -D4m
T: 0 ( 1309) P:98 I:1000 C: 240000 Min:      8 Act:   13 Avg:   18 Max:      60
T: 1 ( 1310) P:98 I:1500 C: 159968 Min:      8 Act:   14 Avg:   16 Max:      58

cyclictest -m -Sp98 -q -D4m -d0
T: 0 ( 1339) P:98 I:1000 C: 240000 Min:      8 Act:   28 Avg:   18 Max:      60
T: 1 ( 1340) P:98 I:1000 C: 239969 Min:      8 Act:   25 Avg:   16 Max:      43

cyclictest -m -Sp98 -q -D4m -i250
T: 0 ( 1369) P:98 I:250 C: 960000 Min:      8 Act:   13 Avg:   17 Max:      64
T: 1 ( 1370) P:98 I:750 C: 319941 Min:      8 Act:   18 Avg:   16 Max:      56

cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48

=============================================== after, no net load:

cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1287) P:98 I:1000 C: 240000 Min:      8 Act:   10 Avg:    9 Max:      30
T: 1 ( 1288) P:98 I:1500 C: 159987 Min:      8 Act:   13 Avg:   10 Max:      32

cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1295) P:98 I:1000 C: 240000 Min:      8 Act:    8 Avg:    9 Max:      39
T: 1 ( 1296) P:98 I:1000 C: 239981 Min:      7 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1298) P:98 I:250 C: 960000 Min:      7 Act:    8 Avg:    9 Max:      42
T: 1 ( 1299) P:98 I:750 C: 319970 Min:      8 Act:    9 Avg:    8 Max:      29

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28

=============================================== after, with net load:
cyclictest -m -Sp98 -q  -D4m 
T: 0 ( 1315) P:98 I:1000 C: 240000 Min:      8 Act:   11 Avg:   14 Max:      40
T: 1 ( 1316) P:98 I:1500 C: 159962 Min:      8 Act:   19 Avg:   15 Max:      38


cyclictest -m -Sp98 -q -D4m -d0  
T: 0 ( 1365) P:98 I:1000 C: 240000 Min:      8 Act:   14 Avg:   15 Max:      51
T: 1 ( 1366) P:98 I:1000 C: 239943 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250
T: 0 ( 1309) P:98 I:250 C: 960000 Min:      8 Act:   15 Avg:   13 Max:      42
T: 1 ( 1310) P:98 I:750 C: 319986 Min:      8 Act:   18 Avg:   15 Max:      36

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36

> 
> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> in net/core/dev.c and chrt the priority of you network interrupt
> handlers to SCHED_OTHER priority?

===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER

with net load:
cyclictest -m -Sp98 -q -D4m -i250 -d0
T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39

cyclictest -m -Sp98 -q  -D4m -i250 -d0
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37

and net parformance is better:
root@am57xx-evm:~# ps -A | grep 4848
   82 ?        00:00:00 irq/354-4848400
   83 ?        00:00:00 irq/355-4848400
root@am57xx-evm:~# chrt -o -p 0 82
root@am57xx-evm:~# chrt -o -p 0 83
./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
[1] 1298
# /dev/cpu_dma_latency set to 0us
Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux


------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 16.0 KByte (WARNING: requested 8.00 KByte)
------------------------------------------------------------
[  3] local 172.22.39.21 port 58672 connected with 172.22.39.17 port 5001
[  5] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60672
[   78.718828] NOHZ: local_softirq_pending 80
[   78.908849] NOHZ: local_softirq_pending 80
[   78.908927] NOHZ: local_softirq_pending 80
[   78.918827] NOHZ: local_softirq_pending 80
[   79.058826] NOHZ: local_softirq_pending 80
[   81.488918] NOHZ: local_softirq_pending 80
[   81.489162] NOHZ: local_softirq_pending 80
[   81.489398] NOHZ: local_softirq_pending 80
[   81.489646] NOHZ: local_softirq_pending 80
[   81.489895] NOHZ: local_softirq_pending 80
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-60.0 sec  1.30 GBytes   187 Mbits/sec
[  5]  0.0-60.0 sec   637 MBytes  89.0 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 32.0 KByte (WARNING: requested 16.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58674 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60680
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.78 GBytes   255 Mbits/sec
[  4]  0.0-60.0 sec  1.47 GBytes   210 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size: 64.0 KByte (WARNING: requested 32.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58676 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60693
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.62 GBytes   232 Mbits/sec
[  4]  0.0-60.0 sec  2.07 GBytes   296 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  128 KByte (WARNING: requested 64.0 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58678 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60702
T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
root@am57xx-evm:~# [ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  1.77 GBytes   254 Mbits/sec
[  4]  0.0-60.0 sec  1.77 GBytes   253 Mbits/sec
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.22.39.17, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 172.22.39.21 port 58680 connected with 172.22.39.17 port 5001
[  4] local 172.22.39.21 port 5001 connected with 172.22.39.17 port 60711
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  2.18 GBytes   312 Mbits/sec
[  4]  0.0-60.0 sec  2.53 GBytes   362 Mbits/sec


> 
>> 3) network performance increase
>>
>> win, K	Mbits/s
>> 	before	after	%
>> 8K	354	350.3	0.0
>> 16K	412	551	33.7 
>> 32K	423	659.5	55.9 
>> 64K	436	728.3	67.0 
>> 128K	537	845	57.4 
> 
> How close are the after numbers to !RT?

http://processors.wiki.ti.com/index.php/Processor_SDK_Linux_Kernel_Performance_Guide#AM57XX_TCP_Performance

Min 20% drop even with this patch :(

-- 
regards,
-grygorii


========================= ps -l -A (after boot)
root@am57xx-evm:~# ps -l -A
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0     1     0  1  80   0 -  1321 SyS_ep ?        00:00:05 systemd
1 S     0     2     0  0  80   0 -     0 kthrea ?        00:00:00 kthreadd
1 S     0     3     2  0  80   0 -     0 smpboo ?        00:00:00 ksoftirqd/0
5 S     0     4     2  0  58   - -     0 smpboo ?        00:00:00 ktimersoftd/0
1 S     0     5     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:0
1 S     0     6     2  0  60 -20 -     0 worker ?        00:00:00 kworker/0:0H
1 S     0     7     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:0
1 S     0     8     2  0  58   - -     0 rcu_gp ?        00:00:00 rcu_preempt
1 S     0     9     2  0  58   - -     0 rcu_gp ?        00:00:00 rcu_sched
1 S     0    10     2  0  58   - -     0 rcu_bo ?        00:00:00 rcub/0
1 S     0    11     2  0  58   - -     0 smpboo ?        00:00:00 rcuc/0
1 S     0    12     2  0  80   0 -     0 run_cl ?        00:00:00 kclksetdelayd
1 S     0    13     2  0 -40   - -     0 posix_ ?        00:00:00 posixcputmr/0
1 S     0    14     2  0  80   0 -     0 run_cm ?        00:00:00 kcmosdelayd
1 S     0    15     2  0 -40   - -     0 smpboo ?        00:00:00 migration/0
1 S     0    16     2  0 -40   - -     0 smpboo ?        00:00:00 migration/1
1 S     0    17     2  0  58   - -     0 smpboo ?        00:00:00 rcuc/1
5 S     0    18     2  0  58   - -     0 smpboo ?        00:00:00 ktimersoftd/1
1 S     0    19     2  0  80   0 -     0 smpboo ?        00:00:00 ksoftirqd/1
1 S     0    20     2  0 -40   - -     0 posix_ ?        00:00:00 posixcputmr/1
1 S     0    21     2  0  80   0 -     0 worker ?        00:00:00 kworker/1:0
1 S     0    22     2  0  60 -20 -     0 worker ?        00:00:00 kworker/1:0H
5 S     0    23     2  0  80   0 -     0 devtmp ?        00:00:00 kdevtmpfs
1 S     0    24     2  0  80   0 -     0 swork_ ?        00:00:00 kswork
1 S     0    25     2  0  60 -20 -     0 rescue ?        00:00:00 perf
1 S     0    26     2  0   9   - -     0 irq_th ?        00:00:00 irq/37-4ae10000
1 S     0    27     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:1
1 S     0    28     2  0   9   - -     0 irq_th ?        00:00:00 irq/70-48055000
1 S     0    29     2  0   9   - -     0 irq_th ?        00:00:00 irq/103-4805700
1 S     0    30     2  0   9   - -     0 irq_th ?        00:00:00 irq/136-4805900
1 S     0    31     2  0   9   - -     0 irq_th ?        00:00:00 irq/169-4805b00
1 S     0    32     2  0   9   - -     0 irq_th ?        00:00:00 irq/202-4805d00
1 S     0    33     2  0   9   - -     0 irq_th ?        00:00:00 irq/235-4805100
1 S     0    34     2  0   9   - -     0 irq_th ?        00:00:00 irq/268-4805300
1 S     0    35     2  0  60 -20 -     0 rescue ?        00:00:00 writeback
1 S     0    36     2  0  60 -20 -     0 rescue ?        00:00:00 crypto
1 S     0    37     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    38     2  0  60 -20 -     0 rescue ?        00:00:00 kblockd
1 S     0    39     2  0   9   - -     0 irq_th ?        00:00:00 irq/29-omap-dma
1 S     0    40     2  0   9   - -     0 irq_th ?        00:00:00 irq/32-43300000
1 S     0    41     2  0   9   - -     0 irq_th ?        00:00:00 irq/34-43300000
1 S     0    42     2  0   9   - -     0 irq_th ?        00:00:00 irq/343-40d0100
1 S     0    43     2  0   9   - -     0 irq_th ?        00:00:00 irq/344-40d0200
1 S     0    44     2  0   9   - -     0 irq_th ?        00:00:00 irq/345-5888200
1 S     0    45     2  0   9   - -     0 irq_th ?        00:00:00 irq/346-5508200
1 S     0    46     2  0   9   - -     0 irq_th ?        00:00:00 irq/363-4150100
1 S     0    47     2  0   9   - -     0 irq_th ?        00:00:00 irq/364-4150200
1 S     0    48     2  0   9   - -     0 irq_th ?        00:00:00 irq/337-4807000
1 S     0    49     2  0   9   - -     0 irq_th ?        00:00:00 irq/38-palmas
1 S     0    50     2  0   9   - -     0 irq_th ?        00:00:00 irq/338-4806000
1 S     0    51     2  0   9   - -     0 irq_th ?        00:00:00 irq/339-4807c00
1 S     0    52     2  0  60 -20 -     0 rescue ?        00:00:00 edac-poller
1 S     0    53     2  0  60 -20 -     0 rescue ?        00:00:00 rpciod
1 S     0    54     2  0  80   0 -     0 kswapd ?        00:00:00 kswapd0
1 S     0    55     2  0  60 -20 -     0 rescue ?        00:00:00 vmstat
1 S     0    56     2  0  80   0 -     0 fsnoti ?        00:00:00 fsnotify_mark
1 S     0    57     2  0  60 -20 -     0 rescue ?        00:00:00 nfsiod
1 S     0    66     2  0   9   - -     0 irq_th ?        00:00:00 irq/25-dra7xx-p
1 S     0    67     2  0   9   - -     0 irq_th ?        00:00:00 irq/437-PCIe PM
1 S     0    68     2  0   9   - -     0 irq_th ?        00:00:00 irq/437-aerdrv
1 S     0    69     2  0   9   - -     0 irq_th ?        00:00:00 irq/370-OMAP HD
1 S     0    70     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    71     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    72     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    73     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    74     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    75     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    76     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    77     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    78     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:1
1 S     0    81     2  0  80   0 -     0 worker ?        00:00:01 kworker/1:1
1 S     0    82     2  0   9   - -     0 irq_th ?        00:00:00 irq/354-4848400
1 S     0    83     2  0   9   - -     0 irq_th ?        00:00:00 irq/355-4848400
1 S     0    84     2  0  60 -20 -     0 rescue ?        00:00:00 kpsmoused
1 S     0    85     2  0   9   - -     0 irq_th ?        00:00:00 irq/27-talert
1 S     0    86     2  0   9   - -     0 irq_th ?        00:00:00 irq/340-mmc0
1 S     0    87     2  0   9   - -     0 irq_th ?        00:00:00 irq/230-4809c00
1 S     0    88     2  0   9   - -     0 irq_th ?        00:00:00 irq/341-mmc1
1 S     0    89     2  0   9   - -     0 irq_th ?        00:00:00 irq/342-mmc2
1 S     0    90     2  0   9   - -     0 irq_th ?        00:00:00 irq/352-4846800
1 S     0    91     2  0   9   - -     0 irq_th ?        00:00:00 irq/351-4846800
1 S     0    92     2  0  60 -20 -     0 rescue ?        00:00:00 ipv6_addrconf
1 S     0    93     2  0   9   - -     0 irq_th ?        00:00:00 irq/335-omap_dm
1 S     0    94     2  0   9   - -     0 irq_th ?        00:00:00 irq/369-OMAP DI
1 S     0    95     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    96     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/0
1 S     0    97     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0    98     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1
1 S     0    99     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0   100     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1boot0
1 S     0   101     2  0  60 -20 -     0 rescue ?        00:00:00 bioset
1 S     0   102     2  0  80   0 -     0 mmc_qu ?        00:00:00 mmcqd/1boot1
1 S     0   103     2  0  60 -20 -     0 rescue ?        00:00:00 deferwq
1 S     0   104     2  0   9   - -     0 irq_th ?        00:00:00 irq/94-Up
1 S     0   105     2  0   9   - -     0 irq_th ?        00:00:00 irq/96-Down
1 S     0   106     2  0   9   - -     0 irq_th ?        00:00:00 irq/99-Left
1 S     0   107     2  0   9   - -     0 irq_th ?        00:00:00 irq/95-Right
1 S     0   110     2  0  60 -20 -     0 worker ?        00:00:00 kworker/1:1H
1 S     0   111     2  0  80   0 -     0 kjourn ?        00:00:00 jbd2/mmcblk0p2-
1 S     0   112     2  0  60 -20 -     0 rescue ?        00:00:00 ext4-rsv-conver
1 S     0   129     2  0  60 -20 -     0 worker ?        00:00:00 kworker/0:1H
1 S     0   165     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:2
1 S     0   168     2  0  80   0 -     0 worker ?        00:00:00 kworker/1:2
4 S     0   197     1  0  80   0 -  2566 SyS_ep ?        00:00:01 systemd-journal
1 S     0   249     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:2
1 S     0   255     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:3
4 S     0   289     1  0  80   0 -  2731 SyS_ep ?        00:00:02 systemd-udevd
1 S     0   330     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:4
1 S     0   339     2  0  80   0 -     0 worker ?        00:00:00 kworker/0:5
1 S     0   387     2  0   9   - -     0 irq_th ?        00:00:00 irq/362-4809000
1 S     0   407     2  0   9   - -     0 irq_th ?        00:00:00 irq/470-2-006f
1 S     0   409     2  0   9   - -     0 irq_th ?        00:00:00 irq/469-mcp7941
1 S     0   413     2  0   9   - -     0 irq_th ?        00:00:00 irq/75-pixcir_t
4 S   998   418     1  0  80   0 -  3038 SyS_ep ?        00:00:00 systemd-timesyn
1 S     0   429     2  0   9   - -     0 irq_th ?        00:00:00 irq/158-palmas_
1 S     0   462     2  0   9   - -     0 irq_th ?        00:00:00 irq/348-rtc2
1 S     0   464     2  0   9   - -     0 irq_th ?        00:00:00 irq/349-dwc3-om
1 S     0   466     2  0   9   - -     0 irq_th ?        00:00:00 irq/350-dwc3-om
1 S     0   493     2  0   9   - -     0 irq_th ?        00:00:00 irq/357-vpe
1 S     0   503     2  0  60 -20 -     0 rescue ?        00:00:00 ata_sff
1 S     0   509     2  0   9   - -     0 irq_th ?        00:00:00 irq/365-vip2-s0
1 S     0   510     2  0   9   - -     0 irq_th ?        00:00:00 irq/366-vip2-s1
1 S     0   514     2  0   9   - -     0 irq_th ?        00:00:00 irq/347-4a14000
1 S     0   516     2  0  80   0 -     0 scsi_e ?        00:00:00 scsi_eh_0
1 S     0   517     2  0  60 -20 -     0 rescue ?        00:00:00 scsi_tmf_0
1 S     0   595     2  0   9   - -     0 irq_th ?        00:00:00 irq/361-4b10100
1 S     0   605     2  0  80   0 -     0 worker ?        00:00:00 kworker/u4:3
4 S     0   765     1  0  80   0 -   498 poll_s ?        00:00:00 starter
4 S     0   792     1  0  80   0 -   497 syslog ?        00:00:00 klogd
4 S     0   819     1  0  80   0 -  2210 poll_s ?        00:00:01 ofonod
1 S     0   825     2  0  80   0 -     0 worker ?        00:00:02 kworker/1:3
1 S     0   887     2  0   9   - -     0 irq_th ?        00:00:00 irq/313-mbox_ip
4 S     0   900   765  0  80   0 - 35826 sigtim ?        00:00:00 charon
4 S   995   901     1  0  80   0 -  1164 poll_s ?        00:00:00 avahi-daemon
1 S     0   902     2  0   9   - -     0 irq_th ?        00:00:00 irq/317-mbox_ip
1 S     0   903     2  0   9   - -     0 irq_th ?        00:00:00 irq/321-rproc-w
1 S     0   904     2  0   9   - -     0 irq_th ?        00:00:00 irq/326-rproc-w
1 S     0   909     2  0   9   - -     0 irq_th ?        00:00:00 irq/313-mbox_ds
1 S   995   910   901  0  80   0 -  1164 unix_s ?        00:00:00 avahi-daemon
4 S     0   911     1  0  80   0 -   501 skb_re ?        00:00:00 syslogd
1 S     0   912     2  0   9   - -     0 irq_th ?        00:00:00 irq/317-mbox_ds
1 S     0   929     2  0   9   - -     0 irq_th ?        00:00:00 irq/490-remotep
1 S     0   930     2  0   9   - -     0 irq_th ?        00:00:00 irq/492-remotep
1 S     0   931     2  0   9   - -     0 irq_th ?        00:00:00 irq/494-remotep
1 S     0   934     2  0   9   - -     0 irq_th ?        00:00:00 irq/496-remotep
4 S   999   938     1  0  80   0 -  1080 SyS_ep ?        00:00:00 dbus-daemon
4 S   997   948     1  0  80   0 -  2725 SyS_ep ?        00:00:00 systemd-network
4 S     0   970     1  0  80   0 -   656 SyS_ep ?        00:00:00 systemd-logind
1 S     0  1006     2  0   9   - -     0 irq_th ?        00:00:00 irq/472-xhci-hc
4 S     0  1035     1  0  80   0 -  1221 wait   ttyS2    00:00:00 login
4 S     0  1036     1  0  80   0 -   447 poll_s tty1     00:00:00 agetty
4 S   996  1040     1  0  80   0 -   754 SyS_ep ?        00:00:00 systemd-resolve
4 S     0  1048     1  0  80   0 -  1143 SyS_ep ?        00:00:00 lighttpd
5 S     0  1176     1  0  80   0 -   497 poll_s ?        00:00:00 telnetd
5 S     0  1177     1  0  80   0 -  2441 pipe_w ?        00:00:00 lad_dra7xx
5 S 65534  1190     1  0  80   0 -   614 poll_s ?        00:00:00 thttpd
5 S     0  1199     1  0  80   0 -   581 poll_s ?        00:00:00 rngd
1 S     0  1241     2  0   9   - -     0 irq_th ?        00:00:00 irq/91-48020000
1 S     0  1242     2  0   9   - -     0 irq_th ?        00:00:00 irq/301-4802000
4 S     0  1248     1  0  80   0 -  1354 SyS_ep ?        00:00:00 systemd
5 S     0  1252  1248  0  80   0 -  1854 sigtim ?        00:00:00 (sd-pam)
4 S     0  1259  1035  0  80   0 -   676 wait   ttyS2    00:00:00 sh
0 R     0  1263  1259  0  80   0 -   553 -      ttyS2    00:00:00 ps

========================= /proc/interrupts & /proc/softirqs (after boot)
root@am57xx-evm:~# cat /proc/interrupts 
           CPU0       CPU1       
 16:          0          0      CBAR  32 Level     gp_timer
 19:      30322      29398       GIC  27 Edge      arch_timer
 22:          0          0      CBAR   4 Level     l3-dbg-irq
 23:          0          0     WUGEN  10 Level     l3-app-irq
 25:          7          0      CBAR 232 Level     dra7xx-pcie-main
 26:          0          0      CBAR 233 Level     dra7-pcie-msi
 27:          1          0      CBAR 121 Level     talert
 29:         57          0      CBAR   8 Level     omap-dma-engine
 32:        583          0      CBAR 361 Level     43300000.edma_ccint
 34:          0          0      CBAR 359 Level     43300000.edma_ccerrint
 37:          0          0      CBAR  24 Level     4ae10000.gpio
 38:          0          0  4ae10000.gpio   0 Level     palmas
 70:          1          0      CBAR  25 Level     48055000.gpio
 75:          0          1  48055000.gpio   4 Edge      pixcir_tangoc
 91:          0          0  48055000.gpio  20 Edge      48020000.serial
 94:          0          0  48055000.gpio  23 Edge      Up
 95:          0          0  48055000.gpio  24 Edge      Right
 96:          0          0  48055000.gpio  25 Edge      Down
 99:          0          0  48055000.gpio  28 Edge      Left
103:          0          0      CBAR  26 Level     48057000.gpio
136:          0          0      CBAR  27 Level     48059000.gpio
158:          0          0  48059000.gpio  21 Edge      palmas_usb_vbus
169:          0          0      CBAR  28 Level     4805b000.gpio
202:          0          0      CBAR  29 Level     4805d000.gpio
230:          0          0  4805d000.gpio  27 Edge      4809c000.mmc cd
235:          0          0      CBAR  30 Level     48051000.gpio
268:          0          0      CBAR 116 Level     48053000.gpio
301:        552          0      CBAR  69 Level     48020000.serial
313:         50          0      CBAR 251 Level     mbox_ipu1_ipc3x, mbox_dsp1_ipc3x
317:         60          0      CBAR 255 Level     mbox_ipu2_ipc3x, mbox_dsp2_ipc3x
321:          0          0      CBAR  35 Level     rproc-wdt
326:          0          0      CBAR  40 Level     rproc-wdt
335:          8          0      CBAR 108 Level     omap_dmm_irq_handler
337:       1888          0      CBAR  51 Level     48070000.i2c
338:         40          0      CBAR  56 Level     48060000.i2c
339:         28          0      CBAR  55 Level     4807c000.i2c
340:      12831          0      CBAR  78 Level     mmc0
341:       1473          0      CBAR  81 Level     mmc1
342:         13          0      CBAR  89 Level     mmc2
343:          0          0      CBAR  23 Level     40d01000.mmu
344:          0          0      CBAR 145 Level     40d02000.mmu
345:          0          0      CBAR 395 Level     58882000.mmu
346:          0          0      CBAR 396 Level     55082000.mmu
347:          0          0      CBAR  49 Level     4a140000.sata
348:          0          0      CBAR 217 Level     rtc2
349:          5          0      CBAR  72 Level     dwc3-omap
350:          2          0      CBAR  87 Level     dwc3-omap
351:          0          0      CBAR 151 Level     48468000.mcasp_tx
352:          0          0      CBAR 150 Level     48468000.mcasp_rx
354:         24          0      CBAR 335 Level     48484000.ethernet
355:         67          0      CBAR 336 Level     48484000.ethernet
357:          0          0      CBAR 354 Level     vpe
361:         20          0      CBAR  46 Level     4b101000.sham
362:          0          0      CBAR  47 Level     48090000.rng
363:          0          0      CBAR 146 Level     41501000.mmu
364:          0          0      CBAR 147 Level     41502000.mmu
365:          0          0      CBAR 352 Level     vip2-s0
366:          0          0      CBAR 393 Level     vip2-s1
369:      16949          0      CBAR  20 Level     OMAP DISPC
370:          0          0      CBAR  96 Level     OMAP HDMI
435:          0          0    palmas   8 Edge      48070000.i2c:tps659038@58:tps659038_rtc
437:          0          0   PCI-MSI   0 Edge      PCIe PME, aerdrv
469:          0          0      CBAR   2 Edge      mcp7941x
470:          0          0   pinctrl 1060 Edge      2-006f
472:         66          0      CBAR  71 Level     xhci-hcd:usb1
490:          1          0  4b220000.intc  16 Edge      remoteproc4
492:          1          0  4b220000.intc  18 Edge      remoteproc5
494:          1          0  4b2a0000.intc  16 Edge      remoteproc6
496:          1          0  4b2a0000.intc  18 Edge      remoteproc7
IPI0:          0          1  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      48241      64393  Rescheduling interrupts
IPI3:          0          0  Function call interrupts
IPI4:        432        620  Single function call interrupts
IPI5:          0          0  CPU stop interrupts
IPI6:          0          0  IRQ work interrupts
IPI7:          0          0  completion interrupts
Err:          0
root@am57xx-evm:~# cat /proc/softirqs 
                    CPU0       CPU1       
          HI:          0          2
       TIMER:      29167      29156
      NET_TX:          0         57
      NET_RX:        123         86
       BLOCK:          0          0
BLOCK_IOPOLL:          0          0
     TASKLET:        406        386
       SCHED:      28322      28253
     HRTIMER:        609        384
         RCU:          0          0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-12  8:32 ` Sebastian Andrzej Siewior
@ 2016-08-19 14:29     ` Grygorii Strashko
  2016-08-19 14:29     ` Grygorii Strashko
  1 sibling, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-19 14:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.
> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.
> 

I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"

 irq/354-4848400-85    [000]    90.639460: irq_handler_entry:    irq=19 name=arch_timer
           iperf-1284  [001]    90.639474: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639486: irq_handler_exit:     irq=19 ret=handled
 irq/354-4848400-85    [000]    90.639488: softirq_raise:        vec=7 [action=SCHED]
           iperf-1284  [001]    90.639490: sched_waking:         comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.639492: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639499: sched_wakeup:         ksoftirqd/1:21 [120] success=1 CPU:001
           iperf-1284  [001]    90.639504: sched_waking:         comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001
 irq/354-4848400-85    [000]    90.639505: irq_handler_exit:     irq=19 ret=handled
           iperf-1284  [001]    90.639512: sched_wakeup:         ktimersoftd/1:20 [98] success=1 CPU:001
           iperf-1284  [001]    90.639526: sched_stat_runtime:   comm=iperf pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns]
           iperf-1284  [001]    90.639537: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639545: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639556: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639589: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639614: sched_switch:         irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98]
   ktimersoftd/1-20    [001]    90.639625: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639637: sched_waking:         comm=rcu_preempt pid=8 prio=98 target_cpu=001
   ktimersoftd/1-20    [001]    90.639646: sched_wakeup:         rcu_preempt:8 [98] success=1 CPU:001
   ktimersoftd/1-20    [001]    90.639663: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639679: sched_switch:         ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98]
     rcu_preempt-8     [001]    90.639722: sched_switch:         rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120]
     ksoftirqd/1-21    [001]    90.639740: sched_stat_runtime:   comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns]
     ksoftirqd/1-21    [001]    90.639750: sched_switch:         ksoftirqd/1:21 [120] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.639878: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.639880: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.639884: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.639896: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.639903: sched_stat_runtime:   comm=iperf pid=1284 runtime=150466 [ns] vruntime=2105473316 [ns]
           iperf-1284  [001]    90.639913: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639921: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639932: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639958: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.639963: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.639984: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640012: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640109: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.640115: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.640152: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640154: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640158: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640170: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640177: sched_stat_runtime:   comm=iperf pid=1284 runtime=163480 [ns] vruntime=2105636796 [ns]
           iperf-1284  [001]    90.640187: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640195: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640206: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640239: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640266: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640376: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640378: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640382: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640394: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640401: sched_stat_runtime:   comm=iperf pid=1284 runtime=133387 [ns] vruntime=2105770183 [ns]
           iperf-1284  [001]    90.640412: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640419: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640430: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640463: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640489: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640560: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640568: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640579: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640606: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640608: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640611: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640623: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640630: sched_stat_runtime:   comm=iperf pid=1284 runtime=139080 [ns] vruntime=2105909263 [ns]
           iperf-1284  [001]    90.640641: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640648: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640659: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640692: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640720: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640806: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640820: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640823: irq_handler_exit:     irq=354 ret=handled
 irq/354-4848400-85    [000]    90.640833: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640835: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640838: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640849: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640857: sched_stat_runtime:   comm=iperf pid=1284 runtime=135501 [ns] vruntime=2106044764 [ns]
 irq/354-4848400-85    [000]    90.640861: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.640867: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/354-4848400-85    [000]    90.640872: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640874: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.640875: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640885: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640886: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640919: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640945: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641052: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641054: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641058: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641070: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641077: sched_stat_runtime:   comm=iperf pid=1284 runtime=129483 [ns] vruntime=2106174247 [ns]
           iperf-1284  [001]    90.641087: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641095: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641106: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641132: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.641138: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.641159: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641185: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641312: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641314: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641317: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641329: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641336: sched_stat_runtime:   comm=iperf pid=1284 runtime=149003 [ns] vruntime=2106323250 [ns]
           iperf-1284  [001]    90.641347: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641355: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641366: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641398: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641425: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641526: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641526: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641526: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641526: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641526: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641557: sched_stat_runtime:   comm=iperf pid=1284 runtime=130946 [ns] vruntime=2106454196 [ns]
 irq/354-4848400-85    [000]    90.641565: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.641567: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641575: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641576: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641578: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.641586: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641589: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641618: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641645: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641760: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641762: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641765: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641777: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641784: sched_stat_runtime:   comm=iperf pid=1284 runtime=137941 [ns] vruntime=2106592137 [ns]
           iperf-1284  [001]    90.641795: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641802: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641814: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641845: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641871: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641916: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641924: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641935: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641964: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.641969: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.642006: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642008: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642011: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642024: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642032: sched_stat_runtime:   comm=iperf pid=1284 runtime=156486 [ns] vruntime=2106748623 [ns]
           iperf-1284  [001]    90.642042: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642049: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642060: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642093: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642120: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642233: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642235: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642238: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642250: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642257: sched_stat_runtime:   comm=iperf pid=1284 runtime=135990 [ns] vruntime=2106884613 [ns]
           iperf-1284  [001]    90.642268: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642275: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642286: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642319: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642345: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
          rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
          rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
           iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
          rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
          rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
          rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
           iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
 irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
 irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
          rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642533: sched_switch:         rcuc/0:11 [98] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [000]    90.642541: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642551: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642562: sched_pi_setprio:     comm=iperf pid=1284 oldprio=120 newprio=49
 irq/355-4848400-86    [000]    90.642593: sched_switch:         irq/355-4848400:86 [49] D ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642621: sched_switch:         rcuc/0:11 [98] S ==> ktimersoftd/0:4 [98]
   ktimersoftd/0-4     [000]    90.642631: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642647: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642665: sched_switch:         ktimersoftd/0:4 [98] S ==> ksoftirqd/0:3 [120]
     ksoftirqd/0-3     [000]    90.642692: sched_stat_runtime:   comm=ksoftirqd/0 pid=3 runtime=34486 [ns] vruntime=33313887612 [ns]
     ksoftirqd/0-3     [000]    90.642703: sched_switch:         ksoftirqd/0:3 [120] D ==> trace-cmd:1278 [120]
       trace-cmd-1278  [000]    90.642742: sched_waking:         comm=trace-cmd pid=1279 prio=120 target_cpu=001
       trace-cmd-1278  [000]    90.642755: sched_wakeup:         trace-cmd:1279 [120] success=1 CPU:001
 irq/354-4848400-85    [001]    90.643012: softirq_exit:         vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643020: sched_stat_runtime:   comm=trace-cmd pid=1278 runtime=328098 [ns] vruntime=554552351 [ns]
 irq/354-4848400-85    [001]    90.643021: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [001]    90.643031: softirq_entry:        vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643036: sched_switch:         trace-cmd:1278 [120] D ==> iperf:1282 [120]
           iperf-1282  [000]    90.643055: sched_stat_runtime:   comm=iperf pid=1282 runtime=34811 [ns] vruntime=560242796 [ns]
           iperf-1282  [000]    90.643068: sched_switch:         iperf:1282 [120] D ==> swapper/0:0 [120]
          <idle>-0     [000]    90.643079: bputs:                0xc003e7d4s: 58f67c094e488
          <idle>-0     [000]    90.643090: bprint:               softirq_check_pending_idle: NOHZ: 80



-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-08-19 14:29     ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-08-19 14:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 08/12/2016 11:32 AM, Sebastian Andrzej Siewior wrote:
> On 2016-08-11 19:15:40 [+0300], Grygorii Strashko wrote:
>> Mark CPSW Rx/Tx IRQs as IRQF_NO_THREAD and avoid double scheduling on -RT
>> where this IRQs are forced threaded:
>>  rx-irq
>>   |- schedule threaded rx-irq handler
>> ...
>>   |- threaded rx-irq handler -> cpsw_rx_interrupt()
>>      |- napi_schedule()
>> 	|- __raise_softirq_irqoff()
>> 	   |- wakeup_proper_softirq()
>> ...
>>   napi
> 
> This should not be the default path. The default should be napi running
> in the context of the threaded rx-irq handler once the handler is done.
> The wakeup_proper_softirq() part is only done if napi thinks that the
> callback functions runs for too long. So in *that* case you continue
> NAPI in the softirq-thread which runs at SCHED_OTHER.
> 
>> after:
>>  rx-irq
>>   |- cpsw_rx_interrupt()
>>      |- napi_schedule()
>>   |- irq_exit()
>>      |- invoke_softirq()
>> 	   |- wakeup_softirqd()
>> ...
>>   napi
> 
> Since you schedule the softirq from an IRQ-off region / without a
> process context you force the softirq to run in the thread at
> SCHED_OTHER priority.
> 
>> And, as result, get benefits from the following improvements (tested
>> on am57xx-evm):
>>
>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>    seen any more. Now these warnings can be seen once iperf is started.
>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> 
> Do you also see "sched: RT throttling activated"? Because I don't see
> otherwise why this should pop up.
> 

I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"

 irq/354-4848400-85    [000]    90.639460: irq_handler_entry:    irq=19 name=arch_timer
           iperf-1284  [001]    90.639474: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639486: irq_handler_exit:     irq=19 ret=handled
 irq/354-4848400-85    [000]    90.639488: softirq_raise:        vec=7 [action=SCHED]
           iperf-1284  [001]    90.639490: sched_waking:         comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.639492: softirq_raise:        vec=1 [action=TIMER]
           iperf-1284  [001]    90.639499: sched_wakeup:         ksoftirqd/1:21 [120] success=1 CPU:001
           iperf-1284  [001]    90.639504: sched_waking:         comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001
 irq/354-4848400-85    [000]    90.639505: irq_handler_exit:     irq=19 ret=handled
           iperf-1284  [001]    90.639512: sched_wakeup:         ktimersoftd/1:20 [98] success=1 CPU:001
           iperf-1284  [001]    90.639526: sched_stat_runtime:   comm=iperf pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns]
           iperf-1284  [001]    90.639537: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639545: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639556: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639589: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639614: sched_switch:         irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98]
   ktimersoftd/1-20    [001]    90.639625: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639637: sched_waking:         comm=rcu_preempt pid=8 prio=98 target_cpu=001
   ktimersoftd/1-20    [001]    90.639646: sched_wakeup:         rcu_preempt:8 [98] success=1 CPU:001
   ktimersoftd/1-20    [001]    90.639663: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/1-20    [001]    90.639679: sched_switch:         ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98]
     rcu_preempt-8     [001]    90.639722: sched_switch:         rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120]
     ksoftirqd/1-21    [001]    90.639740: sched_stat_runtime:   comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns]
     ksoftirqd/1-21    [001]    90.639750: sched_switch:         ksoftirqd/1:21 [120] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.639878: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.639880: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.639884: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.639896: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.639903: sched_stat_runtime:   comm=iperf pid=1284 runtime=150466 [ns] vruntime=2105473316 [ns]
           iperf-1284  [001]    90.639913: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.639921: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639932: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.639958: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.639963: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.639984: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640012: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640109: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.640115: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.640152: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640154: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640158: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640170: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640177: sched_stat_runtime:   comm=iperf pid=1284 runtime=163480 [ns] vruntime=2105636796 [ns]
           iperf-1284  [001]    90.640187: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640195: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640206: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640239: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640266: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640376: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640378: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640382: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640394: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640401: sched_stat_runtime:   comm=iperf pid=1284 runtime=133387 [ns] vruntime=2105770183 [ns]
           iperf-1284  [001]    90.640412: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640419: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640430: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640463: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640489: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640560: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640568: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640579: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640606: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640608: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640611: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640623: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640630: sched_stat_runtime:   comm=iperf pid=1284 runtime=139080 [ns] vruntime=2105909263 [ns]
           iperf-1284  [001]    90.640641: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.640648: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640659: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640692: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640720: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.640806: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640820: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640823: irq_handler_exit:     irq=354 ret=handled
 irq/354-4848400-85    [000]    90.640833: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640835: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.640838: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.640849: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.640857: sched_stat_runtime:   comm=iperf pid=1284 runtime=135501 [ns] vruntime=2106044764 [ns]
 irq/354-4848400-85    [000]    90.640861: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.640867: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/354-4848400-85    [000]    90.640872: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.640874: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.640875: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.640885: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640886: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640919: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.640945: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641052: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641054: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641058: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641070: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641077: sched_stat_runtime:   comm=iperf pid=1284 runtime=129483 [ns] vruntime=2106174247 [ns]
           iperf-1284  [001]    90.641087: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641095: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641106: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641132: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
 irq/355-4848400-86    [001]    90.641138: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:000
 irq/355-4848400-86    [001]    90.641159: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641185: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641312: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641314: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641317: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641329: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641336: sched_stat_runtime:   comm=iperf pid=1284 runtime=149003 [ns] vruntime=2106323250 [ns]
           iperf-1284  [001]    90.641347: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641355: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641366: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641398: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641425: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641526: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641526: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641526: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641526: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641526: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641557: sched_stat_runtime:   comm=iperf pid=1284 runtime=130946 [ns] vruntime=2106454196 [ns]
 irq/354-4848400-85    [000]    90.641565: softirq_raise:        vec=3 [action=NET_RX]
           iperf-1284  [001]    90.641567: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641575: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641576: irq_handler_entry:    irq=354 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641578: irq_handler_exit:     irq=354 ret=handled
 irq/355-4848400-86    [001]    90.641586: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641589: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641618: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641645: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641760: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.641762: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.641765: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.641777: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.641784: sched_stat_runtime:   comm=iperf pid=1284 runtime=137941 [ns] vruntime=2106592137 [ns]
           iperf-1284  [001]    90.641795: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.641802: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641814: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641845: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.641871: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.641916: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641924: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641935: softirq_entry:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.641964: sched_waking:         comm=iperf pid=1284 prio=120 target_cpu=001
 irq/354-4848400-85    [000]    90.641969: sched_wakeup:         iperf:1284 [120] success=1 CPU:001
 irq/354-4848400-85    [000]    90.642006: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642008: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642011: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642024: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642032: sched_stat_runtime:   comm=iperf pid=1284 runtime=156486 [ns] vruntime=2106748623 [ns]
           iperf-1284  [001]    90.642042: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642049: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642060: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642093: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642120: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642233: irq_handler_entry:    irq=355 name=48484000.ethernet
 irq/354-4848400-85    [000]    90.642235: irq_handler_exit:     irq=355 ret=handled
 irq/354-4848400-85    [000]    90.642238: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
 irq/354-4848400-85    [000]    90.642250: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
           iperf-1284  [001]    90.642257: sched_stat_runtime:   comm=iperf pid=1284 runtime=135990 [ns] vruntime=2106884613 [ns]
           iperf-1284  [001]    90.642268: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [001]    90.642275: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642286: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642319: softirq_exit:         vec=3 [action=NET_RX]
 irq/355-4848400-86    [001]    90.642345: sched_switch:         irq/355-4848400:86 [49] S ==> iperf:1284 [120]
 irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
 irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
          rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
          rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
           iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
          rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
          rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
          rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
           iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
 irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
 irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
          rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
          rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
          rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
          rcuc/0-11    [000]    90.642533: sched_switch:         rcuc/0:11 [98] R ==> irq/355-4848400:86 [49]
 irq/355-4848400-86    [000]    90.642541: softirq_raise:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642551: softirq_entry:        vec=3 [action=NET_RX]
 irq/355-4848400-86    [000]    90.642562: sched_pi_setprio:     comm=iperf pid=1284 oldprio=120 newprio=49
 irq/355-4848400-86    [000]    90.642593: sched_switch:         irq/355-4848400:86 [49] D ==> rcuc/0:11 [98]
          rcuc/0-11    [000]    90.642621: sched_switch:         rcuc/0:11 [98] S ==> ktimersoftd/0:4 [98]
   ktimersoftd/0-4     [000]    90.642631: softirq_entry:        vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642647: softirq_exit:         vec=1 [action=TIMER]
   ktimersoftd/0-4     [000]    90.642665: sched_switch:         ktimersoftd/0:4 [98] S ==> ksoftirqd/0:3 [120]
     ksoftirqd/0-3     [000]    90.642692: sched_stat_runtime:   comm=ksoftirqd/0 pid=3 runtime=34486 [ns] vruntime=33313887612 [ns]
     ksoftirqd/0-3     [000]    90.642703: sched_switch:         ksoftirqd/0:3 [120] D ==> trace-cmd:1278 [120]
       trace-cmd-1278  [000]    90.642742: sched_waking:         comm=trace-cmd pid=1279 prio=120 target_cpu=001
       trace-cmd-1278  [000]    90.642755: sched_wakeup:         trace-cmd:1279 [120] success=1 CPU:001
 irq/354-4848400-85    [001]    90.643012: softirq_exit:         vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643020: sched_stat_runtime:   comm=trace-cmd pid=1278 runtime=328098 [ns] vruntime=554552351 [ns]
 irq/354-4848400-85    [001]    90.643021: softirq_raise:        vec=3 [action=NET_RX]
 irq/354-4848400-85    [001]    90.643031: softirq_entry:        vec=3 [action=NET_RX]
       trace-cmd-1278  [000]    90.643036: sched_switch:         trace-cmd:1278 [120] D ==> iperf:1282 [120]
           iperf-1282  [000]    90.643055: sched_stat_runtime:   comm=iperf pid=1282 runtime=34811 [ns] vruntime=560242796 [ns]
           iperf-1282  [000]    90.643068: sched_switch:         iperf:1282 [120] D ==> swapper/0:0 [120]
          <idle>-0     [000]    90.643079: bputs:                0xc003e7d4s: 58f67c094e488
          <idle>-0     [000]    90.643090: bprint:               softirq_check_pending_idle: NOHZ: 80



-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-12 15:58     ` Grygorii Strashko
  (?)
@ 2016-09-08 14:28     ` Sebastian Andrzej Siewior
  2016-09-08 16:24         ` Grygorii Strashko
  -1 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-08 14:28 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
> Hi Sebastian,
Hi Grygorii,

> Thankds for comment. You're right:
> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
> 
> but wouldn't here two wake_up_process() calls any way,
> plus preempt_check_resched_rt() in napi_schedule().

Usually you prefer BH handling in the IRQ-thread because it runs at
higher priority and is not interrupted by a SCHED_OTHER process. And you
can assign it a higher priority if it should be preferred over an other
interrupt. However, if the processing of the interrupt is taking too
much time (like that ping flood, a lot of network traffic) then we push
it to the softirq thread. If you do this now unconditionally in the
SCHED_OTHER softirq thread then you take away all the `good' things we
had (like processing important packets at higher priority as long as
nobody floods us). Plus you share this thread with everything else that
runs in there.

> >> And, as result, get benefits from the following improvements (tested
> >> on am57xx-evm):
> >>
> >> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
> >>    seen any more. Now these warnings can be seen once iperf is started.
> >>    # iperf -c $IPERFHOST -w 128K  -d -t 60
> > 
> > Do you also see "sched: RT throttling activated"? Because I don't see
> > otherwise why this should pop up.
> 
> I've reverted my patch an did requested experiments (some additional info below).
> 
> I do not see "sched: RT throttling activated" :(

That is okay. However if aim for throughput you might want to switch
away from NO_HZ (and deactivate the software watchdog wich runs at
prio 99 if enabled).

> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
> [1] 1301
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux
> 
> [1]+  Done                    ./net_perf.sh

I can't parse this. But that local_softirq_pending() warning might
contribute to lower numbers.

> =============================================== before, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> # /dev/cpu_dma_latency set to 0us
> T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
> T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26

> =============================================== after, no net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
> T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28

I think those two should be equal more or less since the change should
have no impact on "no net load" or do I miss something?

> =============================================== before, with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
> T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48
> 
> 
> =============================================== after, with net load:
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
> T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36

So the max value dropped by ~50% with your patch. Interesting. What I
remember from testing is that once you had, say, one hour of hackbench
running then after that, the extra network traffic didn't contribute
much (if at all) to the max value.
That said it is hard to believe that one extra context switch
contributes about 40us to the max value on CPU0.

> > What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
> > in net/core/dev.c and chrt the priority of you network interrupt
> > handlers to SCHED_OTHER priority?
> 
> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
> 
> with net load:
> cyclictest -m -Sp98 -q -D4m -i250 -d0
> T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39
> 
> cyclictest -m -Sp98 -q  -D4m -i250 -d0
> T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
> T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
> 
> and net parformance is better:
> root@am57xx-evm:~# ps -A | grep 4848
>    82 ?        00:00:00 irq/354-4848400
>    83 ?        00:00:00 irq/355-4848400
> root@am57xx-evm:~# chrt -o -p 0 82
> root@am57xx-evm:~# chrt -o -p 0 83
> ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m -i250 -d0
> [1] 1298
> # /dev/cpu_dma_latency set to 0us
> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #95 SMP PREEMPT RT Fri Aug 12 16:20:42 EEST 2016 armv7l GNU/Linux

So that looks nice, doesn't it?

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-08-19 14:29     ` Grygorii Strashko
  (?)
@ 2016-09-08 16:00     ` Sebastian Andrzej Siewior
  2016-09-09 12:46         ` Grygorii Strashko
  -1 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-08 16:00 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"
> 
>  irq/354-4848400-85    [000]    90.639460: irq_handler_entry:    irq=19 name=arch_timer
>            iperf-1284  [001]    90.639474: softirq_raise:        vec=1 [action=TIMER]
>            iperf-1284  [001]    90.639486: irq_handler_exit:     irq=19 ret=handled
>  irq/354-4848400-85    [000]    90.639488: softirq_raise:        vec=7 [action=SCHED]
here we raise the SCHED softirq

>            iperf-1284  [001]    90.639490: sched_waking:         comm=ksoftirqd/1 pid=21 prio=120 target_cpu=001
>  irq/354-4848400-85    [000]    90.639492: softirq_raise:        vec=1 [action=TIMER]
and a timer
>            iperf-1284  [001]    90.639499: sched_wakeup:         ksoftirqd/1:21 [120] success=1 CPU:001
>            iperf-1284  [001]    90.639504: sched_waking:         comm=ktimersoftd/1 pid=20 prio=98 target_cpu=001
>  irq/354-4848400-85    [000]    90.639505: irq_handler_exit:     irq=19 ret=handled

okay. so softirq here I come

>            iperf-1284  [001]    90.639512: sched_wakeup:         ktimersoftd/1:20 [98] success=1 CPU:001
>            iperf-1284  [001]    90.639526: sched_stat_runtime:   comm=iperf pid=1284 runtime=50752 [ns] vruntime=2105322850 [ns]
>            iperf-1284  [001]    90.639537: sched_switch:         iperf:1284 [120] R ==> irq/355-4848400:86 [49]
>  irq/355-4848400-86    [001]    90.639545: softirq_raise:        vec=3 [action=NET_RX]
>  irq/355-4848400-86    [001]    90.639556: softirq_entry:        vec=3 [action=NET_RX]
>  irq/355-4848400-86    [001]    90.639589: softirq_exit:         vec=3 [action=NET_RX]
>  irq/355-4848400-86    [001]    90.639614: sched_switch:         irq/355-4848400:86 [49] S ==> ktimersoftd/1:20 [98]
>    ktimersoftd/1-20    [001]    90.639625: softirq_entry:        vec=1 [action=TIMER]
>    ktimersoftd/1-20    [001]    90.639637: sched_waking:         comm=rcu_preempt pid=8 prio=98 target_cpu=001
>    ktimersoftd/1-20    [001]    90.639646: sched_wakeup:         rcu_preempt:8 [98] success=1 CPU:001
>    ktimersoftd/1-20    [001]    90.639663: softirq_exit:         vec=1 [action=TIMER]
>    ktimersoftd/1-20    [001]    90.639679: sched_switch:         ktimersoftd/1:20 [98] S ==> rcu_preempt:8 [98]
>      rcu_preempt-8     [001]    90.639722: sched_switch:         rcu_preempt:8 [98] S ==> ksoftirqd/1:21 [120]
>      ksoftirqd/1-21    [001]    90.639740: sched_stat_runtime:   comm=ksoftirqd/1 pid=21 runtime=25539 [ns] vruntime=29960463828 [ns]
>      ksoftirqd/1-21    [001]    90.639750: sched_switch:         ksoftirqd/1:21 [120] S ==> iperf:1284 [120]
>  irq/354-4848400-85    [000]    90.639878: irq_handler_entry:    irq=355 name=48484000.ethernet
wait, no

[ lots NET_RX wake ups ]
>  irq/354-4848400-85    [000]    90.640560: softirq_exit:         vec=3 [action=NET_RX]
ach. NET_RX

>  irq/354-4848400-85    [000]    90.640568: softirq_raise:        vec=3 [action=NET_RX]
>  irq/354-4848400-85    [000]    90.640579: softirq_entry:        vec=3 [action=NET_RX]
>  irq/354-4848400-85    [000]    90.640606: irq_handler_entry:    irq=355 name=48484000.ethernet
>  irq/354-4848400-85    [000]    90.640608: irq_handler_exit:     irq=355 ret=handled

[ more of those ]

>  irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
>  irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]

We don't serve TIMER & SCHED because those two are pushed to the
ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
the next best thing which is RCU.

>           rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
but not for long.

>           rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
>           rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>           rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>           rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
>            iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>           rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
>           rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
>           rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>            iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
>  irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
>           rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
>  irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
>           rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
ach that IRQ thread no pinned. Good. We migrate.

>           rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
>           rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
>           rcuc/0-11    [000]    90.642533: sched_switch:         rcuc/0:11 [98] R ==> irq/355-4848400:86 [49]
>  irq/355-4848400-86    [000]    90.642541: softirq_raise:        vec=3 [action=NET_RX]
>  irq/355-4848400-86    [000]    90.642551: softirq_entry:        vec=3 [action=NET_RX]
>  irq/355-4848400-86    [000]    90.642562: sched_pi_setprio:     comm=iperf pid=1284 oldprio=120 newprio=49
>  irq/355-4848400-86    [000]    90.642593: sched_switch:         irq/355-4848400:86 [49] D ==> rcuc/0:11 [98]
>           rcuc/0-11    [000]    90.642621: sched_switch:         rcuc/0:11 [98] S ==> ktimersoftd/0:4 [98]
>    ktimersoftd/0-4     [000]    90.642631: softirq_entry:        vec=1 [action=TIMER]
we did wait quite some time for that one.

>    ktimersoftd/0-4     [000]    90.642647: softirq_exit:         vec=1 [action=TIMER]
>    ktimersoftd/0-4     [000]    90.642665: sched_switch:         ktimersoftd/0:4 [98] S ==> ksoftirqd/0:3 [120]
switch back ksoftirqd where SCHED softirq might run.

>      ksoftirqd/0-3     [000]    90.642692: sched_stat_runtime:   comm=ksoftirqd/0 pid=3 runtime=34486 [ns] vruntime=33313887612 [ns]
>      ksoftirqd/0-3     [000]    90.642703: sched_switch:         ksoftirqd/0:3 [120] D ==> trace-cmd:1278 [120]
but we blocked on something in softirq and go back to trace-cmd

>        trace-cmd-1278  [000]    90.642742: sched_waking:         comm=trace-cmd pid=1279 prio=120 target_cpu=001
>        trace-cmd-1278  [000]    90.642755: sched_wakeup:         trace-cmd:1279 [120] success=1 CPU:001
>  irq/354-4848400-85    [001]    90.643012: softirq_exit:         vec=3 [action=NET_RX]
>        trace-cmd-1278  [000]    90.643020: sched_stat_runtime:   comm=trace-cmd pid=1278 runtime=328098 [ns] vruntime=554552351 [ns]
>  irq/354-4848400-85    [001]    90.643021: softirq_raise:        vec=3 [action=NET_RX]
>  irq/354-4848400-85    [001]    90.643031: softirq_entry:        vec=3 [action=NET_RX]
>        trace-cmd-1278  [000]    90.643036: sched_switch:         trace-cmd:1278 [120] D ==> iperf:1282 [120]
trace-cmd is going, back to iperf.

>            iperf-1282  [000]    90.643055: sched_stat_runtime:   comm=iperf pid=1282 runtime=34811 [ns] vruntime=560242796 [ns]
>            iperf-1282  [000]    90.643068: sched_switch:         iperf:1282 [120] D ==> swapper/0:0 [120]
iperf is blocked on something, too.

>           <idle>-0     [000]    90.643079: bputs:                0xc003e7d4s: 58f67c094e488
>           <idle>-0     [000]    90.643090: bprint:               softirq_check_pending_idle: NOHZ: 80

As you see ksoftirqd left the CPU with a D so I would assume it is
blocked on a lock and waits. 
NET_RX is in progress but scheduled out due to RCUC which is also
scheduled out.

I assume we got to softirq because nothing else can run. It will see
that NET_RX is pending and tries it but blocks on the lock
(lock_softirq()). It schedules out. Nothing left -> idle.

The idle code checks to see if a softirq is pending and in fact there is
SCHED on the list and ksoftirq was about to handle it but due to
ordering complication (NET_RX before SCHED) it can't. And we have the
warning.

This 

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
 {
 	static int rate_limit;
 	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
+	struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
 	u32 warnpending;
 	int i;
 
@@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
 		return;
 
 	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
-	for (i = 0; i < NR_SOFTIRQS; i++) {
+	for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
 		struct task_struct *tsk = sr->runner[i];
 
 		/*
@@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
 		}
 	}
 
+	if (warnpending && ksoft_tsk) {
+		raw_spin_lock(&ksoft_tsk->pi_lock);
+		if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == TASK_RUNNING) {
+			/* Clear all bits pending in that task */
+			warnpending &= ~(ksoft_tsk->softirqs_raised);
+		}
+		raw_spin_unlock(&ksoft_tsk->pi_lock);
+	}
+
 	if (warnpending) {
 		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
 		       warnpending);


should avoid the warning if the softirq thread is blocked.

However:
- RCU boosting with RT prio might help here or there.
- having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
  is not strictly required but saves a few cycles if you don't have to
  perform cross CPU wake ups and migrate task forth and back. The latter
  happens at prio 99.
- I am not sure NAPI works as expected. I would assume so. There is IRQ
  354 and 355 which fire after each other. One would be enough I guess.
  And they seem to be short living / fire often. If NAPI works then it
  should put an end to it and push it to the softirq thread.
  If you have IRQ-pacing support I suggest to use something like 10ms or
  so. That means your ping response will go from <= 1ms to 10ms in the
  worst case but since you process more packets at a time your
  throughput should increase.
  If I count this correct, it too you alsmost 4ms from "raise SCHED" to
  "try process SCHED" and most of the time was spent in 35[45] hard irq,
  raise NET_RX or cross wakeup the IRQ thread.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-09-08 14:28     ` Sebastian Andrzej Siewior
@ 2016-09-08 16:24         ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-09-08 16:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 09/08/2016 05:28 PM, Sebastian Andrzej Siewior wrote:
> On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
>> Hi Sebastian,
> Hi Grygorii,
> 
>> Thankds for comment. You're right:
>> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
>>
>> but wouldn't here two wake_up_process() calls any way,
>> plus preempt_check_resched_rt() in napi_schedule().
> 
> Usually you prefer BH handling in the IRQ-thread because it runs at
> higher priority and is not interrupted by a SCHED_OTHER process. And you
> can assign it a higher priority if it should be preferred over an other
> interrupt. However, if the processing of the interrupt is taking too
> much time (like that ping flood, a lot of network traffic) then we push
> it to the softirq thread. If you do this now unconditionally in the
> SCHED_OTHER softirq thread then you take away all the `good' things we
> had (like processing important packets at higher priority as long as
> nobody floods us). Plus you share this thread with everything else that
> runs in there.

That's i understand, but effect from this patch on network throughput
is pretty amazing :)

> 
>>>> And, as result, get benefits from the following improvements (tested
>>>> on am57xx-evm):
>>>>
>>>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>>>    seen any more. Now these warnings can be seen once iperf is started.
>>>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
>>>
>>> Do you also see "sched: RT throttling activated"? Because I don't see
>>> otherwise why this should pop up.
>>
>> I've reverted my patch an did requested experiments (some additional info below).
>>
>> I do not see "sched: RT throttling activated" :(
> 
> That is okay. However if aim for throughput you might want to switch
> away from NO_HZ (and deactivate the software watchdog wich runs at
> prio 99 if enabled).
> 
>> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
>> [1] 1301
>> # /dev/cpu_dma_latency set to 0us
>> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux
>>
> …
>> [1]+  Done                    ./net_perf.sh
> 
> I can't parse this. But that local_softirq_pending() warning might
> contribute to lower numbers.
> 
>> =============================================== before, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> # /dev/cpu_dma_latency set to 0us
>> T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
>> T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26
> 
>> =============================================== after, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
>> T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28
> 
> I think those two should be equal more or less since the change should
> have no impact on "no net load" or do I miss something?

Correct I see no differences in this case, as per above.

> 
>> =============================================== before, with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
>> T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48
>>
>>
>> =============================================== after, with net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
>> T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36
> 
> So the max value dropped by ~50% with your patch. Interesting. What I
> remember from testing is that once you had, say, one hour of hackbench
> running then after that, the extra network traffic didn't contribute
> much (if at all) to the max value.
> That said it is hard to believe that one extra context switch
> contributes about 40us to the max value on CPU0.

Yup. but short time testing provides very stable results.
This patch is going to be tested more intensively shortly.

> 
>>> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
>>> in net/core/dev.c and chrt the priority of you network interrupt
>>> handlers to SCHED_OTHER priority?
>>
>> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
>>
>> with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
>> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39
>>
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
>> T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
>>

> 
> So that looks nice, doesn't it?

Yah, This improvement, in general. 
But the fact that so significant net performance drop observed out of the box 
(without any tunning) and on idle system - triggers a lot of questions ;(

I'm worry if observed original behavior can depend on usage NAPI polling  for both RX/TX 
in CPSW driver. CPSW request two IRQs RX and TX and both handler just do napi_schedule()[NET_RX].


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-09-08 16:24         ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-09-08 16:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 09/08/2016 05:28 PM, Sebastian Andrzej Siewior wrote:
> On 2016-08-12 18:58:21 [+0300], Grygorii Strashko wrote:
>> Hi Sebastian,
> Hi Grygorii,
> 
>> Thankds for comment. You're right:
>> irq_thread()->irq_forced_thread_fn()->local_bh_enable()
>>
>> but wouldn't here two wake_up_process() calls any way,
>> plus preempt_check_resched_rt() in napi_schedule().
> 
> Usually you prefer BH handling in the IRQ-thread because it runs at
> higher priority and is not interrupted by a SCHED_OTHER process. And you
> can assign it a higher priority if it should be preferred over an other
> interrupt. However, if the processing of the interrupt is taking too
> much time (like that ping flood, a lot of network traffic) then we push
> it to the softirq thread. If you do this now unconditionally in the
> SCHED_OTHER softirq thread then you take away all the `good' things we
> had (like processing important packets at higher priority as long as
> nobody floods us). Plus you share this thread with everything else that
> runs in there.

That's i understand, but effect from this patch on network throughput
is pretty amazing :)

> 
>>>> And, as result, get benefits from the following improvements (tested
>>>> on am57xx-evm):
>>>>
>>>> 1) "[ 78.348599] NOHZ: local_softirq_pending 80" message will not be
>>>>    seen any more. Now these warnings can be seen once iperf is started.
>>>>    # iperf -c $IPERFHOST -w 128K  -d -t 60
>>>
>>> Do you also see "sched: RT throttling activated"? Because I don't see
>>> otherwise why this should pop up.
>>
>> I've reverted my patch an did requested experiments (some additional info below).
>>
>> I do not see "sched: RT throttling activated" :(
> 
> That is okay. However if aim for throughput you might want to switch
> away from NO_HZ (and deactivate the software watchdog wich runs at
> prio 99 if enabled).
> 
>> root@am57xx-evm:~# ./net_perf.sh & cyclictest -m -Sp98 -q  -D4m
>> [1] 1301
>> # /dev/cpu_dma_latency set to 0us
>> Linux am57xx-evm 4.4.16-rt23-00321-ga195e6a-dirty #92 SMP PREEMPT RT Fri Aug 12 14:03:59 EEST 2016 armv7l GNU/Linux
>>
> …
>> [1]+  Done                    ./net_perf.sh
> 
> I can't parse this. But that local_softirq_pending() warning might
> contribute to lower numbers.
> 
>> =============================================== before, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> # /dev/cpu_dma_latency set to 0us
>> T: 0 ( 1288) P:98 I:250 C: 960000 Min:      8 Act:    9 Avg:    8 Max:      33
>> T: 1 ( 1289) P:98 I:250 C: 959929 Min:      7 Act:   11 Avg:    9 Max:      26
> 
>> =============================================== after, no net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1301) P:98 I:250 C: 960000 Min:      7 Act:    9 Avg:    8 Max:      22
>> T: 1 ( 1302) P:98 I:250 C: 959914 Min:      7 Act:   11 Avg:    8 Max:      28
> 
> I think those two should be equal more or less since the change should
> have no impact on "no net load" or do I miss something?

Correct I see no differences in this case, as per above.

> 
>> =============================================== before, with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1400) P:98 I:250 C: 960000 Min:      8 Act:   25 Avg:   18 Max:      83
>> T: 1 ( 1401) P:98 I:250 C: 959801 Min:      7 Act:   27 Avg:   17 Max:      48
>>
>>
>> =============================================== after, with net load:
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1358) P:98 I:250 C: 960000 Min:      8 Act:   11 Avg:   14 Max:      42
>> T: 1 ( 1359) P:98 I:250 C: 959743 Min:      7 Act:   18 Avg:   15 Max:      36
> 
> So the max value dropped by ~50% with your patch. Interesting. What I
> remember from testing is that once you had, say, one hour of hackbench
> running then after that, the extra network traffic didn't contribute
> much (if at all) to the max value.
> That said it is hard to believe that one extra context switch
> contributes about 40us to the max value on CPU0.

Yup. but short time testing provides very stable results.
This patch is going to be tested more intensively shortly.

> 
>>> What happens if s/__raise_softirq_irqoff_ksoft/__raise_softirq_irqoff/
>>> in net/core/dev.c and chrt the priority of you network interrupt
>>> handlers to SCHED_OTHER priority?
>>
>> ===== without this patch + __raise_softirq_irqoff + netIRQs->SCHED_OTHER
>>
>> with net load:
>> cyclictest -m -Sp98 -q -D4m -i250 -d0
>> T: 0 ( 1325) P:98 I:1000 C: 240000 Min:      8 Act:   22 Avg:   17 Max:      51
>> T: 1 ( 1326) P:98 I:1500 C: 159981 Min:      8 Act:   15 Avg:   15 Max:      39
>>
>> cyclictest -m -Sp98 -q  -D4m -i250 -d0
>> T: 0 ( 1307) P:98 I:250 C: 960000 Min:      7 Act:   13 Avg:   16 Max:      50
>> T: 1 ( 1308) P:98 I:250 C: 959819 Min:      8 Act:   12 Avg:   14 Max:      37
>>

> 
> So that looks nice, doesn't it?

Yah, This improvement, in general. 
But the fact that so significant net performance drop observed out of the box 
(without any tunning) and on idle system - triggers a lot of questions ;(

I'm worry if observed original behavior can depend on usage NAPI polling  for both RX/TX 
in CPSW driver. CPSW request two IRQs RX and TX and both handler just do napi_schedule()[NET_RX].


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-09-08 16:00     ` Sebastian Andrzej Siewior
@ 2016-09-09 12:46         ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-09-09 12:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote:
> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
>> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"
>>

> 
>>  irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
>>  irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
> 
> We don't serve TIMER & SCHED because those two are pushed to the
> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
> the next best thing which is RCU.
> 
>>           rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
> but not for long.
> 
>>           rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
>>           rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>>           rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>>           rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
>>            iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>>           rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
>>           rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
>>           rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>>            iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
>>  irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
>>           rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
>>  irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
>>           rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
> ach that IRQ thread no pinned. Good. We migrate.
> 

It looks like scheduler playing ping-pong between CPUs with threaded irqs irq/354-355.
And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
I can see better latency and netperf improvement
cyclictest -m -Sp98 -q  -D4m
T: 0 ( 1318) P:98 I:1000 C: 240000 Min:      9 Act:   14 Avg:   15 Max:      42
T: 1 ( 1319) P:98 I:1500 C: 159909 Min:      9 Act:   14 Avg:   16 Max:      39

if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
I can observe more less similar results as with this patch. 


>>           rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
>>           rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled

> 
> As you see ksoftirqd left the CPU with a D so I would assume it is
> blocked on a lock and waits. 
> NET_RX is in progress but scheduled out due to RCUC which is also
> scheduled out.
> 
> I assume we got to softirq because nothing else can run. It will see
> that NET_RX is pending and tries it but blocks on the lock
> (lock_softirq()). It schedules out. Nothing left -> idle.
> 
> The idle code checks to see if a softirq is pending and in fact there is
> SCHED on the list and ksoftirq was about to handle it but due to
> ordering complication (NET_RX before SCHED) it can't. And we have the
> warning.
> 
> This 
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
>  {
>  	static int rate_limit;
>  	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
> +	struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
>  	u32 warnpending;
>  	int i;
>  
> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
>  		return;
>  
>  	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
> -	for (i = 0; i < NR_SOFTIRQS; i++) {
> +	for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
>  		struct task_struct *tsk = sr->runner[i];
>  
>  		/*
> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
>  		}
>  	}
>  
> +	if (warnpending && ksoft_tsk) {
> +		raw_spin_lock(&ksoft_tsk->pi_lock);
> +		if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == TASK_RUNNING) {
> +			/* Clear all bits pending in that task */
> +			warnpending &= ~(ksoft_tsk->softirqs_raised);
> +		}
> +		raw_spin_unlock(&ksoft_tsk->pi_lock);
> +	}
> +
>  	if (warnpending) {
>  		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>  		       warnpending);
> 
> 
> should avoid the warning if the softirq thread is blocked.


with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> 

> 
> However:
> - RCU boosting with RT prio might help here or there.

That I'll try. current settings
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_BOOST_DELAY=500


> - having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
>   is not strictly required but saves a few cycles if you don't have to
>   perform cross CPU wake ups and migrate task forth and back. The latter
>   happens at prio 99.

I've experimented with this and it improves netperf and I also followed instructions from [1].
But seems messed ti pin threaded irqs to cpu.
[1] https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html

> - I am not sure NAPI works as expected. I would assume so. There is IRQ
>   354 and 355 which fire after each other. One would be enough I guess.
>   And they seem to be short living / fire often. If NAPI works then it
>   should put an end to it and push it to the softirq thread.
>   If you have IRQ-pacing support I suggest to use something like 10ms or
>   so. That means your ping response will go from <= 1ms to 10ms in the
>   worst case but since you process more packets at a time your
>   throughput should increase.
>   If I count this correct, it too you alsmost 4ms from "raise SCHED" to
>   "try process SCHED" and most of the time was spent in 35[45] hard irq,
>   raise NET_RX or cross wakeup the IRQ thread.

The question I have to dial with is why switching to RT cause so significant
netperf drop (without additional tunning) comparing to vanilla - ~120% for 256K and ~200% for 128K windows?

It's of course expected to see netperf drop, but I assume not so significant :(
And I can't find any reports or statistic related to this. Does the same happen on x86?

Thanks a lot for your comments.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2016-09-09 12:46         ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2016-09-09 12:46 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote:
> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
>> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"
>>

> 
>>  irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
>>  irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
> 
> We don't serve TIMER & SCHED because those two are pushed to the
> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
> the next best thing which is RCU.
> 
>>           rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
> but not for long.
> 
>>           rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
>>           rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>>           rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>>           rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
>>            iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>>           rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
>>           rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
>>           rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>>            iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
>>  irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
>>           rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
>>  irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
>>           rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
> ach that IRQ thread no pinned. Good. We migrate.
> 

It looks like scheduler playing ping-pong between CPUs with threaded irqs irq/354-355.
And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
I can see better latency and netperf improvement
cyclictest -m -Sp98 -q  -D4m
T: 0 ( 1318) P:98 I:1000 C: 240000 Min:      9 Act:   14 Avg:   15 Max:      42
T: 1 ( 1319) P:98 I:1500 C: 159909 Min:      9 Act:   14 Avg:   16 Max:      39

if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
I can observe more less similar results as with this patch. 


>>           rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
>>           rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled

> 
> As you see ksoftirqd left the CPU with a D so I would assume it is
> blocked on a lock and waits. 
> NET_RX is in progress but scheduled out due to RCUC which is also
> scheduled out.
> 
> I assume we got to softirq because nothing else can run. It will see
> that NET_RX is pending and tries it but blocks on the lock
> (lock_softirq()). It schedules out. Nothing left -> idle.
> 
> The idle code checks to see if a softirq is pending and in fact there is
> SCHED on the list and ksoftirq was about to handle it but due to
> ordering complication (NET_RX before SCHED) it can't. And we have the
> warning.
> 
> This 
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
>  {
>  	static int rate_limit;
>  	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
> +	struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
>  	u32 warnpending;
>  	int i;
>  
> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
>  		return;
>  
>  	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
> -	for (i = 0; i < NR_SOFTIRQS; i++) {
> +	for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
>  		struct task_struct *tsk = sr->runner[i];
>  
>  		/*
> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
>  		}
>  	}
>  
> +	if (warnpending && ksoft_tsk) {
> +		raw_spin_lock(&ksoft_tsk->pi_lock);
> +		if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == TASK_RUNNING) {
> +			/* Clear all bits pending in that task */
> +			warnpending &= ~(ksoft_tsk->softirqs_raised);
> +		}
> +		raw_spin_unlock(&ksoft_tsk->pi_lock);
> +	}
> +
>  	if (warnpending) {
>  		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>  		       warnpending);
> 
> 
> should avoid the warning if the softirq thread is blocked.


with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> 

> 
> However:
> - RCU boosting with RT prio might help here or there.

That I'll try. current settings
CONFIG_RCU_BOOST=y
CONFIG_RCU_KTHREAD_PRIO=1
CONFIG_RCU_BOOST_DELAY=500


> - having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
>   is not strictly required but saves a few cycles if you don't have to
>   perform cross CPU wake ups and migrate task forth and back. The latter
>   happens at prio 99.

I've experimented with this and it improves netperf and I also followed instructions from [1].
But seems messed ti pin threaded irqs to cpu.
[1] https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html

> - I am not sure NAPI works as expected. I would assume so. There is IRQ
>   354 and 355 which fire after each other. One would be enough I guess.
>   And they seem to be short living / fire often. If NAPI works then it
>   should put an end to it and push it to the softirq thread.
>   If you have IRQ-pacing support I suggest to use something like 10ms or
>   so. That means your ping response will go from <= 1ms to 10ms in the
>   worst case but since you process more packets at a time your
>   throughput should increase.
>   If I count this correct, it too you alsmost 4ms from "raise SCHED" to
>   "try process SCHED" and most of the time was spent in 35[45] hard irq,
>   raise NET_RX or cross wakeup the IRQ thread.

The question I have to dial with is why switching to RT cause so significant
netperf drop (without additional tunning) comparing to vanilla - ~120% for 256K and ~200% for 128K windows?

It's of course expected to see netperf drop, but I assume not so significant :(
And I can't find any reports or statistic related to this. Does the same happen on x86?

Thanks a lot for your comments.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-09-09 12:46         ` Grygorii Strashko
  (?)
@ 2016-09-15 14:39         ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-15 14:39 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

On 2016-09-09 15:46:44 [+0300], Grygorii Strashko wrote:
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 240000 Min:      9 Act:   14 Avg:   15 Max:      42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:      9 Act:   14 Avg:   16 Max:      39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 

so no patch then.

> with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> 

okay. So I need to think what I do about this. Either this or trying to
run the "higher" softirq first but this could break things.
Thanks for the confirmation.

> > - having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
> >   is not strictly required but saves a few cycles if you don't have to
> >   perform cross CPU wake ups and migrate task forth and back. The latter
> >   happens at prio 99.
> 
> I've experimented with this and it improves netperf and I also followed instructions from [1].
> But seems messed ti pin threaded irqs to cpu.
> [1] https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html

There is irq_thread() => irq_thread_check_affinity(). It might not work
as expected on ARM but it makes sense to follow the affinity mask HW irq
for the thread.

> > - I am not sure NAPI works as expected. I would assume so. There is IRQ
> >   354 and 355 which fire after each other. One would be enough I guess.
> >   And they seem to be short living / fire often. If NAPI works then it
> >   should put an end to it and push it to the softirq thread.
> >   If you have IRQ-pacing support I suggest to use something like 10ms or
> >   so. That means your ping response will go from <= 1ms to 10ms in the
> >   worst case but since you process more packets at a time your
> >   throughput should increase.
> >   If I count this correct, it too you alsmost 4ms from "raise SCHED" to
> >   "try process SCHED" and most of the time was spent in 35[45] hard irq,
> >   raise NET_RX or cross wakeup the IRQ thread.
> 
> The question I have to dial with is why switching to RT cause so significant
> netperf drop (without additional tunning) comparing to vanilla - ~120% for 256K and ~200% for 128K windows?

You have a sched / thread ping/pong. That is one thing. !RT with
threaded irqs should show similar problems. The higher latency is caused
by the migration thread.

> It's of course expected to see netperf drop, but I assume not so significant :(
> And I can't find any reports or statistic related to this. Does the same happen on x86?

It should. Maybe at a lower level if it handles migration more
effective. There is this watchdog thread (for instance) which tries to
detect lockups and runs at P99. It causes "worse" cyclictest numbers on
x86 and on ARM but on ARM this is more visible than on x86.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
  2016-09-09 12:46         ` Grygorii Strashko
@ 2017-04-07 23:35           ` Grygorii Strashko
  -1 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2017-04-07 23:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

Hi Sebastian,

On 09/09/2016 07:46 AM, Grygorii Strashko wrote:
> On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote:
>> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
>>> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"
>>>
> 
>>
>>>  irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
>>>  irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
>>
>> We don't serve TIMER & SCHED because those two are pushed to the
>> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
>> the next best thing which is RCU.
>>
>>>           rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
>> but not for long.
>>
>>>           rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
>>>           rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>>>           rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>>>           rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
>>>            iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>>>           rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
>>>           rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
>>>           rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>>>            iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
>>>  irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
>>>           rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
>>>  irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
>>>           rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
>> ach that IRQ thread no pinned. Good. We migrate.
>>
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 240000 Min:      9 Act:   14 Avg:   15 Max:      42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:      9 Act:   14 Avg:   16 Max:      39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 
> 
> 
>>>           rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
>>>           rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
> 
>>
>> As you see ksoftirqd left the CPU with a D so I would assume it is
>> blocked on a lock and waits. 
>> NET_RX is in progress but scheduled out due to RCUC which is also
>> scheduled out.
>>
>> I assume we got to softirq because nothing else can run. It will see
>> that NET_RX is pending and tries it but blocks on the lock
>> (lock_softirq()). It schedules out. Nothing left -> idle.
>>
>> The idle code checks to see if a softirq is pending and in fact there is
>> SCHED on the list and ksoftirq was about to handle it but due to
>> ordering complication (NET_RX before SCHED) it can't. And we have the
>> warning.
>>
>> This 
>>
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
>>  {
>>  	static int rate_limit;
>>  	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
>> +	struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
>>  	u32 warnpending;
>>  	int i;
>>  
>> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
>>  		return;
>>  
>>  	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
>> -	for (i = 0; i < NR_SOFTIRQS; i++) {
>> +	for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
>>  		struct task_struct *tsk = sr->runner[i];
>>  
>>  		/*
>> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
>>  		}
>>  	}
>>  
>> +	if (warnpending && ksoft_tsk) {
>> +		raw_spin_lock(&ksoft_tsk->pi_lock);
>> +		if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == TASK_RUNNING) {
>> +			/* Clear all bits pending in that task */
>> +			warnpending &= ~(ksoft_tsk->softirqs_raised);
>> +		}
>> +		raw_spin_unlock(&ksoft_tsk->pi_lock);
>> +	}
>> +
>>  	if (warnpending) {
>>  		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>>  		       warnpending);
>>
>>
>> should avoid the warning if the softirq thread is blocked.
> 
> 
> with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> 
> 

Above change is not present in K4.9 and I can still see NOHZ messages if
i run iperf on am57xx-evm, I'd like to know if you have any plans regarding this?

Copy pasting here your comments from another e-mail
--
okay. So I need to think what I do about this. Either this or trying to
run the "higher" softirq first but this could break things.
Thanks for the confirmation.
((c) Sebastian Andrzej Siewior)
--

root@am57xx-evm:~# iperf -c 192.168.1.1 -w128K -d -i5 -t120 & cyclictest -n -m -Sp97 -q -D2m 
[1] 1078
# /dev/cpu_dma_latency set to 0us
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 192.168.1.2 port 39346 connected with 192.168.1.1 port 5001
[  4] local 192.168.1.2 port 5001 connected with 192.168.1.1 port 50594
[   90.829952] NOHZ: local_softirq_pending 80
[   90.830001] NOHZ: local_softirq_pending 80
[   91.529926] NOHZ: local_softirq_pending 80
[   93.299956] NOHZ: local_softirq_pending 80                                                                                     
[   93.680078] NOHZ: local_softirq_pending 80                                                                                     
[   93.680128] NOHZ: local_softirq_pending 80                                                                                     
[ ID] Interval       Transfer     Bandwidth                                                                                       
[  5]  0.0- 5.0 sec  73.2 MBytes   123 Mbits/sec                                                                                  
[  4]  0.0- 5.0 sec  47.5 MBytes  79.7 Mbits/sec
[   94.719951] NOHZ: local_softirq_pending 80
[   96.439923] NOHZ: local_softirq_pending 80
[   96.569922] NOHZ: local_softirq_pending 80
[   96.569951] NOHZ: local_softirq_pending 80
[  5]  5.0-10.0 sec  73.5 MBytes   123 Mbits/sec


PS: if i set net irqs to run on cpu0 (or CPU1) only it improve net throughput and
eliminates this NOHZ messages.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD
@ 2017-04-07 23:35           ` Grygorii Strashko
  0 siblings, 0 replies; 18+ messages in thread
From: Grygorii Strashko @ 2017-04-07 23:35 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, linux-omap, Alison Chaiken, linux-rt-users, netdev

Hi Sebastian,

On 09/09/2016 07:46 AM, Grygorii Strashko wrote:
> On 09/08/2016 07:00 PM, Sebastian Andrzej Siewior wrote:
>> On 2016-08-19 17:29:16 [+0300], Grygorii Strashko wrote:
>>> I've collected trace before first occurrence of  "NOHZ: local_softirq_pending 80"
>>>
> 
>>
>>>  irq/354-4848400-85    [000]    90.642393: softirq_exit:         vec=3 [action=NET_RX]
>>>  irq/354-4848400-85    [000]    90.642419: sched_switch:         irq/354-4848400:85 [49] S ==> rcuc/0:11 [98]
>>
>> We don't serve TIMER & SCHED because those two are pushed to the
>> ksoftirq thread(s). So we keep mostly doing NET_RX and now we switch to
>> the next best thing which is RCU.
>>
>>>           rcuc/0-11    [000]    90.642430: irq_handler_entry:    irq=354 name=48484000.ethernet
>> but not for long.
>>
>>>           rcuc/0-11    [000]    90.642432: irq_handler_exit:     irq=354 ret=handled
>>>           rcuc/0-11    [000]    90.642435: sched_waking:         comm=irq/354-4848400 pid=85 prio=49 target_cpu=000
>>>           rcuc/0-11    [000]    90.642442: sched_migrate_task:   comm=irq/354-4848400 pid=85 prio=49 orig_cpu=0 dest_cpu=1
>>>           rcuc/0-11    [000]    90.642453: sched_wakeup:         irq/354-4848400:85 [49] success=1 CPU:001
>>>            iperf-1284  [001]    90.642462: sched_stat_runtime:   comm=iperf pid=1284 runtime=113053 [ns] vruntime=2106997666 [ns]
>>>           rcuc/0-11    [000]    90.642464: irq_handler_entry:    irq=355 name=48484000.ethernet
>>>           rcuc/0-11    [000]    90.642466: irq_handler_exit:     irq=355 ret=handled
>>>           rcuc/0-11    [000]    90.642469: sched_waking:         comm=irq/355-4848400 pid=86 prio=49 target_cpu=001
>>>            iperf-1284  [001]    90.642473: sched_switch:         iperf:1284 [120] R ==> irq/354-4848400:85 [49]
>>>  irq/354-4848400-85    [001]    90.642481: softirq_raise:        vec=3 [action=NET_RX]
>>>           rcuc/0-11    [000]    90.642483: sched_wakeup:         irq/355-4848400:86 [49] success=1 CPU:001
>>>  irq/354-4848400-85    [001]    90.642493: softirq_entry:        vec=3 [action=NET_RX]
>>>           rcuc/0-11    [000]    90.642497: sched_migrate_task:   comm=irq/355-4848400 pid=86 prio=49 orig_cpu=1 dest_cpu=0
>> ach that IRQ thread no pinned. Good. We migrate.
>>
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 240000 Min:      9 Act:   14 Avg:   15 Max:      42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:      9 Act:   14 Avg:   16 Max:      39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 
> 
> 
>>>           rcuc/0-11    [000]    90.642515: irq_handler_entry:    irq=354 name=48484000.ethernet
>>>           rcuc/0-11    [000]    90.642516: irq_handler_exit:     irq=354 ret=handled
> 
>>
>> As you see ksoftirqd left the CPU with a D so I would assume it is
>> blocked on a lock and waits. 
>> NET_RX is in progress but scheduled out due to RCUC which is also
>> scheduled out.
>>
>> I assume we got to softirq because nothing else can run. It will see
>> that NET_RX is pending and tries it but blocks on the lock
>> (lock_softirq()). It schedules out. Nothing left -> idle.
>>
>> The idle code checks to see if a softirq is pending and in fact there is
>> SCHED on the list and ksoftirq was about to handle it but due to
>> ordering complication (NET_RX before SCHED) it can't. And we have the
>> warning.
>>
>> This 
>>
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -105,6 +105,7 @@ void softirq_check_pending_idle(void)
>>  {
>>  	static int rate_limit;
>>  	struct softirq_runner *sr = this_cpu_ptr(&softirq_runners);
>> +	struct task_struct *ksoft_tsk = __this_cpu_read(ksoftirqd);
>>  	u32 warnpending;
>>  	int i;
>>  
>> @@ -112,7 +113,7 @@ void softirq_check_pending_idle(void)
>>  		return;
>>  
>>  	warnpending = local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK;
>> -	for (i = 0; i < NR_SOFTIRQS; i++) {
>> +	for (i = 0; (i < NR_SOFTIRQS) && warnpending; i++) {
>>  		struct task_struct *tsk = sr->runner[i];
>>  
>>  		/*
>> @@ -132,6 +133,15 @@ void softirq_check_pending_idle(void)
>>  		}
>>  	}
>>  
>> +	if (warnpending && ksoft_tsk) {
>> +		raw_spin_lock(&ksoft_tsk->pi_lock);
>> +		if (ksoft_tsk->pi_blocked_on || ksoft_tsk->state == TASK_RUNNING) {
>> +			/* Clear all bits pending in that task */
>> +			warnpending &= ~(ksoft_tsk->softirqs_raised);
>> +		}
>> +		raw_spin_unlock(&ksoft_tsk->pi_lock);
>> +	}
>> +
>>  	if (warnpending) {
>>  		printk(KERN_ERR "NOHZ: local_softirq_pending %02x\n",
>>  		       warnpending);
>>
>>
>> should avoid the warning if the softirq thread is blocked.
> 
> 
> with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> 
> 

Above change is not present in K4.9 and I can still see NOHZ messages if
i run iperf on am57xx-evm, I'd like to know if you have any plans regarding this?

Copy pasting here your comments from another e-mail
--
okay. So I need to think what I do about this. Either this or trying to
run the "higher" softirq first but this could break things.
Thanks for the confirmation.
((c) Sebastian Andrzej Siewior)
--

root@am57xx-evm:~# iperf -c 192.168.1.1 -w128K -d -i5 -t120 & cyclictest -n -m -Sp97 -q -D2m 
[1] 1078
# /dev/cpu_dma_latency set to 0us
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.1.1, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  5] local 192.168.1.2 port 39346 connected with 192.168.1.1 port 5001
[  4] local 192.168.1.2 port 5001 connected with 192.168.1.1 port 50594
[   90.829952] NOHZ: local_softirq_pending 80
[   90.830001] NOHZ: local_softirq_pending 80
[   91.529926] NOHZ: local_softirq_pending 80
[   93.299956] NOHZ: local_softirq_pending 80                                                                                     
[   93.680078] NOHZ: local_softirq_pending 80                                                                                     
[   93.680128] NOHZ: local_softirq_pending 80                                                                                     
[ ID] Interval       Transfer     Bandwidth                                                                                       
[  5]  0.0- 5.0 sec  73.2 MBytes   123 Mbits/sec                                                                                  
[  4]  0.0- 5.0 sec  47.5 MBytes  79.7 Mbits/sec
[   94.719951] NOHZ: local_softirq_pending 80
[   96.439923] NOHZ: local_softirq_pending 80
[   96.569922] NOHZ: local_softirq_pending 80
[   96.569951] NOHZ: local_softirq_pending 80
[  5]  5.0-10.0 sec  73.5 MBytes   123 Mbits/sec


PS: if i set net irqs to run on cpu0 (or CPU1) only it improve net throughput and
eliminates this NOHZ messages.

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-07 23:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 16:15 [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD Grygorii Strashko
2016-08-11 16:15 ` Grygorii Strashko
2016-08-11 16:36 ` Steven Rostedt
2016-08-11 16:36   ` Steven Rostedt
2016-08-12  8:32 ` Sebastian Andrzej Siewior
2016-08-12 15:58   ` Grygorii Strashko
2016-08-12 15:58     ` Grygorii Strashko
2016-09-08 14:28     ` Sebastian Andrzej Siewior
2016-09-08 16:24       ` Grygorii Strashko
2016-09-08 16:24         ` Grygorii Strashko
2016-08-19 14:29   ` Grygorii Strashko
2016-08-19 14:29     ` Grygorii Strashko
2016-09-08 16:00     ` Sebastian Andrzej Siewior
2016-09-09 12:46       ` Grygorii Strashko
2016-09-09 12:46         ` Grygorii Strashko
2016-09-15 14:39         ` Sebastian Andrzej Siewior
2017-04-07 23:35         ` Grygorii Strashko
2017-04-07 23:35           ` Grygorii Strashko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.