linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][RT] netpoll: Always take poll_lock when doing polling
@ 2016-05-26 23:56 Steven Rostedt
  2016-06-02 16:12 ` Sebastian Andrzej Siewior
  2016-06-10 15:30 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2016-05-26 23:56 UTC (permalink / raw)
  To: LKML, linux-rt-users, netdev
  Cc: Thomas Gleixner, Peter Zijlstra, Clark Williams, Eric Dumazet,
	David Miller, alison

[ Alison, can you try this patch ]

This uses netpoll_poll_lock()/unlock() to synchronize netpoll and napi
poll operations. Without this method, the synchronization is done by
looping on NAPI_STATE_SCHED 'bitset'. This method works fine on a non-rt
kernel because a softirq can not be preempted, and the thread poll is
called with local_bh_disable() which prevents softirqs from running and
preempting it. But on rt, this code can be preempted.  Thus, the code may
be preempted out while holding the NAPI_STATE_SCHED 'bitset', opening a
window for a livelock.

For example:

   <interrupt thread (as all interrupts on RT are threaded>

   napi_schedule_prep()
        test_and_set_bit(NAPI_STATE_SCHED, &n->state)

   <preempted by higher prio task that runs softirqs in its context>

   sk_busy_loop()

      do {
           rc = busy_poll()
               ret = napi_schedule_prep()
                    return !test_and_set_bit(NAPI_STATE_SCHED, &n->state)
                     <returns zero because NAPI_STATE_SCHED is set>
               if (!ret) return 0
           <rc is zero>
      } while (...) /* for ever */

This isn't a problem in non PREEMPT_RT because the napi_schedule_prep()
can not be preempted. But because it can in PREEMPT_RT, we need to add
some extra locking. The netpoll_poll_lock() works well here, but they need
to be added around any call to busy_poll().

Using IS_ENABLED(CONFIG_PREEMPT_RT_FULL) will allow gcc to optimize out
the extra calls to poll_lock.

Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/netpoll.h |    2 +-
 include/net/busy_poll.h |   14 +++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

Index: linux-rt.git/include/linux/netpoll.h
===================================================================
--- linux-rt.git.orig/include/linux/netpoll.h	2016-05-26 18:31:09.183150389 -0400
+++ linux-rt.git/include/linux/netpoll.h	2016-05-26 18:52:02.657014280 -0400
@@ -77,7 +77,7 @@ static inline void *netpoll_poll_lock(st
 {
 	struct net_device *dev = napi->dev;
 
-	if (dev && dev->npinfo) {
+	if (dev && (IS_ENABLED(CONFIG_PREEMPT_RT_FULL) || dev->npinfo)) {
 		spin_lock(&napi->poll_lock);
 		napi->poll_owner = smp_processor_id();
 		return napi;
Index: linux-rt.git/include/net/busy_poll.h
===================================================================
--- linux-rt.git.orig/include/net/busy_poll.h	2016-05-26 18:31:09.183150389 -0400
+++ linux-rt.git/include/net/busy_poll.h	2016-05-26 19:10:12.134266713 -0400
@@ -25,6 +25,7 @@
 #define _LINUX_NET_BUSY_POLL_H
 
 #include <linux/netdevice.h>
+#include <linux/netpoll.h>
 #include <net/ip.h>
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
@@ -97,7 +98,18 @@ static inline bool sk_busy_loop(struct s
 		goto out;
 
 	do {
-		rc = ops->ndo_busy_poll(napi);
+		/* When RT is enabled, napi_schedule_prep() can be preempted
+		 * with NAPI_STATE_SCHED set, causing the busy_poll() function
+		 * to always return zero, and this loop may never exit.
+		 * In that case, we must always take the netpoll_poll_lock.
+		 */
+		if (IS_ENABLED(CONFIG_PREEMPT_RT_FULL)) {
+			void *have = netpoll_poll_lock(napi);
+			rc = ops->ndo_busy_poll(napi);
+			netpoll_poll_unlock(have);
+		} else {
+			rc = ops->ndo_busy_poll(napi);
+		}
 
 		if (rc == LL_FLUSH_FAILED)
 			break; /* permanent failure */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-05-26 23:56 [PATCH][RT] netpoll: Always take poll_lock when doing polling Steven Rostedt
@ 2016-06-02 16:12 ` Sebastian Andrzej Siewior
  2016-06-04 11:11   ` Steven Rostedt
  2016-06-10 15:30 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-02 16:12 UTC (permalink / raw)
  To: Steven Rostedt, alison
  Cc: LKML, linux-rt-users, netdev, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Eric Dumazet, David Miller

* Steven Rostedt | 2016-05-26 19:56:41 [-0400]:

>[ Alison, can you try this patch ]

Alison, did you try it?

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-02 16:12 ` Sebastian Andrzej Siewior
@ 2016-06-04 11:11   ` Steven Rostedt
  2016-06-05 15:16     ` Alison Chaiken
  2016-06-10 15:57     ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2016-06-04 11:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: alison, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

On Thu, 2 Jun 2016 18:12:35 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Steven Rostedt | 2016-05-26 19:56:41 [-0400]:
> 
> >[ Alison, can you try this patch ]
> 
> Alison, did you try it?
> 
> Sebastian

This patch may help too...

-- Steve

>From 729e35706b8352d83692764adbeca429bb26ba7f Mon Sep 17 00:00:00 2001
From: Steven Rostedt <srostedt@redhat.com>
Date: Tue, 5 Jan 2016 14:53:09 -0500
Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
 amount of time

To prevent starvation of tasks like ksoftirqd, if the task that is
processing its softirqs takes more than 2 jiffies to do so, and the
softirqs are constantly being re-added, then defer the processing to
ksoftirqd.

Signed-off-by: Steven Rostedt <rosted@goodmis.org>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
 kernel/softirq.c |   53 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 19 deletions(-)

Index: linux-rt.git/kernel/softirq.c
===================================================================
--- linux-rt.git.orig/kernel/softirq.c	2016-06-04 07:01:54.584296827 -0400
+++ linux-rt.git/kernel/softirq.c	2016-06-04 07:06:35.449913451 -0400
@@ -206,6 +206,22 @@ static void handle_softirq(unsigned int
 	}
 }
 
+/*
+ * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
+ * but break the loop if need_resched() is set or after 2 ms.
+ * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
+ * certain cases, such as stop_machine(), jiffies may cease to
+ * increment and so we need the MAX_SOFTIRQ_RESTART limit as
+ * well to make sure we eventually return from this method.
+ *
+ * These limits have been established via experimentation.
+ * The two things to balance is latency against fairness -
+ * we want to handle softirqs as soon as possible, but they
+ * should not be able to lock up the box.
+ */
+#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_RESTART 10
+
 #ifndef CONFIG_PREEMPT_RT_FULL
 static inline int ksoftirqd_softirq_pending(void)
 {
@@ -349,22 +365,6 @@ void __local_bh_enable_ip(unsigned long
 }
 EXPORT_SYMBOL(__local_bh_enable_ip);
 
-/*
- * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
- *
- * These limits have been established via experimentation.
- * The two things to balance is latency against fairness -
- * we want to handle softirqs as soon as possible, but they
- * should not be able to lock up the box.
- */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
-#define MAX_SOFTIRQ_RESTART 10
-
 #ifdef CONFIG_TRACE_IRQFLAGS
 /*
  * When we run softirqs from irq_exit() and thus on the hardirq stack we need
@@ -537,11 +537,17 @@ static void do_single_softirq(int which)
  */
 static void do_current_softirqs(void)
 {
-	while (current->softirqs_raised) {
-		int i = __ffs(current->softirqs_raised);
+	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+	unsigned int softirqs_raised = current->softirqs_raised;
+
+restart:
+	current->softirqs_raised &= ~softirqs_raised;
+
+	while (softirqs_raised) {
+		int i = __ffs(softirqs_raised);
 		unsigned int pending, mask = (1U << i);
 
-		current->softirqs_raised &= ~mask;
+		softirqs_raised &= ~mask;
 		local_irq_enable();
 
 		/*
@@ -568,6 +574,15 @@ static void do_current_softirqs(void)
 		unlock_softirq(i);
 		local_irq_disable();
 	}
+
+	softirqs_raised = current->softirqs_raised;
+	if (softirqs_raised) {
+		if (time_before(jiffies, end))
+			goto restart;
+
+		__this_cpu_read(ksoftirqd)->softirqs_raised |= softirqs_raised;
+		wakeup_softirqd();
+	}
 }
 
 void __local_bh_disable(void)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-04 11:11   ` Steven Rostedt
@ 2016-06-05 15:16     ` Alison Chaiken
  2016-06-06 12:03       ` Clark Williams
  2016-06-07  9:46       ` Sebastian Andrzej Siewior
  2016-06-10 15:57     ` Sebastian Andrzej Siewior
  1 sibling, 2 replies; 16+ messages in thread
From: Alison Chaiken @ 2016-06-05 15:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, LKML, linux-rt-users, netdev,
	Thomas Gleixner, Peter Zijlstra, Clark Williams, Eric Dumazet,
	David Miller

Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always
take poll_lock when doing polling"
>> [ Alison, can you try this patch ]

Sebastian follows up:
>Alison, did you try it?

Sorry for not responding sooner.   I was hoping to come to a complete
understanding of the system before replying . . .

I did try that patch, but it hasn't made much difference.   Let me
back up and restate the problem I'm trying to solve, which is that a
DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
main event loop in user space that misses latency deadlines under the
test condition where I ping-flood it from another box.   While in
production, the system would not be expected to support high rates of
network traffic, but the instability with the ping-flood makes me
wonder if there aren't underlying configuration problems.

We've applied Sebastian's commit "softirq: split timer softirqs out of
ksoftirqd," which improved event loop stability substantially when we
left ksoftirqd running at userspace default but elevated ktimersoftd.
 That made me think that focusing on the softirqs was pertinent.

Subsequently, I've tried "[PATCH][RT] netpoll: Always take poll_lock
when doing polling" (which seems like a good idea in any event).
After reading the "net: threadable napi poll loop discussion"
(https://lkml.org/lkml/2016/5/10/472), and
https://lkml.org/lkml/2016/2/27/152, I tried reverting

commit c10d73671ad30f54692f7f69f0e09e75d3a8926a
   Author: Eric Dumazet <edumazet@google.com>
   Date:   Thu Jan 10 15:26:34 2013 -0800
       softirq: reduce latencies

but that didn't help.   When the userspace application (running at -3
priority) starts having problems, I see that the hard IRQ associated
with the ethernet device uses about 35% of one core, which seems
awfully high if the NAPI has triggered a switch to polling.  I vaguely
recall David Miller saying in the "threadable napi poll loop"
discussion that accounting was broken for net IRQs, so perhaps that
number is misleading.   mpstat shows that the NET_RX softirqs run on
the same core where we've pinned the ethernet IRQ, so you might hope
that userspace might be able to run happily on the other one.

What I see in ftrace while watching scheduler and IRQ events is that
the userspace application is yielding to ethernet or CAN IRQs, which
also raise NET_RX.    In the following,  ping-flood is running, and
irq/343 is the ethernet one:

 userspace_application-4767  [000] dn.h1..  4196.422318:
irq_handler_entry: irq=347 name=can1
 userspace_application-4767  [000] dn.h1..  4196.422319:
irq_handler_exit: irq=347 ret=handled
 userspace_application-4767  [000] dn.h2..  4196.422321: sched_waking:
comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
 irq/343-4848400-874   [001] ....112  4196.422323: softirq_entry:
vec=3 [action=NET_RX]
 userspace_application-4767  [000] dn.h3..  4196.422325: sched_wakeup:
comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
 irq/343-4848400-874   [001] ....112  4196.422328: napi_poll: napi
poll on napi struct edd5f560 for device eth0
 irq/343-4848400-874   [001] ....112  4196.422329: softirq_exit: vec=3
[action=NET_RX]
 userspace_application-4767  [000] dn..3..  4196.422332:
sched_stat_runtime: comm=userspace_application pid=4767 runtime=22448
[ns] vruntime=338486919642 [ns]
 userspace_application-4767  [000] d...3..  4196.422336: sched_switch:
prev_comm=userspace_application prev_pid=4767 prev_prio=120
prev_state=R ==> next_comm=irq/347-can1 next_pid=2053 next_prio=28
 irq/343-4848400-874   [001] d...3..  4196.422339: sched_switch:
prev_comm=irq/343-4848400 prev_pid=874 prev_prio=47 prev_state=S ==>
next_comm=irq/344-4848400 next_pid=875 next_prio=47

You can see why the application is having problems: it is constantly
interrupted by eth and CAN IRQs.   Given that CAN traffic is critical
for our application, perhaps we will simply have to reduce the eth
hard IRQ priority in order to make the system more robust?     It
would be great to offload the network traffic-handling to the Cortex-M
processor on the DRA7, but I fear that the development schedule will
not allow for that option.

I still am not sure how to tell if the NAPI switch from
interrupt-driven to polling is properly taking place.      Any
suggestion on how best to monitor that behavior with overly loading
the system would be appreciated.

Thanks again for the patches,
Alison Chaiken
Peloton Technology

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-05 15:16     ` Alison Chaiken
@ 2016-06-06 12:03       ` Clark Williams
  2016-06-06 23:25         ` Alison Chaiken
  2016-06-07  9:46       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 16+ messages in thread
From: Clark Williams @ 2016-06-06 12:03 UTC (permalink / raw)
  To: Alison Chaiken
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, LKML, linux-rt-users,
	netdev, Thomas Gleixner, Peter Zijlstra, Eric Dumazet,
	David Miller

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Sun, 5 Jun 2016 08:16:58 -0700
Alison Chaiken <alison@peloton-tech.com> wrote:

> Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always
> take poll_lock when doing polling"
> >> [ Alison, can you try this patch ]  
> 
> Sebastian follows up:
> >Alison, did you try it?  
> 
> Sorry for not responding sooner.   I was hoping to come to a complete
> understanding of the system before replying . . .
> 
> I did try that patch, but it hasn't made much difference.   Let me
> back up and restate the problem I'm trying to solve, which is that a
> DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
> main event loop in user space that misses latency deadlines under the
> test condition where I ping-flood it from another box.   While in
> production, the system would not be expected to support high rates of
> network traffic, but the instability with the ping-flood makes me
> wonder if there aren't underlying configuration problems.

What sort of tunings have you applied, regarding thread and interrupt affinity? If you have not, you might try isolating one of your cores and just run the user-space application on that core, with interrupt threads running on the other core. You could use the 'tuna' application like this:

	$ sudo tuna --cpus=1 --isolate

This will move all the threads that *can* be moved off of cpu1 (probably to cpu0 since I believe the OMAP5 is a dual-core processor?). 

Also, what scheduler policy/priority are you using for the user-space application?

Clark

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-06 12:03       ` Clark Williams
@ 2016-06-06 23:25         ` Alison Chaiken
  0 siblings, 0 replies; 16+ messages in thread
From: Alison Chaiken @ 2016-06-06 23:25 UTC (permalink / raw)
  To: Clark Williams
  Cc: Steven Rostedt, Sebastian Andrzej Siewior, LKML, linux-rt-users,
	netdev, Thomas Gleixner, Peter Zijlstra, Eric Dumazet,
	David Miller

Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always
take poll_lock when doing polling"
>> >> [ Alison, can you try this patch ]
>>
 Sebastian follows up:
>> >Alison, did you try it?

I wrote:
>> I did try that patch, but it hasn't made much difference.   Let me
>> back up and restate the problem I'm trying to solve, which is that a
>> DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
>> main event loop in user space that misses latency deadlines under the
>> test condition where I ping-flood it from another box.   While in
>> production, the system would not be expected to support high rates of
>> network traffic, but the instability with the ping-flood makes me
>> wonder if there aren't underlying configuration problems.

Clark asked:
> What sort of tunings have you applied, regarding thread and interrupt affinity?
> Also, what scheduler policy/priority are you using for the user-space application?

We have the most critical hard IRQs (CAN, UART) pinned to one core,
scheduled with FIFO, and running at highest RT priority.  The less
critical IRQs (ethernet, MMC, DMA) are pinned to the other core and
are running at lower FIFO priority.   Next in FIFO priority we have
the ktimersoftd threads.   Then  we have our critical userspace
application running under RR with slightly lower priority and no
pinning.

When there is not much network traffic, the userspace event_loop makes
its deadlines, but when there is a lot of network traffic, the two
network hard IRQs shoot to the top of the process table, with one of
them using about 80% of one core.   This behavior persists whether the
kernel includes "net: provide a way to delegate processing a softirq
to ksoftirqd", "softirq: Perform softirqs in local_bh_enable() for a
limited amount of time", or reverts c10d73671 "softirq: reduce
latencies".

It's hard to see how a *hard* IRQ could take so much processor time.
I guess this gets back to
http://article.gmane.org/gmane.linux.kernel/2219110:
From: Rik van Riel <>
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop
I need to get back to fixing irq & softirq time accounting,
which does not currently work correctly in all time keeping
modes...

So most likely the softirq budget is getting charged to the hard IRQ
that raises it.

>If you have not, you might try isolating one of your cores and just run the user-space application on that core, with interrupt threads running on the other core. You could use the 'tuna' application like this:
>         $ sudo tuna --cpus=1 --isolate
> This will move all the threads that *can* be moved off of cpu1 (probably to cpu0 since I believe the OMAP5 is a dual-core processor?).

Thanks, I installed tuna and gave that a try, but it actually makes
things worse.   I also tried lowering the priority of the ethernet
hard IRQ below that of the most critical userspace application, to no
avail.

Perhaps expecting an RT system to survive a ping-flood is just
unreasonable?   It would be nice to deliver a system that I didn't
know how to bring down.   At least in our real use case, the critical
system will be NAT'ed and packets will not be forwarded to it.

Thanks,
Alison

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-05 15:16     ` Alison Chaiken
  2016-06-06 12:03       ` Clark Williams
@ 2016-06-07  9:46       ` Sebastian Andrzej Siewior
  2016-06-08  0:19         ` Alison Chaiken
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-07  9:46 UTC (permalink / raw)
  To: Alison Chaiken
  Cc: Steven Rostedt, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

* Alison Chaiken | 2016-06-05 08:16:58 [-0700]:

>I did try that patch, but it hasn't made much difference.   Let me
>back up and restate the problem I'm trying to solve, which is that a
>DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
>main event loop in user space that misses latency deadlines under the
>test condition where I ping-flood it from another box.   While in
>production, the system would not be expected to support high rates of
>network traffic, but the instability with the ping-flood makes me
>wonder if there aren't underlying configuration problems.
>
>We've applied Sebastian's commit "softirq: split timer softirqs out of
>ksoftirqd," which improved event loop stability substantially when we

Why did you apply that one? You have 4.1.18-ti-rt so I don't know how
that works but v4.1.15-rt18 had this patch included. Also "net: provide
a way to delegate processing a softirq to ksoftirqd" should be applied
(which is also part of v4.1.15-rt18).

>left ksoftirqd running at userspace default but elevated ktimersoftd.
> That made me think that focusing on the softirqs was pertinent.

Before that explicit "delegation" to ksoftirq within NAPI it was likely
that the NAPI callback was never interrupted and continued on the "next"
softirq.

>priority) starts having problems, I see that the hard IRQ associated
>with the ethernet device uses about 35% of one core, which seems
>awfully high if the NAPI has triggered a switch to polling.  I vaguely

Try the patch above, it is likely your NAPI was never interrupted.

>recall David Miller saying in the "threadable napi poll loop"
>discussion that accounting was broken for net IRQs, so perhaps that
>number is misleading.   mpstat shows that the NET_RX softirqs run on
>the same core where we've pinned the ethernet IRQ, so you might hope
>that userspace might be able to run happily on the other one.
>
>What I see in ftrace while watching scheduler and IRQ events is that
>the userspace application is yielding to ethernet or CAN IRQs, which
>also raise NET_RX.    In the following,  ping-flood is running, and
>irq/343 is the ethernet one:
>
> userspace_application-4767  [000] dn.h1..  4196.422318: irq_handler_entry: irq=347 name=can1
> userspace_application-4767  [000] dn.h1..  4196.422319: irq_handler_exit: irq=347 ret=handled
> userspace_application-4767  [000] dn.h2..  4196.422321: sched_waking: comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
> irq/343-4848400-874   [001] ....112  4196.422323: softirq_entry: vec=3 [action=NET_RX]
> userspace_application-4767  [000] dn.h3..  4196.422325: sched_wakeup: comm=irq/347-can1 pid=2053 prio=28 target_cpu=000
> irq/343-4848400-874   [001] ....112  4196.422328: napi_poll: napi poll on napi struct edd5f560 for device eth0
> irq/343-4848400-874   [001] ....112  4196.422329: softirq_exit: vec=3 [action=NET_RX]
> userspace_application-4767  [000] dn..3..  4196.422332: sched_stat_runtime: comm=userspace_application pid=4767 runtime=22448 [ns] vruntime=338486919642 [ns]
> userspace_application-4767  [000] d...3..  4196.422336: sched_switch: prev_comm=userspace_application prev_pid=4767 prev_prio=120 prev_state=R ==> next_comm=irq/347-can1 next_pid=2053 next_prio=28
> irq/343-4848400-874   [001] d...3..  4196.422339: sched_switch: prev_comm=irq/343-4848400 prev_pid=874 prev_prio=47 prev_state=S ==> next_comm=irq/344-4848400 next_pid=875 next_prio=47

What I remember from testing the two patches on am335x was that before a
ping flood on gbit froze the serial console but with them it the ping
flood was not noticed.

>Thanks again for the patches,
>Alison Chaiken
>Peloton Technology

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-07  9:46       ` Sebastian Andrzej Siewior
@ 2016-06-08  0:19         ` Alison Chaiken
  2016-06-09 12:37           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Alison Chaiken @ 2016-06-08  0:19 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

I wrote:
>>We've applied Sebastian's commit "softirq: split timer softirqs out of
>>ksoftirqd," which improved event loop stability substantially when we

Sebastian Andrzej Siewior <bigeasy@linutronix.de>replied:
>Why did you apply that one? You have 4.1.18-ti-rt so I don't know how
>that works but v4.1.15-rt18 had this patch included. Also "net: provide
>a way to delegate processing a softirq to ksoftirqd" should be applied
>(which is also part of v4.1.15-rt18).

Sorry to be obscure; I had applied that patch to v4.1.6-rt5.

> What I remember from testing the two patches on am335x was that before a
> ping flood on gbit froze the serial console but with them it the ping
> flood was not noticed.

I compiled a kernel from upstream d060a36 "Merge branch
'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into
ti-rt-linux-4.1.y" which is unpatched except for using a
board-appropriate device-tree.    The serial console is responsive
with all our RT userspace applications running alongside a rapid
external ping.   However, our main event loop misses frequently as
soon as ping faster than 'ping -i 0.0002' is run.    mpstat shows that
the sum of the hard IRQ rates in a second is equal precisely to the
NET_RX rate, which is ~3400/s.   Does the fact that 3400 < (1/0.0002)
already mean that some packets are dropped?   ftrace shows that
cpsw_rx_poll() is called even when there is essentially no network
traffic, so I'm not sure how to tell if NAPI is working as intended.

I tried running the wakeup_rt tracer, but it loads the system too
much.     With ftrace capturing IRQ, scheduler and net events, we're
writing out markers into the trace buffer when the event loop makes
its deadline and then when it misses so that we can compare the normal
and long-latency intervals, but there doesn't appear to be a smoking
gun in the difference between the two.

Thanks for all your help,
Alison

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-08  0:19         ` Alison Chaiken
@ 2016-06-09 12:37           ` Sebastian Andrzej Siewior
  2016-09-03 23:40             ` Alison Chaiken
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-09 12:37 UTC (permalink / raw)
  To: Alison Chaiken
  Cc: Steven Rostedt, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

* Alison Chaiken | 2016-06-07 17:19:43 [-0700]:

>Sorry to be obscure; I had applied that patch to v4.1.6-rt5.

Using the latest is often not a bad choice compared to the random tree
you have here.

>> What I remember from testing the two patches on am335x was that before a
>> ping flood on gbit froze the serial console but with them it the ping
>> flood was not noticed.
>
>I compiled a kernel from upstream d060a36 "Merge branch
>'ti-linux-4.1.y' of git.ti.com:ti-linux-kernel/ti-linux-kernel into
>ti-rt-linux-4.1.y" which is unpatched except for using a
>board-appropriate device-tree.    The serial console is responsive
>with all our RT userspace applications running alongside a rapid
>external ping.   However, our main event loop misses frequently as
>soon as ping faster than 'ping -i 0.0002' is run.    mpstat shows that
>the sum of the hard IRQ rates in a second is equal precisely to the
>NET_RX rate, which is ~3400/s.   Does the fact that 3400 < (1/0.0002)
>already mean that some packets are dropped?   ftrace shows that

Not necessarily. The ping command reports how many packets were
received. It is possible that the sender was not able to send that many
packates _or_ the received was able to process more packets during a
single interrupt.

>cpsw_rx_poll() is called even when there is essentially no network
>traffic, so I'm not sure how to tell if NAPI is working as intended.

You should see an invocation of __raise_softirq_irqoff_ksoft() and then
cpsw's poll function should run in "ksoftirqd/" context instead in the
context of the task it runs now.

>I tried running the wakeup_rt tracer, but it loads the system too
>much.     With ftrace capturing IRQ, scheduler and net events, we're
>writing out markers into the trace buffer when the event loop makes
>its deadline and then when it misses so that we can compare the normal
>and long-latency intervals, but there doesn't appear to be a smoking
>gun in the difference between the two.

You would need to figure out what adds the latency. My understanding is
that your RT application is doing CAN traffic and is not meeting the
deadline. So you drop CAN packets in the end?

>Thanks for all your help,
>Alison

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-05-26 23:56 [PATCH][RT] netpoll: Always take poll_lock when doing polling Steven Rostedt
  2016-06-02 16:12 ` Sebastian Andrzej Siewior
@ 2016-06-10 15:30 ` Sebastian Andrzej Siewior
  2016-06-10 16:28   ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-10 15:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, linux-rt-users, netdev, Thomas Gleixner, Peter Zijlstra,
	Clark Williams, Eric Dumazet, David Miller, alison

* Steven Rostedt | 2016-05-26 19:56:41 [-0400]:

>For example:
>
>   <interrupt thread (as all interrupts on RT are threaded>
>
>   napi_schedule_prep()
>        test_and_set_bit(NAPI_STATE_SCHED, &n->state)
>
>   <preempted by higher prio task that runs softirqs in its context>
>
>   sk_busy_loop()
>
>      do {
>           rc = busy_poll()
>               ret = napi_schedule_prep()
>                    return !test_and_set_bit(NAPI_STATE_SCHED, &n->state)
>                     <returns zero because NAPI_STATE_SCHED is set>
>               if (!ret) return 0
>           <rc is zero>
>      } while (...) /* for ever */
>

No, I don't see the busyloop. while() is here:
|    while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
|                  !need_resched() && !busy_loop_timeout(end_time));

and this seems to be the case since v3.11 where it was introduced (but
now it moved to dev.c). So even if there is no busy_poll() and
napi_schedule_prep() returns 0 our cycles here are limited by
busy_loop_timeout().

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-04 11:11   ` Steven Rostedt
  2016-06-05 15:16     ` Alison Chaiken
@ 2016-06-10 15:57     ` Sebastian Andrzej Siewior
  2016-06-10 16:11       ` Steven Rostedt
  1 sibling, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-10 15:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: alison, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

* Steven Rostedt | 2016-06-04 07:11:31 [-0400]:

>From: Steven Rostedt <srostedt@redhat.com>
>Date: Tue, 5 Jan 2016 14:53:09 -0500
>Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
> amount of time
>
>To prevent starvation of tasks like ksoftirqd, if the task that is
>processing its softirqs takes more than 2 jiffies to do so, and the
>softirqs are constantly being re-added, then defer the processing to
>ksoftirqd.

I'm not sure about this. Alison didn't scream "yes it solves my problem"
and I am not sure what it is.
It is true that in RT we don't have such a limit like in !RT. You would
need to use __raise_softirq_irqoff_ksoft() instead the normal or +
wakeup() since you may have timers pending and those need to go to the
"other" ksoftirqd.
But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
it will end up on the CPU right away unless there other tasks that need
the CPU. So the scheduler will balance it the same way. The only change
will be that softirqs which are processed in context of any application
for more than two jiffies will be moved to ksoftirqd. This could be a
win.

>Signed-off-by: Steven Rostedt <rosted@goodmis.org>
>Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-10 15:57     ` Sebastian Andrzej Siewior
@ 2016-06-10 16:11       ` Steven Rostedt
  2016-06-10 16:20         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2016-06-10 16:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: alison, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

On Fri, 10 Jun 2016 17:57:17 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> * Steven Rostedt | 2016-06-04 07:11:31 [-0400]:
> 
> >From: Steven Rostedt <srostedt@redhat.com>
> >Date: Tue, 5 Jan 2016 14:53:09 -0500
> >Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
> > amount of time
> >
> >To prevent starvation of tasks like ksoftirqd, if the task that is
> >processing its softirqs takes more than 2 jiffies to do so, and the
> >softirqs are constantly being re-added, then defer the processing to
> >ksoftirqd.  
> 
> I'm not sure about this. Alison didn't scream "yes it solves my problem"
> and I am not sure what it is.
> It is true that in RT we don't have such a limit like in !RT. You would
> need to use __raise_softirq_irqoff_ksoft() instead the normal or +
> wakeup() since you may have timers pending and those need to go to the
> "other" ksoftirqd.
> But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
> it will end up on the CPU right away unless there other tasks that need
> the CPU. So the scheduler will balance it the same way. The only change
> will be that softirqs which are processed in context of any application
> for more than two jiffies will be moved to ksoftirqd. This could be a
> win.

We actually triggered a starvation due to this. I was just seeing if
Alison hit the same issue we did in our tests.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-10 16:11       ` Steven Rostedt
@ 2016-06-10 16:20         ` Sebastian Andrzej Siewior
  2016-06-10 16:45           ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-06-10 16:20 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: alison, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

On 06/10/2016 06:11 PM, Steven Rostedt wrote:
>> It is true that in RT we don't have such a limit like in !RT. You would
>> need to use __raise_softirq_irqoff_ksoft() instead the normal or +
>> wakeup() since you may have timers pending and those need to go to the
>> "other" ksoftirqd.
>> But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
>> it will end up on the CPU right away unless there other tasks that need
>> the CPU. So the scheduler will balance it the same way. The only change
>> will be that softirqs which are processed in context of any application
>> for more than two jiffies will be moved to ksoftirqd. This could be a
>> win.
> 
> We actually triggered a starvation due to this. I was just seeing if
> Alison hit the same issue we did in our tests.

Okay. Didn't get this information from him. But this is only because
the softirqs not running in ksoftirqd, right?

> -- Steve
> 
Sebastian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-10 15:30 ` Sebastian Andrzej Siewior
@ 2016-06-10 16:28   ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2016-06-10 16:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, David Miller, alison

On Fri, 2016-06-10 at 17:30 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2016-05-26 19:56:41 [-0400]:
> 
> >For example:
> >
> >   <interrupt thread (as all interrupts on RT are threaded>
> >
> >   napi_schedule_prep()
> >        test_and_set_bit(NAPI_STATE_SCHED, &n->state)
> >
> >   <preempted by higher prio task that runs softirqs in its context>
> >
> >   sk_busy_loop()
> >
> >      do {
> >           rc = busy_poll()
> >               ret = napi_schedule_prep()
> >                    return !test_and_set_bit(NAPI_STATE_SCHED, &n->state)
> >                     <returns zero because NAPI_STATE_SCHED is set>
> >               if (!ret) return 0
> >           <rc is zero>
> >      } while (...) /* for ever */
> >
> 
> No, I don't see the busyloop. while() is here:
> |    while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
> |                  !need_resched() && !busy_loop_timeout(end_time));
> 
> and this seems to be the case since v3.11 where it was introduced (but
> now it moved to dev.c). So even if there is no busy_poll() and
> napi_schedule_prep() returns 0 our cycles here are limited by
> busy_loop_timeout().

Well, before linux-4.5 and commit 2a028ecb76497d ("net: allow BH
servicing in sk_busy_loop()") , sk_busy_loop() was completely blocking
BH.

Not sure it matters in your case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-10 16:20         ` Sebastian Andrzej Siewior
@ 2016-06-10 16:45           ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2016-06-10 16:45 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: alison, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

On Fri, 10 Jun 2016 18:20:22 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > We actually triggered a starvation due to this. I was just seeing if
> > Alison hit the same issue we did in our tests.  
> 
> Okay. Didn't get this information from him. But this is only because
> the softirqs not running in ksoftirqd, right?

Right, I think we supplied this patch before we added the one that
would push repeated softirqs to ksoftirqd. We probably should see if
that patch alone fixes our issue.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling
  2016-06-09 12:37           ` Sebastian Andrzej Siewior
@ 2016-09-03 23:40             ` Alison Chaiken
  0 siblings, 0 replies; 16+ messages in thread
From: Alison Chaiken @ 2016-09-03 23:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, LKML, linux-rt-users, netdev, Thomas Gleixner,
	Peter Zijlstra, Clark Williams, Eric Dumazet, David Miller

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

 I asked on 2016-06-07 17:19:43 [-0700]:
>>cpsw_rx_poll() is called even when there is essentially no network
>>traffic, so I'm not sure how to tell if NAPI is working as intended.

On Thu, Jun 9, 2016 at 5:37 AM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> You should see an invocation of __raise_softirq_irqoff_ksoft() and then
> cpsw's poll function should run in "ksoftirqd/" context instead in the
> context of the task it runs now.

The attached patch uses a kprobe to detect when Ethernet switches to
NAPI on a Freescale i.MX6 board.   Thanks to Sebastian for the
suggestion about the method.   As expected, there are no messages when
I ping-flood the board from another host.   However, if I also spawn
multiple scp's of large files at the same time, then the messages
appear.     I tested with 4.4.4-rt11, but the virtually identical
patch is against 4.1.18-rt17.   I'm posting it here in case it's
useful to someone else.   It seems to me that if the various IRQs that
can invoke the net_rx_action() are pinned to different cores, that the
use of smp_processor_id() to identify the device that spawns the IRQ
is therefore robust.

The RT scheduling problem we had (namely, system falls over under
ping-flood) was solved by my colleague Brian Silverman, who pinned our
userspace application that ran the critical event loop and adjusted
its priority.   Doing so prevented a ping-flood from causing the event
loop to miss cycles.

Thanks again to everyone for your advice, and I hope to meet some of
you in Berlin next month.

-- Alison

[-- Attachment #2: kprobes-detect-ethernet-and-CAN-NAPI.patch --]
[-- Type: text/x-patch, Size: 2295 bytes --]

From 1c83b4ee5d572bc1ede630fc72d01228ff2338e2 Mon Sep 17 00:00:00 2001
From: Alison Chaiken <alison@peloton-tech.com>
Date: Sat, 3 Sep 2016 15:51:41 -0700
Subject: [PATCH] kprobes: detect ethernet and CAN NAPI

Inserting this module will induce the core network driver to print a
message whenever handling of CAN or Ethernet packets is performed via
NAPI in ksoftirqd rather than in the context of the invoking hard IRQ
thread.  Tested on Boundary Devices Nitrogen board with i.MX6Q SOC
with 4.4.4-rt11 kernel.

Signed-off-by: Alison Chaiken <alison@she-devel.com>
---
 samples/kprobes/kprobe_example.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 366db1a..ef3b5ee 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -16,7 +16,8 @@
 
 /* For each probe you need to allocate a kprobe structure */
 static struct kprobe kp = {
-	.symbol_name	= "do_fork",
+/*	.symbol_name	= "do_fork", */
+	.symbol_name	= "__raise_softirq_irqoff_ksoft",
 };
 
 /* kprobe pre_handler: called just before the probed instruction is executed */
@@ -51,6 +52,8 @@ static int handler_pre(struct kprobe *p, struct pt_regs *regs)
 static void handler_post(struct kprobe *p, struct pt_regs *regs,
 				unsigned long flags)
 {
+  unsigned id = smp_processor_id();
+
 #ifdef CONFIG_X86
 	printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n",
 		p->addr, regs->flags);
@@ -67,6 +70,25 @@ static void handler_post(struct kprobe *p, struct pt_regs *regs,
 	printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
 		p->addr, regs->ex1);
 #endif
+
+	/* change id to that where the eth IRQ is pinned */
+	if (id == 0) {
+		pr_info("Switched to ethernet NAPI.\n");
+#ifdef CONFIG_ARM
+		pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx,"
+			" lr = 0x%lx, cpsr = 0x%lx\n",
+			p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr);
+#endif
+	}
+	/* change id to that where the CAN IRQ is pinned */
+	if (id == 1) {
+		pr_info("Switched to CAN NAPI.\n");
+#ifdef CONFIG_ARM
+		pr_info("post_handler: p->addr = 0x%p, pc = 0x%lx,"
+			" lr = 0x%lx, cpsr = 0x%lx\n",
+			p->addr, regs->ARM_pc, regs->ARM_lr, regs->ARM_cpsr);
+#endif
+	}
 }
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-03 23:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 23:56 [PATCH][RT] netpoll: Always take poll_lock when doing polling Steven Rostedt
2016-06-02 16:12 ` Sebastian Andrzej Siewior
2016-06-04 11:11   ` Steven Rostedt
2016-06-05 15:16     ` Alison Chaiken
2016-06-06 12:03       ` Clark Williams
2016-06-06 23:25         ` Alison Chaiken
2016-06-07  9:46       ` Sebastian Andrzej Siewior
2016-06-08  0:19         ` Alison Chaiken
2016-06-09 12:37           ` Sebastian Andrzej Siewior
2016-09-03 23:40             ` Alison Chaiken
2016-06-10 15:57     ` Sebastian Andrzej Siewior
2016-06-10 16:11       ` Steven Rostedt
2016-06-10 16:20         ` Sebastian Andrzej Siewior
2016-06-10 16:45           ` Steven Rostedt
2016-06-10 15:30 ` Sebastian Andrzej Siewior
2016-06-10 16:28   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).