All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
@ 2020-09-09  9:09 qianjun.kernel
  2020-09-11 15:55 ` peterz
  2020-09-11 16:46 ` Qais Yousef
  0 siblings, 2 replies; 9+ messages in thread
From: qianjun.kernel @ 2020-09-09  9:09 UTC (permalink / raw)
  To: tglx, peterz, will, luto, linux-kernel; +Cc: laoar.shao, urezki, jun qian

From: jun qian <qianjun.kernel@gmail.com>

When get the pending softirqs, it need to process all the pending
softirqs in the while loop. If the processing time of each pending
softirq is need more than 2 msec in this loop, or one of the softirq
will running a long time, according to the original code logic, it
will process all the pending softirqs without wakeuping ksoftirqd,
which will cause a relatively large scheduling delay on the
corresponding CPU, which we do not wish to see. The patch will check
the total time to process pending softirq, if the time exceeds 2 ms
we need to wakeup the ksofirqd to aviod large sched delay.

Signed-off-by: jun qian <qianjun.kernel@gmail.com>
---
 kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 69 insertions(+), 14 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index c4201b7f..1f696c8 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
 #include <linux/smpboot.h>
 #include <linux/tick.h>
 #include <linux/irq.h>
+#include <linux/sched/clock.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
@@ -199,18 +200,17 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
 
 /*
  * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
+ * but break the loop if need_resched() is set or after MAX_SOFTIRQ_TIME_NS
+ * ns. In the loop, if the processing time of the softirq has exceeded
+ * MAX_SOFTIRQ_TIME_NS ns, we also need to break the loop to wakeup the
+ * ksofirqd.
  *
  * These limits have been established via experimentation.
  * The two things to balance is latency against fairness -
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME_NS 2000000
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -246,15 +246,20 @@ static inline void lockdep_softirq_end(bool in_hardirq)
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+DEFINE_PER_CPU(__u32, pending_new_flag);
+DEFINE_PER_CPU(__u32, pending_next_bit);
+#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1)
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
-	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+	u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
 	bool in_hardirq;
-	__u32 pending;
-	int softirq_bit;
+	__u32 pending, pending_left, pending_new;
+	int softirq_bit, next_bit;
+	unsigned long flags;
 
 	/*
 	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
@@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 
 	h = softirq_vec;
 
-	while ((softirq_bit = ffs(pending))) {
-		unsigned int vec_nr;
+	next_bit = per_cpu(pending_next_bit, smp_processor_id());
+	per_cpu(pending_new_flag, smp_processor_id()) = 0;
+
+	pending_left = pending &
+		(SOFTIRQ_PENDING_MASK << next_bit);
+	pending_new = pending &
+		(SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit));
+
+	/*
+	 * In order to be fair, we shold process the pengding bits by the
+	 * last processing order.
+	 */
+	while ((softirq_bit = ffs(pending_left)) ||
+		(softirq_bit = ffs(pending_new))) {
 		int prev_count;
+		unsigned int vec_nr = 0;
 
+		/*
+		 * when the left pengding bits have been handled, we should
+		 * to reset the h to softirq_vec.
+		 */
+		if (!ffs(pending_left)) {
+			if (per_cpu(pending_new_flag, smp_processor_id()) == 0) {
+				h = softirq_vec;
+				per_cpu(pending_new_flag, smp_processor_id()) = 1;
+			}
+		}
 		h += softirq_bit - 1;
 
 		vec_nr = h - softirq_vec;
@@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
 			preempt_count_set(prev_count);
 		}
 		h++;
-		pending >>= softirq_bit;
+
+		if (ffs(pending_left))
+			pending_left >>= softirq_bit;
+		else
+			pending_new >>= softirq_bit;
+
+		/*
+		 * the softirq's action has been run too much time,
+		 * so it may need to wakeup the ksoftirqd
+		 */
+		if (need_resched() && sched_clock() > end) {
+			/*
+			 * Ensure that the remaining pending bits will be
+			 * handled.
+			 */
+			local_irq_save(flags);
+			if (ffs(pending_left))
+				or_softirq_pending((pending_left << (vec_nr + 1)) |
+							pending_new);
+			else
+				or_softirq_pending(pending_new << (vec_nr + 1));
+			local_irq_restore(flags);
+			per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1;
+			break;
+		}
 	}
 
+	/* reset the pending_next_bit */
+	per_cpu(pending_next_bit, smp_processor_id()) = 0;
+
 	if (__this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
 	local_irq_disable();
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (time_before(jiffies, end) && !need_resched() &&
-		    --max_restart)
+		if (!need_resched() && --max_restart &&
+		    sched_clock() <= end)
 			goto restart;
 
 		wakeup_softirqd();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-09  9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel
@ 2020-09-11 15:55 ` peterz
  2020-09-12  7:17   ` jun qian
  2020-09-11 16:46 ` Qais Yousef
  1 sibling, 1 reply; 9+ messages in thread
From: peterz @ 2020-09-11 15:55 UTC (permalink / raw)
  To: qianjun.kernel
  Cc: tglx, will, luto, linux-kernel, laoar.shao, urezki, frederic

[-- Attachment #1: Type: text/plain, Size: 4585 bytes --]

On Wed, Sep 09, 2020 at 05:09:31PM +0800, qianjun.kernel@gmail.com wrote:
> From: jun qian <qianjun.kernel@gmail.com>
> 
> When get the pending softirqs, it need to process all the pending
> softirqs in the while loop. If the processing time of each pending
> softirq is need more than 2 msec in this loop, or one of the softirq
> will running a long time, according to the original code logic, it
> will process all the pending softirqs without wakeuping ksoftirqd,
> which will cause a relatively large scheduling delay on the
> corresponding CPU, which we do not wish to see. The patch will check
> the total time to process pending softirq, if the time exceeds 2 ms
> we need to wakeup the ksofirqd to aviod large sched delay.

But what is all that unreadaable gibberish with pending_new_{flag,bit} ?

Random comments below..


> +#define MAX_SOFTIRQ_TIME_NS 2000000

	2*NSEC_PER_MSEC


> +DEFINE_PER_CPU(__u32, pending_new_flag);
> +DEFINE_PER_CPU(__u32, pending_next_bit);

__u32 is for userspace ABI, this is not it, use u32

> +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1)
> +
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> -	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> +	u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
>  	unsigned long old_flags = current->flags;
>  	int max_restart = MAX_SOFTIRQ_RESTART;
>  	struct softirq_action *h;
>  	bool in_hardirq;
> -	__u32 pending;
> -	int softirq_bit;
> +	__u32 pending, pending_left, pending_new;
> +	int softirq_bit, next_bit;
> +	unsigned long flags;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  
>  	h = softirq_vec;
>  
> -	while ((softirq_bit = ffs(pending))) {
> -		unsigned int vec_nr;
> +	next_bit = per_cpu(pending_next_bit, smp_processor_id());
> +	per_cpu(pending_new_flag, smp_processor_id()) = 0;

	__this_cpu_read() / __this_cpu_write()

> +
> +	pending_left = pending &
> +		(SOFTIRQ_PENDING_MASK << next_bit);
> +	pending_new = pending &
> +		(SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit));

The second mask is the inverse of the first.

> +	/*
> +	 * In order to be fair, we shold process the pengding bits by the
> +	 * last processing order.
> +	 */
> +	while ((softirq_bit = ffs(pending_left)) ||
> +		(softirq_bit = ffs(pending_new))) {
>  		int prev_count;
> +		unsigned int vec_nr = 0;
>  
> +		/*
> +		 * when the left pengding bits have been handled, we should
> +		 * to reset the h to softirq_vec.
> +		 */
> +		if (!ffs(pending_left)) {
> +			if (per_cpu(pending_new_flag, smp_processor_id()) == 0) {
> +				h = softirq_vec;
> +				per_cpu(pending_new_flag, smp_processor_id()) = 1;
> +			}
> +		}
>  		h += softirq_bit - 1;
>  
>  		vec_nr = h - softirq_vec;
> @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  			preempt_count_set(prev_count);
>  		}
>  		h++;
> -		pending >>= softirq_bit;
> +
> +		if (ffs(pending_left))

This is the _third_ ffs(pending_left), those things are _expensive_ (on
some archs, see include/asm-generic/bitops/__ffs.h).

> +			pending_left >>= softirq_bit;
> +		else
> +			pending_new >>= softirq_bit;
> +
> +		/*
> +		 * the softirq's action has been run too much time,
> +		 * so it may need to wakeup the ksoftirqd
> +		 */
> +		if (need_resched() && sched_clock() > end) {
> +			/*
> +			 * Ensure that the remaining pending bits will be
> +			 * handled.
> +			 */
> +			local_irq_save(flags);
> +			if (ffs(pending_left))

*fourth*...

> +				or_softirq_pending((pending_left << (vec_nr + 1)) |
> +							pending_new);
> +			else
> +				or_softirq_pending(pending_new << (vec_nr + 1));
> +			local_irq_restore(flags);
> +			per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1;
> +			break;
> +		}
>  	}
>  
> +	/* reset the pending_next_bit */
> +	per_cpu(pending_next_bit, smp_processor_id()) = 0;
> +
>  	if (__this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();
>  	local_irq_disable();
>  
>  	pending = local_softirq_pending();
>  	if (pending) {
> -		if (time_before(jiffies, end) && !need_resched() &&
> -		    --max_restart)
> +		if (!need_resched() && --max_restart &&
> +		    sched_clock() <= end)
>  			goto restart;
>  
>  		wakeup_softirqd();

This really wants to be a number of separate patches; and I quickly lost
the plot in your code. Instead of cleaning things up, you're making an
even bigger mess of things.

That said, I _think_ I've managed to decode what you want. See the
completely untested patches attached.



[-- Attachment #2: peterz-softirq-fix-loop.patch --]
[-- Type: text/x-diff, Size: 1456 bytes --]

Subject: softirq: Rewrite softirq processing loop
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:00:03 CEST 2020

Simplify the softirq processing loop by using the bitmap APIs

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -258,9 +258,9 @@ asmlinkage __visible void __softirq_entr
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
+	unsigned long pending;
+	unsigned int vec_nr;
 	bool in_hardirq;
-	__u32 pending;
-	int softirq_bit;
 
 	/*
 	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
@@ -281,15 +281,13 @@ asmlinkage __visible void __softirq_entr
 
 	local_irq_enable();
 
-	h = softirq_vec;
+	for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS) {
+		unsigned int prev_count;
 
-	while ((softirq_bit = ffs(pending))) {
-		unsigned int vec_nr;
-		int prev_count;
+		__clear_bit(vec_nr, &pending);
 
-		h += softirq_bit - 1;
+		h = softirq_vec + vec_nr;
 
-		vec_nr = h - softirq_vec;
 		prev_count = preempt_count();
 
 		kstat_incr_softirqs_this_cpu(vec_nr);
@@ -303,8 +301,6 @@ asmlinkage __visible void __softirq_entr
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
-		h++;
-		pending >>= softirq_bit;
 	}
 
 	if (__this_cpu_read(ksoftirqd) == current)

[-- Attachment #3: peterz-softirq-timo.patch --]
[-- Type: text/x-diff, Size: 1552 bytes --]

Subject: softirq: Use sched_clock() based timeout
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:30:01 CEST 2020

Replace the jiffies based timeout with a sched_clock() based one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
 #include <linux/smpboot.h>
 #include <linux/tick.h>
 #include <linux/irq.h>
+#include <linux/sched/clock.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
@@ -216,7 +217,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -254,9 +255,9 @@ static inline void lockdep_softirq_end(b
 
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
-	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
+	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
 	unsigned int vec_nr;
@@ -309,7 +310,7 @@ asmlinkage __visible void __softirq_entr
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (time_before(jiffies, end) && !need_resched() &&
+		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
 		    --max_restart)
 			goto restart;
 

[-- Attachment #4: peterz-softirq-needs-break.patch --]
[-- Type: text/x-diff, Size: 2703 bytes --]

Subject: softirq: Factor loop termination condition
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:17:20 CEST 2020


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -204,22 +204,6 @@ void __local_bh_enable_ip(unsigned long
 }
 EXPORT_SYMBOL(__local_bh_enable_ip);
 
-/*
- * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
- *
- * These limits have been established via experimentation.
- * The two things to balance is latency against fairness -
- * we want to handle softirqs as soon as possible, but they
- * should not be able to lock up the box.
- */
-#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
-#define MAX_SOFTIRQ_RESTART 10
-
 #ifdef CONFIG_TRACE_IRQFLAGS
 /*
  * When we run softirqs from irq_exit() and thus on the hardirq stack we need
@@ -253,10 +237,33 @@ static inline bool lockdep_softirq_start
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+/*
+ * We restart softirq processing but break the loop if need_resched() is set or
+ * after 2 ms. The MAX_SOFTIRQ_RESTART guarantees a loop termination if
+ * sched_clock() were ever to stall.
+ *
+ * These limits have been established via experimentation.  The two things to
+ * balance is latency against fairness - we want to handle softirqs as soon as
+ * possible, but they should not be able to lock up the box.
+ */
+#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
+#define MAX_SOFTIRQ_RESTART	10
+
+static inline bool __softirq_needs_break(u64 start)
+{
+	if (need_resched())
+		return true;
+
+	if (sched_clock() - start >= MAX_SOFTIRQ_TIME)
+		return true;
+
+	return false;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
+	unsigned int max_restart = MAX_SOFTIRQ_RESTART;
 	unsigned long old_flags = current->flags;
-	int max_restart = MAX_SOFTIRQ_RESTART;
 	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
@@ -310,8 +317,7 @@ asmlinkage __visible void __softirq_entr
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
-		    --max_restart)
+		if (!__softirq_needs_break(start) && --max_restart)
 			goto restart;
 
 		wakeup_softirqd();

[-- Attachment #5: peterz-softirq-break-more.patch --]
[-- Type: text/x-diff, Size: 1118 bytes --]

Subject: softirq: Allow early break
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:50:17 CEST 2020

Allow terminating the softirq processing loop without finishing the
vectors.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -309,19 +309,23 @@ asmlinkage __visible void __softirq_entr
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
+
+		if (pending && __softirq_needs_break(start))
+			break;
 	}
 
 	if (__this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
 	local_irq_disable();
 
-	pending = local_softirq_pending();
-	if (pending) {
-		if (!__softirq_needs_break(start) && --max_restart)
-			goto restart;
+	if (pending)
+		or_softirq_pending(pending);
+	else if ((pending = local_softirq_pending()) &&
+		 !__softirq_needs_break(start) &&
+		 --max_restart)
+		goto restart;
 
-		wakeup_softirqd();
-	}
+	wakeup_softirqd();
 
 	lockdep_softirq_end(in_hardirq);
 	account_irq_exit_time(current);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-09  9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel
  2020-09-11 15:55 ` peterz
@ 2020-09-11 16:46 ` Qais Yousef
  2020-09-11 18:28   ` peterz
  1 sibling, 1 reply; 9+ messages in thread
From: Qais Yousef @ 2020-09-11 16:46 UTC (permalink / raw)
  To: qianjun.kernel
  Cc: tglx, peterz, will, luto, linux-kernel, laoar.shao, urezki,
	John Dias, Wei Wang, Quentin Perret

On 09/09/20 17:09, qianjun.kernel@gmail.com wrote:
> From: jun qian <qianjun.kernel@gmail.com>
> 
> When get the pending softirqs, it need to process all the pending
> softirqs in the while loop. If the processing time of each pending
> softirq is need more than 2 msec in this loop, or one of the softirq
> will running a long time, according to the original code logic, it
> will process all the pending softirqs without wakeuping ksoftirqd,
> which will cause a relatively large scheduling delay on the
> corresponding CPU, which we do not wish to see. The patch will check
> the total time to process pending softirq, if the time exceeds 2 ms
> we need to wakeup the ksofirqd to aviod large sched delay.
> 
> Signed-off-by: jun qian <qianjun.kernel@gmail.com>

In Android there's a patch that tries to avoid schedling an RT task on a cpu
that is running softirqs. I wonder if this patch helps with this case.

https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0

John, Wei, is this something of interest to you?

IIUC this patch will make sure the total softirq duration is 2ms rather than
each call is 2ms.

I persume if there's a single handler that takes a lot of time then this won't
help. But in that case, one can argue there's a potential bug with this
handler.

Cheers

--
Qais Yousef

> ---
>  kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 69 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c4201b7f..1f696c8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -25,6 +25,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/tick.h>
>  #include <linux/irq.h>
> +#include <linux/sched/clock.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/irq.h>
> @@ -199,18 +200,17 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt)
>  
>  /*
>   * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> - * but break the loop if need_resched() is set or after 2 ms.
> - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> - * certain cases, such as stop_machine(), jiffies may cease to
> - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> - * well to make sure we eventually return from this method.
> + * but break the loop if need_resched() is set or after MAX_SOFTIRQ_TIME_NS
> + * ns. In the loop, if the processing time of the softirq has exceeded
> + * MAX_SOFTIRQ_TIME_NS ns, we also need to break the loop to wakeup the
> + * ksofirqd.
>   *
>   * These limits have been established via experimentation.
>   * The two things to balance is latency against fairness -
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME_NS 2000000
>  #define MAX_SOFTIRQ_RESTART 10
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS
> @@ -246,15 +246,20 @@ static inline void lockdep_softirq_end(bool in_hardirq)
>  static inline void lockdep_softirq_end(bool in_hardirq) { }
>  #endif
>  
> +DEFINE_PER_CPU(__u32, pending_new_flag);
> +DEFINE_PER_CPU(__u32, pending_next_bit);
> +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1)
> +
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> -	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> +	u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
>  	unsigned long old_flags = current->flags;
>  	int max_restart = MAX_SOFTIRQ_RESTART;
>  	struct softirq_action *h;
>  	bool in_hardirq;
> -	__u32 pending;
> -	int softirq_bit;
> +	__u32 pending, pending_left, pending_new;
> +	int softirq_bit, next_bit;
> +	unsigned long flags;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  
>  	h = softirq_vec;
>  
> -	while ((softirq_bit = ffs(pending))) {
> -		unsigned int vec_nr;
> +	next_bit = per_cpu(pending_next_bit, smp_processor_id());
> +	per_cpu(pending_new_flag, smp_processor_id()) = 0;
> +
> +	pending_left = pending &
> +		(SOFTIRQ_PENDING_MASK << next_bit);
> +	pending_new = pending &
> +		(SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit));
> +
> +	/*
> +	 * In order to be fair, we shold process the pengding bits by the
> +	 * last processing order.
> +	 */
> +	while ((softirq_bit = ffs(pending_left)) ||
> +		(softirq_bit = ffs(pending_new))) {
>  		int prev_count;
> +		unsigned int vec_nr = 0;
>  
> +		/*
> +		 * when the left pengding bits have been handled, we should
> +		 * to reset the h to softirq_vec.
> +		 */
> +		if (!ffs(pending_left)) {
> +			if (per_cpu(pending_new_flag, smp_processor_id()) == 0) {
> +				h = softirq_vec;
> +				per_cpu(pending_new_flag, smp_processor_id()) = 1;
> +			}
> +		}
>  		h += softirq_bit - 1;
>  
>  		vec_nr = h - softirq_vec;
> @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  			preempt_count_set(prev_count);
>  		}
>  		h++;
> -		pending >>= softirq_bit;
> +
> +		if (ffs(pending_left))
> +			pending_left >>= softirq_bit;
> +		else
> +			pending_new >>= softirq_bit;
> +
> +		/*
> +		 * the softirq's action has been run too much time,
> +		 * so it may need to wakeup the ksoftirqd
> +		 */
> +		if (need_resched() && sched_clock() > end) {
> +			/*
> +			 * Ensure that the remaining pending bits will be
> +			 * handled.
> +			 */
> +			local_irq_save(flags);
> +			if (ffs(pending_left))
> +				or_softirq_pending((pending_left << (vec_nr + 1)) |
> +							pending_new);
> +			else
> +				or_softirq_pending(pending_new << (vec_nr + 1));
> +			local_irq_restore(flags);
> +			per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1;
> +			break;
> +		}
>  	}
>  
> +	/* reset the pending_next_bit */
> +	per_cpu(pending_next_bit, smp_processor_id()) = 0;
> +
>  	if (__this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();
>  	local_irq_disable();
>  
>  	pending = local_softirq_pending();
>  	if (pending) {
> -		if (time_before(jiffies, end) && !need_resched() &&
> -		    --max_restart)
> +		if (!need_resched() && --max_restart &&
> +		    sched_clock() <= end)
>  			goto restart;
>  
>  		wakeup_softirqd();
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-11 16:46 ` Qais Yousef
@ 2020-09-11 18:28   ` peterz
  2020-09-14 11:27     ` Qais Yousef
       [not found]     ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>
  0 siblings, 2 replies; 9+ messages in thread
From: peterz @ 2020-09-11 18:28 UTC (permalink / raw)
  To: Qais Yousef
  Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao,
	urezki, John Dias, Wei Wang, Quentin Perret

On Fri, Sep 11, 2020 at 05:46:45PM +0100, Qais Yousef wrote:
> On 09/09/20 17:09, qianjun.kernel@gmail.com wrote:
> > From: jun qian <qianjun.kernel@gmail.com>
> > 
> > When get the pending softirqs, it need to process all the pending
> > softirqs in the while loop. If the processing time of each pending
> > softirq is need more than 2 msec in this loop, or one of the softirq
> > will running a long time, according to the original code logic, it
> > will process all the pending softirqs without wakeuping ksoftirqd,
> > which will cause a relatively large scheduling delay on the
> > corresponding CPU, which we do not wish to see. The patch will check
> > the total time to process pending softirq, if the time exceeds 2 ms
> > we need to wakeup the ksofirqd to aviod large sched delay.
> > 
> > Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> 
> In Android there's a patch that tries to avoid schedling an RT task on a cpu
> that is running softirqs. I wonder if this patch helps with this case.
> 
> https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0
> 
> John, Wei, is this something of interest to you?

Urgh.. that's pretty gross. I think the sane approach is indeed getting
softirqs to react to need_resched() better.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-11 15:55 ` peterz
@ 2020-09-12  7:17   ` jun qian
  0 siblings, 0 replies; 9+ messages in thread
From: jun qian @ 2020-09-12  7:17 UTC (permalink / raw)
  To: peterz
  Cc: Thomas Gleixner, will, luto, linux-kernel, Yafang Shao,
	Uladzislau Rezki, frederic

<peterz@infradead.org> 于2020年9月11日周五 下午11:55写道:
>
> On Wed, Sep 09, 2020 at 05:09:31PM +0800, qianjun.kernel@gmail.com wrote:
> > From: jun qian <qianjun.kernel@gmail.com>
> >
> > When get the pending softirqs, it need to process all the pending
> > softirqs in the while loop. If the processing time of each pending
> > softirq is need more than 2 msec in this loop, or one of the softirq
> > will running a long time, according to the original code logic, it
> > will process all the pending softirqs without wakeuping ksoftirqd,
> > which will cause a relatively large scheduling delay on the
> > corresponding CPU, which we do not wish to see. The patch will check
> > the total time to process pending softirq, if the time exceeds 2 ms
> > we need to wakeup the ksofirqd to aviod large sched delay.
>
> But what is all that unreadaable gibberish with pending_new_{flag,bit} ?
>
> Random comments below..
>
>
> > +#define MAX_SOFTIRQ_TIME_NS 2000000
>
>         2*NSEC_PER_MSEC
>
>
> > +DEFINE_PER_CPU(__u32, pending_new_flag);
> > +DEFINE_PER_CPU(__u32, pending_next_bit);
>
> __u32 is for userspace ABI, this is not it, use u32
>
> > +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1)
> > +
> >  asmlinkage __visible void __softirq_entry __do_softirq(void)
> >  {
> > -     unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> > +     u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
> >       unsigned long old_flags = current->flags;
> >       int max_restart = MAX_SOFTIRQ_RESTART;
> >       struct softirq_action *h;
> >       bool in_hardirq;
> > -     __u32 pending;
> > -     int softirq_bit;
> > +     __u32 pending, pending_left, pending_new;
> > +     int softirq_bit, next_bit;
> > +     unsigned long flags;
> >
> >       /*
> >        * Mask out PF_MEMALLOC as the current task context is borrowed for the
> > @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> >
> >       h = softirq_vec;
> >
> > -     while ((softirq_bit = ffs(pending))) {
> > -             unsigned int vec_nr;
> > +     next_bit = per_cpu(pending_next_bit, smp_processor_id());
> > +     per_cpu(pending_new_flag, smp_processor_id()) = 0;
>
>         __this_cpu_read() / __this_cpu_write()
>
> > +
> > +     pending_left = pending &
> > +             (SOFTIRQ_PENDING_MASK << next_bit);
> > +     pending_new = pending &
> > +             (SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit));
>
> The second mask is the inverse of the first.
>
> > +     /*
> > +      * In order to be fair, we shold process the pengding bits by the
> > +      * last processing order.
> > +      */
> > +     while ((softirq_bit = ffs(pending_left)) ||
> > +             (softirq_bit = ffs(pending_new))) {
> >               int prev_count;
> > +             unsigned int vec_nr = 0;
> >
> > +             /*
> > +              * when the left pengding bits have been handled, we should
> > +              * to reset the h to softirq_vec.
> > +              */
> > +             if (!ffs(pending_left)) {
> > +                     if (per_cpu(pending_new_flag, smp_processor_id()) == 0) {
> > +                             h = softirq_vec;
> > +                             per_cpu(pending_new_flag, smp_processor_id()) = 1;
> > +                     }
> > +             }
> >               h += softirq_bit - 1;
> >
> >               vec_nr = h - softirq_vec;
> > @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> >                       preempt_count_set(prev_count);
> >               }
> >               h++;
> > -             pending >>= softirq_bit;
> > +
> > +             if (ffs(pending_left))
>
> This is the _third_ ffs(pending_left), those things are _expensive_ (on
> some archs, see include/asm-generic/bitops/__ffs.h).
>
> > +                     pending_left >>= softirq_bit;
> > +             else
> > +                     pending_new >>= softirq_bit;
> > +
> > +             /*
> > +              * the softirq's action has been run too much time,
> > +              * so it may need to wakeup the ksoftirqd
> > +              */
> > +             if (need_resched() && sched_clock() > end) {
> > +                     /*
> > +                      * Ensure that the remaining pending bits will be
> > +                      * handled.
> > +                      */
> > +                     local_irq_save(flags);
> > +                     if (ffs(pending_left))
>
> *fourth*...
>
> > +                             or_softirq_pending((pending_left << (vec_nr + 1)) |
> > +                                                     pending_new);
> > +                     else
> > +                             or_softirq_pending(pending_new << (vec_nr + 1));
> > +                     local_irq_restore(flags);
> > +                     per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1;
> > +                     break;
> > +             }
> >       }
> >
> > +     /* reset the pending_next_bit */
> > +     per_cpu(pending_next_bit, smp_processor_id()) = 0;
> > +
> >       if (__this_cpu_read(ksoftirqd) == current)
> >               rcu_softirq_qs();
> >       local_irq_disable();
> >
> >       pending = local_softirq_pending();
> >       if (pending) {
> > -             if (time_before(jiffies, end) && !need_resched() &&
> > -                 --max_restart)
> > +             if (!need_resched() && --max_restart &&
> > +                 sched_clock() <= end)
> >                       goto restart;
> >
> >               wakeup_softirqd();
>
> This really wants to be a number of separate patches; and I quickly lost
> the plot in your code. Instead of cleaning things up, you're making an
> even bigger mess of things.
>
> That said, I _think_ I've managed to decode what you want. See the
> completely untested patches attached.
>
>

thanks a lot thank for your suggestion. I will rewrite the patch by
following you sugestion, Useing multiple patches to clarify ideas.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-11 18:28   ` peterz
@ 2020-09-14 11:27     ` Qais Yousef
  2020-09-14 14:14       ` peterz
       [not found]     ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>
  1 sibling, 1 reply; 9+ messages in thread
From: Qais Yousef @ 2020-09-14 11:27 UTC (permalink / raw)
  To: peterz
  Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao,
	urezki, John Dias, Wei Wang, Quentin Perret

On 09/11/20 20:28, peterz@infradead.org wrote:
> On Fri, Sep 11, 2020 at 05:46:45PM +0100, Qais Yousef wrote:
> > On 09/09/20 17:09, qianjun.kernel@gmail.com wrote:
> > > From: jun qian <qianjun.kernel@gmail.com>
> > > 
> > > When get the pending softirqs, it need to process all the pending
> > > softirqs in the while loop. If the processing time of each pending
> > > softirq is need more than 2 msec in this loop, or one of the softirq
> > > will running a long time, according to the original code logic, it
> > > will process all the pending softirqs without wakeuping ksoftirqd,
> > > which will cause a relatively large scheduling delay on the
> > > corresponding CPU, which we do not wish to see. The patch will check
> > > the total time to process pending softirq, if the time exceeds 2 ms
> > > we need to wakeup the ksofirqd to aviod large sched delay.
> > > 
> > > Signed-off-by: jun qian <qianjun.kernel@gmail.com>
> > 
> > In Android there's a patch that tries to avoid schedling an RT task on a cpu
> > that is running softirqs. I wonder if this patch helps with this case.
> > 
> > https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0
> > 
> > John, Wei, is this something of interest to you?
> 
> Urgh.. that's pretty gross. I think the sane approach is indeed getting
> softirqs to react to need_resched() better.

What does PREEMPT_RT do to deal with softirqs delays?

I have tried playing with enabling threadirqs, which AFAIU should make softirqs
preemptible, right?

I realize this patch is still missing from mainline at least:

	https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch

Would this be a heavy handed approach to make available for non PREEMPT_RT
kernels?

I only worry about potential NET_RX throughput issues. Which by the way is
protected with preempt_disable currently in mainline. See netif_rx_ni().

I am guessing here, but I suspect this NET_RX softirq is one source of big
delays when network activity is high.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
       [not found]     ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>
@ 2020-09-14 11:41       ` Qais Yousef
  0 siblings, 0 replies; 9+ messages in thread
From: Qais Yousef @ 2020-09-14 11:41 UTC (permalink / raw)
  To: John Dias
  Cc: Peter Zijlstra, qianjun.kernel, tglx, will, luto, LKML,
	laoar.shao, urezki, Wei Wang, Quentin Perret

On 09/11/20 12:14, John Dias wrote:
> I agree that the rt-softirq interaction patches are a gross hack (and I
> wrote the original versions, so it's my grossness). The problem is that
> even a 1ms delay in executing those low-latency audio threads is enough to
> cause obvious glitching in audio under very real circumstances, which is
> not an acceptable user experience -- and some softirq handlers run for >1ms
> on their own. (And 1ms is a high upper bound, not a median target.)

AFAIK professional audio apps have the toughest limit of sub 10ms. 120MHz
screens impose a stricter limit on display pipeline too to finish its frame in
8ms.

1ms is too short and approaches PREEMPT_RT realm.

Is it possible to expand on the use case here? What application requires this
constraint?

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-14 11:27     ` Qais Yousef
@ 2020-09-14 14:14       ` peterz
  2020-09-14 15:28         ` Qais Yousef
  0 siblings, 1 reply; 9+ messages in thread
From: peterz @ 2020-09-14 14:14 UTC (permalink / raw)
  To: Qais Yousef
  Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao,
	urezki, John Dias, Wei Wang, Quentin Perret

On Mon, Sep 14, 2020 at 12:27:35PM +0100, Qais Yousef wrote:
> What does PREEMPT_RT do to deal with softirqs delays?

Makes the lot preemptible, you found the patch below.

> I have tried playing with enabling threadirqs, which AFAIU should make softirqs
> preemptible, right?

Not yet,..

> I realize this patch is still missing from mainline at least:
> 
> 	https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch
> 
> Would this be a heavy handed approach to make available for non PREEMPT_RT
> kernels?

Not sure, I suspect it relies on migrate_disable(), which is
preempt_disable() on !RT and then we're back to square one.

> I only worry about potential NET_RX throughput issues. Which by the way is
> protected with preempt_disable currently in mainline. See netif_rx_ni().

So preempt_disable() isn't necessairily a problem, you just want it to
terminate soonish after need_resched() becomes true. Also, I'm having a
wee problem getting from net_rx_action() to netif_rx_ni()

> I am guessing here, but I suspect this NET_RX softirq is one source of big
> delays when network activity is high.

Well, one approach is to more agressively limit how long softirq
processing can run. Current measures are very soft in that regard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
  2020-09-14 14:14       ` peterz
@ 2020-09-14 15:28         ` Qais Yousef
  0 siblings, 0 replies; 9+ messages in thread
From: Qais Yousef @ 2020-09-14 15:28 UTC (permalink / raw)
  To: peterz
  Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao,
	urezki, John Dias, Wei Wang, Quentin Perret

On 09/14/20 16:14, peterz@infradead.org wrote:
> On Mon, Sep 14, 2020 at 12:27:35PM +0100, Qais Yousef wrote:
> > What does PREEMPT_RT do to deal with softirqs delays?
> 
> Makes the lot preemptible, you found the patch below.
> 
> > I have tried playing with enabling threadirqs, which AFAIU should make softirqs
> > preemptible, right?
> 
> Not yet,..
> 
> > I realize this patch is still missing from mainline at least:
> > 
> > 	https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch
> > 
> > Would this be a heavy handed approach to make available for non PREEMPT_RT
> > kernels?
> 
> Not sure, I suspect it relies on migrate_disable(), which is
> preempt_disable() on !RT and then we're back to square one.

I think it will depend on local_bh_disable(). I didn't dig into the patch
above, but I believe it's doing that for RT.

Or maybe there's another aspect I am not aware of that relies on
migrate_disable() too..

> 
> > I only worry about potential NET_RX throughput issues. Which by the way is
> > protected with preempt_disable currently in mainline. See netif_rx_ni().
> 
> So preempt_disable() isn't necessairily a problem, you just want it to

Yes. But high network traffic will make this a busy softirq. And it won't work
with the patch above. But I assume the above will have to fix that with it.

	https://lore.kernel.org/netdev/20170616172400.10809-1-bigeasy@linutronix.de/

For the time being, it's just another potential path that could introduce
latencies.

I can't follow the whole thing too, but if 5G modems ends up there; I can see
this a big source of noise when the user is downloading a big file. Assuming 5g
lives up to its reputation of 400+ Mbps in practice.

So there might be a tangible trade off between better softirqs latencies vs
better network throughput.

> terminate soonish after need_resched() becomes true. Also, I'm having a
> wee problem getting from net_rx_action() to netif_rx_ni()

I can investigate this direction :)

> 
> > I am guessing here, but I suspect this NET_RX softirq is one source of big
> > delays when network activity is high.
> 
> Well, one approach is to more agressively limit how long softirq
> processing can run. Current measures are very soft in that regard.

Which is this patch. Although it doesn't take into account a single softirq
exceeding the quota IIUC. But the need_resched() bit above should address that.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-14 15:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09  9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel
2020-09-11 15:55 ` peterz
2020-09-12  7:17   ` jun qian
2020-09-11 16:46 ` Qais Yousef
2020-09-11 18:28   ` peterz
2020-09-14 11:27     ` Qais Yousef
2020-09-14 14:14       ` peterz
2020-09-14 15:28         ` Qais Yousef
     [not found]     ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>
2020-09-14 11:41       ` Qais Yousef

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.