All of lore.kernel.org
 help / color / mirror / Atom feed
From: peterz@infradead.org
To: qianjun.kernel@gmail.com
Cc: tglx@linutronix.de, will@kernel.org, luto@kernel.org,
	linux-kernel@vger.kernel.org, laoar.shao@gmail.com,
	urezki@gmail.com, frederic@kernel.org
Subject: Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs
Date: Fri, 11 Sep 2020 17:55:55 +0200	[thread overview]
Message-ID: <20200911155555.GX2674@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200909090931.8836-1-qianjun.kernel@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4585 bytes --]

On Wed, Sep 09, 2020 at 05:09:31PM +0800, qianjun.kernel@gmail.com wrote:
> From: jun qian <qianjun.kernel@gmail.com>
> 
> When get the pending softirqs, it need to process all the pending
> softirqs in the while loop. If the processing time of each pending
> softirq is need more than 2 msec in this loop, or one of the softirq
> will running a long time, according to the original code logic, it
> will process all the pending softirqs without wakeuping ksoftirqd,
> which will cause a relatively large scheduling delay on the
> corresponding CPU, which we do not wish to see. The patch will check
> the total time to process pending softirq, if the time exceeds 2 ms
> we need to wakeup the ksofirqd to aviod large sched delay.

But what is all that unreadaable gibberish with pending_new_{flag,bit} ?

Random comments below..


> +#define MAX_SOFTIRQ_TIME_NS 2000000

	2*NSEC_PER_MSEC


> +DEFINE_PER_CPU(__u32, pending_new_flag);
> +DEFINE_PER_CPU(__u32, pending_next_bit);

__u32 is for userspace ABI, this is not it, use u32

> +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1)
> +
>  asmlinkage __visible void __softirq_entry __do_softirq(void)
>  {
> -	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
> +	u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS;
>  	unsigned long old_flags = current->flags;
>  	int max_restart = MAX_SOFTIRQ_RESTART;
>  	struct softirq_action *h;
>  	bool in_hardirq;
> -	__u32 pending;
> -	int softirq_bit;
> +	__u32 pending, pending_left, pending_new;
> +	int softirq_bit, next_bit;
> +	unsigned long flags;
>  
>  	/*
>  	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
> @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  
>  	h = softirq_vec;
>  
> -	while ((softirq_bit = ffs(pending))) {
> -		unsigned int vec_nr;
> +	next_bit = per_cpu(pending_next_bit, smp_processor_id());
> +	per_cpu(pending_new_flag, smp_processor_id()) = 0;

	__this_cpu_read() / __this_cpu_write()

> +
> +	pending_left = pending &
> +		(SOFTIRQ_PENDING_MASK << next_bit);
> +	pending_new = pending &
> +		(SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit));

The second mask is the inverse of the first.

> +	/*
> +	 * In order to be fair, we shold process the pengding bits by the
> +	 * last processing order.
> +	 */
> +	while ((softirq_bit = ffs(pending_left)) ||
> +		(softirq_bit = ffs(pending_new))) {
>  		int prev_count;
> +		unsigned int vec_nr = 0;
>  
> +		/*
> +		 * when the left pengding bits have been handled, we should
> +		 * to reset the h to softirq_vec.
> +		 */
> +		if (!ffs(pending_left)) {
> +			if (per_cpu(pending_new_flag, smp_processor_id()) == 0) {
> +				h = softirq_vec;
> +				per_cpu(pending_new_flag, smp_processor_id()) = 1;
> +			}
> +		}
>  		h += softirq_bit - 1;
>  
>  		vec_nr = h - softirq_vec;
> @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
>  			preempt_count_set(prev_count);
>  		}
>  		h++;
> -		pending >>= softirq_bit;
> +
> +		if (ffs(pending_left))

This is the _third_ ffs(pending_left), those things are _expensive_ (on
some archs, see include/asm-generic/bitops/__ffs.h).

> +			pending_left >>= softirq_bit;
> +		else
> +			pending_new >>= softirq_bit;
> +
> +		/*
> +		 * the softirq's action has been run too much time,
> +		 * so it may need to wakeup the ksoftirqd
> +		 */
> +		if (need_resched() && sched_clock() > end) {
> +			/*
> +			 * Ensure that the remaining pending bits will be
> +			 * handled.
> +			 */
> +			local_irq_save(flags);
> +			if (ffs(pending_left))

*fourth*...

> +				or_softirq_pending((pending_left << (vec_nr + 1)) |
> +							pending_new);
> +			else
> +				or_softirq_pending(pending_new << (vec_nr + 1));
> +			local_irq_restore(flags);
> +			per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1;
> +			break;
> +		}
>  	}
>  
> +	/* reset the pending_next_bit */
> +	per_cpu(pending_next_bit, smp_processor_id()) = 0;
> +
>  	if (__this_cpu_read(ksoftirqd) == current)
>  		rcu_softirq_qs();
>  	local_irq_disable();
>  
>  	pending = local_softirq_pending();
>  	if (pending) {
> -		if (time_before(jiffies, end) && !need_resched() &&
> -		    --max_restart)
> +		if (!need_resched() && --max_restart &&
> +		    sched_clock() <= end)
>  			goto restart;
>  
>  		wakeup_softirqd();

This really wants to be a number of separate patches; and I quickly lost
the plot in your code. Instead of cleaning things up, you're making an
even bigger mess of things.

That said, I _think_ I've managed to decode what you want. See the
completely untested patches attached.



[-- Attachment #2: peterz-softirq-fix-loop.patch --]
[-- Type: text/x-diff, Size: 1456 bytes --]

Subject: softirq: Rewrite softirq processing loop
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:00:03 CEST 2020

Simplify the softirq processing loop by using the bitmap APIs

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -258,9 +258,9 @@ asmlinkage __visible void __softirq_entr
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
 	struct softirq_action *h;
+	unsigned long pending;
+	unsigned int vec_nr;
 	bool in_hardirq;
-	__u32 pending;
-	int softirq_bit;
 
 	/*
 	 * Mask out PF_MEMALLOC as the current task context is borrowed for the
@@ -281,15 +281,13 @@ asmlinkage __visible void __softirq_entr
 
 	local_irq_enable();
 
-	h = softirq_vec;
+	for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS) {
+		unsigned int prev_count;
 
-	while ((softirq_bit = ffs(pending))) {
-		unsigned int vec_nr;
-		int prev_count;
+		__clear_bit(vec_nr, &pending);
 
-		h += softirq_bit - 1;
+		h = softirq_vec + vec_nr;
 
-		vec_nr = h - softirq_vec;
 		prev_count = preempt_count();
 
 		kstat_incr_softirqs_this_cpu(vec_nr);
@@ -303,8 +301,6 @@ asmlinkage __visible void __softirq_entr
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
-		h++;
-		pending >>= softirq_bit;
 	}
 
 	if (__this_cpu_read(ksoftirqd) == current)

[-- Attachment #3: peterz-softirq-timo.patch --]
[-- Type: text/x-diff, Size: 1552 bytes --]

Subject: softirq: Use sched_clock() based timeout
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:30:01 CEST 2020

Replace the jiffies based timeout with a sched_clock() based one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -25,6 +25,7 @@
 #include <linux/smpboot.h>
 #include <linux/tick.h>
 #include <linux/irq.h>
+#include <linux/sched/clock.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/irq.h>
@@ -216,7 +217,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -254,9 +255,9 @@ static inline void lockdep_softirq_end(b
 
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
-	unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 	unsigned long old_flags = current->flags;
 	int max_restart = MAX_SOFTIRQ_RESTART;
+	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
 	unsigned int vec_nr;
@@ -309,7 +310,7 @@ asmlinkage __visible void __softirq_entr
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (time_before(jiffies, end) && !need_resched() &&
+		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
 		    --max_restart)
 			goto restart;
 

[-- Attachment #4: peterz-softirq-needs-break.patch --]
[-- Type: text/x-diff, Size: 2703 bytes --]

Subject: softirq: Factor loop termination condition
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:17:20 CEST 2020


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   44 +++++++++++++++++++++++++-------------------
 1 file changed, 25 insertions(+), 19 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -204,22 +204,6 @@ void __local_bh_enable_ip(unsigned long
 }
 EXPORT_SYMBOL(__local_bh_enable_ip);
 
-/*
- * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
- * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
- * certain cases, such as stop_machine(), jiffies may cease to
- * increment and so we need the MAX_SOFTIRQ_RESTART limit as
- * well to make sure we eventually return from this method.
- *
- * These limits have been established via experimentation.
- * The two things to balance is latency against fairness -
- * we want to handle softirqs as soon as possible, but they
- * should not be able to lock up the box.
- */
-#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
-#define MAX_SOFTIRQ_RESTART 10
-
 #ifdef CONFIG_TRACE_IRQFLAGS
 /*
  * When we run softirqs from irq_exit() and thus on the hardirq stack we need
@@ -253,10 +237,33 @@ static inline bool lockdep_softirq_start
 static inline void lockdep_softirq_end(bool in_hardirq) { }
 #endif
 
+/*
+ * We restart softirq processing but break the loop if need_resched() is set or
+ * after 2 ms. The MAX_SOFTIRQ_RESTART guarantees a loop termination if
+ * sched_clock() were ever to stall.
+ *
+ * These limits have been established via experimentation.  The two things to
+ * balance is latency against fairness - we want to handle softirqs as soon as
+ * possible, but they should not be able to lock up the box.
+ */
+#define MAX_SOFTIRQ_TIME	2*NSEC_PER_MSEC
+#define MAX_SOFTIRQ_RESTART	10
+
+static inline bool __softirq_needs_break(u64 start)
+{
+	if (need_resched())
+		return true;
+
+	if (sched_clock() - start >= MAX_SOFTIRQ_TIME)
+		return true;
+
+	return false;
+}
+
 asmlinkage __visible void __softirq_entry __do_softirq(void)
 {
+	unsigned int max_restart = MAX_SOFTIRQ_RESTART;
 	unsigned long old_flags = current->flags;
-	int max_restart = MAX_SOFTIRQ_RESTART;
 	u64 start = sched_clock();
 	struct softirq_action *h;
 	unsigned long pending;
@@ -310,8 +317,7 @@ asmlinkage __visible void __softirq_entr
 
 	pending = local_softirq_pending();
 	if (pending) {
-		if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() &&
-		    --max_restart)
+		if (!__softirq_needs_break(start) && --max_restart)
 			goto restart;
 
 		wakeup_softirqd();

[-- Attachment #5: peterz-softirq-break-more.patch --]
[-- Type: text/x-diff, Size: 1118 bytes --]

Subject: softirq: Allow early break
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Sep 11 17:50:17 CEST 2020

Allow terminating the softirq processing loop without finishing the
vectors.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/softirq.c |   16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -309,19 +309,23 @@ asmlinkage __visible void __softirq_entr
 			       prev_count, preempt_count());
 			preempt_count_set(prev_count);
 		}
+
+		if (pending && __softirq_needs_break(start))
+			break;
 	}
 
 	if (__this_cpu_read(ksoftirqd) == current)
 		rcu_softirq_qs();
 	local_irq_disable();
 
-	pending = local_softirq_pending();
-	if (pending) {
-		if (!__softirq_needs_break(start) && --max_restart)
-			goto restart;
+	if (pending)
+		or_softirq_pending(pending);
+	else if ((pending = local_softirq_pending()) &&
+		 !__softirq_needs_break(start) &&
+		 --max_restart)
+		goto restart;
 
-		wakeup_softirqd();
-	}
+	wakeup_softirqd();
 
 	lockdep_softirq_end(in_hardirq);
 	account_irq_exit_time(current);

  reply	other threads:[~2020-09-11 15:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09  9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel
2020-09-11 15:55 ` peterz [this message]
2020-09-12  7:17   ` jun qian
2020-09-11 16:46 ` Qais Yousef
2020-09-11 18:28   ` peterz
2020-09-14 11:27     ` Qais Yousef
2020-09-14 14:14       ` peterz
2020-09-14 15:28         ` Qais Yousef
     [not found]     ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>
2020-09-14 11:41       ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200911155555.GX2674@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=frederic@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=qianjun.kernel@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.