* [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs @ 2020-09-09 9:09 qianjun.kernel 2020-09-11 15:55 ` peterz 2020-09-11 16:46 ` Qais Yousef 0 siblings, 2 replies; 9+ messages in thread From: qianjun.kernel @ 2020-09-09 9:09 UTC (permalink / raw) To: tglx, peterz, will, luto, linux-kernel; +Cc: laoar.shao, urezki, jun qian From: jun qian <qianjun.kernel@gmail.com> When get the pending softirqs, it need to process all the pending softirqs in the while loop. If the processing time of each pending softirq is need more than 2 msec in this loop, or one of the softirq will running a long time, according to the original code logic, it will process all the pending softirqs without wakeuping ksoftirqd, which will cause a relatively large scheduling delay on the corresponding CPU, which we do not wish to see. The patch will check the total time to process pending softirq, if the time exceeds 2 ms we need to wakeup the ksofirqd to aviod large sched delay. Signed-off-by: jun qian <qianjun.kernel@gmail.com> --- kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 69 insertions(+), 14 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index c4201b7f..1f696c8 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -25,6 +25,7 @@ #include <linux/smpboot.h> #include <linux/tick.h> #include <linux/irq.h> +#include <linux/sched/clock.h> #define CREATE_TRACE_POINTS #include <trace/events/irq.h> @@ -199,18 +200,17 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) /* * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, - * but break the loop if need_resched() is set or after 2 ms. - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in - * certain cases, such as stop_machine(), jiffies may cease to - * increment and so we need the MAX_SOFTIRQ_RESTART limit as - * well to make sure we eventually return from this method. + * but break the loop if need_resched() is set or after MAX_SOFTIRQ_TIME_NS + * ns. In the loop, if the processing time of the softirq has exceeded + * MAX_SOFTIRQ_TIME_NS ns, we also need to break the loop to wakeup the + * ksofirqd. * * These limits have been established via experimentation. * The two things to balance is latency against fairness - * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) +#define MAX_SOFTIRQ_TIME_NS 2000000 #define MAX_SOFTIRQ_RESTART 10 #ifdef CONFIG_TRACE_IRQFLAGS @@ -246,15 +246,20 @@ static inline void lockdep_softirq_end(bool in_hardirq) static inline void lockdep_softirq_end(bool in_hardirq) { } #endif +DEFINE_PER_CPU(__u32, pending_new_flag); +DEFINE_PER_CPU(__u32, pending_next_bit); +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1) + asmlinkage __visible void __softirq_entry __do_softirq(void) { - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; + u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS; unsigned long old_flags = current->flags; int max_restart = MAX_SOFTIRQ_RESTART; struct softirq_action *h; bool in_hardirq; - __u32 pending; - int softirq_bit; + __u32 pending, pending_left, pending_new; + int softirq_bit, next_bit; + unsigned long flags; /* * Mask out PF_MEMALLOC as the current task context is borrowed for the @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) h = softirq_vec; - while ((softirq_bit = ffs(pending))) { - unsigned int vec_nr; + next_bit = per_cpu(pending_next_bit, smp_processor_id()); + per_cpu(pending_new_flag, smp_processor_id()) = 0; + + pending_left = pending & + (SOFTIRQ_PENDING_MASK << next_bit); + pending_new = pending & + (SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit)); + + /* + * In order to be fair, we shold process the pengding bits by the + * last processing order. + */ + while ((softirq_bit = ffs(pending_left)) || + (softirq_bit = ffs(pending_new))) { int prev_count; + unsigned int vec_nr = 0; + /* + * when the left pengding bits have been handled, we should + * to reset the h to softirq_vec. + */ + if (!ffs(pending_left)) { + if (per_cpu(pending_new_flag, smp_processor_id()) == 0) { + h = softirq_vec; + per_cpu(pending_new_flag, smp_processor_id()) = 1; + } + } h += softirq_bit - 1; vec_nr = h - softirq_vec; @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) preempt_count_set(prev_count); } h++; - pending >>= softirq_bit; + + if (ffs(pending_left)) + pending_left >>= softirq_bit; + else + pending_new >>= softirq_bit; + + /* + * the softirq's action has been run too much time, + * so it may need to wakeup the ksoftirqd + */ + if (need_resched() && sched_clock() > end) { + /* + * Ensure that the remaining pending bits will be + * handled. + */ + local_irq_save(flags); + if (ffs(pending_left)) + or_softirq_pending((pending_left << (vec_nr + 1)) | + pending_new); + else + or_softirq_pending(pending_new << (vec_nr + 1)); + local_irq_restore(flags); + per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1; + break; + } } + /* reset the pending_next_bit */ + per_cpu(pending_next_bit, smp_processor_id()) = 0; + if (__this_cpu_read(ksoftirqd) == current) rcu_softirq_qs(); local_irq_disable(); pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) && !need_resched() && - --max_restart) + if (!need_resched() && --max_restart && + sched_clock() <= end) goto restart; wakeup_softirqd(); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-09 9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel @ 2020-09-11 15:55 ` peterz 2020-09-12 7:17 ` jun qian 2020-09-11 16:46 ` Qais Yousef 1 sibling, 1 reply; 9+ messages in thread From: peterz @ 2020-09-11 15:55 UTC (permalink / raw) To: qianjun.kernel Cc: tglx, will, luto, linux-kernel, laoar.shao, urezki, frederic [-- Attachment #1: Type: text/plain, Size: 4585 bytes --] On Wed, Sep 09, 2020 at 05:09:31PM +0800, qianjun.kernel@gmail.com wrote: > From: jun qian <qianjun.kernel@gmail.com> > > When get the pending softirqs, it need to process all the pending > softirqs in the while loop. If the processing time of each pending > softirq is need more than 2 msec in this loop, or one of the softirq > will running a long time, according to the original code logic, it > will process all the pending softirqs without wakeuping ksoftirqd, > which will cause a relatively large scheduling delay on the > corresponding CPU, which we do not wish to see. The patch will check > the total time to process pending softirq, if the time exceeds 2 ms > we need to wakeup the ksofirqd to aviod large sched delay. But what is all that unreadaable gibberish with pending_new_{flag,bit} ? Random comments below.. > +#define MAX_SOFTIRQ_TIME_NS 2000000 2*NSEC_PER_MSEC > +DEFINE_PER_CPU(__u32, pending_new_flag); > +DEFINE_PER_CPU(__u32, pending_next_bit); __u32 is for userspace ABI, this is not it, use u32 > +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1) > + > asmlinkage __visible void __softirq_entry __do_softirq(void) > { > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > + u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS; > unsigned long old_flags = current->flags; > int max_restart = MAX_SOFTIRQ_RESTART; > struct softirq_action *h; > bool in_hardirq; > - __u32 pending; > - int softirq_bit; > + __u32 pending, pending_left, pending_new; > + int softirq_bit, next_bit; > + unsigned long flags; > > /* > * Mask out PF_MEMALLOC as the current task context is borrowed for the > @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > h = softirq_vec; > > - while ((softirq_bit = ffs(pending))) { > - unsigned int vec_nr; > + next_bit = per_cpu(pending_next_bit, smp_processor_id()); > + per_cpu(pending_new_flag, smp_processor_id()) = 0; __this_cpu_read() / __this_cpu_write() > + > + pending_left = pending & > + (SOFTIRQ_PENDING_MASK << next_bit); > + pending_new = pending & > + (SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit)); The second mask is the inverse of the first. > + /* > + * In order to be fair, we shold process the pengding bits by the > + * last processing order. > + */ > + while ((softirq_bit = ffs(pending_left)) || > + (softirq_bit = ffs(pending_new))) { > int prev_count; > + unsigned int vec_nr = 0; > > + /* > + * when the left pengding bits have been handled, we should > + * to reset the h to softirq_vec. > + */ > + if (!ffs(pending_left)) { > + if (per_cpu(pending_new_flag, smp_processor_id()) == 0) { > + h = softirq_vec; > + per_cpu(pending_new_flag, smp_processor_id()) = 1; > + } > + } > h += softirq_bit - 1; > > vec_nr = h - softirq_vec; > @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > preempt_count_set(prev_count); > } > h++; > - pending >>= softirq_bit; > + > + if (ffs(pending_left)) This is the _third_ ffs(pending_left), those things are _expensive_ (on some archs, see include/asm-generic/bitops/__ffs.h). > + pending_left >>= softirq_bit; > + else > + pending_new >>= softirq_bit; > + > + /* > + * the softirq's action has been run too much time, > + * so it may need to wakeup the ksoftirqd > + */ > + if (need_resched() && sched_clock() > end) { > + /* > + * Ensure that the remaining pending bits will be > + * handled. > + */ > + local_irq_save(flags); > + if (ffs(pending_left)) *fourth*... > + or_softirq_pending((pending_left << (vec_nr + 1)) | > + pending_new); > + else > + or_softirq_pending(pending_new << (vec_nr + 1)); > + local_irq_restore(flags); > + per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1; > + break; > + } > } > > + /* reset the pending_next_bit */ > + per_cpu(pending_next_bit, smp_processor_id()) = 0; > + > if (__this_cpu_read(ksoftirqd) == current) > rcu_softirq_qs(); > local_irq_disable(); > > pending = local_softirq_pending(); > if (pending) { > - if (time_before(jiffies, end) && !need_resched() && > - --max_restart) > + if (!need_resched() && --max_restart && > + sched_clock() <= end) > goto restart; > > wakeup_softirqd(); This really wants to be a number of separate patches; and I quickly lost the plot in your code. Instead of cleaning things up, you're making an even bigger mess of things. That said, I _think_ I've managed to decode what you want. See the completely untested patches attached. [-- Attachment #2: peterz-softirq-fix-loop.patch --] [-- Type: text/x-diff, Size: 1456 bytes --] Subject: softirq: Rewrite softirq processing loop From: Peter Zijlstra <peterz@infradead.org> Date: Fri Sep 11 17:00:03 CEST 2020 Simplify the softirq processing loop by using the bitmap APIs Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/softirq.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -258,9 +258,9 @@ asmlinkage __visible void __softirq_entr unsigned long old_flags = current->flags; int max_restart = MAX_SOFTIRQ_RESTART; struct softirq_action *h; + unsigned long pending; + unsigned int vec_nr; bool in_hardirq; - __u32 pending; - int softirq_bit; /* * Mask out PF_MEMALLOC as the current task context is borrowed for the @@ -281,15 +281,13 @@ asmlinkage __visible void __softirq_entr local_irq_enable(); - h = softirq_vec; + for_each_set_bit(vec_nr, &pending, NR_SOFTIRQS) { + unsigned int prev_count; - while ((softirq_bit = ffs(pending))) { - unsigned int vec_nr; - int prev_count; + __clear_bit(vec_nr, &pending); - h += softirq_bit - 1; + h = softirq_vec + vec_nr; - vec_nr = h - softirq_vec; prev_count = preempt_count(); kstat_incr_softirqs_this_cpu(vec_nr); @@ -303,8 +301,6 @@ asmlinkage __visible void __softirq_entr prev_count, preempt_count()); preempt_count_set(prev_count); } - h++; - pending >>= softirq_bit; } if (__this_cpu_read(ksoftirqd) == current) [-- Attachment #3: peterz-softirq-timo.patch --] [-- Type: text/x-diff, Size: 1552 bytes --] Subject: softirq: Use sched_clock() based timeout From: Peter Zijlstra <peterz@infradead.org> Date: Fri Sep 11 17:30:01 CEST 2020 Replace the jiffies based timeout with a sched_clock() based one. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/softirq.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -25,6 +25,7 @@ #include <linux/smpboot.h> #include <linux/tick.h> #include <linux/irq.h> +#include <linux/sched/clock.h> #define CREATE_TRACE_POINTS #include <trace/events/irq.h> @@ -216,7 +217,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) +#define MAX_SOFTIRQ_TIME 2*NSEC_PER_MSEC #define MAX_SOFTIRQ_RESTART 10 #ifdef CONFIG_TRACE_IRQFLAGS @@ -254,9 +255,9 @@ static inline void lockdep_softirq_end(b asmlinkage __visible void __softirq_entry __do_softirq(void) { - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; unsigned long old_flags = current->flags; int max_restart = MAX_SOFTIRQ_RESTART; + u64 start = sched_clock(); struct softirq_action *h; unsigned long pending; unsigned int vec_nr; @@ -309,7 +310,7 @@ asmlinkage __visible void __softirq_entr pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) && !need_resched() && + if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() && --max_restart) goto restart; [-- Attachment #4: peterz-softirq-needs-break.patch --] [-- Type: text/x-diff, Size: 2703 bytes --] Subject: softirq: Factor loop termination condition From: Peter Zijlstra <peterz@infradead.org> Date: Fri Sep 11 17:17:20 CEST 2020 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/softirq.c | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -204,22 +204,6 @@ void __local_bh_enable_ip(unsigned long } EXPORT_SYMBOL(__local_bh_enable_ip); -/* - * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, - * but break the loop if need_resched() is set or after 2 ms. - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in - * certain cases, such as stop_machine(), jiffies may cease to - * increment and so we need the MAX_SOFTIRQ_RESTART limit as - * well to make sure we eventually return from this method. - * - * These limits have been established via experimentation. - * The two things to balance is latency against fairness - - * we want to handle softirqs as soon as possible, but they - * should not be able to lock up the box. - */ -#define MAX_SOFTIRQ_TIME 2*NSEC_PER_MSEC -#define MAX_SOFTIRQ_RESTART 10 - #ifdef CONFIG_TRACE_IRQFLAGS /* * When we run softirqs from irq_exit() and thus on the hardirq stack we need @@ -253,10 +237,33 @@ static inline bool lockdep_softirq_start static inline void lockdep_softirq_end(bool in_hardirq) { } #endif +/* + * We restart softirq processing but break the loop if need_resched() is set or + * after 2 ms. The MAX_SOFTIRQ_RESTART guarantees a loop termination if + * sched_clock() were ever to stall. + * + * These limits have been established via experimentation. The two things to + * balance is latency against fairness - we want to handle softirqs as soon as + * possible, but they should not be able to lock up the box. + */ +#define MAX_SOFTIRQ_TIME 2*NSEC_PER_MSEC +#define MAX_SOFTIRQ_RESTART 10 + +static inline bool __softirq_needs_break(u64 start) +{ + if (need_resched()) + return true; + + if (sched_clock() - start >= MAX_SOFTIRQ_TIME) + return true; + + return false; +} + asmlinkage __visible void __softirq_entry __do_softirq(void) { + unsigned int max_restart = MAX_SOFTIRQ_RESTART; unsigned long old_flags = current->flags; - int max_restart = MAX_SOFTIRQ_RESTART; u64 start = sched_clock(); struct softirq_action *h; unsigned long pending; @@ -310,8 +317,7 @@ asmlinkage __visible void __softirq_entr pending = local_softirq_pending(); if (pending) { - if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() && - --max_restart) + if (!__softirq_needs_break(start) && --max_restart) goto restart; wakeup_softirqd(); [-- Attachment #5: peterz-softirq-break-more.patch --] [-- Type: text/x-diff, Size: 1118 bytes --] Subject: softirq: Allow early break From: Peter Zijlstra <peterz@infradead.org> Date: Fri Sep 11 17:50:17 CEST 2020 Allow terminating the softirq processing loop without finishing the vectors. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/softirq.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -309,19 +309,23 @@ asmlinkage __visible void __softirq_entr prev_count, preempt_count()); preempt_count_set(prev_count); } + + if (pending && __softirq_needs_break(start)) + break; } if (__this_cpu_read(ksoftirqd) == current) rcu_softirq_qs(); local_irq_disable(); - pending = local_softirq_pending(); - if (pending) { - if (!__softirq_needs_break(start) && --max_restart) - goto restart; + if (pending) + or_softirq_pending(pending); + else if ((pending = local_softirq_pending()) && + !__softirq_needs_break(start) && + --max_restart) + goto restart; - wakeup_softirqd(); - } + wakeup_softirqd(); lockdep_softirq_end(in_hardirq); account_irq_exit_time(current); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-11 15:55 ` peterz @ 2020-09-12 7:17 ` jun qian 0 siblings, 0 replies; 9+ messages in thread From: jun qian @ 2020-09-12 7:17 UTC (permalink / raw) To: peterz Cc: Thomas Gleixner, will, luto, linux-kernel, Yafang Shao, Uladzislau Rezki, frederic <peterz@infradead.org> 于2020年9月11日周五 下午11:55写道: > > On Wed, Sep 09, 2020 at 05:09:31PM +0800, qianjun.kernel@gmail.com wrote: > > From: jun qian <qianjun.kernel@gmail.com> > > > > When get the pending softirqs, it need to process all the pending > > softirqs in the while loop. If the processing time of each pending > > softirq is need more than 2 msec in this loop, or one of the softirq > > will running a long time, according to the original code logic, it > > will process all the pending softirqs without wakeuping ksoftirqd, > > which will cause a relatively large scheduling delay on the > > corresponding CPU, which we do not wish to see. The patch will check > > the total time to process pending softirq, if the time exceeds 2 ms > > we need to wakeup the ksofirqd to aviod large sched delay. > > But what is all that unreadaable gibberish with pending_new_{flag,bit} ? > > Random comments below.. > > > > +#define MAX_SOFTIRQ_TIME_NS 2000000 > > 2*NSEC_PER_MSEC > > > > +DEFINE_PER_CPU(__u32, pending_new_flag); > > +DEFINE_PER_CPU(__u32, pending_next_bit); > > __u32 is for userspace ABI, this is not it, use u32 > > > +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1) > > + > > asmlinkage __visible void __softirq_entry __do_softirq(void) > > { > > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > > + u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS; > > unsigned long old_flags = current->flags; > > int max_restart = MAX_SOFTIRQ_RESTART; > > struct softirq_action *h; > > bool in_hardirq; > > - __u32 pending; > > - int softirq_bit; > > + __u32 pending, pending_left, pending_new; > > + int softirq_bit, next_bit; > > + unsigned long flags; > > > > /* > > * Mask out PF_MEMALLOC as the current task context is borrowed for the > > @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > > > h = softirq_vec; > > > > - while ((softirq_bit = ffs(pending))) { > > - unsigned int vec_nr; > > + next_bit = per_cpu(pending_next_bit, smp_processor_id()); > > + per_cpu(pending_new_flag, smp_processor_id()) = 0; > > __this_cpu_read() / __this_cpu_write() > > > + > > + pending_left = pending & > > + (SOFTIRQ_PENDING_MASK << next_bit); > > + pending_new = pending & > > + (SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit)); > > The second mask is the inverse of the first. > > > + /* > > + * In order to be fair, we shold process the pengding bits by the > > + * last processing order. > > + */ > > + while ((softirq_bit = ffs(pending_left)) || > > + (softirq_bit = ffs(pending_new))) { > > int prev_count; > > + unsigned int vec_nr = 0; > > > > + /* > > + * when the left pengding bits have been handled, we should > > + * to reset the h to softirq_vec. > > + */ > > + if (!ffs(pending_left)) { > > + if (per_cpu(pending_new_flag, smp_processor_id()) == 0) { > > + h = softirq_vec; > > + per_cpu(pending_new_flag, smp_processor_id()) = 1; > > + } > > + } > > h += softirq_bit - 1; > > > > vec_nr = h - softirq_vec; > > @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > preempt_count_set(prev_count); > > } > > h++; > > - pending >>= softirq_bit; > > + > > + if (ffs(pending_left)) > > This is the _third_ ffs(pending_left), those things are _expensive_ (on > some archs, see include/asm-generic/bitops/__ffs.h). > > > + pending_left >>= softirq_bit; > > + else > > + pending_new >>= softirq_bit; > > + > > + /* > > + * the softirq's action has been run too much time, > > + * so it may need to wakeup the ksoftirqd > > + */ > > + if (need_resched() && sched_clock() > end) { > > + /* > > + * Ensure that the remaining pending bits will be > > + * handled. > > + */ > > + local_irq_save(flags); > > + if (ffs(pending_left)) > > *fourth*... > > > + or_softirq_pending((pending_left << (vec_nr + 1)) | > > + pending_new); > > + else > > + or_softirq_pending(pending_new << (vec_nr + 1)); > > + local_irq_restore(flags); > > + per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1; > > + break; > > + } > > } > > > > + /* reset the pending_next_bit */ > > + per_cpu(pending_next_bit, smp_processor_id()) = 0; > > + > > if (__this_cpu_read(ksoftirqd) == current) > > rcu_softirq_qs(); > > local_irq_disable(); > > > > pending = local_softirq_pending(); > > if (pending) { > > - if (time_before(jiffies, end) && !need_resched() && > > - --max_restart) > > + if (!need_resched() && --max_restart && > > + sched_clock() <= end) > > goto restart; > > > > wakeup_softirqd(); > > This really wants to be a number of separate patches; and I quickly lost > the plot in your code. Instead of cleaning things up, you're making an > even bigger mess of things. > > That said, I _think_ I've managed to decode what you want. See the > completely untested patches attached. > > thanks a lot thank for your suggestion. I will rewrite the patch by following you sugestion, Useing multiple patches to clarify ideas. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-09 9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel 2020-09-11 15:55 ` peterz @ 2020-09-11 16:46 ` Qais Yousef 2020-09-11 18:28 ` peterz 1 sibling, 1 reply; 9+ messages in thread From: Qais Yousef @ 2020-09-11 16:46 UTC (permalink / raw) To: qianjun.kernel Cc: tglx, peterz, will, luto, linux-kernel, laoar.shao, urezki, John Dias, Wei Wang, Quentin Perret On 09/09/20 17:09, qianjun.kernel@gmail.com wrote: > From: jun qian <qianjun.kernel@gmail.com> > > When get the pending softirqs, it need to process all the pending > softirqs in the while loop. If the processing time of each pending > softirq is need more than 2 msec in this loop, or one of the softirq > will running a long time, according to the original code logic, it > will process all the pending softirqs without wakeuping ksoftirqd, > which will cause a relatively large scheduling delay on the > corresponding CPU, which we do not wish to see. The patch will check > the total time to process pending softirq, if the time exceeds 2 ms > we need to wakeup the ksofirqd to aviod large sched delay. > > Signed-off-by: jun qian <qianjun.kernel@gmail.com> In Android there's a patch that tries to avoid schedling an RT task on a cpu that is running softirqs. I wonder if this patch helps with this case. https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0 John, Wei, is this something of interest to you? IIUC this patch will make sure the total softirq duration is 2ms rather than each call is 2ms. I persume if there's a single handler that takes a lot of time then this won't help. But in that case, one can argue there's a potential bug with this handler. Cheers -- Qais Yousef > --- > kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 69 insertions(+), 14 deletions(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index c4201b7f..1f696c8 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -25,6 +25,7 @@ > #include <linux/smpboot.h> > #include <linux/tick.h> > #include <linux/irq.h> > +#include <linux/sched/clock.h> > > #define CREATE_TRACE_POINTS > #include <trace/events/irq.h> > @@ -199,18 +200,17 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) > > /* > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, > - * but break the loop if need_resched() is set or after 2 ms. > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in > - * certain cases, such as stop_machine(), jiffies may cease to > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as > - * well to make sure we eventually return from this method. > + * but break the loop if need_resched() is set or after MAX_SOFTIRQ_TIME_NS > + * ns. In the loop, if the processing time of the softirq has exceeded > + * MAX_SOFTIRQ_TIME_NS ns, we also need to break the loop to wakeup the > + * ksofirqd. > * > * These limits have been established via experimentation. > * The two things to balance is latency against fairness - > * we want to handle softirqs as soon as possible, but they > * should not be able to lock up the box. > */ > -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +#define MAX_SOFTIRQ_TIME_NS 2000000 > #define MAX_SOFTIRQ_RESTART 10 > > #ifdef CONFIG_TRACE_IRQFLAGS > @@ -246,15 +246,20 @@ static inline void lockdep_softirq_end(bool in_hardirq) > static inline void lockdep_softirq_end(bool in_hardirq) { } > #endif > > +DEFINE_PER_CPU(__u32, pending_new_flag); > +DEFINE_PER_CPU(__u32, pending_next_bit); > +#define SOFTIRQ_PENDING_MASK ((1UL << NR_SOFTIRQS) - 1) > + > asmlinkage __visible void __softirq_entry __do_softirq(void) > { > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > + u64 end = sched_clock() + MAX_SOFTIRQ_TIME_NS; > unsigned long old_flags = current->flags; > int max_restart = MAX_SOFTIRQ_RESTART; > struct softirq_action *h; > bool in_hardirq; > - __u32 pending; > - int softirq_bit; > + __u32 pending, pending_left, pending_new; > + int softirq_bit, next_bit; > + unsigned long flags; > > /* > * Mask out PF_MEMALLOC as the current task context is borrowed for the > @@ -277,10 +282,33 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > h = softirq_vec; > > - while ((softirq_bit = ffs(pending))) { > - unsigned int vec_nr; > + next_bit = per_cpu(pending_next_bit, smp_processor_id()); > + per_cpu(pending_new_flag, smp_processor_id()) = 0; > + > + pending_left = pending & > + (SOFTIRQ_PENDING_MASK << next_bit); > + pending_new = pending & > + (SOFTIRQ_PENDING_MASK >> (NR_SOFTIRQS - next_bit)); > + > + /* > + * In order to be fair, we shold process the pengding bits by the > + * last processing order. > + */ > + while ((softirq_bit = ffs(pending_left)) || > + (softirq_bit = ffs(pending_new))) { > int prev_count; > + unsigned int vec_nr = 0; > > + /* > + * when the left pengding bits have been handled, we should > + * to reset the h to softirq_vec. > + */ > + if (!ffs(pending_left)) { > + if (per_cpu(pending_new_flag, smp_processor_id()) == 0) { > + h = softirq_vec; > + per_cpu(pending_new_flag, smp_processor_id()) = 1; > + } > + } > h += softirq_bit - 1; > > vec_nr = h - softirq_vec; > @@ -298,17 +326,44 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > preempt_count_set(prev_count); > } > h++; > - pending >>= softirq_bit; > + > + if (ffs(pending_left)) > + pending_left >>= softirq_bit; > + else > + pending_new >>= softirq_bit; > + > + /* > + * the softirq's action has been run too much time, > + * so it may need to wakeup the ksoftirqd > + */ > + if (need_resched() && sched_clock() > end) { > + /* > + * Ensure that the remaining pending bits will be > + * handled. > + */ > + local_irq_save(flags); > + if (ffs(pending_left)) > + or_softirq_pending((pending_left << (vec_nr + 1)) | > + pending_new); > + else > + or_softirq_pending(pending_new << (vec_nr + 1)); > + local_irq_restore(flags); > + per_cpu(pending_next_bit, smp_processor_id()) = vec_nr + 1; > + break; > + } > } > > + /* reset the pending_next_bit */ > + per_cpu(pending_next_bit, smp_processor_id()) = 0; > + > if (__this_cpu_read(ksoftirqd) == current) > rcu_softirq_qs(); > local_irq_disable(); > > pending = local_softirq_pending(); > if (pending) { > - if (time_before(jiffies, end) && !need_resched() && > - --max_restart) > + if (!need_resched() && --max_restart && > + sched_clock() <= end) > goto restart; > > wakeup_softirqd(); > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-11 16:46 ` Qais Yousef @ 2020-09-11 18:28 ` peterz 2020-09-14 11:27 ` Qais Yousef [not found] ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com> 0 siblings, 2 replies; 9+ messages in thread From: peterz @ 2020-09-11 18:28 UTC (permalink / raw) To: Qais Yousef Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao, urezki, John Dias, Wei Wang, Quentin Perret On Fri, Sep 11, 2020 at 05:46:45PM +0100, Qais Yousef wrote: > On 09/09/20 17:09, qianjun.kernel@gmail.com wrote: > > From: jun qian <qianjun.kernel@gmail.com> > > > > When get the pending softirqs, it need to process all the pending > > softirqs in the while loop. If the processing time of each pending > > softirq is need more than 2 msec in this loop, or one of the softirq > > will running a long time, according to the original code logic, it > > will process all the pending softirqs without wakeuping ksoftirqd, > > which will cause a relatively large scheduling delay on the > > corresponding CPU, which we do not wish to see. The patch will check > > the total time to process pending softirq, if the time exceeds 2 ms > > we need to wakeup the ksofirqd to aviod large sched delay. > > > > Signed-off-by: jun qian <qianjun.kernel@gmail.com> > > In Android there's a patch that tries to avoid schedling an RT task on a cpu > that is running softirqs. I wonder if this patch helps with this case. > > https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0 > > John, Wei, is this something of interest to you? Urgh.. that's pretty gross. I think the sane approach is indeed getting softirqs to react to need_resched() better. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-11 18:28 ` peterz @ 2020-09-14 11:27 ` Qais Yousef 2020-09-14 14:14 ` peterz [not found] ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com> 1 sibling, 1 reply; 9+ messages in thread From: Qais Yousef @ 2020-09-14 11:27 UTC (permalink / raw) To: peterz Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao, urezki, John Dias, Wei Wang, Quentin Perret On 09/11/20 20:28, peterz@infradead.org wrote: > On Fri, Sep 11, 2020 at 05:46:45PM +0100, Qais Yousef wrote: > > On 09/09/20 17:09, qianjun.kernel@gmail.com wrote: > > > From: jun qian <qianjun.kernel@gmail.com> > > > > > > When get the pending softirqs, it need to process all the pending > > > softirqs in the while loop. If the processing time of each pending > > > softirq is need more than 2 msec in this loop, or one of the softirq > > > will running a long time, according to the original code logic, it > > > will process all the pending softirqs without wakeuping ksoftirqd, > > > which will cause a relatively large scheduling delay on the > > > corresponding CPU, which we do not wish to see. The patch will check > > > the total time to process pending softirq, if the time exceeds 2 ms > > > we need to wakeup the ksofirqd to aviod large sched delay. > > > > > > Signed-off-by: jun qian <qianjun.kernel@gmail.com> > > > > In Android there's a patch that tries to avoid schedling an RT task on a cpu > > that is running softirqs. I wonder if this patch helps with this case. > > > > https://android.googlesource.com/kernel/msm/+/5c3f54c34acf4d9ed01530288d4a98acff815d79%5E%21/#F0 > > > > John, Wei, is this something of interest to you? > > Urgh.. that's pretty gross. I think the sane approach is indeed getting > softirqs to react to need_resched() better. What does PREEMPT_RT do to deal with softirqs delays? I have tried playing with enabling threadirqs, which AFAIU should make softirqs preemptible, right? I realize this patch is still missing from mainline at least: https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch Would this be a heavy handed approach to make available for non PREEMPT_RT kernels? I only worry about potential NET_RX throughput issues. Which by the way is protected with preempt_disable currently in mainline. See netif_rx_ni(). I am guessing here, but I suspect this NET_RX softirq is one source of big delays when network activity is high. Thanks -- Qais Yousef ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-14 11:27 ` Qais Yousef @ 2020-09-14 14:14 ` peterz 2020-09-14 15:28 ` Qais Yousef 0 siblings, 1 reply; 9+ messages in thread From: peterz @ 2020-09-14 14:14 UTC (permalink / raw) To: Qais Yousef Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao, urezki, John Dias, Wei Wang, Quentin Perret On Mon, Sep 14, 2020 at 12:27:35PM +0100, Qais Yousef wrote: > What does PREEMPT_RT do to deal with softirqs delays? Makes the lot preemptible, you found the patch below. > I have tried playing with enabling threadirqs, which AFAIU should make softirqs > preemptible, right? Not yet,.. > I realize this patch is still missing from mainline at least: > > https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch > > Would this be a heavy handed approach to make available for non PREEMPT_RT > kernels? Not sure, I suspect it relies on migrate_disable(), which is preempt_disable() on !RT and then we're back to square one. > I only worry about potential NET_RX throughput issues. Which by the way is > protected with preempt_disable currently in mainline. See netif_rx_ni(). So preempt_disable() isn't necessairily a problem, you just want it to terminate soonish after need_resched() becomes true. Also, I'm having a wee problem getting from net_rx_action() to netif_rx_ni() > I am guessing here, but I suspect this NET_RX softirq is one source of big > delays when network activity is high. Well, one approach is to more agressively limit how long softirq processing can run. Current measures are very soft in that regard. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs 2020-09-14 14:14 ` peterz @ 2020-09-14 15:28 ` Qais Yousef 0 siblings, 0 replies; 9+ messages in thread From: Qais Yousef @ 2020-09-14 15:28 UTC (permalink / raw) To: peterz Cc: qianjun.kernel, tglx, will, luto, linux-kernel, laoar.shao, urezki, John Dias, Wei Wang, Quentin Perret On 09/14/20 16:14, peterz@infradead.org wrote: > On Mon, Sep 14, 2020 at 12:27:35PM +0100, Qais Yousef wrote: > > What does PREEMPT_RT do to deal with softirqs delays? > > Makes the lot preemptible, you found the patch below. > > > I have tried playing with enabling threadirqs, which AFAIU should make softirqs > > preemptible, right? > > Not yet,.. > > > I realize this patch is still missing from mainline at least: > > > > https://gitlab.com/kalilinux/packages/linux/blob/a17bad0db9da44cd73f594794a58cc5646393b13/debian/patches-rt/softirq-Add-preemptible-softirq.patch > > > > Would this be a heavy handed approach to make available for non PREEMPT_RT > > kernels? > > Not sure, I suspect it relies on migrate_disable(), which is > preempt_disable() on !RT and then we're back to square one. I think it will depend on local_bh_disable(). I didn't dig into the patch above, but I believe it's doing that for RT. Or maybe there's another aspect I am not aware of that relies on migrate_disable() too.. > > > I only worry about potential NET_RX throughput issues. Which by the way is > > protected with preempt_disable currently in mainline. See netif_rx_ni(). > > So preempt_disable() isn't necessairily a problem, you just want it to Yes. But high network traffic will make this a busy softirq. And it won't work with the patch above. But I assume the above will have to fix that with it. https://lore.kernel.org/netdev/20170616172400.10809-1-bigeasy@linutronix.de/ For the time being, it's just another potential path that could introduce latencies. I can't follow the whole thing too, but if 5G modems ends up there; I can see this a big source of noise when the user is downloading a big file. Assuming 5g lives up to its reputation of 400+ Mbps in practice. So there might be a tangible trade off between better softirqs latencies vs better network throughput. > terminate soonish after need_resched() becomes true. Also, I'm having a > wee problem getting from net_rx_action() to netif_rx_ni() I can investigate this direction :) > > > I am guessing here, but I suspect this NET_RX softirq is one source of big > > delays when network activity is high. > > Well, one approach is to more agressively limit how long softirq > processing can run. Current measures are very soft in that regard. Which is this patch. Although it doesn't take into account a single softirq exceeding the quota IIUC. But the need_resched() bit above should address that. Thanks -- Qais Yousef ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com>]
* Re: [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs [not found] ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com> @ 2020-09-14 11:41 ` Qais Yousef 0 siblings, 0 replies; 9+ messages in thread From: Qais Yousef @ 2020-09-14 11:41 UTC (permalink / raw) To: John Dias Cc: Peter Zijlstra, qianjun.kernel, tglx, will, luto, LKML, laoar.shao, urezki, Wei Wang, Quentin Perret On 09/11/20 12:14, John Dias wrote: > I agree that the rt-softirq interaction patches are a gross hack (and I > wrote the original versions, so it's my grossness). The problem is that > even a 1ms delay in executing those low-latency audio threads is enough to > cause obvious glitching in audio under very real circumstances, which is > not an acceptable user experience -- and some softirq handlers run for >1ms > on their own. (And 1ms is a high upper bound, not a median target.) AFAIK professional audio apps have the toughest limit of sub 10ms. 120MHz screens impose a stricter limit on display pipeline too to finish its frame in 8ms. 1ms is too short and approaches PREEMPT_RT realm. Is it possible to expand on the use case here? What application requires this constraint? Thanks -- Qais Yousef ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-09-14 15:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-09 9:09 [PATCH V6 1/1] Softirq:avoid large sched delay from the pending softirqs qianjun.kernel 2020-09-11 15:55 ` peterz 2020-09-12 7:17 ` jun qian 2020-09-11 16:46 ` Qais Yousef 2020-09-11 18:28 ` peterz 2020-09-14 11:27 ` Qais Yousef 2020-09-14 14:14 ` peterz 2020-09-14 15:28 ` Qais Yousef [not found] ` <CA+njcd3HFV5Gqtt9qzTAzpnA4-4ngPBQ7T0gwgc0Fm9_VoJLcQ@mail.gmail.com> 2020-09-14 11:41 ` Qais Yousef
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.