All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/RFT PATCH 0/2] Forced idle and Non-RCU local softirq pending
@ 2022-12-15 18:42 Srinivas Pandruvada
  2022-12-15 18:42 ` [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq Srinivas Pandruvada
  2022-12-15 18:43 ` [RFC/RFT PATCH 2/2] sched/core: Add max duration to play_precise_idle() Srinivas Pandruvada
  0 siblings, 2 replies; 13+ messages in thread
From: Srinivas Pandruvada @ 2022-12-15 18:42 UTC (permalink / raw)
  To: rafael, peterz, frederic, daniel.lezcano
  Cc: linux-pm, linux-kernel, len.brown, Srinivas Pandruvada

Linux has support for idle injection for a while. To inject time
play_idle_precise() is used.

When idle time is injected using play_idle_precise(), there are couple of issues:

1. Sometimes there are Warning in kernel log:

[147777.095484] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[147777.099719] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #288!!!
[147777.103725] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #288!!!

2. Softirq processing is delayed

A sample kernel trace is in the commit log of patch 0001.

There were offline discussion with Frederic and Peter on this issue.
Frederic sent a test patch with some todos, which I tried to address.
The solution proposed here is that if a ksoftirq is pending break the
do_idle() in loop and give 1 jiffie to process via schedule_timeout(1).

The conversation is pasted here to establish context:

On Sat, Sep 18, 2021 at 08:55:48AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 17, 2021 at 11:42:21PM +0200, Frederic Weisbecker wrote:
> > On Mon, Sep 13, 2021 at 02:58:59AM +0000, Pandruvada, Srinivas wrote:
> > > Hi Frederic,
> > > 
> > > Peter suggested to contact you regarding some issues with force idle
> > > and softirqs. You may have some changes in work or suggestions.
> > > 
> > > We are trying to use idle injection on some CPUs for thermal and
> > > performance reasons. This is done via Linux idle_injection interface
> > > (powercap/idle_inject.c) which calls scheduler function
> > > play_idle_precise(). This results in calling can_stop_idle_tick() via
> > > tick_nohz_idle_stop_tick(), which results in printing of:
> > > 
> > > [  185.765383] NOHZ tick-stop error: Non-RCU local softirq work is
> > > pending, CPU 207 handler #08!!!
> > > 
> > > 
> > > So when tick is about to be stopped, either this work needs to be
> > > migrated or we wait for softirq to be executed and then disable on the
> > > CPU. Please suggest.
> > 
> > You can't blindly migrate softirqs because they often depend on the CPU they
> > are queued on. So you need to wait for them to execute.
> > 
> > As for how to adapt the warning with taking idle injection into consideration,
> > I need to understand something first: how comes we reach this path without
> > need_resched() set?
> 
> It might be set, but the idle inject thread wins from ksoftirqd, it
> being FIFO.

Ah ok.

> > Also looking at play_idle_precise(), we only ever escape the idle loop once
> > the idle inject timer has fired. The need for resched is never checked to break
> > the loop.
> 
> do_idle() has a schedule() loop it it, it will happily schedule.

Oops, forgot my basics...

> The thing is that the idle injection thread is typically the highest
> priority runnable thread and as such will starve things (on purpose).
> 
> Only higher prio FIFO, any DEADLINE or the STOP thread can effectively
> preempt idle injection (and actual IRQs ofcourse).

I see... In fact need_resched() shouldn't even be set in this case I guess...

> 
> So I supopse an IRQ can happen, not finish the softirq in its tail, try
> and punt to ksoftirqd and not get scheduled because idle (injection)
> wins on priority.
> 
> The question is what do we want to do there... we could just run the
> softirq crap from the idle injection thread, seeing how the work
> shouldn't be there in the first place, but since it is, it need being
> done.
> 
> Feels gross tho...

How about the other gross following solution (untested)?:

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index d17b0a5ce6ac..882c48441469 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,6 +52,12 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
+/* FIXME: handle __cpuidle / instrumentation_begin()/end() */
+static bool idle_loop_should_break(void)
+{
+	return need_resched() || task_is_running(__this_cpu_read(ksoftirqd));
+}
+
 static noinline int __cpuidle cpu_idle_poll(void)
 {
 	trace_cpu_idle(0, smp_processor_id());
@@ -59,7 +65,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
 	rcu_idle_enter();
 	local_irq_enable();
 
-	while (!tif_need_resched() &&
+	while (!idle_loop_should_break() &&
 	       (cpu_idle_force_poll || tick_check_broadcast_expired()))
 		cpu_relax();
 
@@ -177,7 +183,7 @@ static void cpuidle_idle_call(void)
 	 * Check if the idle task must be rescheduled. If it is the
 	 * case, exit the function after re-enabling the local irq.
 	 */
-	if (need_resched()) {
+	if (idle_loop_should_break()) {
 		local_irq_enable();
 		return;
 	}
@@ -279,7 +285,7 @@ static void do_idle(void)
 	__current_set_polling();
 	tick_nohz_idle_enter();
 
-	while (!need_resched()) {
+	while (!idle_loop_should_break()) {
 		rmb();
 
 		local_irq_disable();
@@ -373,25 +379,31 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 	WARN_ON_ONCE(!duration_ns);
 	WARN_ON_ONCE(current->mm);
 
-	rcu_sleep_check();
-	preempt_disable();
-	current->flags |= PF_IDLE;
-	cpuidle_use_deepest_state(latency_ns);
+	do {
+		rcu_sleep_check();
+		preempt_disable();
+		current->flags |= PF_IDLE;
+		cpuidle_use_deepest_state(latency_ns);
 
-	it.done = 0;
-	hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
-	it.timer.function = idle_inject_timer_fn;
-	hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
-		      HRTIMER_MODE_REL_PINNED_HARD);
+		it.done = 0;
+		hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
+		it.timer.function = idle_inject_timer_fn;
+		hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
+			      HRTIMER_MODE_REL_PINNED_HARD);
 
-	while (!READ_ONCE(it.done))
-		do_idle();
+		while (!READ_ONCE(it.done) && !task_is_running(__this_cpu_read(ksoftirqd)))
+			do_idle();
+
+		cpuidle_use_deepest_state(0);
+		current->flags &= ~PF_IDLE;
 
-	cpuidle_use_deepest_state(0);
-	current->flags &= ~PF_IDLE;
+		preempt_fold_need_resched();
+		preempt_enable();
 
-	preempt_fold_need_resched();
-	preempt_enable();
+		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
+		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd)))
+			schedule_timeout(1);
+	} while (!READ_ONCE(it.done));
 }
 EXPORT_SYMBOL_GPL(play_idle_precise);
 


> > How about the other gross following solution (untested)?:
> > 
> It causes NMI watchdog because lockup on the CPU where the idle
> injection is done. Attached the dump.
> 
> I have to add on top the following diff to avoid lockup. With this I
> don't see the 
> " NOHZ tick-stop error: Non-RCU local softirq work is pending,"
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index a747a36330a8..e1ec5157a671 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -394,13 +394,18 @@ void play_idle_precise(u64 duration_ns, u64
> latency_ns)
>                 while (!READ_ONCE(it.done) &&
> !task_is_running(__this_cpu_read(ksoftirqd)))
>                         do_idle();
>  
> +               hrtimer_cancel(&it.timer);
> +       
>                 cpuidle_use_deepest_state(0);
>                 current->flags &= ~PF_IDLE;
>  
>                 preempt_fold_need_resched();
>                 preempt_enable();
>                 /* Give ksoftirqd 1 jiffy to get a chance to start its
> job */
> +               if (!READ_ONCE(it.done) &&
> task_is_running(__this_cpu_read(ksoftirqd))) {
> +                       __set_current_state(TASK_UNINTERRUPTIBLE);
>                         schedule_timeout(1);
> +               }
>         } while (!READ_ONCE(it.done));
>  }
>  EXPORT_SYMBOL_GPL(play_idle_precise);

Ah right.

Also, beware of a few details:

1) This can loop forever if there is a long and strong softirq activity.
   So we need to define some timeout. This also means play_idle_precise() should
   return some error.
   
<Patch 0002 adds a maximum limit as a parameter.>   

2) Do you need to make that loop interruptible? I don't know if the idle
   injection request comes directly from userspace or is it some kernel thread.
<This is done via a kernel thread. User space can't interrupt. It can change the idle percent, which will be picked up next time by powercap/idle_inject>

3) Do you need to substract some time spent waiting for softirqs execution to
   the idle injection time? Probably not, I guess it depends on the role played
   by this idle injection but I figured I should ask.
<Don't need to be that accurate as this is for thermal control, which doesn't need to be accurate>

4) An interrupt can fire in the middle of the idle injection, raising a softirq.
   In this case you need to re-injection the remaining idle time.
   eg: Imagine you program a 3 seconds idle injection. You sleep 1 second, an
   interrupt fires and raise a softirq, you schedule out, then once the softirq
   handled you need to reprogram 2 seconds.
<Handled in patch 0001 for starting timer for remaining time.>

5) We still need to handle __cpuidle sections.
<Add cpuidle section for your FIXME. But don't understand the need for noinstr and instrumentation_begin()/end().>


Frederic Weisbecker (1):
  sched/core: Check and schedule ksoftirq

Srinivas Pandruvada (1):
  sched/core: Define max duration to play_precise_idle()

 drivers/powercap/idle_inject.c |  4 ++-
 include/linux/cpu.h            |  4 +--
 kernel/sched/idle.c            | 66 ++++++++++++++++++++++++----------
 3 files changed, 52 insertions(+), 22 deletions(-)

-- 
2.38.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-15 18:42 [RFC/RFT PATCH 0/2] Forced idle and Non-RCU local softirq pending Srinivas Pandruvada
@ 2022-12-15 18:42 ` Srinivas Pandruvada
  2022-12-16 11:19   ` Peter Zijlstra
  2022-12-15 18:43 ` [RFC/RFT PATCH 2/2] sched/core: Add max duration to play_precise_idle() Srinivas Pandruvada
  1 sibling, 1 reply; 13+ messages in thread
From: Srinivas Pandruvada @ 2022-12-15 18:42 UTC (permalink / raw)
  To: rafael, peterz, frederic, daniel.lezcano
  Cc: linux-pm, linux-kernel, len.brown, Srinivas Pandruvada

From: Frederic Weisbecker <frederic@kernel.org>

It is possible that ksoftirqd is woken up and racing with forced
idle injection via play_idle_precise() called from a FIFO thread.

Here it is possible that need_resched() is false but ksoftirqd
is pending. This will result in printing of warning:

[147777.095484] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

Also the processing can be delayed upto the duration of forced idle.

For example in the following traces (with added traces for debug):

<idle>-0 [004] 207.087742: sched_wakeup: ksoftirqd/4
<idle>-0 [004] 207.087743: sched_switch: swapper/4:0 [120] R ==> idle_inject/4:37 [49]

idle_inject/4-37 [004] 207.087744: play_idle_enter
idle_inject/4-37 [004] 207.087746: bputs: can_stop_idle_tick.isra.16: soft irq pending

(Printed warning at this point
NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

...
...
idle_inject/4-37 [004] 207.112127: play_idle_exit: state=24000000 cpu_id=4

idle_inject/4-37 [004] 207.112134: sched_switch: idle_inject/4:37 [49] S ==> ksoftirqd/4:39 [120]
ksoftirqd/4-39   [004] 207.112142: softirq_entry: vec=3 [action=NET_RX]
ksoftirqd/4-39   [004] 207.112213: irq_handler_entry: irq=142 name=enp0s

Delayed by :  207.112142 - 207.087742 = 0.024400
(250HZ configuration) and the idle duration for play_idle_precise is 24ms.
So, the softirq entered after the expiry of forced idle time.

The solution here is to give chance to run softirq. Currently do_idle()
checks for need_resched() and break. But here in addition to need_resched()
also added a check for task_is_running(__this_cpu_read(ksoftirqd)).
So if the ksoftirq is pending break the do_idle() loop and give 1 jiffie
to process via schedule_timeout(1). If the required idle time is
not expired, start hrtimer again and call do_idle() again.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 kernel/sched/idle.c | 55 ++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f26ab2675f7d..77d6168288cf 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -49,6 +49,11 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
+static bool __cpuidle idle_loop_should_break(void)
+{
+	return need_resched() || task_is_running(__this_cpu_read(ksoftirqd));
+}
+
 static noinline int __cpuidle cpu_idle_poll(void)
 {
 	trace_cpu_idle(0, smp_processor_id());
@@ -56,7 +61,7 @@ static noinline int __cpuidle cpu_idle_poll(void)
 	ct_idle_enter();
 	local_irq_enable();
 
-	while (!tif_need_resched() &&
+	while (!idle_loop_should_break() &&
 	       (cpu_idle_force_poll || tick_check_broadcast_expired()))
 		cpu_relax();
 
@@ -174,7 +179,7 @@ static void cpuidle_idle_call(void)
 	 * Check if the idle task must be rescheduled. If it is the
 	 * case, exit the function after re-enabling the local irq.
 	 */
-	if (need_resched()) {
+	if (idle_loop_should_break()) {
 		local_irq_enable();
 		return;
 	}
@@ -276,7 +281,7 @@ static void do_idle(void)
 	__current_set_polling();
 	tick_nohz_idle_enter();
 
-	while (!need_resched()) {
+	while (!idle_loop_should_break()) {
 		rmb();
 
 		local_irq_disable();
@@ -358,6 +363,7 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer)
 void play_idle_precise(u64 duration_ns, u64 latency_ns)
 {
 	struct idle_timer it;
+	ktime_t remaining;
 
 	/*
 	 * Only FIFO tasks can disable the tick since they don't need the forced
@@ -370,25 +376,38 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 	WARN_ON_ONCE(!duration_ns);
 	WARN_ON_ONCE(current->mm);
 
-	rcu_sleep_check();
-	preempt_disable();
-	current->flags |= PF_IDLE;
-	cpuidle_use_deepest_state(latency_ns);
+	remaining = ns_to_ktime(duration_ns);
 
-	it.done = 0;
-	hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
-	it.timer.function = idle_inject_timer_fn;
-	hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
-		      HRTIMER_MODE_REL_PINNED_HARD);
+	do {
+		rcu_sleep_check();
+		preempt_disable();
+		current->flags |= PF_IDLE;
+		cpuidle_use_deepest_state(latency_ns);
 
-	while (!READ_ONCE(it.done))
-		do_idle();
+		it.done = 0;
+		hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD);
+		it.timer.function = idle_inject_timer_fn;
+		hrtimer_start(&it.timer, remaining, HRTIMER_MODE_REL_PINNED_HARD);
+
+		while (!READ_ONCE(it.done) && !task_is_running(__this_cpu_read(ksoftirqd)))
+			do_idle();
 
-	cpuidle_use_deepest_state(0);
-	current->flags &= ~PF_IDLE;
+		remaining = hrtimer_get_remaining(&it.timer);
 
-	preempt_fold_need_resched();
-	preempt_enable();
+		hrtimer_cancel(&it.timer);
+
+		cpuidle_use_deepest_state(0);
+		current->flags &= ~PF_IDLE;
+
+		preempt_fold_need_resched();
+		preempt_enable();
+
+		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
+		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) {
+			__set_current_state(TASK_UNINTERRUPTIBLE);
+			schedule_timeout(1);
+		}
+	} while (!READ_ONCE(it.done));
 }
 EXPORT_SYMBOL_GPL(play_idle_precise);
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC/RFT PATCH 2/2] sched/core: Add max duration to play_precise_idle()
  2022-12-15 18:42 [RFC/RFT PATCH 0/2] Forced idle and Non-RCU local softirq pending Srinivas Pandruvada
  2022-12-15 18:42 ` [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq Srinivas Pandruvada
@ 2022-12-15 18:43 ` Srinivas Pandruvada
  1 sibling, 0 replies; 13+ messages in thread
From: Srinivas Pandruvada @ 2022-12-15 18:43 UTC (permalink / raw)
  To: rafael, peterz, frederic, daniel.lezcano
  Cc: linux-pm, linux-kernel, len.brown, Srinivas Pandruvada

When there is a storm of soft irqs, it is possible that caller
spends lot more time in play_idle_precise() than the actual
idle time requested.

Add a pramater to play_precise_idle() to specify a maximum time to
exit the loop in play_precise_idle(), even if the the total forced idle
time is less than the desired. If exit before the required forced idle
duration return error to the caller, otherwise return 0.

For powercap/idle_inject, the maximum wait will be capped to two times
of idle duration.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/powercap/idle_inject.c |  4 +++-
 include/linux/cpu.h            |  4 ++--
 kernel/sched/idle.c            | 13 +++++++++++--
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c
index fe86a09e3b67..2764155d184e 100644
--- a/drivers/powercap/idle_inject.c
+++ b/drivers/powercap/idle_inject.c
@@ -132,6 +132,7 @@ static void idle_inject_fn(unsigned int cpu)
 {
 	struct idle_inject_device *ii_dev;
 	struct idle_inject_thread *iit;
+	u64 idle_duration_ns;
 
 	ii_dev = per_cpu(idle_inject_device, cpu);
 	iit = per_cpu_ptr(&idle_inject_thread, cpu);
@@ -140,8 +141,9 @@ static void idle_inject_fn(unsigned int cpu)
 	 * Let the smpboot main loop know that the task should not run again.
 	 */
 	iit->should_run = 0;
+	idle_duration_ns = READ_ONCE(ii_dev->idle_duration_us) * NSEC_PER_USEC;
 
-	play_idle_precise(READ_ONCE(ii_dev->idle_duration_us) * NSEC_PER_USEC,
+	play_idle_precise(idle_duration_ns, idle_duration_ns << 1,
 			  READ_ONCE(ii_dev->latency_us) * NSEC_PER_USEC);
 }
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 314802f98b9d..9969940553e5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -190,11 +190,11 @@ void arch_cpu_idle_dead(void);
 int cpu_report_state(int cpu);
 int cpu_check_up_prepare(int cpu);
 void cpu_set_state_online(int cpu);
-void play_idle_precise(u64 duration_ns, u64 latency_ns);
+int play_idle_precise(u64 duration_ns, u64 max_duration_ns, u64 latency_ns);
 
 static inline void play_idle(unsigned long duration_us)
 {
-	play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX);
+	play_idle_precise(duration_us * NSEC_PER_USEC, 0, U64_MAX);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 77d6168288cf..c98b77ff84f5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -360,10 +360,10 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-void play_idle_precise(u64 duration_ns, u64 latency_ns)
+int play_idle_precise(u64 duration_ns, u64 duration_ns_max, u64 latency_ns)
 {
 	struct idle_timer it;
-	ktime_t remaining;
+	ktime_t remaining, end_time;
 
 	/*
 	 * Only FIFO tasks can disable the tick since they don't need the forced
@@ -378,6 +378,8 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 
 	remaining = ns_to_ktime(duration_ns);
 
+	end_time = ktime_add_ns(ktime_get(), duration_ns_max);
+
 	do {
 		rcu_sleep_check();
 		preempt_disable();
@@ -402,12 +404,19 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 		preempt_fold_need_resched();
 		preempt_enable();
 
+		if (!READ_ONCE(it.done) && duration_ns_max) {
+			if (ktime_after(ktime_get(), end_time))
+				return -EAGAIN;
+		}
+
 		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
 		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) {
 			__set_current_state(TASK_UNINTERRUPTIBLE);
 			schedule_timeout(1);
 		}
 	} while (!READ_ONCE(it.done));
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(play_idle_precise);
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-15 18:42 ` [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq Srinivas Pandruvada
@ 2022-12-16 11:19   ` Peter Zijlstra
  2022-12-16 16:58     ` srinivas pandruvada
  2022-12-16 22:07     ` Frederic Weisbecker
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2022-12-16 11:19 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rafael, frederic, daniel.lezcano, linux-pm, linux-kernel, len.brown

On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote:
> +		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
> +		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) {
> +			__set_current_state(TASK_UNINTERRUPTIBLE);
> +			schedule_timeout(1);
> +		}

That's absolutely disgusting :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-16 11:19   ` Peter Zijlstra
@ 2022-12-16 16:58     ` srinivas pandruvada
  2022-12-16 22:07     ` Frederic Weisbecker
  1 sibling, 0 replies; 13+ messages in thread
From: srinivas pandruvada @ 2022-12-16 16:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rafael, frederic, daniel.lezcano, linux-pm, linux-kernel, len.brown

On Fri, 2022-12-16 at 12:19 +0100, Peter Zijlstra wrote:
> On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote:
> > +               /* Give ksoftirqd 1 jiffy to get a chance to start
> > its job */
> > +               if (!READ_ONCE(it.done) &&
> > task_is_running(__this_cpu_read(ksoftirqd))) {
> > +                       __set_current_state(TASK_UNINTERRUPTIBLE);
> > +                       schedule_timeout(1);
> > +               }
> 
> That's absolutely disgusting :-/
What is the alternative? Process softirq in this task context for one
time?

Thanks,
Srinivas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-16 11:19   ` Peter Zijlstra
  2022-12-16 16:58     ` srinivas pandruvada
@ 2022-12-16 22:07     ` Frederic Weisbecker
  2022-12-19 11:33       ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2022-12-16 22:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srinivas Pandruvada, rafael, daniel.lezcano, linux-pm,
	linux-kernel, len.brown

On Fri, Dec 16, 2022 at 12:19:34PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote:
> > +		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
> > +		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) {
> > +			__set_current_state(TASK_UNINTERRUPTIBLE);
> > +			schedule_timeout(1);
> > +		}
> 
> That's absolutely disgusting :-/

I know, and I hate checking task_is_running(__this_cpu_read(ksoftirqd))
everywhere in idle. And in fact it doesn't work because some cpuidle drivers
also do need_resched() checks.

I guess that either we assume that the idle injection is more important
than serving softirqs and we shutdown the warnings accordingly, or we arrange
for idle injection to have a lower prio than ksoftirqd.

Thoughts?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-16 22:07     ` Frederic Weisbecker
@ 2022-12-19 11:33       ` Peter Zijlstra
  2022-12-19 11:46         ` Sebastian Andrzej Siewior
  2022-12-19 22:54         ` srinivas pandruvada
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2022-12-19 11:33 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Srinivas Pandruvada, rafael, daniel.lezcano, linux-pm,
	linux-kernel, len.brown, Thomas Gleixner,
	Sebastian Andrzej Siewior

On Fri, Dec 16, 2022 at 11:07:48PM +0100, Frederic Weisbecker wrote:
> On Fri, Dec 16, 2022 at 12:19:34PM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada wrote:
> > > +		/* Give ksoftirqd 1 jiffy to get a chance to start its job */
> > > +		if (!READ_ONCE(it.done) && task_is_running(__this_cpu_read(ksoftirqd))) {
> > > +			__set_current_state(TASK_UNINTERRUPTIBLE);
> > > +			schedule_timeout(1);
> > > +		}
> > 
> > That's absolutely disgusting :-/
> 
> I know, and I hate checking task_is_running(__this_cpu_read(ksoftirqd))
> everywhere in idle. And in fact it doesn't work because some cpuidle drivers
> also do need_resched() checks.

quite a few indeed.

> I guess that either we assume that the idle injection is more important
> than serving softirqs and we shutdown the warnings accordingly, or we arrange
> for idle injection to have a lower prio than ksoftirqd.

ksoftirq is typically a CFS task while idle injection is required to be
a FIFO (typically FIFO-1) task -- so that would require lifting
ksoftirqd to FIFO and that has other problems.

I'm guessing the problem case is where idle injection comes in while
ksoftirq is running (and preempted), because in that case you cannot run
the softirq stuff in-place.

The 'right' thing to do would be to PI boost ksoftirqd in that case I
suppose. Perhaps something like so, it would boost ksoftirq when it's
running, and otherwise runs the ksoftirqd thing in-situ.

I've not really throught hard about this though, so perhaps I completely
wrecked things.


---
 include/linux/interrupt.h   |  3 +++
 kernel/sched/build_policy.c |  1 +
 kernel/sched/idle.c         |  4 ++++
 kernel/softirq.c            | 20 +++++++++++++++++---
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index a92bce40b04b..a2db811d6947 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -606,6 +606,9 @@ extern void raise_softirq_irqoff(unsigned int nr);
 extern void raise_softirq(unsigned int nr);
 
 DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
+DECLARE_PER_CPU(struct rt_mutex, ksoftirq_lock);
+
+extern bool __run_ksoftirqd(unsigned int cpu);
 
 static inline struct task_struct *this_cpu_ksoftirqd(void)
 {
diff --git a/kernel/sched/build_policy.c b/kernel/sched/build_policy.c
index d9dc9ab3773f..731c595e551c 100644
--- a/kernel/sched/build_policy.c
+++ b/kernel/sched/build_policy.c
@@ -28,6 +28,7 @@
 #include <linux/suspend.h>
 #include <linux/tsacct_kern.h>
 #include <linux/vtime.h>
+#include <linux/interrupt.h>
 
 #include <uapi/linux/sched/types.h>
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f26ab2675f7d..093bfdfca2f1 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -370,6 +370,10 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 	WARN_ON_ONCE(!duration_ns);
 	WARN_ON_ONCE(current->mm);
 
+	rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
+	__run_ksoftirqd(smp_processor_id());
+	rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
+
 	rcu_sleep_check();
 	preempt_disable();
 	current->flags |= PF_IDLE;
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913c067d..a2cb600383a4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -59,6 +59,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct rt_mutex, ksoftirq_lock);
 
 const char * const softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
@@ -912,6 +913,7 @@ void __init softirq_init(void)
 			&per_cpu(tasklet_vec, cpu).head;
 		per_cpu(tasklet_hi_vec, cpu).tail =
 			&per_cpu(tasklet_hi_vec, cpu).head;
+		rt_mutex_init(&per_cpu(ksoftirq_mutex, cpu));
 	}
 
 	open_softirq(TASKLET_SOFTIRQ, tasklet_action);
@@ -923,7 +925,7 @@ static int ksoftirqd_should_run(unsigned int cpu)
 	return local_softirq_pending();
 }
 
-static void run_ksoftirqd(unsigned int cpu)
+bool __run_ksoftirqd(unsigned int cpu)
 {
 	ksoftirqd_run_begin();
 	if (local_softirq_pending()) {
@@ -933,10 +935,22 @@ static void run_ksoftirqd(unsigned int cpu)
 		 */
 		__do_softirq();
 		ksoftirqd_run_end();
-		cond_resched();
-		return;
+		return true;
 	}
 	ksoftirqd_run_end();
+	return false;
+}
+
+static void run_ksoftirqd(unsigned int cpu)
+{
+	bool run;
+
+	rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
+	ran = __run_ksoftirqd(cpu);
+	rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
+
+	if (ran)
+		cond_resched();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-19 11:33       ` Peter Zijlstra
@ 2022-12-19 11:46         ` Sebastian Andrzej Siewior
  2022-12-19 14:56           ` Peter Zijlstra
  2022-12-19 22:54         ` srinivas pandruvada
  1 sibling, 1 reply; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-12-19 11:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Srinivas Pandruvada, rafael, daniel.lezcano,
	linux-pm, linux-kernel, len.brown, Thomas Gleixner

On 2022-12-19 12:33:22 [+0100], Peter Zijlstra wrote:
> ksoftirq is typically a CFS task while idle injection is required to be
> a FIFO (typically FIFO-1) task -- so that would require lifting
> ksoftirqd to FIFO and that has other problems.
> 
> I'm guessing the problem case is where idle injection comes in while
> ksoftirq is running (and preempted), because in that case you cannot run
> the softirq stuff in-place.
> 
> The 'right' thing to do would be to PI boost ksoftirqd in that case I
> suppose. Perhaps something like so, it would boost ksoftirq when it's
> running, and otherwise runs the ksoftirqd thing in-situ.
> 
> I've not really throught hard about this though, so perhaps I completely
> wrecked things.

I don't know why you intend to run ksoftirqd but in general it breaks RT
left and right and we attempt to avoid running ksoftirqd as much as
possible. If it runs and you go and boost it then it probably gets even
worse from RT point of view.

ksoftirqd runs softirqs from hardirq context. Everything else is handled
in is handled within local-bh-disable+enable loop. We already have have
the boost-ksoftird hammer which is the per-CPU BLK called
local_bh_disable(). In general everything should be moved out of it.
For timers we have the ktimerd thread which needs clean up. 

Sebastian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-19 11:46         ` Sebastian Andrzej Siewior
@ 2022-12-19 14:56           ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2022-12-19 14:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Frederic Weisbecker, Srinivas Pandruvada, rafael, daniel.lezcano,
	linux-pm, linux-kernel, len.brown, Thomas Gleixner

On Mon, Dec 19, 2022 at 12:46:54PM +0100, Sebastian Andrzej Siewior wrote:
> On 2022-12-19 12:33:22 [+0100], Peter Zijlstra wrote:
> > ksoftirq is typically a CFS task while idle injection is required to be
> > a FIFO (typically FIFO-1) task -- so that would require lifting
> > ksoftirqd to FIFO and that has other problems.
> > 
> > I'm guessing the problem case is where idle injection comes in while
> > ksoftirq is running (and preempted), because in that case you cannot run
> > the softirq stuff in-place.
> > 
> > The 'right' thing to do would be to PI boost ksoftirqd in that case I
> > suppose. Perhaps something like so, it would boost ksoftirq when it's
> > running, and otherwise runs the ksoftirqd thing in-situ.
> > 
> > I've not really throught hard about this though, so perhaps I completely
> > wrecked things.
> 
> I don't know why you intend to run ksoftirqd but in general it breaks RT

So the upstream problem is where softirq is punted to ksoftirq (obvious
from hardirq tail) and idle-injection comes in and either:

 - runs before ksoftirq had a chance to run, or
 - preempts ksoftirqd.

In both cases we'll appear to go idle with softirqs pending -- which
triggers a pesky warning, because obviously undesirable.

In the first case we can run 'ksoftirqd' in-context, but then need to
serialize against the actual ksoftirqd. In the second case we need to
serialize against ksoftirqd and ensure it is complete before proceeding
with going 'idle'.

Since idle-injection runs FIFO and ksoftirqd typically does not, some
form of PI is required.

> left and right and we attempt to avoid running ksoftirqd as much as
> possible. If it runs and you go and boost it then it probably gets even
> worse from RT point of view.

So if you never run ksoftirqd, the above problem doesn't happen. If
ksoftirqd *can* run, you still need to deal with it somehow. Boosting
ksoftirqd to whatever priority the idle-injection thread has should not
be worse than anything the idle-injection already does, no?

Specifically, without idle-injection involved the patch as proposed (+-
fixes required to make it compile and work) should be a no-op.

> ksoftirqd runs softirqs from hardirq context. Everything else is handled
> in is handled within local-bh-disable+enable loop. We already have have
> the boost-ksoftird hammer which is the per-CPU BLK called
> local_bh_disable(). In general everything should be moved out of it.
> For timers we have the ktimerd thread which needs clean up. 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-19 11:33       ` Peter Zijlstra
  2022-12-19 11:46         ` Sebastian Andrzej Siewior
@ 2022-12-19 22:54         ` srinivas pandruvada
  2022-12-20 20:51           ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: srinivas pandruvada @ 2022-12-19 22:54 UTC (permalink / raw)
  To: Peter Zijlstra, Frederic Weisbecker
  Cc: rafael, daniel.lezcano, linux-pm, linux-kernel, len.brown,
	Thomas Gleixner, Sebastian Andrzej Siewior

On Mon, 2022-12-19 at 12:33 +0100, Peter Zijlstra wrote:
> On Fri, Dec 16, 2022 at 11:07:48PM +0100, Frederic Weisbecker wrote:
> > On Fri, Dec 16, 2022 at 12:19:34PM +0100, Peter Zijlstra wrote:
> > > On Thu, Dec 15, 2022 at 10:42:59AM -0800, Srinivas Pandruvada
> > > wrote:
> > > > +               /* Give ksoftirqd 1 jiffy to get a chance to
> > > > start its job */
> > > > +               if (!READ_ONCE(it.done) &&
> > > > task_is_running(__this_cpu_read(ksoftirqd))) {
> > > > +                       __set_current_state(TASK_UNINTERRUPTIBL
> > > > E);
> > > > +                       schedule_timeout(1);
> > > > +               }
> > > 
> > > That's absolutely disgusting :-/
> > 
> > I know, and I hate checking
> > task_is_running(__this_cpu_read(ksoftirqd))
> > everywhere in idle. And in fact it doesn't work because some
> > cpuidle drivers
> > also do need_resched() checks.
> 
> quite a few indeed.
> 
> > I guess that either we assume that the idle injection is more
> > important
> > than serving softirqs and we shutdown the warnings accordingly, or
> > we arrange
> > for idle injection to have a lower prio than ksoftirqd.
> 
> ksoftirq is typically a CFS task while idle injection is required to
> be
> a FIFO (typically FIFO-1) task -- so that would require lifting
> ksoftirqd to FIFO and that has other problems.
> 
> I'm guessing the problem case is where idle injection comes in while
> ksoftirq is running (and preempted), because in that case you cannot
> run
> the softirq stuff in-place.
> 
> The 'right' thing to do would be to PI boost ksoftirqd in that case I
> suppose. Perhaps something like so, it would boost ksoftirq when it's
> running, and otherwise runs the ksoftirqd thing in-situ.
> 
> I've not really throught hard about this though, so perhaps I
> completely
> wrecked things.
Had to fix some compile issues in the patch but able to test. This
fixes the case when softirq is pending before play_idle(). For example:

<idle>-0[004]   254.440116: softirq_raise:   vec=9 [action=RCU]
<idle>-0[004]   254.440116: softirq_exit:    vec=9 [action=RCU]
<idle>-0[004]   254.440116: sched_waking:    comm=ksoftirqd/4 pid=41
prio=120 target_cpu=004
<idle>-0[004]   254.440117: sched_wakeup:    ksoftirqd/4:41 [120]<CANT
FIND FIELD success> CPU:004
<idle>-0[004]   254.440119: sched_switch:    swapper/4:0 [120] R ==>
kidle_inj/4:1180 [49]
kidle_inj/4-1180[004]   254.440120: sched_kthread_work_execute_start:
work struct 0xffffd1dcbfd25230: function clamp_idle_injection_func
kidle_inj/4-1180[004]   254.440121: play_idle_enter: state=24000000
cpu_id=4
kidle_inj/4-1180[004]   254.440122: bputs: __run_ksoftirqd: running
from play_idle

"calliing __do_softirq() here"

kidle_inj/4-1180[004]   254.440122: softirq_entry:   vec=9 [action=RCU]
kidle_inj/4-1180[004]   254.440125: softirq_raise:   vec=9 [action=RCU]
kidle_inj/4-1180[004]   254.440126: softirq_exit:    vec=9 [action=RCU]

But after ksoftirqd_run_end(), which will renable local irq, this may
further cause more soft irq pending. Here RCU soft irq entry continues


kidle_inj/4-1180  [004]254.440141: softirq_exit:    vec=9 [action=RCU]
kidle_inj/4-1180  [004]254.440141: softirq_entry:   vec=9 [action=RCU]
kidle_inj/4-1180  [004]254.440143: softirq_raise:   vec=9 [action=RCU]
kidle_inj/4-1180  [004]254.440143: softirq_exit:    vec=9 [action=RCU]
kidle_inj/4-1180  [004]254.440144: bputs: can_stop_idle_tick.isra.0:
soft irq pending

Another warning above.


This log is with ping test.

Adding additional __run_ksoftirqd(), inside do_idle()  while
(!need_resched()) doesn't help.

If we add check to can_stop_idle_tick() before reporting helps. But
since __do_softirq() can't guarantee that there are no pending soft
irqs, it still have chance to get another warning.
 
Thanks,
Srinivas

> 
> 
> ---
>  include/linux/interrupt.h   |  3 +++
>  kernel/sched/build_policy.c |  1 +
>  kernel/sched/idle.c         |  4 ++++
>  kernel/softirq.c            | 20 +++++++++++++++++---
>  4 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index a92bce40b04b..a2db811d6947 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -606,6 +606,9 @@ extern void raise_softirq_irqoff(unsigned int
> nr);
>  extern void raise_softirq(unsigned int nr);
>  
>  DECLARE_PER_CPU(struct task_struct *, ksoftirqd);
> +DECLARE_PER_CPU(struct rt_mutex, ksoftirq_lock);
> +
> +extern bool __run_ksoftirqd(unsigned int cpu);
>  
>  static inline struct task_struct *this_cpu_ksoftirqd(void)
>  {
> diff --git a/kernel/sched/build_policy.c
> b/kernel/sched/build_policy.c
> index d9dc9ab3773f..731c595e551c 100644
> --- a/kernel/sched/build_policy.c
> +++ b/kernel/sched/build_policy.c
> @@ -28,6 +28,7 @@
>  #include <linux/suspend.h>
>  #include <linux/tsacct_kern.h>
>  #include <linux/vtime.h>
> +#include <linux/interrupt.h>
>  
>  #include <uapi/linux/sched/types.h>
>  
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index f26ab2675f7d..093bfdfca2f1 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -370,6 +370,10 @@ void play_idle_precise(u64 duration_ns, u64
> latency_ns)
>         WARN_ON_ONCE(!duration_ns);
>         WARN_ON_ONCE(current->mm);
>  
> +       rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> +       __run_ksoftirqd(smp_processor_id());
> +       rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> +
>         rcu_sleep_check();
>         preempt_disable();
>         current->flags |= PF_IDLE;
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..a2cb600383a4 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -59,6 +59,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat);
>  static struct softirq_action softirq_vec[NR_SOFTIRQS]
> __cacheline_aligned_in_smp;
>  
>  DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
> +DEFINE_PER_CPU(struct rt_mutex, ksoftirq_lock);
>  
>  const char * const softirq_to_name[NR_SOFTIRQS] = {
>         "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL",
> @@ -912,6 +913,7 @@ void __init softirq_init(void)
>                         &per_cpu(tasklet_vec, cpu).head;
>                 per_cpu(tasklet_hi_vec, cpu).tail =
>                         &per_cpu(tasklet_hi_vec, cpu).head;
> +               rt_mutex_init(&per_cpu(ksoftirq_mutex, cpu));
>         }
>  
>         open_softirq(TASKLET_SOFTIRQ, tasklet_action);
> @@ -923,7 +925,7 @@ static int ksoftirqd_should_run(unsigned int cpu)
>         return local_softirq_pending();
>  }
>  
> -static void run_ksoftirqd(unsigned int cpu)
> +bool __run_ksoftirqd(unsigned int cpu)
>  {
>         ksoftirqd_run_begin();
>         if (local_softirq_pending()) {
> @@ -933,10 +935,22 @@ static void run_ksoftirqd(unsigned int cpu)
>                  */
>                 __do_softirq();
>                 ksoftirqd_run_end();
> -               cond_resched();
> -               return;
> +               return true;
>         }
>         ksoftirqd_run_end();
> +       return false;
> +}
> +
> +static void run_ksoftirqd(unsigned int cpu)
> +{
> +       bool run;
> +
> +       rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> +       ran = __run_ksoftirqd(cpu);
> +       rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> +
> +       if (ran)
> +               cond_resched();
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-19 22:54         ` srinivas pandruvada
@ 2022-12-20 20:51           ` Peter Zijlstra
  2022-12-20 21:18             ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2022-12-20 20:51 UTC (permalink / raw)
  To: srinivas pandruvada
  Cc: Frederic Weisbecker, rafael, daniel.lezcano, linux-pm,
	linux-kernel, len.brown, Thomas Gleixner,
	Sebastian Andrzej Siewior

On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada wrote:

> But after ksoftirqd_run_end(), which will renable local irq, this may
> further cause more soft irq pending. Here RCU soft irq entry continues

Right you are.. what about if we spell the idle.c thing like so instead?

Then we repeat the softirq thing every time we drop out of the idle loop
for a reschedule.

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f26ab2675f7d..6dce49813bcc 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
 	hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
 		      HRTIMER_MODE_REL_PINNED_HARD);
 
-	while (!READ_ONCE(it.done))
+	while (!READ_ONCE(it.done)) {
+		rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
+		__run_ksoftirqd(smp_processor_id());
+		rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
+
 		do_idle();
+	}
 
 	cpuidle_use_deepest_state(0);
 	current->flags &= ~PF_IDLE;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-20 20:51           ` Peter Zijlstra
@ 2022-12-20 21:18             ` Peter Zijlstra
  2023-01-10  2:33               ` srinivas pandruvada
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2022-12-20 21:18 UTC (permalink / raw)
  To: srinivas pandruvada
  Cc: Frederic Weisbecker, rafael, daniel.lezcano, linux-pm,
	linux-kernel, len.brown, Thomas Gleixner,
	Sebastian Andrzej Siewior

On Tue, Dec 20, 2022 at 09:51:09PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada wrote:
> 
> > But after ksoftirqd_run_end(), which will renable local irq, this may
> > further cause more soft irq pending. Here RCU soft irq entry continues
> 
> Right you are.. what about if we spell the idle.c thing like so instead?
> 
> Then we repeat the softirq thing every time we drop out of the idle loop
> for a reschedule.

Uff, that obviously can't work because we already have preemption
disabled here, this needs a bit more work. I think it's possible to
re-arrange things a bit.

I'll try and have a look tomorrow, but the kids have their xmas play at
school so who knows what I'll get done.

> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index f26ab2675f7d..6dce49813bcc 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64 latency_ns)
>  	hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
>  		      HRTIMER_MODE_REL_PINNED_HARD);
>  
> -	while (!READ_ONCE(it.done))
> +	while (!READ_ONCE(it.done)) {
> +		rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> +		__run_ksoftirqd(smp_processor_id());
> +		rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> +
>  		do_idle();
> +	}
>  
>  	cpuidle_use_deepest_state(0);
>  	current->flags &= ~PF_IDLE;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq
  2022-12-20 21:18             ` Peter Zijlstra
@ 2023-01-10  2:33               ` srinivas pandruvada
  0 siblings, 0 replies; 13+ messages in thread
From: srinivas pandruvada @ 2023-01-10  2:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, rafael, daniel.lezcano, linux-pm,
	linux-kernel, len.brown, Thomas Gleixner,
	Sebastian Andrzej Siewior

On Tue, 2022-12-20 at 22:18 +0100, Peter Zijlstra wrote:
> On Tue, Dec 20, 2022 at 09:51:09PM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 19, 2022 at 02:54:55PM -0800, srinivas pandruvada
> > wrote:
> > 
> > > But after ksoftirqd_run_end(), which will renable local irq, this
> > > may
> > > further cause more soft irq pending. Here RCU soft irq entry
> > > continues
> > 
> > Right you are.. what about if we spell the idle.c thing like so
> > instead?
> > 
> > Then we repeat the softirq thing every time we drop out of the idle
> > loop
> > for a reschedule.
> 
> Uff, that obviously can't work because we already have preemption
> disabled here, this needs a bit more work. I think it's possible to
> re-arrange things a bit.
Didn't work.
Also when __do_softirq returns, softirq can be pending again.  I think
if local_softirq_pending(), we can break do_idle() loop.

Thanks,
Srinivas

> 
> I'll try and have a look tomorrow, but the kids have their xmas play
> at
> school so who knows what I'll get done.
> 
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index f26ab2675f7d..6dce49813bcc 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -381,8 +381,13 @@ void play_idle_precise(u64 duration_ns, u64
> > latency_ns)
> >         hrtimer_start(&it.timer, ns_to_ktime(duration_ns),
> >                       HRTIMER_MODE_REL_PINNED_HARD);
> >  
> > -       while (!READ_ONCE(it.done))
> > +       while (!READ_ONCE(it.done)) {
> > +               rt_mutex_lock(&per_cpu(ksoftirq_lock, cpu));
> > +               __run_ksoftirqd(smp_processor_id());
> > +               rt_mutex_unlock(&per_cpu(ksoftirq_lock, cpu));
> > +
> >                 do_idle();
> > +       }
> >  
> >         cpuidle_use_deepest_state(0);
> >         current->flags &= ~PF_IDLE;


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-10  2:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-15 18:42 [RFC/RFT PATCH 0/2] Forced idle and Non-RCU local softirq pending Srinivas Pandruvada
2022-12-15 18:42 ` [RFC/RFT PATCH 1/2] sched/core: Check and schedule ksoftirq Srinivas Pandruvada
2022-12-16 11:19   ` Peter Zijlstra
2022-12-16 16:58     ` srinivas pandruvada
2022-12-16 22:07     ` Frederic Weisbecker
2022-12-19 11:33       ` Peter Zijlstra
2022-12-19 11:46         ` Sebastian Andrzej Siewior
2022-12-19 14:56           ` Peter Zijlstra
2022-12-19 22:54         ` srinivas pandruvada
2022-12-20 20:51           ` Peter Zijlstra
2022-12-20 21:18             ` Peter Zijlstra
2023-01-10  2:33               ` srinivas pandruvada
2022-12-15 18:43 ` [RFC/RFT PATCH 2/2] sched/core: Add max duration to play_precise_idle() Srinivas Pandruvada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.