linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sched/core: introduce sched_core_idle_cpu()
@ 2023-06-29  4:02 Cruz Zhao
  2023-06-29 10:09 ` Frederic Weisbecker
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Cruz Zhao @ 2023-06-29  4:02 UTC (permalink / raw)
  To: gregkh, jirislaby, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang1.zhang, jstultz, clingutla, nsaenzju, tglx,
	frederic
  Cc: linux-kernel

As core scheduling introduced, a new state of idle is defined as
force idle, running idle task but nr_running greater than zero.

If a cpu is in force idle state, idle_cpu() will return zero. This
result makes sense in some scenarios, e.g., load balance,
showacpu when dumping, and judge the RCU boost kthread is starving.

But this will cause error in other scenarios, e.g., tick_irq_exit():
When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
not become 1, which became 0 in tick_nohz_irq_enter().
ts->idle_sleeptime won't update in function update_ts_time_stats(), if
ts->idle_active is 0, which should be 1. And this bug will result that
ts->idle_sleeptime is less than the actual value, and finally will
result that the idle time in /proc/stat is less than the actual value.

To solve this problem, we introduce sched_core_idle_cpu(), which
returns 1 when force idle. We audit all users of idle_cpu(), and
change idle_cpu() into sched_core_idle_cpu() in function
tick_irq_exit().

v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
function tick_irq_exit(). And modify the corresponding commit log.

Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/lkml/1687631295-126383-1-git-send-email-CruzZhao@linux.alibaba.com
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 13 +++++++++++++
 kernel/softirq.c      |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b09a83bfad8b..73e61c0f10a7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2430,9 +2430,11 @@ extern void sched_core_free(struct task_struct *tsk);
 extern void sched_core_fork(struct task_struct *p);
 extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
 				unsigned long uaddr);
+extern int sched_core_idle_cpu(int cpu);
 #else
 static inline void sched_core_free(struct task_struct *tsk) { }
 static inline void sched_core_fork(struct task_struct *p) { }
+static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
 #endif
 
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 71c1a0f232b4..c80088956987 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7286,6 +7286,19 @@ struct task_struct *idle_task(int cpu)
 	return cpu_rq(cpu)->idle;
 }
 
+#ifdef CONFIG_SCHED_CORE
+int sched_core_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	if (sched_core_enabled(rq) && rq->curr == rq->idle)
+		return 1;
+
+	return idle_cpu(cpu);
+}
+
+#endif
+
 #ifdef CONFIG_SMP
 /*
  * This function computes an effective utilization for the given CPU, to be
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c8a6913c067d..98b98991ce45 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
 	int cpu = smp_processor_id();
 
 	/* Make sure that timer wheel updates are propagated */
-	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
 		if (!in_hardirq())
 			tick_nohz_irq_exit();
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: introduce sched_core_idle_cpu()
  2023-06-29  4:02 [PATCH v3] sched/core: introduce sched_core_idle_cpu() Cruz Zhao
@ 2023-06-29 10:09 ` Frederic Weisbecker
  2023-07-03  2:07 ` Joel Fernandes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Frederic Weisbecker @ 2023-06-29 10:09 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: gregkh, jirislaby, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang1.zhang, jstultz, clingutla, nsaenzju, tglx,
	linux-kernel

On Thu, Jun 29, 2023 at 12:02:04PM +0800, Cruz Zhao wrote:
> As core scheduling introduced, a new state of idle is defined as
> force idle, running idle task but nr_running greater than zero.
> 
> If a cpu is in force idle state, idle_cpu() will return zero. This
> result makes sense in some scenarios, e.g., load balance,
> showacpu when dumping, and judge the RCU boost kthread is starving.
> 
> But this will cause error in other scenarios, e.g., tick_irq_exit():
> When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
> that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
> is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
> not become 1, which became 0 in tick_nohz_irq_enter().
> ts->idle_sleeptime won't update in function update_ts_time_stats(), if
> ts->idle_active is 0, which should be 1. And this bug will result that
> ts->idle_sleeptime is less than the actual value, and finally will
> result that the idle time in /proc/stat is less than the actual value.
> 
> To solve this problem, we introduce sched_core_idle_cpu(), which
> returns 1 when force idle. We audit all users of idle_cpu(), and
> change idle_cpu() into sched_core_idle_cpu() in function
> tick_irq_exit().
> 
> v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
> function tick_irq_exit(). And modify the corresponding commit log.
> 
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Joel Fernandes <joel@joelfernandes.org>

Please wait for people to actually provide you with their Reviewed-by: tags
before writing it.

Aside of that, the patch looks good so you can put this one:

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: introduce sched_core_idle_cpu()
  2023-06-29  4:02 [PATCH v3] sched/core: introduce sched_core_idle_cpu() Cruz Zhao
  2023-06-29 10:09 ` Frederic Weisbecker
@ 2023-07-03  2:07 ` Joel Fernandes
  2023-07-04  5:39 ` Aaron Lu
  2023-07-17 12:56 ` [tip: sched/core] " tip-bot2 for Cruz Zhao
  3 siblings, 0 replies; 7+ messages in thread
From: Joel Fernandes @ 2023-07-03  2:07 UTC (permalink / raw)
  To: Cruz Zhao, gregkh, jirislaby, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, quic_neeraju, josh, boqun.feng,
	mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	clingutla, nsaenzju, tglx, frederic
  Cc: linux-kernel

On 6/29/2023 12:02 AM, Cruz Zhao wrote:
> As core scheduling introduced, a new state of idle is defined as
> force idle, running idle task but nr_running greater than zero.
> 
> If a cpu is in force idle state, idle_cpu() will return zero. This
> result makes sense in some scenarios, e.g., load balance,
> showacpu when dumping, and judge the RCU boost kthread is starving.
> 
> But this will cause error in other scenarios, e.g., tick_irq_exit():
> When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
> that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
> is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
> not become 1, which became 0 in tick_nohz_irq_enter().
> ts->idle_sleeptime won't update in function update_ts_time_stats(), if
> ts->idle_active is 0, which should be 1. And this bug will result that
> ts->idle_sleeptime is less than the actual value, and finally will
> result that the idle time in /proc/stat is less than the actual value.
> 
> To solve this problem, we introduce sched_core_idle_cpu(), which
> returns 1 when force idle. We audit all users of idle_cpu(), and
> change idle_cpu() into sched_core_idle_cpu() in function
> tick_irq_exit().

Reviewed-by: Joel Fernandes <joel@joelfernandes.org>

 - Joel


> 
> v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
> function tick_irq_exit(). And modify the corresponding commit log.
> 
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
> Link: https://lore.kernel.org/lkml/1687631295-126383-1-git-send-email-CruzZhao@linux.alibaba.com
> ---
>  include/linux/sched.h |  2 ++
>  kernel/sched/core.c   | 13 +++++++++++++
>  kernel/softirq.c      |  2 +-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b09a83bfad8b..73e61c0f10a7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2430,9 +2430,11 @@ extern void sched_core_free(struct task_struct *tsk);
>  extern void sched_core_fork(struct task_struct *p);
>  extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
>  				unsigned long uaddr);
> +extern int sched_core_idle_cpu(int cpu);
>  #else
>  static inline void sched_core_free(struct task_struct *tsk) { }
>  static inline void sched_core_fork(struct task_struct *p) { }
> +static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
>  #endif
>  
>  extern void sched_set_stop_task(int cpu, struct task_struct *stop);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 71c1a0f232b4..c80088956987 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7286,6 +7286,19 @@ struct task_struct *idle_task(int cpu)
>  	return cpu_rq(cpu)->idle;
>  }
>  
> +#ifdef CONFIG_SCHED_CORE
> +int sched_core_idle_cpu(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	if (sched_core_enabled(rq) && rq->curr == rq->idle)
> +		return 1;
> +
> +	return idle_cpu(cpu);
> +}
> +
> +#endif
> +
>  #ifdef CONFIG_SMP
>  /*
>   * This function computes an effective utilization for the given CPU, to be
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..98b98991ce45 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
>  	int cpu = smp_processor_id();
>  
>  	/* Make sure that timer wheel updates are propagated */
> -	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> +	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>  		if (!in_hardirq())
>  			tick_nohz_irq_exit();
>  	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: introduce sched_core_idle_cpu()
  2023-06-29  4:02 [PATCH v3] sched/core: introduce sched_core_idle_cpu() Cruz Zhao
  2023-06-29 10:09 ` Frederic Weisbecker
  2023-07-03  2:07 ` Joel Fernandes
@ 2023-07-04  5:39 ` Aaron Lu
  2023-07-04 13:19   ` Joel Fernandes
  2023-07-12  2:41   ` cruzzhao
  2023-07-17 12:56 ` [tip: sched/core] " tip-bot2 for Cruz Zhao
  3 siblings, 2 replies; 7+ messages in thread
From: Aaron Lu @ 2023-07-04  5:39 UTC (permalink / raw)
  To: Cruz Zhao
  Cc: gregkh, jirislaby, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang1.zhang, jstultz, clingutla, nsaenzju, tglx,
	frederic, linux-kernel

On Thu, Jun 29, 2023 at 12:02:04PM +0800, Cruz Zhao wrote:
> As core scheduling introduced, a new state of idle is defined as
> force idle, running idle task but nr_running greater than zero.
> 
> If a cpu is in force idle state, idle_cpu() will return zero. This
> result makes sense in some scenarios, e.g., load balance,
> showacpu when dumping, and judge the RCU boost kthread is starving.
> 
> But this will cause error in other scenarios, e.g., tick_irq_exit():
> When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
> that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
> is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
> not become 1, which became 0 in tick_nohz_irq_enter().
> ts->idle_sleeptime won't update in function update_ts_time_stats(), if
> ts->idle_active is 0, which should be 1. And this bug will result that
> ts->idle_sleeptime is less than the actual value, and finally will
> result that the idle time in /proc/stat is less than the actual value.
> 
> To solve this problem, we introduce sched_core_idle_cpu(), which
> returns 1 when force idle. We audit all users of idle_cpu(), and
> change idle_cpu() into sched_core_idle_cpu() in function
> tick_irq_exit().
> 
> v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
> function tick_irq_exit(). And modify the corresponding commit log.
> 
> Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> Reviewed-by: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
> Link: https://lore.kernel.org/lkml/1687631295-126383-1-git-send-email-CruzZhao@linux.alibaba.com
> ---
>  include/linux/sched.h |  2 ++
>  kernel/sched/core.c   | 13 +++++++++++++
>  kernel/softirq.c      |  2 +-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b09a83bfad8b..73e61c0f10a7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2430,9 +2430,11 @@ extern void sched_core_free(struct task_struct *tsk);
>  extern void sched_core_fork(struct task_struct *p);
>  extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
>  				unsigned long uaddr);
> +extern int sched_core_idle_cpu(int cpu);
>  #else
>  static inline void sched_core_free(struct task_struct *tsk) { }
>  static inline void sched_core_fork(struct task_struct *p) { }
> +static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
>  #endif
>  
>  extern void sched_set_stop_task(int cpu, struct task_struct *stop);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 71c1a0f232b4..c80088956987 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7286,6 +7286,19 @@ struct task_struct *idle_task(int cpu)
>  	return cpu_rq(cpu)->idle;
>  }
>  
> +#ifdef CONFIG_SCHED_CORE
> +int sched_core_idle_cpu(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	if (sched_core_enabled(rq) && rq->curr == rq->idle)
> +		return 1;

If the intention is to consider forced idle cpus as idle, then should
the above condition written as:

	if (sched_core_enabled(rq) && rq->core->core_forceidle_count)
		return 1;
?

Or as long as a single cookied task is running, all normal idle cpus are
regarded forced idle here and 1 is returned while previously, idle_cpu()
is called for those cpus and if they have wakeup task pending, they are
not regarded as idle so looks like a behaviour change.

Thanks,
Aaron

> +
> +	return idle_cpu(cpu);
> +}
> +
> +#endif
> +
>  #ifdef CONFIG_SMP
>  /*
>   * This function computes an effective utilization for the given CPU, to be
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index c8a6913c067d..98b98991ce45 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
>  	int cpu = smp_processor_id();
>  
>  	/* Make sure that timer wheel updates are propagated */
> -	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
> +	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>  		if (!in_hardirq())
>  			tick_nohz_irq_exit();
>  	}
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: introduce sched_core_idle_cpu()
  2023-07-04  5:39 ` Aaron Lu
@ 2023-07-04 13:19   ` Joel Fernandes
  2023-07-12  2:41   ` cruzzhao
  1 sibling, 0 replies; 7+ messages in thread
From: Joel Fernandes @ 2023-07-04 13:19 UTC (permalink / raw)
  To: Aaron Lu
  Cc: Cruz Zhao, gregkh, jirislaby, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, vschneid, paulmck, quic_neeraju, josh, boqun.feng,
	mathieu.desnoyers, jiangshanlai, qiang1.zhang, jstultz,
	clingutla, nsaenzju, tglx, frederic, linux-kernel

On Tue, Jul 4, 2023 at 1:40 AM Aaron Lu <aaron.lu@intel.com> wrote:
>
> On Thu, Jun 29, 2023 at 12:02:04PM +0800, Cruz Zhao wrote:
> > As core scheduling introduced, a new state of idle is defined as
> > force idle, running idle task but nr_running greater than zero.
> >
> > If a cpu is in force idle state, idle_cpu() will return zero. This
> > result makes sense in some scenarios, e.g., load balance,
> > showacpu when dumping, and judge the RCU boost kthread is starving.
> >
> > But this will cause error in other scenarios, e.g., tick_irq_exit():
> > When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
> > that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
> > is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
> > not become 1, which became 0 in tick_nohz_irq_enter().
> > ts->idle_sleeptime won't update in function update_ts_time_stats(), if
> > ts->idle_active is 0, which should be 1. And this bug will result that
> > ts->idle_sleeptime is less than the actual value, and finally will
> > result that the idle time in /proc/stat is less than the actual value.
> >
> > To solve this problem, we introduce sched_core_idle_cpu(), which
> > returns 1 when force idle. We audit all users of idle_cpu(), and
> > change idle_cpu() into sched_core_idle_cpu() in function
> > tick_irq_exit().
> >
> > v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
> > function tick_irq_exit(). And modify the corresponding commit log.
> >
> > Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
> > Reviewed-by: Peter Zijlstra <peterz@infradead.org>
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> > Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
> > Link: https://lore.kernel.org/lkml/1687631295-126383-1-git-send-email-CruzZhao@linux.alibaba.com
> > ---
> >  include/linux/sched.h |  2 ++
> >  kernel/sched/core.c   | 13 +++++++++++++
> >  kernel/softirq.c      |  2 +-
> >  3 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b09a83bfad8b..73e61c0f10a7 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2430,9 +2430,11 @@ extern void sched_core_free(struct task_struct *tsk);
> >  extern void sched_core_fork(struct task_struct *p);
> >  extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
> >                               unsigned long uaddr);
> > +extern int sched_core_idle_cpu(int cpu);
> >  #else
> >  static inline void sched_core_free(struct task_struct *tsk) { }
> >  static inline void sched_core_fork(struct task_struct *p) { }
> > +static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
> >  #endif
> >
> >  extern void sched_set_stop_task(int cpu, struct task_struct *stop);
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 71c1a0f232b4..c80088956987 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7286,6 +7286,19 @@ struct task_struct *idle_task(int cpu)
> >       return cpu_rq(cpu)->idle;
> >  }
> >
> > +#ifdef CONFIG_SCHED_CORE
> > +int sched_core_idle_cpu(int cpu)
> > +{
> > +     struct rq *rq = cpu_rq(cpu);
> > +
> > +     if (sched_core_enabled(rq) && rq->curr == rq->idle)
> > +             return 1;
>
> If the intention is to consider forced idle cpus as idle, then should
> the above condition written as:
>
>         if (sched_core_enabled(rq) && rq->core->core_forceidle_count)
>                 return 1;
> ?
>
> Or as long as a single cookied task is running, all normal idle cpus are
> regarded forced idle here and 1 is returned while previously, idle_cpu()
> is called for those cpus and if they have wakeup task pending, they are
> not regarded as idle so looks like a behaviour change.
>

Ah you're right, great insight. _sigh_ I should not have missed that
during review. It will change idle_cpu() behavior if core sched is
enabled...

 - Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] sched/core: introduce sched_core_idle_cpu()
  2023-07-04  5:39 ` Aaron Lu
  2023-07-04 13:19   ` Joel Fernandes
@ 2023-07-12  2:41   ` cruzzhao
  1 sibling, 0 replies; 7+ messages in thread
From: cruzzhao @ 2023-07-12  2:41 UTC (permalink / raw)
  To: Aaron Lu
  Cc: gregkh, jirislaby, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	paulmck, quic_neeraju, joel, josh, boqun.feng, mathieu.desnoyers,
	jiangshanlai, qiang1.zhang, jstultz, clingutla, nsaenzju, tglx,
	frederic, linux-kernel



在 2023/7/4 下午1:39, Aaron Lu 写道:
>> +#ifdef CONFIG_SCHED_CORE
>> +int sched_core_idle_cpu(int cpu)
>> +{
>> +	struct rq *rq = cpu_rq(cpu);
>> +
>> +	if (sched_core_enabled(rq) && rq->curr == rq->idle)
>> +		return 1;
> 
> If the intention is to consider forced idle cpus as idle, then should
> the above condition written as:
> 
> 	if (sched_core_enabled(rq) && rq->core->core_forceidle_count)
> 		return 1;
> ?
> 
> Or as long as a single cookied task is running, all normal idle cpus are
> regarded forced idle here and 1 is returned while previously, idle_cpu()
> is called for those cpus and if they have wakeup task pending, they are
> not regarded as idle so looks like a behaviour change.
> 
> Thanks,
> Aaron
> 

I'll fix this problem in the next version.

Best,
Cruz Zhao

>> +
>> +	return idle_cpu(cpu);
>> +}
>> +
>> +#endif
>> +
>>  #ifdef CONFIG_SMP
>>  /*
>>   * This function computes an effective utilization for the given CPU, to be
>> diff --git a/kernel/softirq.c b/kernel/softirq.c
>> index c8a6913c067d..98b98991ce45 100644
>> --- a/kernel/softirq.c
>> +++ b/kernel/softirq.c
>> @@ -630,7 +630,7 @@ static inline void tick_irq_exit(void)
>>  	int cpu = smp_processor_id();
>>  
>>  	/* Make sure that timer wheel updates are propagated */
>> -	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>> +	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
>>  		if (!in_hardirq())
>>  			tick_nohz_irq_exit();
>>  	}
>> -- 
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: sched/core] sched/core: introduce sched_core_idle_cpu()
  2023-06-29  4:02 [PATCH v3] sched/core: introduce sched_core_idle_cpu() Cruz Zhao
                   ` (2 preceding siblings ...)
  2023-07-04  5:39 ` Aaron Lu
@ 2023-07-17 12:56 ` tip-bot2 for Cruz Zhao
  3 siblings, 0 replies; 7+ messages in thread
From: tip-bot2 for Cruz Zhao @ 2023-07-17 12:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Cruz Zhao, Peter Zijlstra (Intel),
	Frederic Weisbecker, Joel Fernandes, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     548796e2e70b44b4661fd7feee6eb239245ff1f8
Gitweb:        https://git.kernel.org/tip/548796e2e70b44b4661fd7feee6eb239245ff1f8
Author:        Cruz Zhao <CruzZhao@linux.alibaba.com>
AuthorDate:    Thu, 29 Jun 2023 12:02:04 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 13 Jul 2023 15:21:50 +02:00

sched/core: introduce sched_core_idle_cpu()

As core scheduling introduced, a new state of idle is defined as
force idle, running idle task but nr_running greater than zero.

If a cpu is in force idle state, idle_cpu() will return zero. This
result makes sense in some scenarios, e.g., load balance,
showacpu when dumping, and judge the RCU boost kthread is starving.

But this will cause error in other scenarios, e.g., tick_irq_exit():
When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
not become 1, which became 0 in tick_nohz_irq_enter().
ts->idle_sleeptime won't update in function update_ts_time_stats(), if
ts->idle_active is 0, which should be 1. And this bug will result that
ts->idle_sleeptime is less than the actual value, and finally will
result that the idle time in /proc/stat is less than the actual value.

To solve this problem, we introduce sched_core_idle_cpu(), which
returns 1 when force idle. We audit all users of idle_cpu(), and
change idle_cpu() into sched_core_idle_cpu() in function
tick_irq_exit().

v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
function tick_irq_exit(). And modify the corresponding commit log.

Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Link: https://lore.kernel.org/r/1688011324-42406-1-git-send-email-CruzZhao@linux.alibaba.com
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 13 +++++++++++++
 kernel/softirq.c      |  2 +-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 609bde8..efc9f4b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2433,9 +2433,11 @@ extern void sched_core_free(struct task_struct *tsk);
 extern void sched_core_fork(struct task_struct *p);
 extern int sched_core_share_pid(unsigned int cmd, pid_t pid, enum pid_type type,
 				unsigned long uaddr);
+extern int sched_core_idle_cpu(int cpu);
 #else
 static inline void sched_core_free(struct task_struct *tsk) { }
 static inline void sched_core_fork(struct task_struct *p) { }
+static inline int sched_core_idle_cpu(int cpu) { return idle_cpu(cpu); }
 #endif
 
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2291f9d..83e3654 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7383,6 +7383,19 @@ struct task_struct *idle_task(int cpu)
 	return cpu_rq(cpu)->idle;
 }
 
+#ifdef CONFIG_SCHED_CORE
+int sched_core_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	if (sched_core_enabled(rq) && rq->curr == rq->idle)
+		return 1;
+
+	return idle_cpu(cpu);
+}
+
+#endif
+
 #ifdef CONFIG_SMP
 /*
  * This function computes an effective utilization for the given CPU, to be
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 807b34c..210cf5f 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -612,7 +612,7 @@ static inline void tick_irq_exit(void)
 	int cpu = smp_processor_id();
 
 	/* Make sure that timer wheel updates are propagated */
-	if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
+	if ((sched_core_idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) {
 		if (!in_hardirq())
 			tick_nohz_irq_exit();
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-17 12:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29  4:02 [PATCH v3] sched/core: introduce sched_core_idle_cpu() Cruz Zhao
2023-06-29 10:09 ` Frederic Weisbecker
2023-07-03  2:07 ` Joel Fernandes
2023-07-04  5:39 ` Aaron Lu
2023-07-04 13:19   ` Joel Fernandes
2023-07-12  2:41   ` cruzzhao
2023-07-17 12:56 ` [tip: sched/core] " tip-bot2 for Cruz Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).