linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH Resend 0/3] sched: fix nr_busy_cpus
@ 2012-12-03 12:26 Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Vincent Guittot @ 2012-12-03 12:26 UTC (permalink / raw)
  To: linux-kernel, linaro-dev, peterz, mingo; +Cc: ccross, Vincent Guittot

The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
 - when the SCHED softirq is raised on an idle core for idle load balance but
   the platform doesn't go out of the cpuidle state
 - when some CPUs enter idle state while booting all CPUs
 - when a CPU is unplug and/or replug

Vincent Guittot (3):
  sched: fix nr_busy_cpus with coupled cpuidle
  sched: fix init NOHZ_IDLE flag
  sched: fix update NOHZ_IDLE flag

 kernel/sched/core.c      |    1 +
 kernel/sched/fair.c      |    2 +-
 kernel/time/tick-sched.c |    2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
  2012-12-03 12:26 [PATCH Resend 0/3] sched: fix nr_busy_cpus Vincent Guittot
@ 2012-12-03 12:26 ` Vincent Guittot
  2013-01-24 16:44   ` Frederic Weisbecker
  2012-12-03 12:26 ` [PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 3/3] sched: fix update " Vincent Guittot
  2 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2012-12-03 12:26 UTC (permalink / raw)
  To: linux-kernel, linaro-dev, peterz, mingo; +Cc: ccross, Vincent Guittot

With the coupled cpuidle driver (but probably also with other drivers),
a CPU loops in a temporary safe state while waiting for other CPUs of its
cluster to be ready to enter the coupled C-state. If an IRQ or a softirq
occurs, the CPU will stay in this internal loop if there is no need
to resched. The SCHED softirq clears the NOHZ and increases
nr_busy_cpus. If there is no need to resched, we will not call
set_cpu_sd_state_idle because of this internal loop in a cpuidle state.
We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used
to handle such situation.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/time/tick-sched.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 955d35b..b8d74ea 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void)
 	if (!ts->inidle)
 		return;
 
+	set_cpu_sd_state_idle();
+
 	/* Cancel the timer because CPU already waken up from the C-states*/
 	menu_hrtimer_cancel();
 	__tick_nohz_idle_enter(ts);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag
  2012-12-03 12:26 [PATCH Resend 0/3] sched: fix nr_busy_cpus Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle Vincent Guittot
@ 2012-12-03 12:26 ` Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 3/3] sched: fix update " Vincent Guittot
  2 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2012-12-03 12:26 UTC (permalink / raw)
  To: linux-kernel, linaro-dev, peterz, mingo; +Cc: ccross, Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters,I have the
nr_busy_cpus field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
have a coherent configuration.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bae620a..77a01c8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5875,6 +5875,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 
 	update_group_power(sd, cpu);
 	atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+	clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH Resend 3/3] sched: fix update NOHZ_IDLE flag
  2012-12-03 12:26 [PATCH Resend 0/3] sched: fix nr_busy_cpus Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle Vincent Guittot
  2012-12-03 12:26 ` [PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag Vincent Guittot
@ 2012-12-03 12:26 ` Vincent Guittot
  2 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2012-12-03 12:26 UTC (permalink / raw)
  To: linux-kernel, linaro-dev, peterz, mingo; +Cc: ccross, Vincent Guittot

The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update
the nr_busy_cpus of the sched_group.
When the sched_domain are updated (because of the unplug of a CPUs as an
example) a null_domain is attached to CPUs. We have to test
likely(!on_null_domain(cpu) first in order to detect such intialization step
and to not modify the NOHZ_IDLE flag

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 24a5588..1ef57a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6311,7 +6311,7 @@ void trigger_load_balance(struct rq *rq, int cpu)
 	    likely(!on_null_domain(cpu)))
 		raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-	if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+	if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu))
 		nohz_balancer_kick(cpu);
 #endif
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
  2012-12-03 12:26 ` [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle Vincent Guittot
@ 2013-01-24 16:44   ` Frederic Weisbecker
  2013-01-24 17:55     ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2013-01-24 16:44 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: linux-kernel, linaro-dev, peterz, mingo, ccross

2012/12/3 Vincent Guittot <vincent.guittot@linaro.org>:
> With the coupled cpuidle driver (but probably also with other drivers),
> a CPU loops in a temporary safe state while waiting for other CPUs of its
> cluster to be ready to enter the coupled C-state. If an IRQ or a softirq
> occurs, the CPU will stay in this internal loop if there is no need
> to resched. The SCHED softirq clears the NOHZ and increases
> nr_busy_cpus. If there is no need to resched, we will not call
> set_cpu_sd_state_idle because of this internal loop in a cpuidle state.
> We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used
> to handle such situation.

I'm a bit confused with this.

set_cpu_sd_state_busy() is only called from nohz_kick_needed(). And it
checks idle_cpu() before doing anything. So if no task is going to be
scheduled, idle_cpu() prevents from calling set_cpu_sd_state_busy().

I'm probably missing something.

Thanks.

>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/time/tick-sched.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 955d35b..b8d74ea 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void)
>         if (!ts->inidle)
>                 return;
>
> +       set_cpu_sd_state_idle();
> +
>         /* Cancel the timer because CPU already waken up from the C-states*/
>         menu_hrtimer_cancel();
>         __tick_nohz_idle_enter(ts);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
  2013-01-24 16:44   ` Frederic Weisbecker
@ 2013-01-24 17:55     ` Vincent Guittot
       [not found]       ` <CAKfTPtBbdKM3R__M+x7P9v4cLObbr=FF6QE+-mEkgx0DtcQefA@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2013-01-24 17:55 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: linux-kernel, linaro-dev, peterz, mingo, ccross

On 24 January 2013 17:44, Frederic Weisbecker <fweisbec@gmail.com> wrote:
> 2012/12/3 Vincent Guittot <vincent.guittot@linaro.org>:
>> With the coupled cpuidle driver (but probably also with other drivers),
>> a CPU loops in a temporary safe state while waiting for other CPUs of its
>> cluster to be ready to enter the coupled C-state. If an IRQ or a softirq
>> occurs, the CPU will stay in this internal loop if there is no need
>> to resched. The SCHED softirq clears the NOHZ and increases
>> nr_busy_cpus. If there is no need to resched, we will not call
>> set_cpu_sd_state_idle because of this internal loop in a cpuidle state.
>> We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used
>> to handle such situation.
>
> I'm a bit confused with this.
>
> set_cpu_sd_state_busy() is only called from nohz_kick_needed(). And it
> checks idle_cpu() before doing anything. So if no task is going to be
> scheduled, idle_cpu() prevents from calling set_cpu_sd_state_busy().
>
> I'm probably missing something.

Hi Frederic

I can't find back the trace that i had saved with the issue but IIRC
the sequence is:
The CPU is kicked for ILB
The wake_list of the CPU becomes not empty so cpu id not idle
CPU wakes up, updates is timer framework and call nohz_kick_needed the
execute the ILB sequence
we don't go out of the cpuidle driver function because we don't need
to resched so we don't clear the busy state

I'm going to look for the saved trace to check the sequence above

Vincent

>
> Thanks.
>
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/time/tick-sched.c |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> index 955d35b..b8d74ea 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void)
>>         if (!ts->inidle)
>>                 return;
>>
>> +       set_cpu_sd_state_idle();
>> +
>>         /* Cancel the timer because CPU already waken up from the C-states*/
>>         menu_hrtimer_cancel();
>>         __tick_nohz_idle_enter(ts);
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
       [not found]       ` <CAKfTPtBbdKM3R__M+x7P9v4cLObbr=FF6QE+-mEkgx0DtcQefA@mail.gmail.com>
@ 2013-01-25 13:00         ` Frederic Weisbecker
       [not found]           ` <CAKfTPtCY7v7Er-bHj+-k3AQst075WOwfqY9rpVHv3k4nsM4E8Q@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2013-01-25 13:00 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: linaro-dev, linux-kernel, ccross, mingo, peterz

2013/1/25 Vincent Guittot <vincent.guittot@linaro.org>:
> This sequence is not the right one
>
>> I'm going to look for the saved trace to check the sequence above
>
> I haven't been able to reproduce the bug that this patch was supposed to
> solved. The patch 2 and 3 seem enough to fix the nr_busy_cpus field. I will
> continue to try to reproduce it but it seems that it was a side effect of
> the 2 others fixes of the series

Ok. I just checked again as well and I can't find a scenario where
this can happen. If you find it out or trigger the bug again, don't
hesitate to resend this patch.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
       [not found]           ` <CAKfTPtCY7v7Er-bHj+-k3AQst075WOwfqY9rpVHv3k4nsM4E8Q@mail.gmail.com>
@ 2013-01-25 14:05             ` Frederic Weisbecker
  0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2013-01-25 14:05 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: linaro-dev, linux-kernel, ccross, peterz, mingo

2013/1/25 Vincent Guittot <vincent.guittot@linaro.org>:
>
> Le 25 janv. 2013 13:00, "Frederic Weisbecker" <fweisbec@gmail.com> a écrit :
>
>
>>
>> 2013/1/25 Vincent Guittot <vincent.guittot@linaro.org>:
>> > This sequence is not the right one
>> >
>> >> I'm going to look for the saved trace to check the sequence above
>> >
>> > I haven't been able to reproduce the bug that this patch was supposed to
>> > solved. The patch 2 and 3 seem enough to fix the nr_busy_cpus field. I
>> > will
>> > continue to try to reproduce it but it seems that it was a side effect
>> > of
>> > the 2 others fixes of the series
>>
>> Ok. I just checked again as well and I can't find a scenario where
>> this can happen. If you find it out or trigger the bug again, don't
>> hesitate to resend this patch.
>
> Ok. I'm going to update the patch serie without this patch

Actually your second patch may cause this, as it clears the NOHZ_IDLE
flag on CPUs that are idle on boot and which could stay that way for a
while. And your second patch is spotting something serious. I'll reply
on it after more thoughts.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-01-25 14:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-03 12:26 [PATCH Resend 0/3] sched: fix nr_busy_cpus Vincent Guittot
2012-12-03 12:26 ` [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle Vincent Guittot
2013-01-24 16:44   ` Frederic Weisbecker
2013-01-24 17:55     ` Vincent Guittot
     [not found]       ` <CAKfTPtBbdKM3R__M+x7P9v4cLObbr=FF6QE+-mEkgx0DtcQefA@mail.gmail.com>
2013-01-25 13:00         ` Frederic Weisbecker
     [not found]           ` <CAKfTPtCY7v7Er-bHj+-k3AQst075WOwfqY9rpVHv3k4nsM4E8Q@mail.gmail.com>
2013-01-25 14:05             ` Frederic Weisbecker
2012-12-03 12:26 ` [PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag Vincent Guittot
2012-12-03 12:26 ` [PATCH Resend 3/3] sched: fix update " Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).