[patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine
@ 2016-03-10 12:04 Thomas Gleixner
  2016-03-10 12:04 ` [patch 01/15] cpu/hotplug: Document states better Thomas Gleixner
                   ` (14 more replies)
  0 siblings, 15 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

The following series contains:

    - cleanup of the notifier maze in the scheduler and conversion
      to the state machine. 

    - Handling of cpu active is distangled from cpu online and moved to the
      end of the hotplug process.

Patches are against tip:master rather than against tip:smp/hotplug because
there is an interaction with the accounting fix pending in sched/urgent.

Thanks,

	tglx
---
 arch/powerpc/kernel/smp.c  |    2 
 b/arch/s390/kernel/smp.c   |    2 
 include/linux/cpu.h        |   18 --
 include/linux/cpuhotplug.h |    2 
 include/linux/cpumask.h    |    6 
 include/linux/sched.h      |    4 
 kernel/cpu.c               |   77 ++++++--
 kernel/sched/core.c        |  399 +++++++++++++++++++--------------------------
 kernel/sched/fair.c        |   15 -
 kernel/sched/sched.h       |    4 
 10 files changed, 248 insertions(+), 281 deletions(-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 01/15] cpu/hotplug: Document states better
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 03/15] sched: Make set_cpu_rq_start_time() a built in hotplug state Thomas Gleixner
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: cpu-hotplug--Document-states-better.patch --]
[-- Type: text/plain, Size: 3097 bytes --]

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c |   47 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1171,6 +1171,10 @@ static struct cpuhp_step cpuhp_bp_states
 		.teardown		= NULL,
 		.cant_stop		= true,
 	},
+	/*
+	 * Preparatory and dead notifiers. Will be replaced once the notifiers
+	 * are converted to states.
+	 */
 	[CPUHP_NOTIFY_PREPARE] = {
 		.name			= "notify:prepare",
 		.startup		= notify_prepare,
@@ -1178,12 +1182,17 @@ static struct cpuhp_step cpuhp_bp_states
 		.skip_onerr		= true,
 		.cant_stop		= true,
 	},
+	/* Kicks the plugged cpu into life */
 	[CPUHP_BRINGUP_CPU] = {
 		.name			= "cpu:bringup",
 		.startup		= bringup_cpu,
 		.teardown		= NULL,
 		.cant_stop		= true,
 	},
+	/*
+	 * Handled on controll processor until the plugged processor manages
+	 * this itself.
+	 */
 	[CPUHP_TEARDOWN_CPU] = {
 		.name			= "cpu:teardown",
 		.startup		= NULL,
@@ -1196,6 +1205,23 @@ static struct cpuhp_step cpuhp_bp_states
 /* Application processor state steps */
 static struct cpuhp_step cpuhp_ap_states[] = {
 #ifdef CONFIG_SMP
+	/* Final state before CPU kills itself */
+	[CPUHP_AP_IDLE_DEAD] = {
+		.name			= "idle:dead",
+	},
+	/*
+	 * Last state before CPU enters the idle loop to die. Transient state
+	 * for synchronization.
+	 */
+	[CPUHP_AP_OFFLINE] = {
+		.name			= "ap:offline",
+		.cant_stop		= true,
+	},
+	/*
+	 * Low level startup/teardown notifiers. Run with interrupts
+	 * disabled. Will be removed once the notifiers are converted to
+	 * states.
+	 */
 	[CPUHP_AP_NOTIFY_STARTING] = {
 		.name			= "notify:starting",
 		.startup		= notify_starting,
@@ -1203,17 +1229,32 @@ static struct cpuhp_step cpuhp_ap_states
 		.skip_onerr		= true,
 		.cant_stop		= true,
 	},
+	/* Entry state on starting. Interrupts enabled from here on. Transient
+	 * state for synchronsization */
+	[CPUHP_AP_ONLINE] = {
+		.name			= "ap:online",
+	},
+	/* Handle smpboot threads park/unpark */
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot:threads",
 		.startup		= smpboot_unpark_threads,
 		.teardown		= smpboot_park_threads,
 	},
+	/*
+	 * Online/down_prepare notifiers. Will be removed once the notifiers
+	 * are converted to states.
+	 */
 	[CPUHP_AP_NOTIFY_ONLINE] = {
 		.name			= "notify:online",
 		.startup		= notify_online,
 		.teardown		= notify_down_prepare,
 	},
 #endif
+	/*
+	 * The dynamically registered state space is here
+	 */
+
+	/* CPU is fully up and running. */
 	[CPUHP_ONLINE] = {
 		.name			= "online",
 		.startup		= NULL,
@@ -1231,7 +1272,11 @@ static int cpuhp_cb_check(enum cpuhp_sta
 
 static bool cpuhp_is_ap_state(enum cpuhp_state state)
 {
-	return state > CPUHP_BRINGUP_CPU;
+	/*
+	 * The extra check for CPUHP_TEARDOWN_CPU is only for documentation
+	 * purposes as that state is handled explicitely in cpu_down.
+	 */
+	return state > CPUHP_BRINGUP_CPU && state != CPUHP_TEARDOWN_CPU;
 }
 
 static struct cpuhp_step *cpuhp_get_step(enum cpuhp_state state)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 03/15] sched: Make set_cpu_rq_start_time() a built in hotplug state
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
  2016-03-10 12:04 ` [patch 01/15] cpu/hotplug: Document states better Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 05/15] sched: Consolidate the notifier maze Thomas Gleixner
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched--Make-set_cpu_rq_start_time---a-built-in-hotplug-state.patch --]
[-- Type: text/plain, Size: 2580 bytes --]

Start distangling the maze of hotplug notifiers in the scheduler.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuhotplug.h |    1 +
 include/linux/sched.h      |    1 +
 kernel/cpu.c               |    6 ++++++
 kernel/sched/core.c        |   16 +++++++++-------
 4 files changed, 17 insertions(+), 7 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -8,6 +8,7 @@ enum cpuhp_state {
 	CPUHP_BRINGUP_CPU,
 	CPUHP_AP_IDLE_DEAD,
 	CPUHP_AP_OFFLINE,
+	CPUHP_AP_SCHED_STARTING,
 	CPUHP_AP_NOTIFY_STARTING,
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -373,6 +373,7 @@ extern void cpu_init (void);
 extern void trap_init(void);
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
+extern int sched_cpu_starting(unsigned int cpu);
 
 extern void sched_show_task(struct task_struct *p);
 
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1217,6 +1217,12 @@ static struct cpuhp_step cpuhp_ap_states
 		.name			= "ap:offline",
 		.cant_stop		= true,
 	},
+	/* First state is scheduler control. Interrupts are disabled */
+	[CPUHP_AP_SCHED_STARTING] = {
+		.name			= "sched:starting",
+		.startup		= sched_cpu_starting,
+		.teardown		= NULL,
+	},
 	/*
 	 * Low level startup/teardown notifiers. Run with interrupts
 	 * disabled. Will be removed once the notifiers are converted to
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5711,10 +5711,10 @@ static struct notifier_block migration_n
 	.priority = CPU_PRI_MIGRATION,
 };
 
-static void set_cpu_rq_start_time(void)
+static void set_cpu_rq_start_time(unsigned int cpu)
 {
-	int cpu = smp_processor_id();
 	struct rq *rq = cpu_rq(cpu);
+
 	rq->age_stamp = sched_clock_cpu(cpu);
 }
 
@@ -5724,10 +5724,6 @@ static int sched_cpu_active(struct notif
 	int cpu = (long)hcpu;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_STARTING:
-		set_cpu_rq_start_time();
-		return NOTIFY_OK;
-
 	case CPU_DOWN_FAILED:
 		set_cpu_active(cpu, true);
 		return NOTIFY_OK;
@@ -5749,6 +5745,12 @@ static int sched_cpu_inactive(struct not
 	}
 }
 
+int sched_cpu_starting(unsigned int cpu)
+{
+	set_cpu_rq_start_time(cpu);
+	return 0;
+}
+
 static int __init migration_init(void)
 {
 	void *cpu = (void *)(long)smp_processor_id();
@@ -7659,7 +7661,7 @@ void __init sched_init(void)
 	if (cpu_isolated_map == NULL)
 		zalloc_cpumask_var(&cpu_isolated_map, GFP_NOWAIT);
 	idle_thread_set_boot_cpu();
-	set_cpu_rq_start_time();
+	set_cpu_rq_start_time(smp_processor_id());
 #endif
 	init_sched_fair_class();
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 05/15] sched: Consolidate the notifier maze
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
  2016-03-10 12:04 ` [patch 01/15] cpu/hotplug: Document states better Thomas Gleixner
  2016-03-10 12:04 ` [patch 03/15] sched: Make set_cpu_rq_start_time() a built in hotplug state Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 04/15] sched: Allow hotplug notifiers to be setup early Thomas Gleixner
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched--Consolidate-the-notifier-maze.patch --]
[-- Type: text/plain, Size: 7644 bytes --]

We can maintain the ordering of the scheduler cpu hotplug functionality nicely
in one notifer. Get rid of the maze.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpu.h |   12 +--
 kernel/sched/core.c |  174 ++++++++++++++++++++--------------------------------
 2 files changed, 73 insertions(+), 113 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -61,19 +61,15 @@ struct notifier_block;
 enum {
 	/*
 	 * SCHED_ACTIVE marks a cpu which is coming up active during
-	 * CPU_ONLINE and CPU_DOWN_FAILED and must be the first
-	 * notifier.  CPUSET_ACTIVE adjusts cpuset according to
-	 * cpu_active mask right after SCHED_ACTIVE.  During
-	 * CPU_DOWN_PREPARE, SCHED_INACTIVE and CPUSET_INACTIVE are
-	 * ordered in the similar way.
+	 * CPU_ONLINE and CPU_DOWN_FAILED and must be the first notifier.  Is
+	 * also cpuset according to cpu_active mask right after activating the
+	 * cpu. During CPU_DOWN_PREPARE, SCHED_INACTIVE reversed the operation.
 	 *
 	 * This ordering guarantees consistent cpu_active mask and
 	 * migration behavior to all cpu notifiers.
 	 */
 	CPU_PRI_SCHED_ACTIVE	= INT_MAX,
-	CPU_PRI_CPUSET_ACTIVE	= INT_MAX - 1,
-	CPU_PRI_SCHED_INACTIVE	= INT_MIN + 1,
-	CPU_PRI_CPUSET_INACTIVE	= INT_MIN,
+	CPU_PRI_SCHED_INACTIVE	= INT_MIN,
 
 	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5720,39 +5720,6 @@ static void set_cpu_rq_start_time(unsign
 	rq->age_stamp = sched_clock_cpu(cpu);
 }
 
-static int sched_cpu_active(struct notifier_block *nfb,
-				      unsigned long action, void *hcpu)
-{
-	int cpu = (long)hcpu;
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_FAILED:
-		set_cpu_active(cpu, true);
-		return NOTIFY_OK;
-
-	default:
-		return NOTIFY_DONE;
-	}
-}
-
-static int sched_cpu_inactive(struct notifier_block *nfb,
-					unsigned long action, void *hcpu)
-{
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_PREPARE:
-		set_cpu_active((long)hcpu, false);
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
-	}
-}
-
-int sched_cpu_starting(unsigned int cpu)
-{
-	set_cpu_rq_start_time(cpu);
-	return 0;
-}
-
 static cpumask_var_t sched_domains_tmpmask; /* sched_domains_mutex */
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -6895,10 +6862,13 @@ static void sched_init_numa(void)
 	init_numa_topology_type();
 }
 
-static void sched_domains_numa_masks_set(int cpu)
+static void sched_domains_numa_masks_set(unsigned int cpu)
 {
-	int i, j;
 	int node = cpu_to_node(cpu);
+	int i, j;
+
+	if (!sched_smp_initialized)
+		return;
 
 	for (i = 0; i < sched_domains_numa_levels; i++) {
 		for (j = 0; j < nr_node_ids; j++) {
@@ -6908,54 +6878,23 @@ static void sched_domains_numa_masks_set
 	}
 }
 
-static void sched_domains_numa_masks_clear(int cpu)
+static void sched_domains_numa_masks_clear(unsigned int cpu)
 {
 	int i, j;
+
+	if (!sched_smp_initialized)
+		return;
+
 	for (i = 0; i < sched_domains_numa_levels; i++) {
 		for (j = 0; j < nr_node_ids; j++)
 			cpumask_clear_cpu(cpu, sched_domains_numa_masks[i][j]);
 	}
 }
 
-/*
- * Update sched_domains_numa_masks[level][node] array when new cpus
- * are onlined.
- */
-static int sched_domains_numa_masks_update(struct notifier_block *nfb,
-					   unsigned long action,
-					   void *hcpu)
-{
-	int cpu = (long)hcpu;
-
-	if (!sched_smp_initialized)
-		return NOTIFY_DONE;
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
-		sched_domains_numa_masks_set(cpu);
-		break;
-
-	case CPU_DEAD:
-		sched_domains_numa_masks_clear(cpu);
-		break;
-
-	default:
-		return NOTIFY_DONE;
-	}
-
-	return NOTIFY_OK;
-}
 #else
-static inline void sched_init_numa(void)
-{
-}
-
-static int sched_domains_numa_masks_update(struct notifier_block *nfb,
-					   unsigned long action,
-					   void *hcpu)
-{
-	return 0;
-}
+static inline void sched_init_numa(void) { }
+static void sched_domains_numa_masks_set(unsigned int cpu) { }
+static void sched_domains_numa_masks_clear(unsigned int cpu) { }
 #endif /* CONFIG_NUMA */
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
@@ -7345,16 +7284,12 @@ static int num_cpus_frozen;	/* used to m
  * If we come here as part of a suspend/resume, don't touch cpusets because we
  * want to restore it back to its original state upon resume anyway.
  */
-static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
-			     void *hcpu)
+static void cpuset_cpu_active(bool frozen)
 {
 	if (!sched_smp_initialized)
-		return NOTIFY_DONE;
-
-	switch (action) {
-	case CPU_ONLINE_FROZEN:
-	case CPU_DOWN_FAILED_FROZEN:
+		return;
 
+	if (frozen) {
 		/*
 		 * num_cpus_frozen tracks how many CPUs are involved in suspend
 		 * resume sequence. As long as this is not the last online
@@ -7364,38 +7299,28 @@ static int cpuset_cpu_active(struct noti
 		num_cpus_frozen--;
 		if (likely(num_cpus_frozen)) {
 			partition_sched_domains(1, NULL, NULL);
-			break;
+			return;
 		}
-
 		/*
 		 * This is the last CPU online operation. So fall through and
 		 * restore the original sched domains by considering the
 		 * cpuset configurations.
 		 */
-
-	case CPU_ONLINE:
-		cpuset_update_active_cpus(true);
-		break;
-	default:
-		return NOTIFY_DONE;
 	}
-	return NOTIFY_OK;
+	cpuset_update_active_cpus(true);
 }
 
-static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
-			       void *hcpu)
+static int cpuset_cpu_inactive(unsigned int cpu, bool frozen)
 {
 	unsigned long flags;
-	long cpu = (long)hcpu;
 	struct dl_bw *dl_b;
 	bool overflow;
 	int cpus;
 
 	if (!sched_smp_initialized)
-		return NOTIFY_DONE;
+		return 0;
 
-	switch (action) {
-	case CPU_DOWN_PREPARE:
+	if (!frozen) {
 		rcu_read_lock_sched();
 		dl_b = dl_bw_of(cpu);
 
@@ -7407,17 +7332,60 @@ static int cpuset_cpu_inactive(struct no
 		rcu_read_unlock_sched();
 
 		if (overflow)
-			return notifier_from_errno(-EBUSY);
+			return -EBUSY;
 		cpuset_update_active_cpus(false);
-		break;
-	case CPU_DOWN_PREPARE_FROZEN:
+	} else {
 		num_cpus_frozen++;
 		partition_sched_domains(1, NULL, NULL);
-		break;
+	}
+	return 0;
+}
+
+static int sched_cpu_active(struct notifier_block *nfb, unsigned long action,
+			    void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		set_cpu_active(cpu, true);
+		sched_domains_numa_masks_set(cpu);
+		cpuset_cpu_active(action & CPU_TASKS_FROZEN);
+		return NOTIFY_OK;
 	default:
 		return NOTIFY_DONE;
 	}
-	return NOTIFY_OK;
+}
+
+static int sched_cpu_inactive(struct notifier_block *nfb,
+					unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	int ret;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_PREPARE:
+		set_cpu_active(cpu, false);
+		ret = cpuset_cpu_inactive(cpu, action & CPU_TASKS_FROZEN);
+		if (ret) {
+			set_cpu_active(cpu, true);
+			return notifier_from_errno(ret);
+		}
+		return NOTIFY_OK;
+
+	case CPU_DEAD:
+		sched_domains_numa_masks_clear(cpu);
+		return NOTIFY_OK;
+	default:
+		return NOTIFY_DONE;
+	}
+}
+
+int sched_cpu_starting(unsigned int cpu)
+{
+	set_cpu_rq_start_time(cpu);
+	return 0;
 }
 
 void __init sched_init_smp(void)
@@ -7469,10 +7437,6 @@ static int __init migration_init(void)
 	cpu_notifier(sched_cpu_active, CPU_PRI_SCHED_ACTIVE);
 	cpu_notifier(sched_cpu_inactive, CPU_PRI_SCHED_INACTIVE);
 
-	hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
-	hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
-	hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
-
 	return 0;
 }
 early_initcall(migration_init);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 04/15] sched: Allow hotplug notifiers to be setup early
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-03-10 12:04 ` [patch 05/15] sched: Consolidate the notifier maze Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 06/15] sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE Thomas Gleixner
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched--Allow-hotplug-notifiers-to-be-setup-early.patch --]
[-- Type: text/plain, Size: 3589 bytes --]

Prevent the SMP scheduler related notifiers to be executed before the smp
scheduler is initialized and install them early.

This is a preparatory change for further consolidation of the hotplug notifier
maze.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   59 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 23 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5256,6 +5256,8 @@ int task_can_attach(struct task_struct *
 
 #ifdef CONFIG_SMP
 
+static bool sched_smp_initialized __read_mostly;
+
 #ifdef CONFIG_NUMA_BALANCING
 /* Migrate current task p to target_cpu */
 int migrate_task_to(struct task_struct *p, int target_cpu)
@@ -5751,25 +5753,6 @@ int sched_cpu_starting(unsigned int cpu)
 	return 0;
 }
 
-static int __init migration_init(void)
-{
-	void *cpu = (void *)(long)smp_processor_id();
-	int err;
-
-	/* Initialize migration for the boot CPU */
-	err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
-	BUG_ON(err == NOTIFY_BAD);
-	migration_call(&migration_notifier, CPU_ONLINE, cpu);
-	register_cpu_notifier(&migration_notifier);
-
-	/* Register cpu active notifiers */
-	cpu_notifier(sched_cpu_active, CPU_PRI_SCHED_ACTIVE);
-	cpu_notifier(sched_cpu_inactive, CPU_PRI_SCHED_INACTIVE);
-
-	return 0;
-}
-early_initcall(migration_init);
-
 static cpumask_var_t sched_domains_tmpmask; /* sched_domains_mutex */
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -6944,6 +6927,9 @@ static int sched_domains_numa_masks_upda
 {
 	int cpu = (long)hcpu;
 
+	if (!sched_smp_initialized)
+		return NOTIFY_DONE;
+
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_ONLINE:
 		sched_domains_numa_masks_set(cpu);
@@ -7362,6 +7348,9 @@ static int num_cpus_frozen;	/* used to m
 static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
 			     void *hcpu)
 {
+	if (!sched_smp_initialized)
+		return NOTIFY_DONE;
+
 	switch (action) {
 	case CPU_ONLINE_FROZEN:
 	case CPU_DOWN_FAILED_FROZEN:
@@ -7402,6 +7391,9 @@ static int cpuset_cpu_inactive(struct no
 	bool overflow;
 	int cpus;
 
+	if (!sched_smp_initialized)
+		return NOTIFY_DONE;
+
 	switch (action) {
 	case CPU_DOWN_PREPARE:
 		rcu_read_lock_sched();
@@ -7449,10 +7441,6 @@ void __init sched_init_smp(void)
 		cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
 	mutex_unlock(&sched_domains_mutex);
 
-	hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
-	hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
-	hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
-
 	init_hrtick();
 
 	/* Move init over to a non-isolated CPU */
@@ -7463,7 +7451,32 @@ void __init sched_init_smp(void)
 
 	init_sched_rt_class();
 	init_sched_dl_class();
+	sched_smp_initialized = true;
 }
+
+static int __init migration_init(void)
+{
+	void *cpu = (void *)(long)smp_processor_id();
+	int err;
+
+	/* Initialize migration for the boot CPU */
+	err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
+	BUG_ON(err == NOTIFY_BAD);
+	migration_call(&migration_notifier, CPU_ONLINE, cpu);
+	register_cpu_notifier(&migration_notifier);
+
+	/* Register cpu active notifiers */
+	cpu_notifier(sched_cpu_active, CPU_PRI_SCHED_ACTIVE);
+	cpu_notifier(sched_cpu_inactive, CPU_PRI_SCHED_INACTIVE);
+
+	hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
+	hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
+	hotcpu_notifier(cpuset_cpu_inactive, CPU_PRI_CPUSET_INACTIVE);
+
+	return 0;
+}
+early_initcall(migration_init);
+
 #else
 void __init sched_init_smp(void)
 {

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 06/15] sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-03-10 12:04 ` [patch 04/15] sched: Allow hotplug notifiers to be setup early Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 07/15] sched/hotplug: Convert cpu_[in]active notifiers to state machine Thomas Gleixner
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched--Move-sched_domains_numa_masks_clear---to-DOWN_PREPARE.patch --]
[-- Type: text/plain, Size: 516 bytes --]

This is the last operation on the cpu before vanishing. No point in calling
that on CPU_DEAD.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7372,9 +7372,6 @@ static int sched_cpu_inactive(struct not
 			set_cpu_active(cpu, true);
 			return notifier_from_errno(ret);
 		}
-		return NOTIFY_OK;
-
-	case CPU_DEAD:
 		sched_domains_numa_masks_clear(cpu);
 		return NOTIFY_OK;
 	default:

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false)
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-03-10 12:04 ` [patch 07/15] sched/hotplug: Convert cpu_[in]active notifiers to state machine Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] sched/hotplug: " tip-bot for Peter Zijlstra
  2016-05-06 13:06   ` tip-bot for Peter Zijlstra
  2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: peterz-cpuhp-stuff-1.patch --]
[-- Type: text/plain, Size: 1948 bytes --]

From: Peter Zijlstra <peterz@infradead.org>

The sync_rcu stuff is specificically for clearing bits in the active
mask, such that everybody will observe the bit cleared and will not
consider the cleared CPU for load-balancing etc.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c        |   15 ---------------
 kernel/sched/core.c |   14 ++++++++++++++
 2 files changed, 14 insertions(+), 15 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -691,21 +691,6 @@ static int takedown_cpu(unsigned int cpu
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	int err;
 
-	/*
-	 * By now we've cleared cpu_active_mask, wait for all preempt-disabled
-	 * and RCU users of this state to go away such that all new such users
-	 * will observe it.
-	 *
-	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
-	 * not imply sync_sched(), so wait for both.
-	 *
-	 * Do sync before park smpboot threads to take care the rcu boost case.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT))
-		synchronize_rcu_mult(call_rcu, call_rcu_sched);
-	else
-		synchronize_rcu();
-
 	/* Park the hotplug thread */
 	kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread);
 
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7345,6 +7345,20 @@ int sched_cpu_deactivate(unsigned int cp
 	int ret;
 
 	set_cpu_active(cpu, false);
+	/*
+	 * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU
+	 * users of this state to go away such that all new such users will
+	 * observe it.
+	 *
+	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+	 * not imply sync_sched(), so wait for both.
+	 *
+	 * Do sync before park smpboot threads to take care the rcu boost case.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT))
+		synchronize_rcu_mult(call_rcu, call_rcu_sched);
+	else
+		synchronize_rcu();
 
 	if (!sched_smp_initialized)
 		return 0;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 07/15] sched/hotplug: Convert cpu_[in]active notifiers to state machine
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-03-10 12:04 ` [patch 06/15] sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched_hotplug__Convert_cpu__in_active_notifiers_to_state_machine.patch --]
[-- Type: text/plain, Size: 6029 bytes --]

Now that we reduced everything into single notifiers, it's simple to move them
into the hotplug state machine space.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpu.h        |   12 --------
 include/linux/cpuhotplug.h |    1 
 include/linux/sched.h      |    2 +
 kernel/cpu.c               |    8 ++++-
 kernel/sched/core.c        |   67 ++++++++++++++-------------------------------
 5 files changed, 30 insertions(+), 60 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,18 +59,6 @@ struct notifier_block;
  * CPU notifier priorities.
  */
 enum {
-	/*
-	 * SCHED_ACTIVE marks a cpu which is coming up active during
-	 * CPU_ONLINE and CPU_DOWN_FAILED and must be the first notifier.  Is
-	 * also cpuset according to cpu_active mask right after activating the
-	 * cpu. During CPU_DOWN_PREPARE, SCHED_INACTIVE reversed the operation.
-	 *
-	 * This ordering guarantees consistent cpu_active mask and
-	 * migration behavior to all cpu notifiers.
-	 */
-	CPU_PRI_SCHED_ACTIVE	= INT_MAX,
-	CPU_PRI_SCHED_INACTIVE	= INT_MIN,
-
 	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
 	CPU_PRI_MIGRATION	= 10,
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -13,6 +13,7 @@ enum cpuhp_state {
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
 	CPUHP_AP_ONLINE_IDLE,
+	CPUHP_AP_ACTIVE,
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_NOTIFY_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -374,6 +374,8 @@ extern void trap_init(void);
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
 extern int sched_cpu_starting(unsigned int cpu);
+extern int sched_cpu_activate(unsigned int cpu);
+extern int sched_cpu_deactivate(unsigned int cpu);
 
 extern void sched_show_task(struct task_struct *p);
 
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -904,8 +904,6 @@ void cpuhp_online_idle(enum cpuhp_state
 
 	st->state = CPUHP_AP_ONLINE_IDLE;
 
-	/* The cpu is marked online, set it active now */
-	set_cpu_active(cpu, true);
 	/* Unpark the stopper thread and the hotplug thread of this cpu */
 	stop_machine_unpark(cpu);
 	kthread_unpark(st->thread);
@@ -1240,6 +1238,12 @@ static struct cpuhp_step cpuhp_ap_states
 	[CPUHP_AP_ONLINE] = {
 		.name			= "ap:online",
 	},
+	/* First state is scheduler control. Interrupts are enabled */
+	[CPUHP_AP_ACTIVE] = {
+		.name			= "sched:active",
+		.startup		= sched_cpu_activate,
+		.teardown		= sched_cpu_deactivate,
+	},
 	/* Handle smpboot threads park/unpark */
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot:threads",
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6867,9 +6867,6 @@ static void sched_domains_numa_masks_set
 	int node = cpu_to_node(cpu);
 	int i, j;
 
-	if (!sched_smp_initialized)
-		return;
-
 	for (i = 0; i < sched_domains_numa_levels; i++) {
 		for (j = 0; j < nr_node_ids; j++) {
 			if (node_distance(j, node) <= sched_domains_numa_distance[i])
@@ -6882,9 +6879,6 @@ static void sched_domains_numa_masks_cle
 {
 	int i, j;
 
-	if (!sched_smp_initialized)
-		return;
-
 	for (i = 0; i < sched_domains_numa_levels; i++) {
 		for (j = 0; j < nr_node_ids; j++)
 			cpumask_clear_cpu(cpu, sched_domains_numa_masks[i][j]);
@@ -7284,12 +7278,9 @@ static int num_cpus_frozen;	/* used to m
  * If we come here as part of a suspend/resume, don't touch cpusets because we
  * want to restore it back to its original state upon resume anyway.
  */
-static void cpuset_cpu_active(bool frozen)
+static void cpuset_cpu_active(void)
 {
-	if (!sched_smp_initialized)
-		return;
-
-	if (frozen) {
+	if (cpuhp_tasks_frozen) {
 		/*
 		 * num_cpus_frozen tracks how many CPUs are involved in suspend
 		 * resume sequence. As long as this is not the last online
@@ -7310,17 +7301,14 @@ static void cpuset_cpu_active(bool froze
 	cpuset_update_active_cpus(true);
 }
 
-static int cpuset_cpu_inactive(unsigned int cpu, bool frozen)
+static int cpuset_cpu_inactive(unsigned int cpu)
 {
 	unsigned long flags;
 	struct dl_bw *dl_b;
 	bool overflow;
 	int cpus;
 
-	if (!sched_smp_initialized)
-		return 0;
-
-	if (!frozen) {
+	if (!cpuhp_tasks_frozen) {
 		rcu_read_lock_sched();
 		dl_b = dl_bw_of(cpu);
 
@@ -7341,42 +7329,33 @@ static int cpuset_cpu_inactive(unsigned
 	return 0;
 }
 
-static int sched_cpu_active(struct notifier_block *nfb, unsigned long action,
-			    void *hcpu)
+int sched_cpu_activate(unsigned int cpu)
 {
-	unsigned int cpu = (unsigned long)hcpu;
+	set_cpu_active(cpu, true);
 
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_FAILED:
-	case CPU_ONLINE:
-		set_cpu_active(cpu, true);
+	if (sched_smp_initialized) {
 		sched_domains_numa_masks_set(cpu);
-		cpuset_cpu_active(action & CPU_TASKS_FROZEN);
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
+		cpuset_cpu_active();
 	}
+	return 0;
 }
 
-static int sched_cpu_inactive(struct notifier_block *nfb,
-					unsigned long action, void *hcpu)
+int sched_cpu_deactivate(unsigned int cpu)
 {
-	unsigned int cpu = (unsigned long)hcpu;
 	int ret;
 
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DOWN_PREPARE:
-		set_cpu_active(cpu, false);
-		ret = cpuset_cpu_inactive(cpu, action & CPU_TASKS_FROZEN);
-		if (ret) {
-			set_cpu_active(cpu, true);
-			return notifier_from_errno(ret);
-		}
-		sched_domains_numa_masks_clear(cpu);
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
+	set_cpu_active(cpu, false);
+
+	if (!sched_smp_initialized)
+		return 0;
+
+	ret = cpuset_cpu_inactive(cpu);
+	if (ret) {
+		set_cpu_active(cpu, true);
+		return ret;
 	}
+	sched_domains_numa_masks_clear(cpu);
+	return 0;
 }
 
 int sched_cpu_starting(unsigned int cpu)
@@ -7430,10 +7409,6 @@ static int __init migration_init(void)
 	migration_call(&migration_notifier, CPU_ONLINE, cpu);
 	register_cpu_notifier(&migration_notifier);
 
-	/* Register cpu active notifiers */
-	cpu_notifier(sched_cpu_active, CPU_PRI_SCHED_ACTIVE);
-	cpu_notifier(sched_cpu_inactive, CPU_PRI_SCHED_INACTIVE);
-
 	return 0;
 }
 early_initcall(migration_init);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (6 preceding siblings ...)
  2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
                     ` (2 more replies)
  2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
                   ` (6 subsequent siblings)
  14 siblings, 3 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched-migration--Move-calc_load_migrate---into-CPU_DYING.patch --]
[-- Type: text/plain, Size: 733 bytes --]

It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |    3 ---
 1 file changed, 3 deletions(-)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5685,9 +5685,6 @@ migration_call(struct notifier_block *nf
 		migrate_tasks(rq);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
-	case CPU_DEAD:
 		calc_load_migrate(rq);
 		break;
 #endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (7 preceding siblings ...)
  2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:06   ` tip-bot for Thomas Gleixner
  2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched-migration--Move-prepare-transition-to-SCHED_STARTING-state.patch --]
[-- Type: text/plain, Size: 1528 bytes --]

We can piggy pack that on the SCHED_STARTING state. It's not required before
the cpu actually comes online. Name the function proper as it has nothing to
do with migration.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5662,11 +5662,6 @@ migration_call(struct notifier_block *nf
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_UP_PREPARE:
-		rq->calc_load_update = calc_load_update;
-		account_reset_rq(rq);
-		break;
-
 	case CPU_ONLINE:
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
@@ -7372,9 +7367,19 @@ int sched_cpu_deactivate(unsigned int cp
 	return 0;
 }
 
+static void sched_rq_cpu_starting(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	rq->calc_load_update = calc_load_update;
+	account_reset_rq(rq);
+	update_max_interval();
+}
+
 int sched_cpu_starting(unsigned int cpu)
 {
 	set_cpu_rq_start_time(cpu);
+	sched_rq_cpu_starting(cpu);
 	return 0;
 }
 
@@ -7415,11 +7420,8 @@ void __init sched_init_smp(void)
 static int __init migration_init(void)
 {
 	void *cpu = (void *)(long)smp_processor_id();
-	int err;
 
-	/* Initialize migration for the boot CPU */
-	err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
-	BUG_ON(err == NOTIFY_BAD);
+	sched_rq_cpu_starting(smp_processor_id());
 	migration_call(&migration_notifier, CPU_ONLINE, cpu);
 	register_cpu_notifier(&migration_notifier);
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (8 preceding siblings ...)
  2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched-migration--Move-CPU_ONLINE-into-scheduler-state.patch --]
[-- Type: text/plain, Size: 1788 bytes --]

The alleged requirement that the migration notifier has a lower priority than
perf is completely undocumented and there is no indication at all that this is
true. perf does not even handle the CPU_ONLINE notification and perf really
has nothing to do with migration.

Move the CPU_ONLINE code into the sched_activate_cpu() state callback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5662,17 +5662,6 @@ migration_call(struct notifier_block *nf
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_ONLINE:
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-
-			set_rq_online(rq);
-		}
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DYING:
 		sched_ttwu_pending();
@@ -7323,12 +7312,34 @@ static int cpuset_cpu_inactive(unsigned
 
 int sched_cpu_activate(unsigned int cpu)
 {
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
 	set_cpu_active(cpu, true);
 
 	if (sched_smp_initialized) {
 		sched_domains_numa_masks_set(cpu);
 		cpuset_cpu_active();
 	}
+
+	/*
+	 * Put the rq online, if not already. This happens:
+	 *
+	 * 1) In the early boot process, because we build the real domains
+	 *    after all cpus have been brought up.
+	 *
+	 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the
+	 *    domains.
+	 */
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_online(rq);
+	}
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+
+	update_max_interval();
+
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (9 preceding siblings ...)
  2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
                   ` (3 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched_hotplug__Move_migration_CPU_DYING_to_sched_cpu_dying__.patch --]
[-- Type: text/plain, Size: 3781 bytes --]

Remove the hotplug notifier and make it an explicit state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpu.h   |    2 -
 include/linux/sched.h |    1 
 kernel/cpu.c          |    2 -
 kernel/sched/core.c   |   70 ++++++++++++++------------------------------------
 4 files changed, 22 insertions(+), 53 deletions(-)

--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,9 +59,7 @@ struct notifier_block;
  * CPU notifier priorities.
  */
 enum {
-	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
-	CPU_PRI_MIGRATION	= 10,
 
 	/* bring up workqueues before normal notifiers and down after */
 	CPU_PRI_WORKQUEUE_UP	= 5,
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -374,6 +374,7 @@ extern void trap_init(void);
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
 extern int sched_cpu_starting(unsigned int cpu);
+extern int sched_cpu_dying(unsigned int cpu);
 extern int sched_cpu_activate(unsigned int cpu);
 extern int sched_cpu_deactivate(unsigned int cpu);
 
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1204,7 +1204,7 @@ static struct cpuhp_step cpuhp_ap_states
 	[CPUHP_AP_SCHED_STARTING] = {
 		.name			= "sched:starting",
 		.startup		= sched_cpu_starting,
-		.teardown		= NULL,
+		.teardown		= sched_cpu_dying,
 	},
 	/*
 	 * Low level startup/teardown notifiers. Run with interrupts
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5649,51 +5649,6 @@ static void set_rq_offline(struct rq *rq
 	}
 }
 
-/*
- * migration_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
-static int
-migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (long)hcpu;
-	unsigned long flags;
-	struct rq *rq = cpu_rq(cpu);
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-
-#ifdef CONFIG_HOTPLUG_CPU
-	case CPU_DYING:
-		sched_ttwu_pending();
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-			set_rq_offline(rq);
-		}
-		migrate_tasks(rq);
-		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		calc_load_migrate(rq);
-		break;
-#endif
-	}
-
-	update_max_interval();
-
-	return NOTIFY_OK;
-}
-
-/*
- * Register at high priority so that task migration (migrate_all_tasks)
- * happens before everything else.  This has to be lower priority than
- * the notifier in the perf_event subsystem, though.
- */
-static struct notifier_block migration_notifier = {
-	.notifier_call = migration_call,
-	.priority = CPU_PRI_MIGRATION,
-};
-
 static void set_cpu_rq_start_time(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7391,6 +7346,26 @@ int sched_cpu_starting(unsigned int cpu)
 	return 0;
 }
 
+int sched_cpu_dying(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	/* Handle pending wakeups and then migrate everything off */
+	sched_ttwu_pending();
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_offline(rq);
+	}
+	migrate_tasks(rq);
+	BUG_ON(rq->nr_running != 1);
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	calc_load_migrate(rq);
+	update_max_interval();
+	return 0;
+}
+
 void __init sched_init_smp(void)
 {
 	cpumask_var_t non_isolated_cpus;
@@ -7427,12 +7402,7 @@ void __init sched_init_smp(void)
 
 static int __init migration_init(void)
 {
-	void *cpu = (void *)(long)smp_processor_id();
-
 	sched_rq_cpu_starting(smp_processor_id());
-	migration_call(&migration_notifier, CPU_ONLINE, cpu);
-	register_cpu_notifier(&migration_notifier);
-
 	return 0;
 }
 early_initcall(migration_init);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 14/15] sched/fair: Make ilb_notifier an explicit call
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (10 preceding siblings ...)
  2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched_fair__Make_ilb_notifier_an_explicit_call.patch --]
[-- Type: text/plain, Size: 2133 bytes --]

No need for an extra notifier.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c  |    1 +
 kernel/sched/fair.c  |   15 +--------------
 kernel/sched/sched.h |    4 ++++
 3 files changed, 6 insertions(+), 14 deletions(-)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7363,6 +7363,7 @@ int sched_cpu_dying(unsigned int cpu)
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 	calc_load_migrate(rq);
 	update_max_interval();
+	nohz_balance_exit_idle(cpu);
 	return 0;
 }
 
Index: b/kernel/sched/fair.c
===================================================================
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7602,7 +7602,7 @@ static void nohz_balancer_kick(void)
 	return;
 }
 
-static inline void nohz_balance_exit_idle(int cpu)
+void nohz_balance_exit_idle(unsigned int cpu)
 {
 	if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))) {
 		/*
@@ -7675,18 +7675,6 @@ void nohz_balance_enter_idle(int cpu)
 	atomic_inc(&nohz.nr_cpus);
 	set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
-
-static int sched_ilb_notifier(struct notifier_block *nfb,
-					unsigned long action, void *hcpu)
-{
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DYING:
-		nohz_balance_exit_idle(smp_processor_id());
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
-	}
-}
 #endif
 
 static DEFINE_SPINLOCK(balancing);
@@ -8486,7 +8474,6 @@ void show_numa_stats(struct task_struct
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
-	cpu_notifier(sched_ilb_notifier, 0);
 #endif
 #endif /* SMP */
 
Index: b/kernel/sched/sched.h
===================================================================
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1688,6 +1688,10 @@ enum rq_nohz_flag_bits {
 };
 
 #define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)
+
+extern void nohz_balance_exit_idle(unsigned int cpu);
+#else
+static inline void nohz_balance_exit_idle(unsigned int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 13/15] sched/hotplug: Make activate() the last hotplug step
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (11 preceding siblings ...)
  2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier an explicit call Thomas Gleixner
  2016-04-04  7:54 ` [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Peter Zijlstra
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched_hotplug__Make_activate___the_last_hotplug_step.patch --]
[-- Type: text/plain, Size: 1803 bytes --]

The scheduler can handle per cpu threads before the cpu is set to active and
it does not allow user space threads on the cpu before active is
set. Attaching to the scheduling domains is also not required before user
space threads can be handled.

Move the activation to the end of the hotplug state space. That also means
that deactivation is the first action when a cpu is shut down.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuhotplug.h |    2 +-
 kernel/cpu.c               |   13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -13,11 +13,11 @@ enum cpuhp_state {
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
 	CPUHP_AP_ONLINE_IDLE,
-	CPUHP_AP_ACTIVE,
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_NOTIFY_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
+	CPUHP_AP_ACTIVE,
 	CPUHP_ONLINE,
 };
 
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1223,12 +1223,6 @@ static struct cpuhp_step cpuhp_ap_states
 	[CPUHP_AP_ONLINE] = {
 		.name			= "ap:online",
 	},
-	/* First state is scheduler control. Interrupts are enabled */
-	[CPUHP_AP_ACTIVE] = {
-		.name			= "sched:active",
-		.startup		= sched_cpu_activate,
-		.teardown		= sched_cpu_deactivate,
-	},
 	/* Handle smpboot threads park/unpark */
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot:threads",
@@ -1249,6 +1243,13 @@ static struct cpuhp_step cpuhp_ap_states
 	 * The dynamically registered state space is here
 	 */
 
+	/* Last state is scheduler control setting the cpu active */
+	[CPUHP_AP_ACTIVE] = {
+		.name			= "sched:active",
+		.startup		= sched_cpu_activate,
+		.teardown		= sched_cpu_deactivate,
+	},
+
 	/* CPU is fully up and running. */
 	[CPUHP_ONLINE] = {
 		.name			= "online",

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [patch 15/15] sched: Make hrtick_notifier an explicit call
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (12 preceding siblings ...)
  2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
@ 2016-03-10 12:04 ` Thomas Gleixner
  2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  2016-04-04  7:54 ` [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Peter Zijlstra
  14 siblings, 2 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-03-10 12:04 UTC (permalink / raw)
  To: LKML; +Cc: Peter Zijlstra, Ingo Molnar, rt

[-- Attachment #1: sched__Make_hrtick_notifier_an_explicit_call.patch --]
[-- Type: text/plain, Size: 1973 bytes --]

No need for an extra notifier. We don't need to handle all these states. It's
sufficient to kill the timer when the cpu dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c |   34 +---------------------------------
 1 file changed, 1 insertion(+), 33 deletions(-)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -381,29 +381,6 @@ void hrtick_start(struct rq *rq, u64 del
 	}
 }
 
-static int
-hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (int)(long)hcpu;
-
-	switch (action) {
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		hrtick_clear(cpu_rq(cpu));
-		return NOTIFY_OK;
-	}
-
-	return NOTIFY_DONE;
-}
-
-static __init void init_hrtick(void)
-{
-	hotcpu_notifier(hotplug_hrtick, 0);
-}
 #else
 /*
  * Called to set the hrtick timer state.
@@ -420,10 +397,6 @@ void hrtick_start(struct rq *rq, u64 del
 	hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay),
 		      HRTIMER_MODE_REL_PINNED);
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif /* CONFIG_SMP */
 
 static void init_rq_hrtick(struct rq *rq)
@@ -447,10 +420,6 @@ static inline void hrtick_clear(struct r
 static inline void init_rq_hrtick(struct rq *rq)
 {
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif	/* CONFIG_SCHED_HRTICK */
 
 /*
@@ -7364,6 +7333,7 @@ int sched_cpu_dying(unsigned int cpu)
 	calc_load_migrate(rq);
 	update_max_interval();
 	nohz_balance_exit_idle(cpu);
+	hrtick_clear(rq);
 	return 0;
 }
 
@@ -7388,8 +7358,6 @@ void __init sched_init_smp(void)
 		cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
 	mutex_unlock(&sched_domains_mutex);
 
-	init_hrtick();
-
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0)
 		BUG();

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine
  2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
                   ` (13 preceding siblings ...)
  2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier an explicit call Thomas Gleixner
@ 2016-04-04  7:54 ` Peter Zijlstra
  14 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2016-04-04  7:54 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, rt

On Thu, Mar 10, 2016 at 12:04:36PM -0000, Thomas Gleixner wrote:
> The following series contains:
> 
>     - cleanup of the notifier maze in the scheduler and conversion
>       to the state machine. 
> 
>     - Handling of cpu active is distangled from cpu online and moved to the
>       end of the hotplug process.
> 
> Patches are against tip:master rather than against tip:smp/hotplug because
> there is an interaction with the accounting fix pending in sched/urgent.

I'm thinking you want this through the hotplug tree, so

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Move sync_rcu to be with set_cpu_active(false)
  2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
@ 2016-05-05 11:24   ` tip-bot for Peter Zijlstra
  2016-05-06 13:06   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-05-05 11:24 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, hpa, tglx, mingo, linux-kernel

Commit-ID:  c080b5a62379f0d26a5f3bc3eb80c93fdc888be4
Gitweb:     http://git.kernel.org/tip/c080b5a62379f0d26a5f3bc3eb80c93fdc888be4
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 10 Mar 2016 12:54:14 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:53 +0200

sched/hotplug: Move sync_rcu to be with set_cpu_active(false)

The sync_rcu stuff is specificically for clearing bits in the active
mask, such that everybody will observe the bit cleared and will not
consider the cleared CPU for load-balancing etc.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.169219710@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/cpu.c        | 15 ---------------
 kernel/sched/core.c | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 15402b7..c134a35 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -703,21 +703,6 @@ static int takedown_cpu(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	int err;
 
-	/*
-	 * By now we've cleared cpu_active_mask, wait for all preempt-disabled
-	 * and RCU users of this state to go away such that all new such users
-	 * will observe it.
-	 *
-	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
-	 * not imply sync_sched(), so wait for both.
-	 *
-	 * Do sync before park smpboot threads to take care the rcu boost case.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT))
-		synchronize_rcu_mult(call_rcu, call_rcu_sched);
-	else
-		synchronize_rcu();
-
 	/* Park the smpboot threads */
 	kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread);
 	smpboot_park_threads(cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 73bcd93..0a31078 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7112,6 +7112,20 @@ int sched_cpu_deactivate(unsigned int cpu)
 	int ret;
 
 	set_cpu_active(cpu, false);
+	/*
+	 * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU
+	 * users of this state to go away such that all new such users will
+	 * observe it.
+	 *
+	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+	 * not imply sync_sched(), so wait for both.
+	 *
+	 * Do sync before park smpboot threads to take care the rcu boost case.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT))
+		synchronize_rcu_mult(call_rcu, call_rcu_sched);
+	else
+		synchronize_rcu();
 
 	if (!sched_smp_initialized)
 		return 0;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move prepare transition to SCHED_STARTING state
  2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
@ 2016-05-05 11:24   ` tip-bot for Thomas Gleixner
  2016-05-06 13:06   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:24 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, tglx, hpa, mingo, linux-kernel

Commit-ID:  42ee97d78a69d580d240e79b69a3bb151472270c
Gitweb:     http://git.kernel.org/tip/42ee97d78a69d580d240e79b69a3bb151472270c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:15 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:54 +0200

sched/migration: Move prepare transition to SCHED_STARTING state

We can piggy pack that on the SCHED_STARTING state. It's not required before
the cpu actually comes online. Name the function proper as it has nothing to
do with migration.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.248226511@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/sched/core.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0a31078..bafc308 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5424,11 +5424,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_UP_PREPARE:
-		rq->calc_load_update = calc_load_update;
-		account_reset_rq(rq);
-		break;
-
 	case CPU_ONLINE:
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
@@ -7139,9 +7134,19 @@ int sched_cpu_deactivate(unsigned int cpu)
 	return 0;
 }
 
+static void sched_rq_cpu_starting(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	rq->calc_load_update = calc_load_update;
+	account_reset_rq(rq);
+	update_max_interval();
+}
+
 int sched_cpu_starting(unsigned int cpu)
 {
 	set_cpu_rq_start_time(cpu);
+	sched_rq_cpu_starting(cpu);
 	return 0;
 }
 
@@ -7182,11 +7187,8 @@ void __init sched_init_smp(void)
 static int __init migration_init(void)
 {
 	void *cpu = (void *)(long)smp_processor_id();
-	int err;
 
-	/* Initialize migration for the boot CPU */
-	err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
-	BUG_ON(err == NOTIFY_BAD);
+	sched_rq_cpu_starting(smp_processor_id());
 	migration_call(&migration_notifier, CPU_ONLINE, cpu);
 	register_cpu_notifier(&migration_notifier);
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
@ 2016-05-05 11:24   ` tip-bot for Thomas Gleixner
  2016-05-06 13:06   ` tip-bot for Thomas Gleixner
  2016-07-12  4:37   ` [patch 10/15] " Anton Blanchard
  2 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:24 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, hpa, tglx, linux-kernel, peterz

Commit-ID:  5dd44d07d3d373f0f3981b506152e3d7df5f5b75
Gitweb:     http://git.kernel.org/tip/5dd44d07d3d373f0f3981b506152e3d7df5f5b75
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:16 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:54 +0200

sched/migration: Move calc_load_migrate() into CPU_DYING

It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.328739226@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/sched/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bafc308..688e8a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5447,9 +5447,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_tasks(rq);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
-	case CPU_DEAD:
 		calc_load_migrate(rq);
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move CPU_ONLINE into scheduler state
  2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
@ 2016-05-05 11:25   ` tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, tglx, peterz, hpa, mingo

Commit-ID:  359aef5a0f04cf4bb62ae9e85a09b1b3608577ce
Gitweb:     http://git.kernel.org/tip/359aef5a0f04cf4bb62ae9e85a09b1b3608577ce
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:17 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:54 +0200

sched/migration: Move CPU_ONLINE into scheduler state

The alleged requirement that the migration notifier has a lower priority than
perf is completely undocumented and there is no indication at all that this is
true. perf does not even handle the CPU_ONLINE notification and perf really
has nothing to do with migration.

Move the CPU_ONLINE code into the sched_activate_cpu() state callback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.421743581@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/sched/core.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 688e8a8..8d8d903 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5424,17 +5424,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_ONLINE:
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-
-			set_rq_online(rq);
-		}
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DYING:
 		sched_ttwu_pending();
@@ -7090,12 +7079,34 @@ static int cpuset_cpu_inactive(unsigned int cpu)
 
 int sched_cpu_activate(unsigned int cpu)
 {
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
 	set_cpu_active(cpu, true);
 
 	if (sched_smp_initialized) {
 		sched_domains_numa_masks_set(cpu);
 		cpuset_cpu_active();
 	}
+
+	/*
+	 * Put the rq online, if not already. This happens:
+	 *
+	 * 1) In the early boot process, because we build the real domains
+	 *    after all cpus have been brought up.
+	 *
+	 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the
+	 *    domains.
+	 */
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_online(rq);
+	}
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+
+	update_max_interval();
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
  2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
@ 2016-05-05 11:25   ` tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, tglx, peterz, linux-kernel, hpa

Commit-ID:  9c793cb19c6ab8812c80a4897e6d7554fa9f77f1
Gitweb:     http://git.kernel.org/tip/9c793cb19c6ab8812c80a4897e6d7554fa9f77f1
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:18 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:54 +0200

sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()

Remove the hotplug notifier and make it an explicit state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.502222097@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/cpu.h   |  2 --
 include/linux/sched.h |  1 +
 kernel/cpu.c          |  2 +-
 kernel/sched/core.c   | 70 +++++++++++++++------------------------------------
 4 files changed, 22 insertions(+), 53 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index b22b000..21597dc 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,9 +59,7 @@ struct notifier_block;
  * CPU notifier priorities.
  */
 enum {
-	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
-	CPU_PRI_MIGRATION	= 10,
 
 	/* bring up workqueues before normal notifiers and down after */
 	CPU_PRI_WORKQUEUE_UP	= 5,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1e5f961..0e9e18a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -373,6 +373,7 @@ extern void trap_init(void);
 extern void update_process_times(int user);
 extern void scheduler_tick(void);
 extern int sched_cpu_starting(unsigned int cpu);
+extern int sched_cpu_dying(unsigned int cpu);
 extern int sched_cpu_activate(unsigned int cpu);
 extern int sched_cpu_deactivate(unsigned int cpu);
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c134a35..d6eeb8c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1223,7 +1223,7 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	[CPUHP_AP_SCHED_STARTING] = {
 		.name			= "sched:starting",
 		.startup		= sched_cpu_starting,
-		.teardown		= NULL,
+		.teardown		= sched_cpu_dying,
 	},
 	/*
 	 * Low level startup/teardown notifiers. Run with interrupts
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d8d903..db92285 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5411,51 +5411,6 @@ static void set_rq_offline(struct rq *rq)
 	}
 }
 
-/*
- * migration_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
-static int
-migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (long)hcpu;
-	unsigned long flags;
-	struct rq *rq = cpu_rq(cpu);
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-
-#ifdef CONFIG_HOTPLUG_CPU
-	case CPU_DYING:
-		sched_ttwu_pending();
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-			set_rq_offline(rq);
-		}
-		migrate_tasks(rq);
-		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		calc_load_migrate(rq);
-		break;
-#endif
-	}
-
-	update_max_interval();
-
-	return NOTIFY_OK;
-}
-
-/*
- * Register at high priority so that task migration (migrate_all_tasks)
- * happens before everything else.  This has to be lower priority than
- * the notifier in the perf_event subsystem, though.
- */
-static struct notifier_block migration_notifier = {
-	.notifier_call = migration_call,
-	.priority = CPU_PRI_MIGRATION,
-};
-
 static void set_cpu_rq_start_time(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7158,6 +7113,26 @@ int sched_cpu_starting(unsigned int cpu)
 	return 0;
 }
 
+int sched_cpu_dying(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	/* Handle pending wakeups and then migrate everything off */
+	sched_ttwu_pending();
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_offline(rq);
+	}
+	migrate_tasks(rq);
+	BUG_ON(rq->nr_running != 1);
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	calc_load_migrate(rq);
+	update_max_interval();
+	return 0;
+}
+
 void __init sched_init_smp(void)
 {
 	cpumask_var_t non_isolated_cpus;
@@ -7194,12 +7169,7 @@ void __init sched_init_smp(void)
 
 static int __init migration_init(void)
 {
-	void *cpu = (void *)(long)smp_processor_id();
-
 	sched_rq_cpu_starting(smp_processor_id());
-	migration_call(&migration_notifier, CPU_ONLINE, cpu);
-	register_cpu_notifier(&migration_notifier);
-
 	return 0;
 }
 early_initcall(migration_init);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Make activate() the last hotplug step
  2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
@ 2016-05-05 11:25   ` tip-bot for Thomas Gleixner
  2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:25 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, peterz, linux-kernel, mingo, hpa

Commit-ID:  b4f43a28647291a7dc1773924b77bc5ee7eccb16
Gitweb:     http://git.kernel.org/tip/b4f43a28647291a7dc1773924b77bc5ee7eccb16
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:19 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:54 +0200

sched/hotplug: Make activate() the last hotplug step

The scheduler can handle per cpu threads before the cpu is set to active and
it does not allow user space threads on the cpu before active is
set. Attaching to the scheduling domains is also not required before user
space threads can be handled.

Move the activation to the end of the hotplug state space. That also means
that deactivation is the first action when a cpu is shut down.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.597477199@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/cpuhotplug.h |  2 +-
 kernel/cpu.c               | 13 +++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9e07468..386374d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -13,11 +13,11 @@ enum cpuhp_state {
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
 	CPUHP_AP_ONLINE_IDLE,
-	CPUHP_AP_ACTIVE,
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_NOTIFY_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
+	CPUHP_AP_ACTIVE,
 	CPUHP_ONLINE,
 };
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d6eeb8c..6180dd6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1242,12 +1242,6 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	[CPUHP_AP_ONLINE] = {
 		.name			= "ap:online",
 	},
-	/* First state is scheduler control. Interrupts are enabled */
-	[CPUHP_AP_ACTIVE] = {
-		.name			= "sched:active",
-		.startup		= sched_cpu_activate,
-		.teardown		= sched_cpu_deactivate,
-	},
 	/* Handle smpboot threads park/unpark */
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot:threads",
@@ -1269,6 +1263,13 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	 * The dynamically registered state space is here
 	 */
 
+	/* Last state is scheduler control setting the cpu active */
+	[CPUHP_AP_ACTIVE] = {
+		.name			= "sched:active",
+		.startup		= sched_cpu_activate,
+		.teardown		= sched_cpu_deactivate,
+	},
+
 	/* CPU is fully up and running. */
 	[CPUHP_ONLINE] = {
 		.name			= "online",

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/fair: Make ilb_notifier an explicit call
  2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
@ 2016-05-05 11:26   ` tip-bot for Thomas Gleixner
  2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, tglx, hpa, mingo, linux-kernel

Commit-ID:  95217bbf05fe4197e810757261332413fc5764e5
Gitweb:     http://git.kernel.org/tip/95217bbf05fe4197e810757261332413fc5764e5
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:20 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:55 +0200

sched/fair: Make ilb_notifier an explicit call

No need for an extra notifier.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.693720241@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/sched/core.c  |  1 +
 kernel/sched/fair.c  | 15 +--------------
 kernel/sched/sched.h |  4 ++++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index db92285..8d59c31 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7130,6 +7130,7 @@ int sched_cpu_dying(unsigned int cpu)
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 	calc_load_migrate(rq);
 	update_max_interval();
+	nohz_balance_exit_idle(cpu);
 	return 0;
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e6..8b6db36 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7711,7 +7711,7 @@ static void nohz_balancer_kick(void)
 	return;
 }
 
-static inline void nohz_balance_exit_idle(int cpu)
+void nohz_balance_exit_idle(unsigned int cpu)
 {
 	if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))) {
 		/*
@@ -7784,18 +7784,6 @@ void nohz_balance_enter_idle(int cpu)
 	atomic_inc(&nohz.nr_cpus);
 	set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
-
-static int sched_ilb_notifier(struct notifier_block *nfb,
-					unsigned long action, void *hcpu)
-{
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DYING:
-		nohz_balance_exit_idle(smp_processor_id());
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
-	}
-}
 #endif
 
 static DEFINE_SPINLOCK(balancing);
@@ -8600,7 +8588,6 @@ __init void init_sched_fair_class(void)
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
-	cpu_notifier(sched_ilb_notifier, 0);
 #endif
 #endif /* SMP */
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..16a27b6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,10 @@ enum rq_nohz_flag_bits {
 };
 
 #define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)
+
+extern void nohz_balance_exit_idle(unsigned int cpu);
+#else
+static inline void nohz_balance_exit_idle(unsigned int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched: Make hrtick_notifier an explicit call
  2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier an explicit call Thomas Gleixner
@ 2016-05-05 11:26   ` tip-bot for Thomas Gleixner
  2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-05 11:26 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, peterz, mingo, linux-kernel, hpa

Commit-ID:  3a85aca729d63a1d09795b1e18b45890fe2ab0bf
Gitweb:     http://git.kernel.org/tip/3a85aca729d63a1d09795b1e18b45890fe2ab0bf
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:21 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 5 May 2016 13:17:55 +0200

sched: Make hrtick_notifier an explicit call

No need for an extra notifier. We don't need to handle all these states. It's
sufficient to kill the timer when the cpu dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.770528462@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 kernel/sched/core.c | 34 +---------------------------------
 1 file changed, 1 insertion(+), 33 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d59c31..2500ac1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -249,29 +249,6 @@ void hrtick_start(struct rq *rq, u64 delay)
 	}
 }
 
-static int
-hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (int)(long)hcpu;
-
-	switch (action) {
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		hrtick_clear(cpu_rq(cpu));
-		return NOTIFY_OK;
-	}
-
-	return NOTIFY_DONE;
-}
-
-static __init void init_hrtick(void)
-{
-	hotcpu_notifier(hotplug_hrtick, 0);
-}
 #else
 /*
  * Called to set the hrtick timer state.
@@ -288,10 +265,6 @@ void hrtick_start(struct rq *rq, u64 delay)
 	hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay),
 		      HRTIMER_MODE_REL_PINNED);
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif /* CONFIG_SMP */
 
 static void init_rq_hrtick(struct rq *rq)
@@ -315,10 +288,6 @@ static inline void hrtick_clear(struct rq *rq)
 static inline void init_rq_hrtick(struct rq *rq)
 {
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif	/* CONFIG_SCHED_HRTICK */
 
 /*
@@ -7131,6 +7100,7 @@ int sched_cpu_dying(unsigned int cpu)
 	calc_load_migrate(rq);
 	update_max_interval();
 	nohz_balance_exit_idle(cpu);
+	hrtick_clear(rq);
 	return 0;
 }
 
@@ -7155,8 +7125,6 @@ void __init sched_init_smp(void)
 		cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
 	mutex_unlock(&sched_domains_mutex);
 
-	init_hrtick();
-
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0)
 		BUG();

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Move sync_rcu to be with set_cpu_active(false)
  2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] sched/hotplug: " tip-bot for Peter Zijlstra
@ 2016-05-06 13:06   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Peter Zijlstra @ 2016-05-06 13:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, tglx, mingo, peterz

Commit-ID:  b2454caa8977ade27292a71f2def5e403e24b4d5
Gitweb:     http://git.kernel.org/tip/b2454caa8977ade27292a71f2def5e403e24b4d5
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 10 Mar 2016 12:54:14 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:24 +0200

sched/hotplug: Move sync_rcu to be with set_cpu_active(false)

The sync_rcu stuff is specificically for clearing bits in the active
mask, such that everybody will observe the bit cleared and will not
consider the cleared CPU for load-balancing etc.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.169219710@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c        | 15 ---------------
 kernel/sched/core.c | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 15402b7..c134a35 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -703,21 +703,6 @@ static int takedown_cpu(unsigned int cpu)
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	int err;
 
-	/*
-	 * By now we've cleared cpu_active_mask, wait for all preempt-disabled
-	 * and RCU users of this state to go away such that all new such users
-	 * will observe it.
-	 *
-	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
-	 * not imply sync_sched(), so wait for both.
-	 *
-	 * Do sync before park smpboot threads to take care the rcu boost case.
-	 */
-	if (IS_ENABLED(CONFIG_PREEMPT))
-		synchronize_rcu_mult(call_rcu, call_rcu_sched);
-	else
-		synchronize_rcu();
-
 	/* Park the smpboot threads */
 	kthread_park(per_cpu_ptr(&cpuhp_state, cpu)->thread);
 	smpboot_park_threads(cpu);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 73bcd93..0a31078 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7112,6 +7112,20 @@ int sched_cpu_deactivate(unsigned int cpu)
 	int ret;
 
 	set_cpu_active(cpu, false);
+	/*
+	 * We've cleared cpu_active_mask, wait for all preempt-disabled and RCU
+	 * users of this state to go away such that all new such users will
+	 * observe it.
+	 *
+	 * For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
+	 * not imply sync_sched(), so wait for both.
+	 *
+	 * Do sync before park smpboot threads to take care the rcu boost case.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT))
+		synchronize_rcu_mult(call_rcu, call_rcu_sched);
+	else
+		synchronize_rcu();
 
 	if (!sched_smp_initialized)
 		return 0;

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move prepare transition to SCHED_STARTING state
  2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:06   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, tglx, hpa, peterz, mingo

Commit-ID:  94baf7a5d882cde0b4d591f4ab89cc32ee39ac6a
Gitweb:     http://git.kernel.org/tip/94baf7a5d882cde0b4d591f4ab89cc32ee39ac6a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:15 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:24 +0200

sched/migration: Move prepare transition to SCHED_STARTING state

We can piggy pack that on the SCHED_STARTING state. It's not required before
the cpu actually comes online. Name the function proper as it has nothing to
do with migration.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.248226511@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0a31078..bafc308 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5424,11 +5424,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_UP_PREPARE:
-		rq->calc_load_update = calc_load_update;
-		account_reset_rq(rq);
-		break;
-
 	case CPU_ONLINE:
 		/* Update our root-domain */
 		raw_spin_lock_irqsave(&rq->lock, flags);
@@ -7139,9 +7134,19 @@ int sched_cpu_deactivate(unsigned int cpu)
 	return 0;
 }
 
+static void sched_rq_cpu_starting(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	rq->calc_load_update = calc_load_update;
+	account_reset_rq(rq);
+	update_max_interval();
+}
+
 int sched_cpu_starting(unsigned int cpu)
 {
 	set_cpu_rq_start_time(cpu);
+	sched_rq_cpu_starting(cpu);
 	return 0;
 }
 
@@ -7182,11 +7187,8 @@ void __init sched_init_smp(void)
 static int __init migration_init(void)
 {
 	void *cpu = (void *)(long)smp_processor_id();
-	int err;
 
-	/* Initialize migration for the boot CPU */
-	err = migration_call(&migration_notifier, CPU_UP_PREPARE, cpu);
-	BUG_ON(err == NOTIFY_BAD);
+	sched_rq_cpu_starting(smp_processor_id());
 	migration_call(&migration_notifier, CPU_ONLINE, cpu);
 	register_cpu_notifier(&migration_notifier);
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:06   ` tip-bot for Thomas Gleixner
  2016-07-12  4:37   ` [patch 10/15] " Anton Blanchard
  2 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:06 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, peterz, linux-kernel, hpa, tglx

Commit-ID:  e9cd8fa4fcfd67c95db9b87c0fff88fa23cb00e5
Gitweb:     http://git.kernel.org/tip/e9cd8fa4fcfd67c95db9b87c0fff88fa23cb00e5
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:16 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:25 +0200

sched/migration: Move calc_load_migrate() into CPU_DYING

It really does not matter when we fold the load for the outgoing cpu. It's
almost dead anyway, so there is no harm if we fail to fold the few
microseconds which are required for going fully away.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.328739226@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bafc308..688e8a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5447,9 +5447,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 		migrate_tasks(rq);
 		BUG_ON(rq->nr_running != 1); /* the migration thread */
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
-	case CPU_DEAD:
 		calc_load_migrate(rq);
 		break;
 #endif

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/migration: Move CPU_ONLINE into scheduler state
  2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:07 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, mingo, tglx, hpa, linux-kernel

Commit-ID:  7d97669933eb94245ec9b715753753ec5ca8f646
Gitweb:     http://git.kernel.org/tip/7d97669933eb94245ec9b715753753ec5ca8f646
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:17 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:25 +0200

sched/migration: Move CPU_ONLINE into scheduler state

The alleged requirement that the migration notifier has a lower priority than
perf is completely undocumented and there is no indication at all that this is
true. perf does not even handle the CPU_ONLINE notification and perf really
has nothing to do with migration.

Move the CPU_ONLINE code into the sched_activate_cpu() state callback.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.421743581@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 688e8a8..8d8d903 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5424,17 +5424,6 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 
-	case CPU_ONLINE:
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-
-			set_rq_online(rq);
-		}
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		break;
-
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_DYING:
 		sched_ttwu_pending();
@@ -7090,12 +7079,34 @@ static int cpuset_cpu_inactive(unsigned int cpu)
 
 int sched_cpu_activate(unsigned int cpu)
 {
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
 	set_cpu_active(cpu, true);
 
 	if (sched_smp_initialized) {
 		sched_domains_numa_masks_set(cpu);
 		cpuset_cpu_active();
 	}
+
+	/*
+	 * Put the rq online, if not already. This happens:
+	 *
+	 * 1) In the early boot process, because we build the real domains
+	 *    after all cpus have been brought up.
+	 *
+	 * 2) At runtime, if cpuset_cpu_active() fails to rebuild the
+	 *    domains.
+	 */
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_online(rq);
+	}
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+
+	update_max_interval();
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
  2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:07 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, mingo, linux-kernel, hpa, tglx

Commit-ID:  f2785ddb5367e217365099294b89d6a84668069e
Gitweb:     http://git.kernel.org/tip/f2785ddb5367e217365099294b89d6a84668069e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:18 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:25 +0200

sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()

Remove the hotplug notifier and make it an explicit state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.502222097@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpu.h   |  2 --
 include/linux/sched.h |  6 +++++
 kernel/cpu.c          |  2 +-
 kernel/sched/core.c   | 72 ++++++++++++++++-----------------------------------
 4 files changed, 29 insertions(+), 53 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index b22b000..21597dc 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,9 +59,7 @@ struct notifier_block;
  * CPU notifier priorities.
  */
 enum {
-	/* migration should happen before other stuff but after perf */
 	CPU_PRI_PERF		= 20,
-	CPU_PRI_MIGRATION	= 10,
 
 	/* bring up workqueues before normal notifiers and down after */
 	CPU_PRI_WORKQUEUE_UP	= 5,
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1e5f961..47835cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -376,6 +376,12 @@ extern int sched_cpu_starting(unsigned int cpu);
 extern int sched_cpu_activate(unsigned int cpu);
 extern int sched_cpu_deactivate(unsigned int cpu);
 
+#ifdef CONFIG_HOTPLUG_CPU
+extern int sched_cpu_dying(unsigned int cpu);
+#else
+# define sched_cpu_dying	NULL
+#endif
+
 extern void sched_show_task(struct task_struct *p);
 
 #ifdef CONFIG_LOCKUP_DETECTOR
diff --git a/kernel/cpu.c b/kernel/cpu.c
index c134a35..d6eeb8c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1223,7 +1223,7 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	[CPUHP_AP_SCHED_STARTING] = {
 		.name			= "sched:starting",
 		.startup		= sched_cpu_starting,
-		.teardown		= NULL,
+		.teardown		= sched_cpu_dying,
 	},
 	/*
 	 * Low level startup/teardown notifiers. Run with interrupts
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d8d903..a9a65ed 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5411,51 +5411,6 @@ static void set_rq_offline(struct rq *rq)
 	}
 }
 
-/*
- * migration_call - callback that gets triggered when a CPU is added.
- * Here we can start up the necessary migration thread for the new CPU.
- */
-static int
-migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (long)hcpu;
-	unsigned long flags;
-	struct rq *rq = cpu_rq(cpu);
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-
-#ifdef CONFIG_HOTPLUG_CPU
-	case CPU_DYING:
-		sched_ttwu_pending();
-		/* Update our root-domain */
-		raw_spin_lock_irqsave(&rq->lock, flags);
-		if (rq->rd) {
-			BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
-			set_rq_offline(rq);
-		}
-		migrate_tasks(rq);
-		BUG_ON(rq->nr_running != 1); /* the migration thread */
-		raw_spin_unlock_irqrestore(&rq->lock, flags);
-		calc_load_migrate(rq);
-		break;
-#endif
-	}
-
-	update_max_interval();
-
-	return NOTIFY_OK;
-}
-
-/*
- * Register at high priority so that task migration (migrate_all_tasks)
- * happens before everything else.  This has to be lower priority than
- * the notifier in the perf_event subsystem, though.
- */
-static struct notifier_block migration_notifier = {
-	.notifier_call = migration_call,
-	.priority = CPU_PRI_MIGRATION,
-};
-
 static void set_cpu_rq_start_time(unsigned int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7158,6 +7113,28 @@ int sched_cpu_starting(unsigned int cpu)
 	return 0;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+int sched_cpu_dying(unsigned int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long flags;
+
+	/* Handle pending wakeups and then migrate everything off */
+	sched_ttwu_pending();
+	raw_spin_lock_irqsave(&rq->lock, flags);
+	if (rq->rd) {
+		BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+		set_rq_offline(rq);
+	}
+	migrate_tasks(rq);
+	BUG_ON(rq->nr_running != 1);
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+	calc_load_migrate(rq);
+	update_max_interval();
+	return 0;
+}
+#endif
+
 void __init sched_init_smp(void)
 {
 	cpumask_var_t non_isolated_cpus;
@@ -7194,12 +7171,7 @@ void __init sched_init_smp(void)
 
 static int __init migration_init(void)
 {
-	void *cpu = (void *)(long)smp_processor_id();
-
 	sched_rq_cpu_starting(smp_processor_id());
-	migration_call(&migration_notifier, CPU_ONLINE, cpu);
-	register_cpu_notifier(&migration_notifier);
-
 	return 0;
 }
 early_initcall(migration_init);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/hotplug: Make activate() the last hotplug step
  2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
  2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:07   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:07 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, linux-kernel, tglx, peterz, mingo

Commit-ID:  aaddd7d1c740ab3c5efaad7a34650b6dc680c21c
Gitweb:     http://git.kernel.org/tip/aaddd7d1c740ab3c5efaad7a34650b6dc680c21c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:19 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:25 +0200

sched/hotplug: Make activate() the last hotplug step

The scheduler can handle per cpu threads before the cpu is set to active and
it does not allow user space threads on the cpu before active is
set. Attaching to the scheduling domains is also not required before user
space threads can be handled.

Move the activation to the end of the hotplug state space. That also means
that deactivation is the first action when a cpu is shut down.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.597477199@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/cpuhotplug.h |  2 +-
 kernel/cpu.c               | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9e07468..386374d 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -13,11 +13,11 @@ enum cpuhp_state {
 	CPUHP_AP_ONLINE,
 	CPUHP_TEARDOWN_CPU,
 	CPUHP_AP_ONLINE_IDLE,
-	CPUHP_AP_ACTIVE,
 	CPUHP_AP_SMPBOOT_THREADS,
 	CPUHP_AP_NOTIFY_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
+	CPUHP_AP_ACTIVE,
 	CPUHP_ONLINE,
 };
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d6eeb8c..d948e44 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1242,12 +1242,6 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	[CPUHP_AP_ONLINE] = {
 		.name			= "ap:online",
 	},
-	/* First state is scheduler control. Interrupts are enabled */
-	[CPUHP_AP_ACTIVE] = {
-		.name			= "sched:active",
-		.startup		= sched_cpu_activate,
-		.teardown		= sched_cpu_deactivate,
-	},
 	/* Handle smpboot threads park/unpark */
 	[CPUHP_AP_SMPBOOT_THREADS] = {
 		.name			= "smpboot:threads",
@@ -1269,6 +1263,15 @@ static struct cpuhp_step cpuhp_ap_states[] = {
 	 * The dynamically registered state space is here
 	 */
 
+#ifdef CONFIG_SMP
+	/* Last state is scheduler control setting the cpu active */
+	[CPUHP_AP_ACTIVE] = {
+		.name			= "sched:active",
+		.startup		= sched_cpu_activate,
+		.teardown		= sched_cpu_deactivate,
+	},
+#endif
+
 	/* CPU is fully up and running. */
 	[CPUHP_ONLINE] = {
 		.name			= "online",

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched/fair: Make ilb_notifier an explicit call
  2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
  2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, peterz, mingo, hpa, tglx

Commit-ID:  20a5c8cc74ade5027c2b0e2bc724278afd6054f3
Gitweb:     http://git.kernel.org/tip/20a5c8cc74ade5027c2b0e2bc724278afd6054f3
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:20 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:26 +0200

sched/fair: Make ilb_notifier an explicit call

No need for an extra notifier.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.693720241@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c  |  1 +
 kernel/sched/fair.c  | 15 +--------------
 kernel/sched/sched.h |  4 ++++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a9a65ed..28ffd68 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7131,6 +7131,7 @@ int sched_cpu_dying(unsigned int cpu)
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 	calc_load_migrate(rq);
 	update_max_interval();
+	nohz_balance_exit_idle(cpu);
 	return 0;
 }
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e6..8b6db36 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7711,7 +7711,7 @@ static void nohz_balancer_kick(void)
 	return;
 }
 
-static inline void nohz_balance_exit_idle(int cpu)
+void nohz_balance_exit_idle(unsigned int cpu)
 {
 	if (unlikely(test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))) {
 		/*
@@ -7784,18 +7784,6 @@ void nohz_balance_enter_idle(int cpu)
 	atomic_inc(&nohz.nr_cpus);
 	set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
 }
-
-static int sched_ilb_notifier(struct notifier_block *nfb,
-					unsigned long action, void *hcpu)
-{
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_DYING:
-		nohz_balance_exit_idle(smp_processor_id());
-		return NOTIFY_OK;
-	default:
-		return NOTIFY_DONE;
-	}
-}
 #endif
 
 static DEFINE_SPINLOCK(balancing);
@@ -8600,7 +8588,6 @@ __init void init_sched_fair_class(void)
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
-	cpu_notifier(sched_ilb_notifier, 0);
 #endif
 #endif /* SMP */
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec2e8d2..16a27b6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1743,6 +1743,10 @@ enum rq_nohz_flag_bits {
 };
 
 #define nohz_flags(cpu)	(&cpu_rq(cpu)->nohz_flags)
+
+extern void nohz_balance_exit_idle(unsigned int cpu);
+#else
+static inline void nohz_balance_exit_idle(unsigned int cpu) { }
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [tip:smp/hotplug] sched: Make hrtick_notifier an explicit call
  2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier an explicit call Thomas Gleixner
  2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
@ 2016-05-06 13:08   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-05-06 13:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, linux-kernel, hpa, peterz, tglx

Commit-ID:  e5ef27d0f5acf9f1db2882d7546a41c021f66820
Gitweb:     http://git.kernel.org/tip/e5ef27d0f5acf9f1db2882d7546a41c021f66820
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Thu, 10 Mar 2016 12:54:21 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 6 May 2016 14:58:26 +0200

sched: Make hrtick_notifier an explicit call

No need for an extra notifier. We don't need to handle all these states. It's
sufficient to kill the timer when the cpu dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160310120025.770528462@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/core.c | 34 +---------------------------------
 1 file changed, 1 insertion(+), 33 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 28ffd68..9c710ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -249,29 +249,6 @@ void hrtick_start(struct rq *rq, u64 delay)
 	}
 }
 
-static int
-hotplug_hrtick(struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-	int cpu = (int)(long)hcpu;
-
-	switch (action) {
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DOWN_PREPARE:
-	case CPU_DOWN_PREPARE_FROZEN:
-	case CPU_DEAD:
-	case CPU_DEAD_FROZEN:
-		hrtick_clear(cpu_rq(cpu));
-		return NOTIFY_OK;
-	}
-
-	return NOTIFY_DONE;
-}
-
-static __init void init_hrtick(void)
-{
-	hotcpu_notifier(hotplug_hrtick, 0);
-}
 #else
 /*
  * Called to set the hrtick timer state.
@@ -288,10 +265,6 @@ void hrtick_start(struct rq *rq, u64 delay)
 	hrtimer_start(&rq->hrtick_timer, ns_to_ktime(delay),
 		      HRTIMER_MODE_REL_PINNED);
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif /* CONFIG_SMP */
 
 static void init_rq_hrtick(struct rq *rq)
@@ -315,10 +288,6 @@ static inline void hrtick_clear(struct rq *rq)
 static inline void init_rq_hrtick(struct rq *rq)
 {
 }
-
-static inline void init_hrtick(void)
-{
-}
 #endif	/* CONFIG_SCHED_HRTICK */
 
 /*
@@ -7132,6 +7101,7 @@ int sched_cpu_dying(unsigned int cpu)
 	calc_load_migrate(rq);
 	update_max_interval();
 	nohz_balance_exit_idle(cpu);
+	hrtick_clear(rq);
 	return 0;
 }
 #endif
@@ -7157,8 +7127,6 @@ void __init sched_init_smp(void)
 		cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
 	mutex_unlock(&sched_domains_mutex);
 
-	init_hrtick();
-
 	/* Move init over to a non-isolated CPU */
 	if (set_cpus_allowed_ptr(current, non_isolated_cpus) < 0)
 		BUG();

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
  2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
  2016-05-06 13:06   ` tip-bot for Thomas Gleixner
@ 2016-07-12  4:37   ` Anton Blanchard
  2016-07-12 16:33     ` Thomas Gleixner
  2 siblings, 1 reply; 38+ messages in thread
From: Anton Blanchard @ 2016-07-12  4:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, rt, Michael Ellerman,
	Vaidyanathan Srinivasan, shreyas

Hi Thomas,

> It really does not matter when we fold the load for the outgoing cpu.
> It's almost dead anyway, so there is no harm if we fail to fold the
> few microseconds which are required for going fully away.

We are seeing the load average shoot up when hot unplugging CPUs (+1
for every CPU we offline) on ppc64. This reproduces on bare metal as
well as inside a KVM guest. A bisect points at this commit.

As an example, a completely idle box with 128 CPUS and 112 hot
unplugged:

# uptime
 04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Anton

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-07-12  4:37   ` [patch 10/15] " Anton Blanchard
@ 2016-07-12 16:33     ` Thomas Gleixner
  2016-07-12 18:49       ` Vaidyanathan Srinivasan
                         ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Thomas Gleixner @ 2016-07-12 16:33 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: LKML, Peter Zijlstra, Ingo Molnar, rt, Michael Ellerman,
	Vaidyanathan Srinivasan, shreyas

Anton,

On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > It really does not matter when we fold the load for the outgoing cpu.
> > It's almost dead anyway, so there is no harm if we fail to fold the
> > few microseconds which are required for going fully away.
> 
> We are seeing the load average shoot up when hot unplugging CPUs (+1
> for every CPU we offline) on ppc64. This reproduces on bare metal as
> well as inside a KVM guest. A bisect points at this commit.
> 
> As an example, a completely idle box with 128 CPUS and 112 hot
> unplugged:
> 
> # uptime
>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Yes, it's an off by one as we now call that from the task which is tearing
down the cpu. Does the patch below fix it?

Thanks,

	tglx

8<----------------------

Subject: sched/migration: Correct off by one in load migration
From: Thomas Gleixner <tglx@linutronix.de>

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51d7105f529a..97ee9ac7e97c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5394,13 +5394,15 @@ void idle_task_exit(void)
 /*
  * Since this CPU is going 'away' for a while, fold any nr_active delta
  * we might have. Assumes we're called after migrate_tasks() so that the
- * nr_active count is stable.
+ * nr_active count is stable. We need to take the teardown thread which
+ * is calling this into account, so we hand in adjust = 1 to the load
+ * calculation.
  *
  * Also see the comment "Global load-average calculations".
  */
 static void calc_load_migrate(struct rq *rq)
 {
-	long delta = calc_load_fold_active(rq);
+	long delta = calc_load_fold_active(rq, 1);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 }
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index b0b93fd33af9..a2d6eb71f06b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
 	loads[2] = (avenrun[2] + offset) << shift;
 }
 
-long calc_load_fold_active(struct rq *this_rq)
+long calc_load_fold_active(struct rq *this_rq, long adjust)
 {
 	long nr_active, delta = 0;
 
-	nr_active = this_rq->nr_running;
+	nr_active = this_rq->nr_running - adjust;
 	nr_active += (long)this_rq->nr_uninterruptible;
 
 	if (nr_active != this_rq->calc_load_active) {
@@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
 	 * We're going into NOHZ mode, if there's any pending delta, fold it
 	 * into the pending idle delta.
 	 */
-	delta = calc_load_fold_active(this_rq);
+	delta = calc_load_fold_active(this_rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
@@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
-	delta  = calc_load_fold_active(this_rq);
+	delta  = calc_load_fold_active(this_rq, 0);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7cbeb92a1cb9..898c0d2f18fe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
 extern atomic_long_t calc_load_tasks;
 
 extern void calc_global_load_tick(struct rq *this_rq);
-extern long calc_load_fold_active(struct rq *this_rq);
+extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 
 #ifdef CONFIG_SMP
 extern void cpu_load_update_active(struct rq *this_rq);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-07-12 16:33     ` Thomas Gleixner
@ 2016-07-12 18:49       ` Vaidyanathan Srinivasan
  2016-07-12 20:05       ` Shilpasri G Bhat
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Vaidyanathan Srinivasan @ 2016-07-12 18:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anton Blanchard, LKML, Peter Zijlstra, Ingo Molnar, rt,
	Michael Ellerman, shreyas

* Thomas Gleixner <tglx@linutronix.de> [2016-07-12 18:33:56]:

> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > > It really does not matter when we fold the load for the outgoing cpu.
> > > It's almost dead anyway, so there is no harm if we fail to fold the
> > > few microseconds which are required for going fully away.
> > 
> > We are seeing the load average shoot up when hot unplugging CPUs (+1
> > for every CPU we offline) on ppc64. This reproduces on bare metal as
> > well as inside a KVM guest. A bisect points at this commit.
> > 
> > As an example, a completely idle box with 128 CPUS and 112 hot
> > unplugged:
> > 
> > # uptime
> >  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Hi Thomas,

Yes this patch fixes the issue.  I was able to recreate the problem
and also verify with this patch on 4.7.0-rc7.

> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------------
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
> Reported-by: Anton Blanchard <anton@samba.org>

Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> -	long delta = calc_load_fold_active(rq);
> +	long delta = calc_load_fold_active(rq, 1);
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
>  	loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>  	long nr_active, delta = 0;
> 
> -	nr_active = this_rq->nr_running;
> +	nr_active = this_rq->nr_running - adjust;
>  	nr_active += (long)this_rq->nr_uninterruptible;

	if (nr_active != this_rq->calc_load_active) {
		delta = nr_active - this_rq->calc_load_active;
		this_rq->calc_load_active = nr_active;
	}

	return delta;

Does the above calculation hold good even if we send adjust=1 and bump
down nr_active? Tested ok though :)

--Vaidy

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-07-12 16:33     ` Thomas Gleixner
  2016-07-12 18:49       ` Vaidyanathan Srinivasan
@ 2016-07-12 20:05       ` Shilpasri G Bhat
  2016-07-13  7:49       ` Peter Zijlstra
  2016-07-13 13:40       ` [tip:sched/urgent] sched/core: Correct off by one bug in load migration calculation tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 38+ messages in thread
From: Shilpasri G Bhat @ 2016-07-12 20:05 UTC (permalink / raw)
  To: Thomas Gleixner, Anton Blanchard
  Cc: LKML, Peter Zijlstra, Ingo Molnar, rt, Michael Ellerman,
	Vaidyanathan Srinivasan, shreyas

Hi,

On 07/12/2016 10:03 PM, Thomas Gleixner wrote:
> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
>>> It really does not matter when we fold the load for the outgoing cpu.
>>> It's almost dead anyway, so there is no harm if we fail to fold the
>>> few microseconds which are required for going fully away.
>>
>> We are seeing the load average shoot up when hot unplugging CPUs (+1
>> for every CPU we offline) on ppc64. This reproduces on bare metal as
>> well as inside a KVM guest. A bisect points at this commit.
>>
>> As an example, a completely idle box with 128 CPUS and 112 hot
>> unplugged:
>>
>> # uptime
>>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?

Tested your patch to see that on an idle box on offlinig the cpus I dont see
increase in loadaverage.

# uptime
01:27:44 up 10 min,  1 user,  load average: 0.00, 0.18, 0.18

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-127

# ppc64_cpu --cores-on=2

# lscpu | grep -Ei "on-line|off-line"
On-line CPU(s) list:   0-15
Off-line CPU(s) list:  16-127

# sleep 60
# uptime
 01:28:52 up 11 min,  1 user,  load average: 0.11, 0.19, 0.18

Thanks and Regards,
Shilpa

> 
> Thanks,
> 
> 	tglx
> 
> 8<----------------------
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> -	long delta = calc_load_fold_active(rq);
> +	long delta = calc_load_fold_active(rq, 1);
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
>  	loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>  	long nr_active, delta = 0;
> 
> -	nr_active = this_rq->nr_running;
> +	nr_active = this_rq->nr_running - adjust;
>  	nr_active += (long)this_rq->nr_uninterruptible;
> 
>  	if (nr_active != this_rq->calc_load_active) {
> @@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
>  	 * We're going into NOHZ mode, if there's any pending delta, fold it
>  	 * into the pending idle delta.
>  	 */
> -	delta = calc_load_fold_active(this_rq);
> +	delta = calc_load_fold_active(this_rq, 0);
>  	if (delta) {
>  		int idx = calc_load_write_idx();
> 
> @@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
>  	if (time_before(jiffies, this_rq->calc_load_update))
>  		return;
> 
> -	delta  = calc_load_fold_active(this_rq);
> +	delta  = calc_load_fold_active(this_rq, 0);
>  	if (delta)
>  		atomic_long_add(delta, &calc_load_tasks);
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 7cbeb92a1cb9..898c0d2f18fe 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
>  extern atomic_long_t calc_load_tasks;
> 
>  extern void calc_global_load_tick(struct rq *this_rq);
> -extern long calc_load_fold_active(struct rq *this_rq);
> +extern long calc_load_fold_active(struct rq *this_rq, long adjust);
> 
>  #ifdef CONFIG_SMP
>  extern void cpu_load_update_active(struct rq *this_rq);
> 
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING
  2016-07-12 16:33     ` Thomas Gleixner
  2016-07-12 18:49       ` Vaidyanathan Srinivasan
  2016-07-12 20:05       ` Shilpasri G Bhat
@ 2016-07-13  7:49       ` Peter Zijlstra
  2016-07-13 13:40       ` [tip:sched/urgent] sched/core: Correct off by one bug in load migration calculation tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2016-07-13  7:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anton Blanchard, LKML, Ingo Molnar, rt, Michael Ellerman,
	Vaidyanathan Srinivasan, shreyas

On Tue, Jul 12, 2016 at 06:33:56PM +0200, Thomas Gleixner wrote:

> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
>  	loads[2] = (avenrun[2] + offset) << shift;
>  }
>  
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>  	long nr_active, delta = 0;
>  
> -	nr_active = this_rq->nr_running;
> +	nr_active = this_rq->nr_running - adjust;
>  	nr_active += (long)this_rq->nr_uninterruptible;
>  
>  	if (nr_active != this_rq->calc_load_active) {

Yeah, I think this is the only sensible approach.

How do you want to route this?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip:sched/urgent] sched/core: Correct off by one bug in load migration calculation
  2016-07-12 16:33     ` Thomas Gleixner
                         ` (2 preceding siblings ...)
  2016-07-13  7:49       ` Peter Zijlstra
@ 2016-07-13 13:40       ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-07-13 13:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, mpe, hpa, peterz, torvalds, svaidy, tglx, anton

Commit-ID:  d60585c5766e9620d5d83e2b25dc042c7bdada2c
Gitweb:     http://git.kernel.org/tip/d60585c5766e9620d5d83e2b25dc042c7bdada2c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 12 Jul 2016 18:33:56 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 13 Jul 2016 14:58:20 +0200

sched/core: Correct off by one bug in load migration calculation

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: rt@linutronix.de
Cc: shreyas@linux.vnet.ibm.com
Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING")
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c    | 6 ++++--
 kernel/sched/loadavg.c | 8 ++++----
 kernel/sched/sched.h   | 2 +-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51d7105..97ee9ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5394,13 +5394,15 @@ void idle_task_exit(void)
 /*
  * Since this CPU is going 'away' for a while, fold any nr_active delta
  * we might have. Assumes we're called after migrate_tasks() so that the
- * nr_active count is stable.
+ * nr_active count is stable. We need to take the teardown thread which
+ * is calling this into account, so we hand in adjust = 1 to the load
+ * calculation.
  *
  * Also see the comment "Global load-average calculations".
  */
 static void calc_load_migrate(struct rq *rq)
 {
-	long delta = calc_load_fold_active(rq);
+	long delta = calc_load_fold_active(rq, 1);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 }
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index b0b93fd..a2d6eb7 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
 	loads[2] = (avenrun[2] + offset) << shift;
 }
 
-long calc_load_fold_active(struct rq *this_rq)
+long calc_load_fold_active(struct rq *this_rq, long adjust)
 {
 	long nr_active, delta = 0;
 
-	nr_active = this_rq->nr_running;
+	nr_active = this_rq->nr_running - adjust;
 	nr_active += (long)this_rq->nr_uninterruptible;
 
 	if (nr_active != this_rq->calc_load_active) {
@@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
 	 * We're going into NOHZ mode, if there's any pending delta, fold it
 	 * into the pending idle delta.
 	 */
-	delta = calc_load_fold_active(this_rq);
+	delta = calc_load_fold_active(this_rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
@@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
-	delta  = calc_load_fold_active(this_rq);
+	delta  = calc_load_fold_active(this_rq, 0);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7cbeb92..898c0d2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
 extern atomic_long_t calc_load_tasks;
 
 extern void calc_global_load_tick(struct rq *this_rq);
-extern long calc_load_fold_active(struct rq *this_rq);
+extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 
 #ifdef CONFIG_SMP
 extern void cpu_load_update_active(struct rq *this_rq);

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-07-13 13:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-10 12:04 [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Thomas Gleixner
2016-03-10 12:04 ` [patch 01/15] cpu/hotplug: Document states better Thomas Gleixner
2016-03-10 12:04 ` [patch 03/15] sched: Make set_cpu_rq_start_time() a built in hotplug state Thomas Gleixner
2016-03-10 12:04 ` [patch 05/15] sched: Consolidate the notifier maze Thomas Gleixner
2016-03-10 12:04 ` [patch 04/15] sched: Allow hotplug notifiers to be setup early Thomas Gleixner
2016-03-10 12:04 ` [patch 06/15] sched: Move sched_domains_numa_masks_clear() to DOWN_PREPARE Thomas Gleixner
2016-03-10 12:04 ` [patch 07/15] sched/hotplug: Convert cpu_[in]active notifiers to state machine Thomas Gleixner
2016-03-10 12:04 ` [patch 08/15] sched, hotplug: Move sync_rcu to be with set_cpu_active(false) Thomas Gleixner
2016-05-05 11:24   ` [tip:smp/hotplug] sched/hotplug: " tip-bot for Peter Zijlstra
2016-05-06 13:06   ` tip-bot for Peter Zijlstra
2016-03-10 12:04 ` [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING Thomas Gleixner
2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:06   ` tip-bot for Thomas Gleixner
2016-07-12  4:37   ` [patch 10/15] " Anton Blanchard
2016-07-12 16:33     ` Thomas Gleixner
2016-07-12 18:49       ` Vaidyanathan Srinivasan
2016-07-12 20:05       ` Shilpasri G Bhat
2016-07-13  7:49       ` Peter Zijlstra
2016-07-13 13:40       ` [tip:sched/urgent] sched/core: Correct off by one bug in load migration calculation tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 09/15] sched/migration: Move prepare transition to SCHED_STARTING state Thomas Gleixner
2016-05-05 11:24   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:06   ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 11/15] sched/migration: Move CPU_ONLINE into scheduler state Thomas Gleixner
2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07   ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 12/15] sched/hotplug: Move migration CPU_DYING to sched_cpu_dying() Thomas Gleixner
2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07   ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 14/15] sched/fair: Make ilb_notifier an explicit call Thomas Gleixner
2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:08   ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 13/15] sched/hotplug: Make activate() the last hotplug step Thomas Gleixner
2016-05-05 11:25   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:07   ` tip-bot for Thomas Gleixner
2016-03-10 12:04 ` [patch 15/15] sched: Make hrtick_notifier an explicit call Thomas Gleixner
2016-05-05 11:26   ` [tip:smp/hotplug] " tip-bot for Thomas Gleixner
2016-05-06 13:08   ` tip-bot for Thomas Gleixner
2016-04-04  7:54 ` [patch 00/15] cpu/hotplug: Convert scheduler to hotplug state machine Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).