Linux-rt-users archive on lore.kernel.org
 help / color / Atom feed
* [PATCH RT v2 0/3] migrate disable fixes and performance
@ 2019-10-12  6:52 Scott Wood
  2019-10-12  6:52 ` [PATCH RT v2 1/3] sched: migrate_enable: Use select_fallback_rq() Scott Wood
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Scott Wood @ 2019-10-12  6:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Waiman Long, linux-rt-users, linux-kernel

These are the unapplied patches from v1, minus the sched deadline
patch, and with stop_one_cpu_nowait() in place of clobbering
current->state.

Scott Wood (3):
  sched: migrate_enable: Use select_fallback_rq()
  sched: Lazy migrate_disable processing
  sched: migrate_enable: Use stop_one_cpu_nowait()

 include/linux/cpu.h          |   4 -
 include/linux/sched.h        |  11 +--
 include/linux/stop_machine.h |   2 +
 init/init_task.c             |   4 +
 kernel/cpu.c                 | 103 +++++++++--------------
 kernel/sched/core.c          | 195 ++++++++++++++++++-------------------------
 kernel/sched/sched.h         |   4 +
 kernel/stop_machine.c        |   7 +-
 lib/smp_processor_id.c       |   3 +
 9 files changed, 142 insertions(+), 191 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RT v2 1/3] sched: migrate_enable: Use select_fallback_rq()
  2019-10-12  6:52 [PATCH RT v2 0/3] migrate disable fixes and performance Scott Wood
@ 2019-10-12  6:52 ` Scott Wood
  2019-10-12  6:52 ` [PATCH RT v2 2/3] sched: Lazy migrate_disable processing Scott Wood
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Scott Wood @ 2019-10-12  6:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Waiman Long, linux-rt-users, linux-kernel, Scott Wood

migrate_enable() currently open-codes a variant of select_fallback_rq().
However, it does not have the "No more Mr. Nice Guy" fallback and thus
it will pass an invalid CPU to the migration thread if cpus_mask only
contains a CPU that is !active.

Signed-off-by: Scott Wood <swood@redhat.com>
---
This scenario will be more likely after the next patch, since
the migrate_disable_update check goes away.  However, it could happen
anyway if cpus_mask was updated to a CPU other than the one we were
pinned to, and that CPU subsequently became inactive.
---
 kernel/sched/core.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 93b4ae1ecaff..bfd160e90797 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7405,6 +7405,7 @@ void migrate_enable(void)
 	if (p->migrate_disable_update) {
 		struct rq *rq;
 		struct rq_flags rf;
+		int cpu = task_cpu(p);
 
 		rq = task_rq_lock(p, &rf);
 		update_rq_clock(rq);
@@ -7414,21 +7415,15 @@ void migrate_enable(void)
 
 		p->migrate_disable_update = 0;
 
-		WARN_ON(smp_processor_id() != task_cpu(p));
-		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
-			const struct cpumask *cpu_valid_mask = cpu_active_mask;
-			struct migration_arg arg;
-			unsigned int dest_cpu;
-
-			if (p->flags & PF_KTHREAD) {
-				/*
-				 * Kernel threads are allowed on online && !active CPUs
-				 */
-				cpu_valid_mask = cpu_online_mask;
-			}
-			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
-			arg.task = p;
-			arg.dest_cpu = dest_cpu;
+		WARN_ON(smp_processor_id() != cpu);
+		if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
+			struct migration_arg arg = { p };
+			struct rq_flags rf;
+
+			rq = task_rq_lock(p, &rf);
+			update_rq_clock(rq);
+			arg.dest_cpu = select_fallback_rq(cpu, p);
+			task_rq_unlock(rq, p, &rf);
 
 			unpin_current_cpu();
 			preempt_lazy_enable();
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RT v2 2/3] sched: Lazy migrate_disable processing
  2019-10-12  6:52 [PATCH RT v2 0/3] migrate disable fixes and performance Scott Wood
  2019-10-12  6:52 ` [PATCH RT v2 1/3] sched: migrate_enable: Use select_fallback_rq() Scott Wood
@ 2019-10-12  6:52 ` Scott Wood
  2019-10-18 18:12   ` Waiman Long
  2019-10-12  6:52 ` [PATCH RT v2 3/3] sched: migrate_enable: Use stop_one_cpu_nowait() Scott Wood
  2019-10-18 11:00 ` [PATCH RT v2 0/3] migrate disable fixes and performance Sebastian Andrzej Siewior
  3 siblings, 1 reply; 7+ messages in thread
From: Scott Wood @ 2019-10-12  6:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Waiman Long, linux-rt-users, linux-kernel, Scott Wood

Avoid overhead on the majority of migrate disable/enable sequences by
only manipulating scheduler data (and grabbing the relevant locks) when
the task actually schedules while migrate-disabled.  A kernel build
showed around a 10% reduction in system time (with CONFIG_NR_CPUS=512).

Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU
count of the number of pinned tasks (including tasks which have not
scheduled in the migrate-disabled section); takedown_cpu() will
wait until that reaches zero (confirmed by take_cpu_down() in stop
machine context to deal with races) before migrating tasks off of the
cpu.

To simplify synchronization, updating cpus_mask is no longer deferred
until migrate_enable().  This lets us not have to worry about
migrate_enable() missing the update if it's on the fast path (didn't
schedule during the migrate disabled section).  It also makes the code
a bit simpler and reduces deviation from mainline.

While the main motivation for this is the performance benefit, lazy
migrate disable also eliminates the restriction on calling
migrate_disable() while atomic but leaving the atomic region prior to
calling migrate_enable() -- though this won't help with local_bh_disable()
(and thus rcutorture) unless something similar is done with the recently
added local_lock.

Signed-off-by: Scott Wood <swood@redhat.com>
---
The speedup is smaller than before, due to commit 659252061477862f
("lib/smp_processor_id: Don't use cpumask_equal()") achieving
an overlapping speedup.
---
 include/linux/cpu.h    |   4 --
 include/linux/sched.h  |  11 +--
 init/init_task.c       |   4 ++
 kernel/cpu.c           | 103 +++++++++++-----------------
 kernel/sched/core.c    | 180 ++++++++++++++++++++-----------------------------
 kernel/sched/sched.h   |   4 ++
 lib/smp_processor_id.c |   3 +
 7 files changed, 128 insertions(+), 181 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f4a772c12d14..2df500fdcbc4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -113,8 +113,6 @@ static inline void cpu_maps_update_done(void)
 extern void cpu_hotplug_enable(void);
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
-extern void pin_current_cpu(void);
-extern void unpin_current_cpu(void);
 
 #else /* CONFIG_HOTPLUG_CPU */
 
@@ -126,8 +124,6 @@ static inline void cpus_read_unlock(void) { }
 static inline void lockdep_assert_cpus_held(void) { }
 static inline void cpu_hotplug_disable(void) { }
 static inline void cpu_hotplug_enable(void) { }
-static inline void pin_current_cpu(void) { }
-static inline void unpin_current_cpu(void) { }
 
 #endif	/* !CONFIG_HOTPLUG_CPU */
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7e892e727f12..c6872b38bf77 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -229,6 +229,8 @@
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
+int cpu_nr_pinned(int cpu);
+
 /**
  * struct prev_cputime - snapshot of system and user cputime
  * @utime: time spent in user mode
@@ -661,16 +663,13 @@ struct task_struct {
 	cpumask_t			cpus_mask;
 #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
 	int				migrate_disable;
-	int				migrate_disable_update;
-	int				pinned_on_cpu;
+	bool				migrate_disable_scheduled;
 # ifdef CONFIG_SCHED_DEBUG
-	int				migrate_disable_atomic;
+	int				pinned_on_cpu;
 # endif
-
 #elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
 # ifdef CONFIG_SCHED_DEBUG
 	int				migrate_disable;
-	int				migrate_disable_atomic;
 # endif
 #endif
 #ifdef CONFIG_PREEMPT_RT_FULL
@@ -2074,4 +2073,6 @@ static inline void rseq_syscall(struct pt_regs *regs)
 
 #endif
 
+extern struct task_struct *takedown_cpu_task;
+
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index e402413dc47d..c0c7618fd2fb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -81,6 +81,10 @@ struct task_struct init_task
 	.cpus_ptr	= &init_task.cpus_mask,
 	.cpus_mask	= CPU_MASK_ALL,
 	.nr_cpus_allowed= NR_CPUS,
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) && \
+    defined(CONFIG_SCHED_DEBUG)
+	.pinned_on_cpu	= -1,
+#endif
 	.mm		= NULL,
 	.active_mm	= &init_mm,
 	.restart_block	= {
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 25afa2bb1a2c..e1bf3c698a32 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -76,11 +76,6 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = {
 	.fail = CPUHP_INVALID,
 };
 
-#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PREEMPT_RT_FULL)
-static DEFINE_PER_CPU(struct rt_rw_lock, cpuhp_pin_lock) = \
-	__RWLOCK_RT_INITIALIZER(cpuhp_pin_lock);
-#endif
-
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
 static struct lockdep_map cpuhp_state_up_map =
 	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
@@ -287,57 +282,6 @@ void cpu_maps_update_done(void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-/**
- * pin_current_cpu - Prevent the current cpu from being unplugged
- */
-void pin_current_cpu(void)
-{
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin;
-	unsigned int cpu;
-	int ret;
-
-again:
-	cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
-	ret = __read_rt_trylock(cpuhp_pin);
-	if (ret) {
-		current->pinned_on_cpu = smp_processor_id();
-		return;
-	}
-	cpu = smp_processor_id();
-	preempt_lazy_enable();
-	preempt_enable();
-
-	sleeping_lock_inc();
-	__read_rt_lock(cpuhp_pin);
-	sleeping_lock_dec();
-
-	preempt_disable();
-	preempt_lazy_disable();
-	if (cpu != smp_processor_id()) {
-		__read_rt_unlock(cpuhp_pin);
-		goto again;
-	}
-	current->pinned_on_cpu = cpu;
-#endif
-}
-
-/**
- * unpin_current_cpu - Allow unplug of current cpu
- */
-void unpin_current_cpu(void)
-{
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
-
-	if (WARN_ON(current->pinned_on_cpu != smp_processor_id()))
-		cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, current->pinned_on_cpu);
-
-	current->pinned_on_cpu = -1;
-	__read_rt_unlock(cpuhp_pin);
-#endif
-}
-
 DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
 
 void cpus_read_lock(void)
@@ -895,6 +839,15 @@ static int take_cpu_down(void *_param)
 	int err, cpu = smp_processor_id();
 	int ret;
 
+#ifdef CONFIG_PREEMPT_RT_BASE
+	/*
+	 * If any tasks disabled migration before we got here,
+	 * go back and sleep again.
+	 */
+	if (cpu_nr_pinned(cpu))
+		return -EAGAIN;
+#endif
+
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
@@ -924,11 +877,10 @@ static int take_cpu_down(void *_param)
 	return 0;
 }
 
+struct task_struct *takedown_cpu_task;
+
 static int takedown_cpu(unsigned int cpu)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, cpu);
-#endif
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	int err;
 
@@ -941,17 +893,38 @@ static int takedown_cpu(unsigned int cpu)
 	 */
 	irq_lock_sparse();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	__write_rt_lock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+	WARN_ON_ONCE(takedown_cpu_task);
+	takedown_cpu_task = current;
+
+again:
+	/*
+	 * If a task pins this CPU after we pass this check, take_cpu_down
+	 * will return -EAGAIN.
+	 */
+	for (;;) {
+		int nr_pinned;
+
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		nr_pinned = cpu_nr_pinned(cpu);
+		if (nr_pinned == 0)
+			break;
+		schedule();
+	}
+	set_current_state(TASK_RUNNING);
 #endif
 
 	/*
 	 * So now all preempt/rcu users must observe !cpu_active().
 	 */
 	err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
+#ifdef CONFIG_PREEMPT_RT_BASE
+	if (err == -EAGAIN)
+		goto again;
+#endif
 	if (err) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-		__write_rt_unlock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+		takedown_cpu_task = NULL;
 #endif
 		/* CPU refused to die */
 		irq_unlock_sparse();
@@ -971,8 +944,8 @@ static int takedown_cpu(unsigned int cpu)
 	wait_for_ap_thread(st, false);
 	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	__write_rt_unlock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+	takedown_cpu_task = NULL;
 #endif
 	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
 	irq_unlock_sparse();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bfd160e90797..5cb2a519b8bf 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1126,7 +1126,8 @@ static int migration_cpu_stop(void *data)
 void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
 {
 	cpumask_copy(&p->cpus_mask, new_mask);
-	p->nr_cpus_allowed = cpumask_weight(new_mask);
+	if (p->cpus_ptr == &p->cpus_mask)
+		p->nr_cpus_allowed = cpumask_weight(new_mask);
 }
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
@@ -1137,8 +1138,7 @@ int __migrate_disabled(struct task_struct *p)
 EXPORT_SYMBOL_GPL(__migrate_disabled);
 #endif
 
-static void __do_set_cpus_allowed_tail(struct task_struct *p,
-				       const struct cpumask *new_mask)
+void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
 	struct rq *rq = task_rq(p);
 	bool queued, running;
@@ -1167,20 +1167,6 @@ static void __do_set_cpus_allowed_tail(struct task_struct *p,
 		set_curr_task(rq, p);
 }
 
-void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
-{
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
-	if (__migrate_disabled(p)) {
-		lockdep_assert_held(&p->pi_lock);
-
-		cpumask_copy(&p->cpus_mask, new_mask);
-		p->migrate_disable_update = 1;
-		return;
-	}
-#endif
-	__do_set_cpus_allowed_tail(p, new_mask);
-}
-
 /*
  * Change a given task's CPU affinity. Migrate the thread to a
  * proper CPU and schedule it away if the CPU it's executing on
@@ -1239,7 +1225,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	}
 
 	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), new_mask) || __migrate_disabled(p))
+	if (cpumask_test_cpu(task_cpu(p), new_mask) ||
+	    p->cpus_ptr != &p->cpus_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
@@ -3502,6 +3489,8 @@ static inline void schedule_debug(struct task_struct *prev)
 	BUG();
 }
 
+static void migrate_disabled_sched(struct task_struct *p);
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -3572,6 +3561,9 @@ static void __sched notrace __schedule(bool preempt)
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
 
+	if (__migrate_disabled(prev))
+		migrate_disabled_sched(prev);
+
 	/* Promote REQ to ACT */
 	rq->clock_update_flags <<= 1;
 	update_rq_clock(rq);
@@ -5822,6 +5814,8 @@ static void migrate_tasks(struct rq *dead_rq, struct rq_flags *rf)
 		BUG_ON(!next);
 		put_prev_task(rq, next);
 
+		WARN_ON_ONCE(__migrate_disabled(next));
+
 		/*
 		 * Rules for changing task_struct::cpus_mask are holding
 		 * both pi_lock and rq->lock, such that holding either
@@ -7317,14 +7311,9 @@ void dump_cpu_task(int cpu)
 static inline void
 migrate_disable_update_cpus_allowed(struct task_struct *p)
 {
-	struct rq *rq;
-	struct rq_flags rf;
-
-	rq = task_rq_lock(p, &rf);
 	p->cpus_ptr = cpumask_of(smp_processor_id());
 	update_nr_migratory(p, -1);
 	p->nr_cpus_allowed = 1;
-	task_rq_unlock(rq, p, &rf);
 }
 
 static inline void
@@ -7342,54 +7331,35 @@ void dump_cpu_task(int cpu)
 
 void migrate_disable(void)
 {
-	struct task_struct *p = current;
+	preempt_disable();
 
-	if (in_atomic() || irqs_disabled()) {
+	if (++current->migrate_disable == 1) {
+		this_rq()->nr_pinned++;
+		preempt_lazy_disable();
 #ifdef CONFIG_SCHED_DEBUG
-		p->migrate_disable_atomic++;
+		WARN_ON_ONCE(current->pinned_on_cpu >= 0);
+		current->pinned_on_cpu = smp_processor_id();
 #endif
-		return;
-	}
-#ifdef CONFIG_SCHED_DEBUG
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
 	}
-#endif
 
-	if (p->migrate_disable) {
-		p->migrate_disable++;
-		return;
-	}
+	preempt_enable();
+}
+EXPORT_SYMBOL(migrate_disable);
 
-	preempt_disable();
-	preempt_lazy_disable();
-	pin_current_cpu();
+static void migrate_disabled_sched(struct task_struct *p)
+{
+	if (p->migrate_disable_scheduled)
+		return;
 
 	migrate_disable_update_cpus_allowed(p);
-	p->migrate_disable = 1;
-
-	preempt_enable();
+	p->migrate_disable_scheduled = 1;
 }
-EXPORT_SYMBOL(migrate_disable);
 
 void migrate_enable(void)
 {
 	struct task_struct *p = current;
-
-	if (in_atomic() || irqs_disabled()) {
-#ifdef CONFIG_SCHED_DEBUG
-		p->migrate_disable_atomic--;
-#endif
-		return;
-	}
-
-#ifdef CONFIG_SCHED_DEBUG
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
-#endif
+	struct rq *rq = this_rq();
+	int cpu = task_cpu(p);
 
 	WARN_ON_ONCE(p->migrate_disable <= 0);
 	if (p->migrate_disable > 1) {
@@ -7399,65 +7369,67 @@ void migrate_enable(void)
 
 	preempt_disable();
 
+#ifdef CONFIG_SCHED_DEBUG
+	WARN_ON_ONCE(current->pinned_on_cpu != cpu);
+	current->pinned_on_cpu = -1;
+#endif
+
+	WARN_ON_ONCE(rq->nr_pinned < 1);
+
 	p->migrate_disable = 0;
+	rq->nr_pinned--;
+	if (rq->nr_pinned == 0 && unlikely(!cpu_active(cpu)) &&
+	    takedown_cpu_task)
+		wake_up_process(takedown_cpu_task);
+
+	if (!p->migrate_disable_scheduled)
+		goto out;
+
+	p->migrate_disable_scheduled = 0;
+
 	migrate_enable_update_cpus_allowed(p);
 
-	if (p->migrate_disable_update) {
-		struct rq *rq;
+	WARN_ON(smp_processor_id() != cpu);
+	if (!is_cpu_allowed(p, cpu)) {
+		struct migration_arg arg = { p };
 		struct rq_flags rf;
-		int cpu = task_cpu(p);
 
 		rq = task_rq_lock(p, &rf);
 		update_rq_clock(rq);
-
-		__do_set_cpus_allowed_tail(p, &p->cpus_mask);
+		arg.dest_cpu = select_fallback_rq(cpu, p);
 		task_rq_unlock(rq, p, &rf);
 
-		p->migrate_disable_update = 0;
+		preempt_lazy_enable();
+		preempt_enable();
 
-		WARN_ON(smp_processor_id() != cpu);
-		if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
-			struct migration_arg arg = { p };
-			struct rq_flags rf;
-
-			rq = task_rq_lock(p, &rf);
-			update_rq_clock(rq);
-			arg.dest_cpu = select_fallback_rq(cpu, p);
-			task_rq_unlock(rq, p, &rf);
-
-			unpin_current_cpu();
-			preempt_lazy_enable();
-			preempt_enable();
-
-			sleeping_lock_inc();
-			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
-			sleeping_lock_dec();
-			return;
-		}
+		sleeping_lock_inc();
+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
+		sleeping_lock_dec();
+		return;
 	}
-	unpin_current_cpu();
+
+out:
 	preempt_lazy_enable();
 	preempt_enable();
 }
 EXPORT_SYMBOL(migrate_enable);
 
-#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
-void migrate_disable(void)
+int cpu_nr_pinned(int cpu)
 {
-#ifdef CONFIG_SCHED_DEBUG
-	struct task_struct *p = current;
+	struct rq *rq = cpu_rq(cpu);
 
-	if (in_atomic() || irqs_disabled()) {
-		p->migrate_disable_atomic++;
-		return;
-	}
+	return rq->nr_pinned;
+}
 
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
+#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
+static void migrate_disabled_sched(struct task_struct *p)
+{
+}
 
-	p->migrate_disable++;
+void migrate_disable(void)
+{
+#ifdef CONFIG_SCHED_DEBUG
+	current->migrate_disable++;
 #endif
 	barrier();
 }
@@ -7468,20 +7440,14 @@ void migrate_enable(void)
 #ifdef CONFIG_SCHED_DEBUG
 	struct task_struct *p = current;
 
-	if (in_atomic() || irqs_disabled()) {
-		p->migrate_disable_atomic--;
-		return;
-	}
-
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
-
 	WARN_ON_ONCE(p->migrate_disable <= 0);
 	p->migrate_disable--;
 #endif
 	barrier();
 }
 EXPORT_SYMBOL(migrate_enable);
+#else
+static void migrate_disabled_sched(struct task_struct *p)
+{
+}
 #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9c1bbd67e971..2e4db417049d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -952,6 +952,10 @@ struct rq {
 	/* Must be inspected within a rcu lock section */
 	struct cpuidle_state	*idle_state;
 #endif
+
+#if defined(CONFIG_PREEMPT_RT_BASE) && defined(CONFIG_SMP)
+	int			nr_pinned;
+#endif
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index bd9571653288..5f2618d346c4 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -23,6 +23,9 @@ unsigned int check_preemption_disabled(const char *what1, const char *what2)
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
+	if (current->migrate_disable)
+		goto out;
+
 	if (current->nr_cpus_allowed == 1)
 		goto out;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RT v2 3/3] sched: migrate_enable: Use stop_one_cpu_nowait()
  2019-10-12  6:52 [PATCH RT v2 0/3] migrate disable fixes and performance Scott Wood
  2019-10-12  6:52 ` [PATCH RT v2 1/3] sched: migrate_enable: Use select_fallback_rq() Scott Wood
  2019-10-12  6:52 ` [PATCH RT v2 2/3] sched: Lazy migrate_disable processing Scott Wood
@ 2019-10-12  6:52 ` Scott Wood
  2019-10-18 11:00 ` [PATCH RT v2 0/3] migrate disable fixes and performance Sebastian Andrzej Siewior
  3 siblings, 0 replies; 7+ messages in thread
From: Scott Wood @ 2019-10-12  6:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Waiman Long, linux-rt-users, linux-kernel, Scott Wood

migrate_enable() can be called with current->state != TASK_RUNNING.
Avoid clobbering the existing state by using stop_one_cpu_nowait().
Since we're stopping the current cpu, we know that we won't get
past __schedule() until migration_cpu_stop() has run (at least up to
the point of migrating us to another cpu).

Signed-off-by: Scott Wood <swood@redhat.com>
---
 include/linux/stop_machine.h |  2 ++
 kernel/sched/core.c          | 22 +++++++++++++---------
 kernel/stop_machine.c        |  7 +++++--
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 6d3635c86dbe..82fc686ddd9e 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -26,6 +26,8 @@ struct cpu_stop_work {
 	cpu_stop_fn_t		fn;
 	void			*arg;
 	struct cpu_stop_done	*done;
+	/* Did not run due to disabled stopper; for nowait debug checks */
+	bool			disabled;
 };
 
 int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5cb2a519b8bf..6383ade320f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1051,6 +1051,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 struct migration_arg {
 	struct task_struct *task;
 	int dest_cpu;
+	bool done;
 };
 
 /*
@@ -1086,6 +1087,11 @@ static int migration_cpu_stop(void *data)
 	struct task_struct *p = arg->task;
 	struct rq *rq = this_rq();
 	struct rq_flags rf;
+	int dest_cpu = arg->dest_cpu;
+
+	/* We don't look at arg after this point. */
+	smp_mb();
+	arg->done = true;
 
 	/*
 	 * The original target CPU might have gone down and we might
@@ -1108,9 +1114,9 @@ static int migration_cpu_stop(void *data)
 	 */
 	if (task_rq(p) == rq) {
 		if (task_on_rq_queued(p))
-			rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
+			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
-			p->wake_cpu = arg->dest_cpu;
+			p->wake_cpu = dest_cpu;
 	}
 	rq_unlock(rq, &rf);
 	raw_spin_unlock(&p->pi_lock);
@@ -7392,6 +7398,7 @@ void migrate_enable(void)
 	WARN_ON(smp_processor_id() != cpu);
 	if (!is_cpu_allowed(p, cpu)) {
 		struct migration_arg arg = { p };
+		struct cpu_stop_work work;
 		struct rq_flags rf;
 
 		rq = task_rq_lock(p, &rf);
@@ -7399,13 +7406,10 @@ void migrate_enable(void)
 		arg.dest_cpu = select_fallback_rq(cpu, p);
 		task_rq_unlock(rq, p, &rf);
 
-		preempt_lazy_enable();
-		preempt_enable();
-
-		sleeping_lock_inc();
-		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
-		sleeping_lock_dec();
-		return;
+		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
+				    &arg, &work);
+		__schedule(true);
+		WARN_ON_ONCE(!arg.done && !work.disabled);
 	}
 
 out:
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 2b5a6754646f..fa53a472dd44 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -85,8 +85,11 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
 	enabled = stopper->enabled;
 	if (enabled)
 		__cpu_stop_queue_work(stopper, work, &wakeq);
-	else if (work->done)
-		cpu_stop_signal_done(work->done);
+	else {
+		work->disabled = true;
+		if (work->done)
+			cpu_stop_signal_done(work->done);
+	}
 	raw_spin_unlock_irqrestore(&stopper->lock, flags);
 
 	wake_up_q(&wakeq);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT v2 0/3] migrate disable fixes and performance
  2019-10-12  6:52 [PATCH RT v2 0/3] migrate disable fixes and performance Scott Wood
                   ` (2 preceding siblings ...)
  2019-10-12  6:52 ` [PATCH RT v2 3/3] sched: migrate_enable: Use stop_one_cpu_nowait() Scott Wood
@ 2019-10-18 11:00 ` Sebastian Andrzej Siewior
  3 siblings, 0 replies; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-10-18 11:00 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Waiman Long, linux-rt-users, linux-kernel

On 2019-10-12 01:52:11 [-0500], Scott Wood wrote:
> These are the unapplied patches from v1, minus the sched deadline
> patch, and with stop_one_cpu_nowait() in place of clobbering
> current->state.

Thank you. They will be part of the next release.

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT v2 2/3] sched: Lazy migrate_disable processing
  2019-10-12  6:52 ` [PATCH RT v2 2/3] sched: Lazy migrate_disable processing Scott Wood
@ 2019-10-18 18:12   ` Waiman Long
  2019-10-18 22:11     ` Scott Wood
  0 siblings, 1 reply; 7+ messages in thread
From: Waiman Long @ 2019-10-18 18:12 UTC (permalink / raw)
  To: Scott Wood, Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	linux-rt-users, linux-kernel

On 10/12/19 2:52 AM, Scott Wood wrote:
> Avoid overhead on the majority of migrate disable/enable sequences by
> only manipulating scheduler data (and grabbing the relevant locks) when
> the task actually schedules while migrate-disabled.  A kernel build
> showed around a 10% reduction in system time (with CONFIG_NR_CPUS=512).
>
> Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU
> count of the number of pinned tasks (including tasks which have not
> scheduled in the migrate-disabled section); takedown_cpu() will
> wait until that reaches zero (confirmed by take_cpu_down() in stop
> machine context to deal with races) before migrating tasks off of the
> cpu.
>
> To simplify synchronization, updating cpus_mask is no longer deferred
> until migrate_enable().  This lets us not have to worry about
> migrate_enable() missing the update if it's on the fast path (didn't
> schedule during the migrate disabled section).  It also makes the code
> a bit simpler and reduces deviation from mainline.
>
> While the main motivation for this is the performance benefit, lazy
> migrate disable also eliminates the restriction on calling
> migrate_disable() while atomic but leaving the atomic region prior to
> calling migrate_enable() -- though this won't help with local_bh_disable()
> (and thus rcutorture) unless something similar is done with the recently
> added local_lock.
>
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
> The speedup is smaller than before, due to commit 659252061477862f
> ("lib/smp_processor_id: Don't use cpumask_equal()") achieving
> an overlapping speedup.
> ---
>  include/linux/cpu.h    |   4 --
>  include/linux/sched.h  |  11 +--
>  init/init_task.c       |   4 ++
>  kernel/cpu.c           | 103 +++++++++++-----------------
>  kernel/sched/core.c    | 180 ++++++++++++++++++++-----------------------------
>  kernel/sched/sched.h   |   4 ++
>  lib/smp_processor_id.c |   3 +
>  7 files changed, 128 insertions(+), 181 deletions(-)
>
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index f4a772c12d14..2df500fdcbc4 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -113,8 +113,6 @@ static inline void cpu_maps_update_done(void)
>  extern void cpu_hotplug_enable(void);
>  void clear_tasks_mm_cpumask(int cpu);
>  int cpu_down(unsigned int cpu);
> -extern void pin_current_cpu(void);
> -extern void unpin_current_cpu(void);
>  
>  #else /* CONFIG_HOTPLUG_CPU */
>  
> @@ -126,8 +124,6 @@ static inline void cpus_read_unlock(void) { }
>  static inline void lockdep_assert_cpus_held(void) { }
>  static inline void cpu_hotplug_disable(void) { }
>  static inline void cpu_hotplug_enable(void) { }
> -static inline void pin_current_cpu(void) { }
> -static inline void unpin_current_cpu(void) { }
>  
>  #endif	/* !CONFIG_HOTPLUG_CPU */
>  
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7e892e727f12..c6872b38bf77 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -229,6 +229,8 @@
>  extern long io_schedule_timeout(long timeout);
>  extern void io_schedule(void);
>  
> +int cpu_nr_pinned(int cpu);
> +
>  /**
>   * struct prev_cputime - snapshot of system and user cputime
>   * @utime: time spent in user mode
> @@ -661,16 +663,13 @@ struct task_struct {
>  	cpumask_t			cpus_mask;
>  #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
>  	int				migrate_disable;
> -	int				migrate_disable_update;
> -	int				pinned_on_cpu;
> +	bool				migrate_disable_scheduled;
>  # ifdef CONFIG_SCHED_DEBUG
> -	int				migrate_disable_atomic;
> +	int				pinned_on_cpu;
>  # endif
> -
>  #elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
>  # ifdef CONFIG_SCHED_DEBUG
>  	int				migrate_disable;
> -	int				migrate_disable_atomic;
>  # endif
>  #endif
>  #ifdef CONFIG_PREEMPT_RT_FULL
> @@ -2074,4 +2073,6 @@ static inline void rseq_syscall(struct pt_regs *regs)
>  
>  #endif
>  
> +extern struct task_struct *takedown_cpu_task;
> +
>  #endif
> diff --git a/init/init_task.c b/init/init_task.c
> index e402413dc47d..c0c7618fd2fb 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -81,6 +81,10 @@ struct task_struct init_task
>  	.cpus_ptr	= &init_task.cpus_mask,
>  	.cpus_mask	= CPU_MASK_ALL,
>  	.nr_cpus_allowed= NR_CPUS,
> +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) && \
> +    defined(CONFIG_SCHED_DEBUG)
> +	.pinned_on_cpu	= -1,
> +#endif
>  	.mm		= NULL,
>  	.active_mm	= &init_mm,
>  	.restart_block	= {
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 25afa2bb1a2c..e1bf3c698a32 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -76,11 +76,6 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = {
>  	.fail = CPUHP_INVALID,
>  };
>  
> -#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PREEMPT_RT_FULL)
> -static DEFINE_PER_CPU(struct rt_rw_lock, cpuhp_pin_lock) = \
> -	__RWLOCK_RT_INITIALIZER(cpuhp_pin_lock);
> -#endif
> -
>  #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
>  static struct lockdep_map cpuhp_state_up_map =
>  	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
> @@ -287,57 +282,6 @@ void cpu_maps_update_done(void)
>  
>  #ifdef CONFIG_HOTPLUG_CPU
>  
> -/**
> - * pin_current_cpu - Prevent the current cpu from being unplugged
> - */
> -void pin_current_cpu(void)
> -{
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	struct rt_rw_lock *cpuhp_pin;
> -	unsigned int cpu;
> -	int ret;
> -
> -again:
> -	cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
> -	ret = __read_rt_trylock(cpuhp_pin);
> -	if (ret) {
> -		current->pinned_on_cpu = smp_processor_id();
> -		return;
> -	}
> -	cpu = smp_processor_id();
> -	preempt_lazy_enable();
> -	preempt_enable();
> -
> -	sleeping_lock_inc();
> -	__read_rt_lock(cpuhp_pin);
> -	sleeping_lock_dec();
> -
> -	preempt_disable();
> -	preempt_lazy_disable();
> -	if (cpu != smp_processor_id()) {
> -		__read_rt_unlock(cpuhp_pin);
> -		goto again;
> -	}
> -	current->pinned_on_cpu = cpu;
> -#endif
> -}
> -
> -/**
> - * unpin_current_cpu - Allow unplug of current cpu
> - */
> -void unpin_current_cpu(void)
> -{
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	struct rt_rw_lock *cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
> -
> -	if (WARN_ON(current->pinned_on_cpu != smp_processor_id()))
> -		cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, current->pinned_on_cpu);
> -
> -	current->pinned_on_cpu = -1;
> -	__read_rt_unlock(cpuhp_pin);
> -#endif
> -}
> -
>  DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
>  
>  void cpus_read_lock(void)
> @@ -895,6 +839,15 @@ static int take_cpu_down(void *_param)
>  	int err, cpu = smp_processor_id();
>  	int ret;
>  
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	/*
> +	 * If any tasks disabled migration before we got here,
> +	 * go back and sleep again.
> +	 */
> +	if (cpu_nr_pinned(cpu))
> +		return -EAGAIN;
> +#endif
> +
>  	/* Ensure this CPU doesn't handle any more interrupts. */
>  	err = __cpu_disable();
>  	if (err < 0)
> @@ -924,11 +877,10 @@ static int take_cpu_down(void *_param)
>  	return 0;
>  }
>  
> +struct task_struct *takedown_cpu_task;
> +
>  static int takedown_cpu(unsigned int cpu)
>  {
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	struct rt_rw_lock *cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, cpu);
> -#endif
>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>  	int err;
>  
> @@ -941,17 +893,38 @@ static int takedown_cpu(unsigned int cpu)
>  	 */
>  	irq_lock_sparse();
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	__write_rt_lock(cpuhp_pin);
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	WARN_ON_ONCE(takedown_cpu_task);
> +	takedown_cpu_task = current;

I don't know how likely it is for more than one task calling
takedown_cpu() concurrently. But if that can happen, there is a
possibility of missed wakeup.

The current code is fragile. How about something like

while (cmpxchg(&takedown_cpu_task, NULL, current) {
    WARN_ON_ONCE(1);
    schedule();
}

Cheers,
Longman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RT v2 2/3] sched: Lazy migrate_disable processing
  2019-10-18 18:12   ` Waiman Long
@ 2019-10-18 22:11     ` Scott Wood
  0 siblings, 0 replies; 7+ messages in thread
From: Scott Wood @ 2019-10-18 22:11 UTC (permalink / raw)
  To: Waiman Long, Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	linux-rt-users, linux-kernel

On Fri, 2019-10-18 at 14:12 -0400, Waiman Long wrote:
> On 10/12/19 2:52 AM, Scott Wood wrote:
> > Avoid overhead on the majority of migrate disable/enable sequences by
> > only manipulating scheduler data (and grabbing the relevant locks) when
> > the task actually schedules while migrate-disabled.  A kernel build
> > showed around a 10% reduction in system time (with CONFIG_NR_CPUS=512).
> > 
> > Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU
> > count of the number of pinned tasks (including tasks which have not
> > scheduled in the migrate-disabled section); takedown_cpu() will
> > wait until that reaches zero (confirmed by take_cpu_down() in stop
> > machine context to deal with races) before migrating tasks off of the
> > cpu.
> > 
> > To simplify synchronization, updating cpus_mask is no longer deferred
> > until migrate_enable().  This lets us not have to worry about
> > migrate_enable() missing the update if it's on the fast path (didn't
> > schedule during the migrate disabled section).  It also makes the code
> > a bit simpler and reduces deviation from mainline.
> > 
> > While the main motivation for this is the performance benefit, lazy
> > migrate disable also eliminates the restriction on calling
> > migrate_disable() while atomic but leaving the atomic region prior to
> > calling migrate_enable() -- though this won't help with
> > local_bh_disable()
> > (and thus rcutorture) unless something similar is done with the recently
> > added local_lock.
> > 
> > Signed-off-by: Scott Wood <swood@redhat.com>
> > ---
> > The speedup is smaller than before, due to commit 659252061477862f
> > ("lib/smp_processor_id: Don't use cpumask_equal()") achieving
> > an overlapping speedup.
> > ---
> >  include/linux/cpu.h    |   4 --
> >  include/linux/sched.h  |  11 +--
> >  init/init_task.c       |   4 ++
> >  kernel/cpu.c           | 103 +++++++++++-----------------
> >  kernel/sched/core.c    | 180 ++++++++++++++++++++--------------------
> > ---------
> >  kernel/sched/sched.h   |   4 ++
> >  lib/smp_processor_id.c |   3 +
> >  7 files changed, 128 insertions(+), 181 deletions(-)
> > 
> > diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> > index f4a772c12d14..2df500fdcbc4 100644
> > --- a/include/linux/cpu.h
> > +++ b/include/linux/cpu.h
> > @@ -113,8 +113,6 @@ static inline void cpu_maps_update_done(void)
> >  extern void cpu_hotplug_enable(void);
> >  void clear_tasks_mm_cpumask(int cpu);
> >  int cpu_down(unsigned int cpu);
> > -extern void pin_current_cpu(void);
> > -extern void unpin_current_cpu(void);
> >  
> >  #else /* CONFIG_HOTPLUG_CPU */
> >  
> > @@ -126,8 +124,6 @@ static inline void cpus_read_unlock(void) { }
> >  static inline void lockdep_assert_cpus_held(void) { }
> >  static inline void cpu_hotplug_disable(void) { }
> >  static inline void cpu_hotplug_enable(void) { }
> > -static inline void pin_current_cpu(void) { }
> > -static inline void unpin_current_cpu(void) { }
> >  
> >  #endif	/* !CONFIG_HOTPLUG_CPU */
> >  
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 7e892e727f12..c6872b38bf77 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -229,6 +229,8 @@
> >  extern long io_schedule_timeout(long timeout);
> >  extern void io_schedule(void);
> >  
> > +int cpu_nr_pinned(int cpu);
> > +
> >  /**
> >   * struct prev_cputime - snapshot of system and user cputime
> >   * @utime: time spent in user mode
> > @@ -661,16 +663,13 @@ struct task_struct {
> >  	cpumask_t			cpus_mask;
> >  #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
> >  	int				migrate_disable;
> > -	int				migrate_disable_update;
> > -	int				pinned_on_cpu;
> > +	bool				migrate_disable_scheduled;
> >  # ifdef CONFIG_SCHED_DEBUG
> > -	int				migrate_disable_atomic;
> > +	int				pinned_on_cpu;
> >  # endif
> > -
> >  #elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
> >  # ifdef CONFIG_SCHED_DEBUG
> >  	int				migrate_disable;
> > -	int				migrate_disable_atomic;
> >  # endif
> >  #endif
> >  #ifdef CONFIG_PREEMPT_RT_FULL
> > @@ -2074,4 +2073,6 @@ static inline void rseq_syscall(struct pt_regs
> > *regs)
> >  
> >  #endif
> >  
> > +extern struct task_struct *takedown_cpu_task;
> > +
> >  #endif
> > diff --git a/init/init_task.c b/init/init_task.c
> > index e402413dc47d..c0c7618fd2fb 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -81,6 +81,10 @@ struct task_struct init_task
> >  	.cpus_ptr	= &init_task.cpus_mask,
> >  	.cpus_mask	= CPU_MASK_ALL,
> >  	.nr_cpus_allowed= NR_CPUS,
> > +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) && \
> > +    defined(CONFIG_SCHED_DEBUG)
> > +	.pinned_on_cpu	= -1,
> > +#endif
> >  	.mm		= NULL,
> >  	.active_mm	= &init_mm,
> >  	.restart_block	= {
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 25afa2bb1a2c..e1bf3c698a32 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -76,11 +76,6 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state,
> > cpuhp_state) = {
> >  	.fail = CPUHP_INVALID,
> >  };
> >  
> > -#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PREEMPT_RT_FULL)
> > -static DEFINE_PER_CPU(struct rt_rw_lock, cpuhp_pin_lock) = \
> > -	__RWLOCK_RT_INITIALIZER(cpuhp_pin_lock);
> > -#endif
> > -
> >  #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
> >  static struct lockdep_map cpuhp_state_up_map =
> >  	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
> > @@ -287,57 +282,6 @@ void cpu_maps_update_done(void)
> >  
> >  #ifdef CONFIG_HOTPLUG_CPU
> >  
> > -/**
> > - * pin_current_cpu - Prevent the current cpu from being unplugged
> > - */
> > -void pin_current_cpu(void)
> > -{
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -	struct rt_rw_lock *cpuhp_pin;
> > -	unsigned int cpu;
> > -	int ret;
> > -
> > -again:
> > -	cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
> > -	ret = __read_rt_trylock(cpuhp_pin);
> > -	if (ret) {
> > -		current->pinned_on_cpu = smp_processor_id();
> > -		return;
> > -	}
> > -	cpu = smp_processor_id();
> > -	preempt_lazy_enable();
> > -	preempt_enable();
> > -
> > -	sleeping_lock_inc();
> > -	__read_rt_lock(cpuhp_pin);
> > -	sleeping_lock_dec();
> > -
> > -	preempt_disable();
> > -	preempt_lazy_disable();
> > -	if (cpu != smp_processor_id()) {
> > -		__read_rt_unlock(cpuhp_pin);
> > -		goto again;
> > -	}
> > -	current->pinned_on_cpu = cpu;
> > -#endif
> > -}
> > -
> > -/**
> > - * unpin_current_cpu - Allow unplug of current cpu
> > - */
> > -void unpin_current_cpu(void)
> > -{
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -	struct rt_rw_lock *cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
> > -
> > -	if (WARN_ON(current->pinned_on_cpu != smp_processor_id()))
> > -		cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, current-
> > >pinned_on_cpu);
> > -
> > -	current->pinned_on_cpu = -1;
> > -	__read_rt_unlock(cpuhp_pin);
> > -#endif
> > -}
> > -
> >  DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
> >  
> >  void cpus_read_lock(void)
> > @@ -895,6 +839,15 @@ static int take_cpu_down(void *_param)
> >  	int err, cpu = smp_processor_id();
> >  	int ret;
> >  
> > +#ifdef CONFIG_PREEMPT_RT_BASE
> > +	/*
> > +	 * If any tasks disabled migration before we got here,
> > +	 * go back and sleep again.
> > +	 */
> > +	if (cpu_nr_pinned(cpu))
> > +		return -EAGAIN;
> > +#endif
> > +
> >  	/* Ensure this CPU doesn't handle any more interrupts. */
> >  	err = __cpu_disable();
> >  	if (err < 0)
> > @@ -924,11 +877,10 @@ static int take_cpu_down(void *_param)
> >  	return 0;
> >  }
> >  
> > +struct task_struct *takedown_cpu_task;
> > +
> >  static int takedown_cpu(unsigned int cpu)
> >  {
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -	struct rt_rw_lock *cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, cpu);
> > -#endif
> >  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
> >  	int err;
> >  
> > @@ -941,17 +893,38 @@ static int takedown_cpu(unsigned int cpu)
> >  	 */
> >  	irq_lock_sparse();
> >  
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -	__write_rt_lock(cpuhp_pin);
> > +#ifdef CONFIG_PREEMPT_RT_BASE
> > +	WARN_ON_ONCE(takedown_cpu_task);
> > +	takedown_cpu_task = current;
> 
> I don't know how likely it is for more than one task calling
> takedown_cpu() concurrently. But if that can happen, there is a
> possibility of missed wakeup.

cpus_write_lock() in _cpu_down() prevents this, as does cpu_add_remove_lock.

> The current code is fragile. How about something like
> 
> while (cmpxchg(&takedown_cpu_task, NULL, current) {
>     WARN_ON_ONCE(1);
>     schedule();
> }

It's a debug check to try to let us know if the code ever changes such that
multiple cpus can be taken down in parallel, not an attempt to actually make
that work.

That said, looking at this again, it seems there is a possibility for a
different race -- if __cpu_disable() returns an error, takedown_cpu_task
will be NULLed out without flushing tasks from the cpu.  This could happen
between reads of takedown_cpu_task in migrate_enable(), leading to
wake_up_process(NULL).  Even if we use READ_ONCE(), if we get delayed by an
interrupt long enough, the task pointed to might go away (or, less
seriously, receive a spurious wakeup).  A fix could be to stick a raw lock
around the if+wakeup and around takedown_cpu_task = NULL in the error path.

-Scott



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-12  6:52 [PATCH RT v2 0/3] migrate disable fixes and performance Scott Wood
2019-10-12  6:52 ` [PATCH RT v2 1/3] sched: migrate_enable: Use select_fallback_rq() Scott Wood
2019-10-12  6:52 ` [PATCH RT v2 2/3] sched: Lazy migrate_disable processing Scott Wood
2019-10-18 18:12   ` Waiman Long
2019-10-18 22:11     ` Scott Wood
2019-10-12  6:52 ` [PATCH RT v2 3/3] sched: migrate_enable: Use stop_one_cpu_nowait() Scott Wood
2019-10-18 11:00 ` [PATCH RT v2 0/3] migrate disable fixes and performance Sebastian Andrzej Siewior

Linux-rt-users archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git