linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RT PATCH 0/8] migrate disable fixes and performance
@ 2019-07-27  5:56 Scott Wood
  2019-07-27  5:56 ` [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep Scott Wood
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

With these patches, a kernel build on a 104-cpu machine took around 75%
less wall time and 85% less system time.  Note that there is a difference
in v5.2-rt compared to v5.0-rt.  The performance with these patches is
similar in both cases, but without these patches v5.2-rt is substantially
slower.  In v5.0-rt with a previous version of these patches, lazy
migrate disable reduced kernel build time by around 15-20% wall and
70-75% system.

Scott Wood (8):
  sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep
  sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr
  sched: Remove dead __migrate_disabled() check
  sched: migrate disable: Protect cpus_ptr with lock
  sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  sched: migrate_enable: Set state to TASK_RUNNING
  sched: migrate_enable: Use select_fallback_rq()
  sched: Lazy migrate_disable processing

 include/linux/cpu.h      |   4 -
 include/linux/sched.h    |  15 ++--
 init/init_task.c         |   4 +
 kernel/cpu.c             |  97 ++++++++---------------
 kernel/rcu/tree_plugin.h |   2 +-
 kernel/sched/core.c      | 200 +++++++++++++++++++----------------------------
 kernel/sched/deadline.c  |  67 ++++++++--------
 kernel/sched/sched.h     |   4 +
 lib/smp_processor_id.c   |   3 +
 9 files changed, 166 insertions(+), 230 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-07-29 13:26   ` Steven Rostedt
  2019-07-27  5:56 ` [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr Scott Wood
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

Without this, rcu_note_context_switch() will complain if an RCU read
lock is held when migrate_enable() calls stop_one_cpu().

Signed-off-by: Scott Wood <swood@redhat.com>
---
 include/linux/sched.h    | 4 ++--
 kernel/rcu/tree_plugin.h | 2 +-
 kernel/sched/core.c      | 2 ++
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4e218f8d8048..ad23ab939b35 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -673,7 +673,7 @@ struct task_struct {
 	int				migrate_disable_atomic;
 # endif
 #endif
-#ifdef CONFIG_PREEMPT_RT_FULL
+#ifdef CONFIG_PREEMPT_RT_BASE
 	int				sleeping_lock;
 #endif
 
@@ -1873,7 +1873,7 @@ static __always_inline bool need_resched(void)
 	return unlikely(tif_need_resched());
 }
 
-#ifdef CONFIG_PREEMPT_RT_FULL
+#ifdef CONFIG_PREEMPT_RT_BASE
 static inline void sleeping_lock_inc(void)
 {
 	current->sleeping_lock++;
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 23a54e4b649c..7a3aa085ce2c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -292,7 +292,7 @@ void rcu_note_context_switch(bool preempt)
 	barrier(); /* Avoid RCU read-side critical sections leaking down. */
 	trace_rcu_utilization(TPS("Start context switch"));
 	lockdep_assert_irqs_disabled();
-#if defined(CONFIG_PREEMPT_RT_FULL)
+#if defined(CONFIG_PREEMPT_RT_BASE)
 	sleeping_l = t->sleeping_lock;
 #endif
 	WARN_ON_ONCE(!preempt && t->rcu_read_lock_nesting > 0 && !sleeping_l);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d3c6542b306f..c3407707e367 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7405,7 +7405,9 @@ void migrate_enable(void)
 			unpin_current_cpu();
 			preempt_lazy_enable();
 			preempt_enable();
+			sleeping_lock_inc();
 			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
+			sleeping_lock_dec();
 			return;
 		}
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
  2019-07-27  5:56 ` [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-17 14:57   ` Sebastian Andrzej Siewior
  2019-07-27  5:56 ` [PATCH RT 3/8] sched: Remove dead __migrate_disabled() check Scott Wood
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

This function is concerned with the long-term cpu mask, not the
transitory mask the task might have while migrate disabled.  Before
this patch, if a task was migrate disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the cpu that the task was running on, then the mask update
would be lost.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c3407707e367..6e643d656d71 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1218,7 +1218,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		goto out;
 	}
 
-	if (cpumask_equal(p->cpus_ptr, new_mask))
+	if (cpumask_equal(&p->cpus_mask, new_mask))
 		goto out;
 
 	if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 3/8] sched: Remove dead __migrate_disabled() check
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
  2019-07-27  5:56 ` [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep Scott Wood
  2019-07-27  5:56 ` [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-07-27  5:56 ` [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock Scott Wood
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

This code was unreachable given the __migrate_disabled() branch
to "out" immediately beforehand.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/core.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6e643d656d71..99a3cfccf4d3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1242,13 +1242,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	if (cpumask_test_cpu(task_cpu(p), new_mask) || __migrate_disabled(p))
 		goto out;
 
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
-	if (__migrate_disabled(p)) {
-		p->migrate_disable_update = 1;
-		goto out;
-	}
-#endif
-
 	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		struct migration_arg arg = { p, dest_cpu };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
                   ` (2 preceding siblings ...)
  2019-07-27  5:56 ` [PATCH RT 3/8] sched: Remove dead __migrate_disabled() check Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-26 16:39   ` Sebastian Andrzej Siewior
  2019-07-27  5:56 ` [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq() Scott Wood
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

Various places assume that cpus_ptr is protected by rq/pi locks,
so don't change it before grabbing those locks.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 99a3cfccf4d3..38a9a9df5638 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7283,9 +7283,8 @@ void dump_cpu_task(int cpu)
 	struct rq *rq;
 	struct rq_flags rf;
 
-	p->cpus_ptr = cpumask_of(smp_processor_id());
-
 	rq = task_rq_lock(p, &rf);
+	p->cpus_ptr = cpumask_of(smp_processor_id());
 	update_nr_migratory(p, -1);
 	p->nr_cpus_allowed = 1;
 	task_rq_unlock(rq, p, &rf);
@@ -7297,9 +7296,8 @@ void dump_cpu_task(int cpu)
 	struct rq *rq;
 	struct rq_flags rf;
 
-	p->cpus_ptr = &p->cpus_mask;
-
 	rq = task_rq_lock(p, &rf);
+	p->cpus_ptr = &p->cpus_mask;
 	p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
 	update_nr_migratory(p, 1);
 	task_rq_unlock(rq, p, &rf);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
                   ` (3 preceding siblings ...)
  2019-07-27  5:56 ` [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-17 15:10   ` Sebastian Andrzej Siewior
  2019-09-27  8:11   ` Juri Lelli
  2019-07-27  5:56 ` [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING Scott Wood
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

With the changes to migrate disabling, ->set_cpus_allowed() no longer
gets deferred until migrate_enable().  To avoid releasing the bandwidth
while the task may still be executing on the old CPU, move the subtraction
to ->migrate_task_rq().

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/deadline.c | 67 +++++++++++++++++++++++--------------------------
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index c18be51f7608..2f18d0cf1b56 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
 	return cpu;
 }
 
+static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct *p)
+{
+	struct root_domain *src_rd = rq->rd;
+
+	/*
+	 * Migrating a SCHED_DEADLINE task between exclusive
+	 * cpusets (different root_domains) entails a bandwidth
+	 * update. We already made space for us in the destination
+	 * domain (see cpuset_can_attach()).
+	 */
+	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
+		struct dl_bw *src_dl_b;
+
+		src_dl_b = dl_bw_of(cpu_of(rq));
+		/*
+		 * We now free resources of the root_domain we are migrating
+		 * off. In the worst case, sched_setattr() may temporary fail
+		 * until we complete the update.
+		 */
+		raw_spin_lock(&src_dl_b->lock);
+		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
+		raw_spin_unlock(&src_dl_b->lock);
+	}
+}
+
 static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
 {
 	struct rq *rq;
 
-	if (p->state != TASK_WAKING)
+	rq = task_rq(p);
+
+	if (p->state != TASK_WAKING) {
+		free_old_cpuset_bw_dl(rq, p);
 		return;
+	}
 
-	rq = task_rq(p);
 	/*
 	 * Since p->state == TASK_WAKING, set_task_cpu() has been called
 	 * from try_to_wake_up(). Hence, p->pi_lock is locked, but
@@ -2220,39 +2248,6 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
 	}
 }
 
-static void set_cpus_allowed_dl(struct task_struct *p,
-				const struct cpumask *new_mask)
-{
-	struct root_domain *src_rd;
-	struct rq *rq;
-
-	BUG_ON(!dl_task(p));
-
-	rq = task_rq(p);
-	src_rd = rq->rd;
-	/*
-	 * Migrating a SCHED_DEADLINE task between exclusive
-	 * cpusets (different root_domains) entails a bandwidth
-	 * update. We already made space for us in the destination
-	 * domain (see cpuset_can_attach()).
-	 */
-	if (!cpumask_intersects(src_rd->span, new_mask)) {
-		struct dl_bw *src_dl_b;
-
-		src_dl_b = dl_bw_of(cpu_of(rq));
-		/*
-		 * We now free resources of the root_domain we are migrating
-		 * off. In the worst case, sched_setattr() may temporary fail
-		 * until we complete the update.
-		 */
-		raw_spin_lock(&src_dl_b->lock);
-		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
-		raw_spin_unlock(&src_dl_b->lock);
-	}
-
-	set_cpus_allowed_common(p, new_mask);
-}
-
 /* Assumes rq->lock is held */
 static void rq_online_dl(struct rq *rq)
 {
@@ -2407,7 +2402,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 #ifdef CONFIG_SMP
 	.select_task_rq		= select_task_rq_dl,
 	.migrate_task_rq	= migrate_task_rq_dl,
-	.set_cpus_allowed       = set_cpus_allowed_dl,
+	.set_cpus_allowed       = set_cpus_allowed_common,
 	.rq_online              = rq_online_dl,
 	.rq_offline             = rq_offline_dl,
 	.task_woken		= task_woken_dl,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
                   ` (4 preceding siblings ...)
  2019-07-27  5:56 ` [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq() Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-17 15:31   ` Sebastian Andrzej Siewior
  2019-07-27  5:56 ` [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq() Scott Wood
  2019-07-27  5:56 ` [PATCH RT 8/8] sched: Lazy migrate_disable processing Scott Wood
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

If migrate_enable() is called while a task is preparing to sleep
(state != TASK_RUNNING), that triggers a debug check in stop_one_cpu().
Explicitly reset state to acknowledge that we're accepting the spurious
wakeup.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 kernel/sched/core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 38a9a9df5638..eb27a9bf70d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7396,6 +7396,14 @@ void migrate_enable(void)
 			unpin_current_cpu();
 			preempt_lazy_enable();
 			preempt_enable();
+
+			/*
+			 * Avoid sleeping with an existing non-running
+			 * state.  This will result in a spurious wakeup
+			 * for the calling context.
+			 */
+			__set_current_state(TASK_RUNNING);
+
 			sleeping_lock_inc();
 			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 			sleeping_lock_dec();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
                   ` (5 preceding siblings ...)
  2019-07-27  5:56 ` [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-17 16:00   ` Sebastian Andrzej Siewior
  2019-07-27  5:56 ` [PATCH RT 8/8] sched: Lazy migrate_disable processing Scott Wood
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

migrate_enable() currently open-codes a variant of select_fallback_rq().
However, it does not have the "No more Mr. Nice Guy" fallback and thus
it will pass an invalid CPU to the migration thread if cpus_mask only
contains a CPU that is !active.

Signed-off-by: Scott Wood <swood@redhat.com>
---
This scenario will be more likely after the next patch, since
the migrate_disable_update check goes away.  However, it could happen
anyway if cpus_mask was updated to a CPU other than the one we were
pinned to, and that CPU subsequently became inactive.
---
 kernel/sched/core.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index eb27a9bf70d7..3a2d8251a30c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7368,6 +7368,7 @@ void migrate_enable(void)
 	if (p->migrate_disable_update) {
 		struct rq *rq;
 		struct rq_flags rf;
+		int cpu = task_cpu(p);
 
 		rq = task_rq_lock(p, &rf);
 		update_rq_clock(rq);
@@ -7377,21 +7378,15 @@ void migrate_enable(void)
 
 		p->migrate_disable_update = 0;
 
-		WARN_ON(smp_processor_id() != task_cpu(p));
-		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
-			const struct cpumask *cpu_valid_mask = cpu_active_mask;
-			struct migration_arg arg;
-			unsigned int dest_cpu;
-
-			if (p->flags & PF_KTHREAD) {
-				/*
-				 * Kernel threads are allowed on online && !active CPUs
-				 */
-				cpu_valid_mask = cpu_online_mask;
-			}
-			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
-			arg.task = p;
-			arg.dest_cpu = dest_cpu;
+		WARN_ON(smp_processor_id() != cpu);
+		if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
+			struct migration_arg arg = { p };
+			struct rq_flags rf;
+
+			rq = task_rq_lock(p, &rf);
+			update_rq_clock(rq);
+			arg.dest_cpu = select_fallback_rq(cpu, p);
+			task_rq_unlock(rq, p, &rf);
 
 			unpin_current_cpu();
 			preempt_lazy_enable();
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH RT 8/8] sched: Lazy migrate_disable processing
  2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
                   ` (6 preceding siblings ...)
  2019-07-27  5:56 ` [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq() Scott Wood
@ 2019-07-27  5:56 ` Scott Wood
  2019-09-17 16:50   ` Sebastian Andrzej Siewior
  7 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-07-27  5:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users, Scott Wood

Avoid overhead on the majority of migrate disable/enable sequences
by only manipulating scheduler data (and grabbing the relevant locks)
when the task actually schedules while migrate-disabled.  Very large
speedups were seen during a kernel build.

Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU
count of the number of pinned tasks (including tasks which have not
scheduled in the migrate-disabled section); takedown_cpu() will
wait until that reaches zero (confirmed by take_cpu_down() in stop
machine context to deal with races) before migrating tasks off of the
cpu.

To simplify synchronization, updating cpus_mask is no longer deferred
until migrate_enable().  This lets us not have to worry about
migrate_enable() missing the update if it's on the fast path (didn't
schedule during the migrate disabled section).  It also makes the code
a bit simpler and reduces deviation from mainline.

While the main motivation for this is the performance benefit, lazy
migrate disable also eliminates the restriction on calling
migrate_disable() while atomic but leaving the atomic region prior to
calling migrate_enable() -- though this won't help with local_bh_disable()
(and thus rcutorture) unless something similar is done with the recently
added local_lock.

Signed-off-by: Scott Wood <swood@redhat.com>
---
 include/linux/cpu.h    |   4 --
 include/linux/sched.h  |  11 +--
 init/init_task.c       |   4 ++
 kernel/cpu.c           |  97 +++++++++----------------
 kernel/sched/core.c    | 192 ++++++++++++++++++++-----------------------------
 kernel/sched/sched.h   |   4 ++
 lib/smp_processor_id.c |   3 +
 7 files changed, 130 insertions(+), 185 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f4a772c12d14..2df500fdcbc4 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -113,8 +113,6 @@ static inline void cpu_maps_update_done(void)
 extern void cpu_hotplug_enable(void);
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
-extern void pin_current_cpu(void);
-extern void unpin_current_cpu(void);
 
 #else /* CONFIG_HOTPLUG_CPU */
 
@@ -126,8 +124,6 @@ static inline void cpus_read_unlock(void) { }
 static inline void lockdep_assert_cpus_held(void) { }
 static inline void cpu_hotplug_disable(void) { }
 static inline void cpu_hotplug_enable(void) { }
-static inline void pin_current_cpu(void) { }
-static inline void unpin_current_cpu(void) { }
 
 #endif	/* !CONFIG_HOTPLUG_CPU */
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ad23ab939b35..069c46dde15b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -229,6 +229,8 @@
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
+int cpu_nr_pinned(int cpu);
+
 /**
  * struct prev_cputime - snapshot of system and user cputime
  * @utime: time spent in user mode
@@ -661,16 +663,13 @@ struct task_struct {
 	cpumask_t			cpus_mask;
 #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
 	int				migrate_disable;
-	int				migrate_disable_update;
-	int				pinned_on_cpu;
+	bool				migrate_disable_scheduled;
 # ifdef CONFIG_SCHED_DEBUG
-	int				migrate_disable_atomic;
+	int				pinned_on_cpu;
 # endif
-
 #elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
 # ifdef CONFIG_SCHED_DEBUG
 	int				migrate_disable;
-	int				migrate_disable_atomic;
 # endif
 #endif
 #ifdef CONFIG_PREEMPT_RT_BASE
@@ -2066,4 +2065,6 @@ static inline void rseq_syscall(struct pt_regs *regs)
 
 #endif
 
+extern struct task_struct *takedown_cpu_task;
+
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index e402413dc47d..c0c7618fd2fb 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -81,6 +81,10 @@ struct task_struct init_task
 	.cpus_ptr	= &init_task.cpus_mask,
 	.cpus_mask	= CPU_MASK_ALL,
 	.nr_cpus_allowed= NR_CPUS,
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) && \
+    defined(CONFIG_SCHED_DEBUG)
+	.pinned_on_cpu	= -1,
+#endif
 	.mm		= NULL,
 	.active_mm	= &init_mm,
 	.restart_block	= {
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 885a195dfbe0..0096acf1a692 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -76,11 +76,6 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = {
 	.fail = CPUHP_INVALID,
 };
 
-#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PREEMPT_RT_FULL)
-static DEFINE_PER_CPU(struct rt_rw_lock, cpuhp_pin_lock) = \
-	__RWLOCK_RT_INITIALIZER(cpuhp_pin_lock);
-#endif
-
 #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP)
 static struct lockdep_map cpuhp_state_up_map =
 	STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map);
@@ -287,55 +282,6 @@ void cpu_maps_update_done(void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 
-/**
- * pin_current_cpu - Prevent the current cpu from being unplugged
- */
-void pin_current_cpu(void)
-{
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin;
-	unsigned int cpu;
-	int ret;
-
-again:
-	cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
-	ret = __read_rt_trylock(cpuhp_pin);
-	if (ret) {
-		current->pinned_on_cpu = smp_processor_id();
-		return;
-	}
-	cpu = smp_processor_id();
-	preempt_lazy_enable();
-	preempt_enable();
-
-	__read_rt_lock(cpuhp_pin);
-
-	preempt_disable();
-	preempt_lazy_disable();
-	if (cpu != smp_processor_id()) {
-		__read_rt_unlock(cpuhp_pin);
-		goto again;
-	}
-	current->pinned_on_cpu = cpu;
-#endif
-}
-
-/**
- * unpin_current_cpu - Allow unplug of current cpu
- */
-void unpin_current_cpu(void)
-{
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock);
-
-	if (WARN_ON(current->pinned_on_cpu != smp_processor_id()))
-		cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, current->pinned_on_cpu);
-
-	current->pinned_on_cpu = -1;
-	__read_rt_unlock(cpuhp_pin);
-#endif
-}
-
 DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
 
 void cpus_read_lock(void)
@@ -893,6 +839,15 @@ static int take_cpu_down(void *_param)
 	int err, cpu = smp_processor_id();
 	int ret;
 
+#ifdef CONFIG_PREEMPT_RT_BASE
+	/*
+	 * If any tasks disabled migration before we got here,
+	 * go back and sleep again.
+	 */
+	if (cpu_nr_pinned(cpu))
+		return -EAGAIN;
+#endif
+
 	/* Ensure this CPU doesn't handle any more interrupts. */
 	err = __cpu_disable();
 	if (err < 0)
@@ -922,11 +877,10 @@ static int take_cpu_down(void *_param)
 	return 0;
 }
 
+struct task_struct *takedown_cpu_task;
+
 static int takedown_cpu(unsigned int cpu)
 {
-#ifdef CONFIG_PREEMPT_RT_FULL
-	struct rt_rw_lock *cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, cpu);
-#endif
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
 	int err;
 
@@ -939,17 +893,34 @@ static int takedown_cpu(unsigned int cpu)
 	 */
 	irq_lock_sparse();
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	__write_rt_lock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+	WARN_ON_ONCE(takedown_cpu_task);
+	takedown_cpu_task = current;
+
+again:
+	for (;;) {
+		int nr_pinned;
+
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		nr_pinned = cpu_nr_pinned(cpu);
+		if (nr_pinned == 0)
+			break;
+		schedule();
+	}
+	set_current_state(TASK_RUNNING);
 #endif
 
 	/*
 	 * So now all preempt/rcu users must observe !cpu_active().
 	 */
 	err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
+#ifdef CONFIG_PREEMPT_RT_BASE
+	if (err == -EAGAIN)
+		goto again;
+#endif
 	if (err) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-		__write_rt_unlock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+		takedown_cpu_task = NULL;
 #endif
 		/* CPU refused to die */
 		irq_unlock_sparse();
@@ -969,8 +940,8 @@ static int takedown_cpu(unsigned int cpu)
 	wait_for_ap_thread(st, false);
 	BUG_ON(st->state != CPUHP_AP_IDLE_DEAD);
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-	__write_rt_unlock(cpuhp_pin);
+#ifdef CONFIG_PREEMPT_RT_BASE
+	takedown_cpu_task = NULL;
 #endif
 	/* Interrupts are moved away from the dying cpu, reenable alloc/free */
 	irq_unlock_sparse();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3a2d8251a30c..a11bfa7939da 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1126,7 +1126,8 @@ static int migration_cpu_stop(void *data)
 void set_cpus_allowed_common(struct task_struct *p, const struct cpumask *new_mask)
 {
 	cpumask_copy(&p->cpus_mask, new_mask);
-	p->nr_cpus_allowed = cpumask_weight(new_mask);
+	if (p->cpus_ptr == &p->cpus_mask)
+		p->nr_cpus_allowed = cpumask_weight(new_mask);
 }
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
@@ -1137,8 +1138,7 @@ int __migrate_disabled(struct task_struct *p)
 EXPORT_SYMBOL_GPL(__migrate_disabled);
 #endif
 
-static void __do_set_cpus_allowed_tail(struct task_struct *p,
-				       const struct cpumask *new_mask)
+void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
 	struct rq *rq = task_rq(p);
 	bool queued, running;
@@ -1167,20 +1167,6 @@ static void __do_set_cpus_allowed_tail(struct task_struct *p,
 		set_curr_task(rq, p);
 }
 
-void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
-{
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
-	if (__migrate_disabled(p)) {
-		lockdep_assert_held(&p->pi_lock);
-
-		cpumask_copy(&p->cpus_mask, new_mask);
-		p->migrate_disable_update = 1;
-		return;
-	}
-#endif
-	__do_set_cpus_allowed_tail(p, new_mask);
-}
-
 /*
  * Change a given task's CPU affinity. Migrate the thread to a
  * proper CPU and schedule it away if the CPU it's executing on
@@ -1239,7 +1225,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 	}
 
 	/* Can the task run on the task's current CPU? If so, we're done */
-	if (cpumask_test_cpu(task_cpu(p), new_mask) || __migrate_disabled(p))
+	if (cpumask_test_cpu(task_cpu(p), new_mask) ||
+	    p->cpus_ptr != &p->cpus_mask)
 		goto out;
 
 	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
@@ -3461,6 +3448,8 @@ static inline void schedule_debug(struct task_struct *prev)
 	BUG();
 }
 
+static void migrate_disabled_sched(struct task_struct *p);
+
 /*
  * __schedule() is the main scheduler function.
  *
@@ -3531,6 +3520,9 @@ static void __sched notrace __schedule(bool preempt)
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
 
+	if (__migrate_disabled(prev))
+		migrate_disabled_sched(prev);
+
 	/* Promote REQ to ACT */
 	rq->clock_update_flags <<= 1;
 	update_rq_clock(rq);
@@ -5781,6 +5773,8 @@ static void migrate_tasks(struct rq *dead_rq, struct rq_flags *rf)
 		BUG_ON(!next);
 		put_prev_task(rq, next);
 
+		WARN_ON_ONCE(__migrate_disabled(next));
+
 		/*
 		 * Rules for changing task_struct::cpus_mask are holding
 		 * both pi_lock and rq->lock, such that holding either
@@ -7280,14 +7274,9 @@ void dump_cpu_task(int cpu)
 static inline void
 migrate_disable_update_cpus_allowed(struct task_struct *p)
 {
-	struct rq *rq;
-	struct rq_flags rf;
-
-	rq = task_rq_lock(p, &rf);
 	p->cpus_ptr = cpumask_of(smp_processor_id());
 	update_nr_migratory(p, -1);
 	p->nr_cpus_allowed = 1;
-	task_rq_unlock(rq, p, &rf);
 }
 
 static inline void
@@ -7305,54 +7294,35 @@ void dump_cpu_task(int cpu)
 
 void migrate_disable(void)
 {
-	struct task_struct *p = current;
+	preempt_disable();
 
-	if (in_atomic() || irqs_disabled()) {
+	if (++current->migrate_disable == 1) {
+		this_rq()->nr_pinned++;
+		preempt_lazy_disable();
 #ifdef CONFIG_SCHED_DEBUG
-		p->migrate_disable_atomic++;
+		WARN_ON_ONCE(current->pinned_on_cpu >= 0);
+		current->pinned_on_cpu = smp_processor_id();
 #endif
-		return;
-	}
-#ifdef CONFIG_SCHED_DEBUG
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
 	}
-#endif
 
-	if (p->migrate_disable) {
-		p->migrate_disable++;
-		return;
-	}
+	preempt_enable();
+}
+EXPORT_SYMBOL(migrate_disable);
 
-	preempt_disable();
-	preempt_lazy_disable();
-	pin_current_cpu();
+static void migrate_disabled_sched(struct task_struct *p)
+{
+	if (p->migrate_disable_scheduled)
+		return;
 
 	migrate_disable_update_cpus_allowed(p);
-	p->migrate_disable = 1;
-
-	preempt_enable();
+	p->migrate_disable_scheduled = 1;
 }
-EXPORT_SYMBOL(migrate_disable);
 
 void migrate_enable(void)
 {
 	struct task_struct *p = current;
-
-	if (in_atomic() || irqs_disabled()) {
-#ifdef CONFIG_SCHED_DEBUG
-		p->migrate_disable_atomic--;
-#endif
-		return;
-	}
-
-#ifdef CONFIG_SCHED_DEBUG
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
-#endif
+	struct rq *rq = this_rq();
+	int cpu = task_cpu(p);
 
 	WARN_ON_ONCE(p->migrate_disable <= 0);
 	if (p->migrate_disable > 1) {
@@ -7362,72 +7332,74 @@ void migrate_enable(void)
 
 	preempt_disable();
 
+#ifdef CONFIG_SCHED_DEBUG
+	WARN_ON_ONCE(current->pinned_on_cpu != cpu);
+	current->pinned_on_cpu = -1;
+#endif
+
+	WARN_ON_ONCE(rq->nr_pinned < 1);
+
 	p->migrate_disable = 0;
+	rq->nr_pinned--;
+	if (rq->nr_pinned == 0 && unlikely(!cpu_active(cpu)) &&
+	    takedown_cpu_task)
+		wake_up_process(takedown_cpu_task);
+
+	if (!p->migrate_disable_scheduled)
+		goto out;
+
+	p->migrate_disable_scheduled = 0;
+
 	migrate_enable_update_cpus_allowed(p);
 
-	if (p->migrate_disable_update) {
-		struct rq *rq;
+	WARN_ON(smp_processor_id() != cpu);
+	if (!is_cpu_allowed(p, cpu)) {
+		struct migration_arg arg = { p };
 		struct rq_flags rf;
-		int cpu = task_cpu(p);
 
 		rq = task_rq_lock(p, &rf);
 		update_rq_clock(rq);
-
-		__do_set_cpus_allowed_tail(p, &p->cpus_mask);
+		arg.dest_cpu = select_fallback_rq(cpu, p);
 		task_rq_unlock(rq, p, &rf);
 
-		p->migrate_disable_update = 0;
-
-		WARN_ON(smp_processor_id() != cpu);
-		if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
-			struct migration_arg arg = { p };
-			struct rq_flags rf;
-
-			rq = task_rq_lock(p, &rf);
-			update_rq_clock(rq);
-			arg.dest_cpu = select_fallback_rq(cpu, p);
-			task_rq_unlock(rq, p, &rf);
+		preempt_lazy_enable();
+		preempt_enable();
 
-			unpin_current_cpu();
-			preempt_lazy_enable();
-			preempt_enable();
-
-			/*
-			 * Avoid sleeping with an existing non-running
-			 * state.  This will result in a spurious wakeup
-			 * for the calling context.
-			 */
-			__set_current_state(TASK_RUNNING);
+		/*
+		 * Avoid sleeping with an existing non-running
+		 * state.  This will result in a spurious wakeup
+		 * for the calling context.
+		 */
+		__set_current_state(TASK_RUNNING);
 
-			sleeping_lock_inc();
-			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
-			sleeping_lock_dec();
-			return;
-		}
+		sleeping_lock_inc();
+		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
+		sleeping_lock_dec();
+		return;
 	}
-	unpin_current_cpu();
+
+out:
 	preempt_lazy_enable();
 	preempt_enable();
 }
 EXPORT_SYMBOL(migrate_enable);
 
-#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
-void migrate_disable(void)
+int cpu_nr_pinned(int cpu)
 {
-#ifdef CONFIG_SCHED_DEBUG
-	struct task_struct *p = current;
+	struct rq *rq = cpu_rq(cpu);
 
-	if (in_atomic() || irqs_disabled()) {
-		p->migrate_disable_atomic++;
-		return;
-	}
+	return rq->nr_pinned;
+}
 
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
+#elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE)
+static void migrate_disabled_sched(struct task_struct *p)
+{
+}
 
-	p->migrate_disable++;
+void migrate_disable(void)
+{
+#ifdef CONFIG_SCHED_DEBUG
+	current->migrate_disable++;
 #endif
 	barrier();
 }
@@ -7438,20 +7410,14 @@ void migrate_enable(void)
 #ifdef CONFIG_SCHED_DEBUG
 	struct task_struct *p = current;
 
-	if (in_atomic() || irqs_disabled()) {
-		p->migrate_disable_atomic--;
-		return;
-	}
-
-	if (unlikely(p->migrate_disable_atomic)) {
-		tracing_off();
-		WARN_ON_ONCE(1);
-	}
-
 	WARN_ON_ONCE(p->migrate_disable <= 0);
 	p->migrate_disable--;
 #endif
 	barrier();
 }
 EXPORT_SYMBOL(migrate_enable);
+#else
+static void migrate_disabled_sched(struct task_struct *p)
+{
+}
 #endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9c1bbd67e971..2e4db417049d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -952,6 +952,10 @@ struct rq {
 	/* Must be inspected within a rcu lock section */
 	struct cpuidle_state	*idle_state;
 #endif
+
+#if defined(CONFIG_PREEMPT_RT_BASE) && defined(CONFIG_SMP)
+	int			nr_pinned;
+#endif
 };
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
diff --git a/lib/smp_processor_id.c b/lib/smp_processor_id.c
index 60ba93fc42ce..ded40a11c867 100644
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -23,6 +23,9 @@ unsigned int check_preemption_disabled(const char *what1, const char *what2)
 	 * Kernel threads bound to a single CPU can safely use
 	 * smp_processor_id():
 	 */
+	if (current->migrate_disable)
+		goto out;
+
 	if (cpumask_equal(current->cpus_ptr, cpumask_of(this_cpu)))
 		goto out;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep
  2019-07-27  5:56 ` [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep Scott Wood
@ 2019-07-29 13:26   ` Steven Rostedt
  0 siblings, 0 replies; 32+ messages in thread
From: Steven Rostedt @ 2019-07-29 13:26 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Peter Zijlstra,
	Juri Lelli, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On Sat, 27 Jul 2019 00:56:31 -0500
Scott Wood <swood@redhat.com> wrote:

> Without this, rcu_note_context_switch() will complain if an RCU read
> lock is held when migrate_enable() calls stop_one_cpu().
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
>

> index d3c6542b306f..c3407707e367 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7405,7 +7405,9 @@ void migrate_enable(void)
>  			unpin_current_cpu();
>  			preempt_lazy_enable();
>  			preempt_enable();

This looks like it needs a comment to explain why this is.

-- Steve

> +			sleeping_lock_inc();
>  			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> +			sleeping_lock_dec();
>  			return;
>  		}
>  	}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr
  2019-07-27  5:56 ` [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr Scott Wood
@ 2019-09-17 14:57   ` Sebastian Andrzej Siewior
  2019-09-17 15:23     ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 14:57 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:32 [-0500], Scott Wood wrote:
> This function is concerned with the long-term cpu mask, not the
> transitory mask the task might have while migrate disabled.  Before
> this patch, if a task was migrate disabled at the time
> __set_cpus_allowed_ptr() was called, and the new mask happened to be
> equal to the cpu that the task was running on, then the mask update
> would be lost.

lost as in "would not be carried out" I assume.

> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
>  kernel/sched/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index c3407707e367..6e643d656d71 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1218,7 +1218,7 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>  		goto out;
>  	}
>  
> -	if (cpumask_equal(p->cpus_ptr, new_mask))
> +	if (cpumask_equal(&p->cpus_mask, new_mask))
>  		goto out;
>  
>  	if (!cpumask_intersects(new_mask, cpu_valid_mask)) {


Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-07-27  5:56 ` [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq() Scott Wood
@ 2019-09-17 15:10   ` Sebastian Andrzej Siewior
  2019-09-27  8:11   ` Juri Lelli
  1 sibling, 0 replies; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 15:10 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:35 [-0500], Scott Wood wrote:
> With the changes to migrate disabling, ->set_cpus_allowed() no longer
> gets deferred until migrate_enable().  To avoid releasing the bandwidth
> while the task may still be executing on the old CPU, move the subtraction
> to ->migrate_task_rq().

I assume this is still valid for the !RT case.
Could someone from the scheduler/DL department please comment on this
one?

> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
>  kernel/sched/deadline.c | 67 +++++++++++++++++++++++--------------------------
>  1 file changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index c18be51f7608..2f18d0cf1b56 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
>  	return cpu;
>  }
>  
> +static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct *p)
> +{
> +	struct root_domain *src_rd = rq->rd;
> +
> +	/*
> +	 * Migrating a SCHED_DEADLINE task between exclusive
> +	 * cpusets (different root_domains) entails a bandwidth
> +	 * update. We already made space for us in the destination
> +	 * domain (see cpuset_can_attach()).
> +	 */
> +	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
> +		struct dl_bw *src_dl_b;
> +
> +		src_dl_b = dl_bw_of(cpu_of(rq));
> +		/*
> +		 * We now free resources of the root_domain we are migrating
> +		 * off. In the worst case, sched_setattr() may temporary fail
> +		 * until we complete the update.
> +		 */
> +		raw_spin_lock(&src_dl_b->lock);
> +		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
> +		raw_spin_unlock(&src_dl_b->lock);
> +	}
> +}
> +
>  static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
>  {
>  	struct rq *rq;
>  
> -	if (p->state != TASK_WAKING)
> +	rq = task_rq(p);
> +
> +	if (p->state != TASK_WAKING) {
> +		free_old_cpuset_bw_dl(rq, p);
>  		return;
> +	}
>  
> -	rq = task_rq(p);
>  	/*
>  	 * Since p->state == TASK_WAKING, set_task_cpu() has been called
>  	 * from try_to_wake_up(). Hence, p->pi_lock is locked, but
> @@ -2220,39 +2248,6 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
>  	}
>  }
>  
> -static void set_cpus_allowed_dl(struct task_struct *p,
> -				const struct cpumask *new_mask)
> -{
> -	struct root_domain *src_rd;
> -	struct rq *rq;
> -
> -	BUG_ON(!dl_task(p));
> -
> -	rq = task_rq(p);
> -	src_rd = rq->rd;
> -	/*
> -	 * Migrating a SCHED_DEADLINE task between exclusive
> -	 * cpusets (different root_domains) entails a bandwidth
> -	 * update. We already made space for us in the destination
> -	 * domain (see cpuset_can_attach()).
> -	 */
> -	if (!cpumask_intersects(src_rd->span, new_mask)) {
> -		struct dl_bw *src_dl_b;
> -
> -		src_dl_b = dl_bw_of(cpu_of(rq));
> -		/*
> -		 * We now free resources of the root_domain we are migrating
> -		 * off. In the worst case, sched_setattr() may temporary fail
> -		 * until we complete the update.
> -		 */
> -		raw_spin_lock(&src_dl_b->lock);
> -		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
> -		raw_spin_unlock(&src_dl_b->lock);
> -	}
> -
> -	set_cpus_allowed_common(p, new_mask);
> -}
> -
>  /* Assumes rq->lock is held */
>  static void rq_online_dl(struct rq *rq)
>  {
> @@ -2407,7 +2402,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
>  #ifdef CONFIG_SMP
>  	.select_task_rq		= select_task_rq_dl,
>  	.migrate_task_rq	= migrate_task_rq_dl,
> -	.set_cpus_allowed       = set_cpus_allowed_dl,
> +	.set_cpus_allowed       = set_cpus_allowed_common,
>  	.rq_online              = rq_online_dl,
>  	.rq_offline             = rq_offline_dl,
>  	.task_woken		= task_woken_dl,
> -- 
> 1.8.3.1
> 

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr
  2019-09-17 14:57   ` Sebastian Andrzej Siewior
@ 2019-09-17 15:23     ` Scott Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-09-17 15:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Tue, 2019-09-17 at 16:57 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:32 [-0500], Scott Wood wrote:
> > This function is concerned with the long-term cpu mask, not the
> > transitory mask the task might have while migrate disabled.  Before
> > this patch, if a task was migrate disabled at the time
> > __set_cpus_allowed_ptr() was called, and the new mask happened to be
> > equal to the cpu that the task was running on, then the mask update
> > would be lost.
> 
> lost as in "would not be carried out" I assume.

Right.  The old mask would be restored upon migrate_enable() even though
that's no longer the policy that was requested.

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING
  2019-07-27  5:56 ` [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING Scott Wood
@ 2019-09-17 15:31   ` Sebastian Andrzej Siewior
  2019-09-17 17:54     ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 15:31 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:36 [-0500], Scott Wood wrote:
> If migrate_enable() is called while a task is preparing to sleep
> (state != TASK_RUNNING), that triggers a debug check in stop_one_cpu().
> Explicitly reset state to acknowledge that we're accepting the spurious
> wakeup.
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
>  kernel/sched/core.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 38a9a9df5638..eb27a9bf70d7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7396,6 +7396,14 @@ void migrate_enable(void)
>  			unpin_current_cpu();
>  			preempt_lazy_enable();
>  			preempt_enable();
> +
> +			/*
> +			 * Avoid sleeping with an existing non-running
> +			 * state.  This will result in a spurious wakeup
> +			 * for the calling context.
> +			 */
> +			__set_current_state(TASK_RUNNING);

Do you have an example for this? I'm not too sure if we are not using a
state by doing this. Actually we were losing it and get yelled at.  We do:
|rt_spin_unlock()
|{
|         rt_spin_lock_fastunlock();
|         migrate_enable();
|}

So save the state as part of the locking process and lose it as part of
migrate_enable() if the CPU mask was changed. I *think* we need to
preserve that state until after the migration to the other CPU.

>  			sleeping_lock_inc();
>  			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
>  			sleeping_lock_dec();

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()
  2019-07-27  5:56 ` [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq() Scott Wood
@ 2019-09-17 16:00   ` Sebastian Andrzej Siewior
  2019-09-24 18:05     ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 16:00 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:37 [-0500], Scott Wood wrote:
> migrate_enable() currently open-codes a variant of select_fallback_rq().
> However, it does not have the "No more Mr. Nice Guy" fallback and thus
> it will pass an invalid CPU to the migration thread if cpus_mask only
> contains a CPU that is !active.
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
> This scenario will be more likely after the next patch, since
> the migrate_disable_update check goes away.  However, it could happen
> anyway if cpus_mask was updated to a CPU other than the one we were
> pinned to, and that CPU subsequently became inactive.

I'm unclear about the problem / side effect this has (before and after
the change). It is possible (before and after that change) that a CPU is
selected which is invalid / goes offline after the "preempt_enable()"
statement and before stop_one_cpu() does its job, correct?

> ---
>  kernel/sched/core.c | 25 ++++++++++---------------
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index eb27a9bf70d7..3a2d8251a30c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7368,6 +7368,7 @@ void migrate_enable(void)
>  	if (p->migrate_disable_update) {
>  		struct rq *rq;
>  		struct rq_flags rf;
> +		int cpu = task_cpu(p);
>  
>  		rq = task_rq_lock(p, &rf);
>  		update_rq_clock(rq);
> @@ -7377,21 +7378,15 @@ void migrate_enable(void)
>  
>  		p->migrate_disable_update = 0;
>  
> -		WARN_ON(smp_processor_id() != task_cpu(p));
> -		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
> -			const struct cpumask *cpu_valid_mask = cpu_active_mask;
> -			struct migration_arg arg;
> -			unsigned int dest_cpu;
> -
> -			if (p->flags & PF_KTHREAD) {
> -				/*
> -				 * Kernel threads are allowed on online && !active CPUs
> -				 */
> -				cpu_valid_mask = cpu_online_mask;
> -			}
> -			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_mask);
> -			arg.task = p;
> -			arg.dest_cpu = dest_cpu;
> +		WARN_ON(smp_processor_id() != cpu);
> +		if (!cpumask_test_cpu(cpu, &p->cpus_mask)) {
> +			struct migration_arg arg = { p };
> +			struct rq_flags rf;
> +
> +			rq = task_rq_lock(p, &rf);
> +			update_rq_clock(rq);
> +			arg.dest_cpu = select_fallback_rq(cpu, p);
> +			task_rq_unlock(rq, p, &rf);
>  
>  			unpin_current_cpu();
>  			preempt_lazy_enable();

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 8/8] sched: Lazy migrate_disable processing
  2019-07-27  5:56 ` [PATCH RT 8/8] sched: Lazy migrate_disable processing Scott Wood
@ 2019-09-17 16:50   ` Sebastian Andrzej Siewior
  2019-09-17 17:06     ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-17 16:50 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:38 [-0500], Scott Wood wrote:
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 885a195dfbe0..0096acf1a692 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -939,17 +893,34 @@ static int takedown_cpu(unsigned int cpu)
>  	 */
>  	irq_lock_sparse();
>  
> -#ifdef CONFIG_PREEMPT_RT_FULL
> -	__write_rt_lock(cpuhp_pin);
> +#ifdef CONFIG_PREEMPT_RT_BASE
> +	WARN_ON_ONCE(takedown_cpu_task);
> +	takedown_cpu_task = current;
> +
> +again:
> +	for (;;) {
> +		int nr_pinned;
> +
> +		set_current_state(TASK_UNINTERRUPTIBLE);
> +		nr_pinned = cpu_nr_pinned(cpu);
> +		if (nr_pinned == 0)
> +			break;
> +		schedule();
> +	}

we used to have cpuhp_pin which ensured that once we own the write lock
there will be no more tasks that can enter a migrate_disable() section
on this CPU. It has been placed fairly late to ensure that nothing new
comes in as part of the shutdown process and that it flushes everything
out that is still in a migrate_disable() section.
Now you claim that once the counter reached zero it never increments
again. I would be happier if there was an explicit check for that :)
There is no back off and flush mechanism which means on a busy CPU (as
in heavily lock contended by multiple tasks) this will wait until the
CPU gets idle again.

> +	set_current_state(TASK_RUNNING);
>  #endif

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 8/8] sched: Lazy migrate_disable processing
  2019-09-17 16:50   ` Sebastian Andrzej Siewior
@ 2019-09-17 17:06     ` Scott Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-09-17 17:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Tue, 2019-09-17 at 18:50 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:38 [-0500], Scott Wood wrote:
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 885a195dfbe0..0096acf1a692 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -939,17 +893,34 @@ static int takedown_cpu(unsigned int cpu)
> >  	 */
> >  	irq_lock_sparse();
> >  
> > -#ifdef CONFIG_PREEMPT_RT_FULL
> > -	__write_rt_lock(cpuhp_pin);
> > +#ifdef CONFIG_PREEMPT_RT_BASE
> > +	WARN_ON_ONCE(takedown_cpu_task);
> > +	takedown_cpu_task = current;
> > +
> > +again:
> > +	for (;;) {
> > +		int nr_pinned;
> > +
> > +		set_current_state(TASK_UNINTERRUPTIBLE);
> > +		nr_pinned = cpu_nr_pinned(cpu);
> > +		if (nr_pinned == 0)
> > +			break;
> > +		schedule();
> > +	}
> 
> we used to have cpuhp_pin which ensured that once we own the write lock
> there will be no more tasks that can enter a migrate_disable() section
> on this CPU. It has been placed fairly late to ensure that nothing new
> comes in as part of the shutdown process and that it flushes everything
> out that is still in a migrate_disable() section.
> Now you claim that once the counter reached zero it never increments
> again. I would be happier if there was an explicit check for that :)

I don't claim that.  A check is added in take_cpu_down() to see whether it
went back up, and if so, exit with EAGAIN.  If *that* check succeeds, it
can't go back up because it's in stop machine, and any tasks will get
migrated to another CPU before they can run again.  There's also a WARN in
migrate_tasks() if somehow a migrate-disabled task does get encountered.

> There is no back off and flush mechanism which means on a busy CPU (as
> in heavily lock contended by multiple tasks) this will wait until the
> CPU gets idle again.

Not really any different from the reader-biased rwlock that this replaces...

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING
  2019-09-17 15:31   ` Sebastian Andrzej Siewior
@ 2019-09-17 17:54     ` Scott Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-09-17 17:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Tue, 2019-09-17 at 17:31 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:36 [-0500], Scott Wood wrote:
> > If migrate_enable() is called while a task is preparing to sleep
> > (state != TASK_RUNNING), that triggers a debug check in stop_one_cpu().
> > Explicitly reset state to acknowledge that we're accepting the spurious
> > wakeup.
> > 
> > Signed-off-by: Scott Wood <swood@redhat.com>
> > ---
> >  kernel/sched/core.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 38a9a9df5638..eb27a9bf70d7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7396,6 +7396,14 @@ void migrate_enable(void)
> >  			unpin_current_cpu();
> >  			preempt_lazy_enable();
> >  			preempt_enable();
> > +
> > +			/*
> > +			 * Avoid sleeping with an existing non-running
> > +			 * state.  This will result in a spurious wakeup
> > +			 * for the calling context.
> > +			 */
> > +			__set_current_state(TASK_RUNNING);
> 
> Do you have an example for this?

Unfortunately I didn't save the traceback of where I originally saw it, but
I ran it again and hit the warning in prepare_to_wait_event().

>  I'm not too sure if we are not using a
> state by doing this. Actually we were losing it and get yelled at.  We do:
> > rt_spin_unlock()
> > {
> >         rt_spin_lock_fastunlock();
> >         migrate_enable();
> > }
> 
> So save the state as part of the locking process and lose it as part of
> migrate_enable() if the CPU mask was changed. I *think* we need to
> preserve that state until after the migration to the other CPU.

Preserving the state would be ideal, though in most cases occasionally
losing the state should be tolerable as long as it doesn't happen
persistently.  TASK_DEAD is an exception but that doesn't do anything before
schedule() that could end up here.  I suppose there could be some places
that use TASK_UNINTERRUPTIBLE without checking the actual condition after
schedule(), and which do something that can end up here before schedule()...

We can't use saved_state here though, because we can get here inside the
rtmutex code (e.g. from debug_rt_mutex_print_deadlock)... and besides, the
waker won't have WF_LOCK_SLEEPER and so saved_state will get cleared.

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq()
  2019-09-17 16:00   ` Sebastian Andrzej Siewior
@ 2019-09-24 18:05     ` Scott Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-09-24 18:05 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Tue, 2019-09-17 at 18:00 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:37 [-0500], Scott Wood wrote:
> > migrate_enable() currently open-codes a variant of select_fallback_rq().
> > However, it does not have the "No more Mr. Nice Guy" fallback and thus
> > it will pass an invalid CPU to the migration thread if cpus_mask only
> > contains a CPU that is !active.
> > 
> > Signed-off-by: Scott Wood <swood@redhat.com>
> > ---
> > This scenario will be more likely after the next patch, since
> > the migrate_disable_update check goes away.  However, it could happen
> > anyway if cpus_mask was updated to a CPU other than the one we were
> > pinned to, and that CPU subsequently became inactive.
> 
> I'm unclear about the problem / side effect this has (before and after
> the change). It is possible (before and after that change) that a CPU is
> selected which is invalid / goes offline after the "preempt_enable()"
> statement and before stop_one_cpu() does its job, correct?

By "pass an invalid CPU" I don't mean offline; I mean >= nr_cpu_ids which is
what cpumask_any_and() returns if the sets don't intersect (a CPU going
offline is merely a way to end up in that situation).  At one point I
observed that causing a crash.  I guess is_cpu_allowed() returned true by
chance based on the contents of data beyond the end of the cpumask?  That
doesn't seem likely based on what comes after p->cpus_mask (at that point
migrate_disable should be zero), but maybe I had something else at that
point in the struct while developing.  In any case, not something to be
relied on. :-)

Going offline after selection shouldn't be a problem.  migration_cpu_stop()
won't do anything if is_cpu_allowed() returns false, and we'll get migrated
off the CPU by migrate_tasks().  Even if we get migrated after reading
task_cpu(p) but before queuing the stop machine work, it'll either get
flushed when the stopper thread parks, or rejected due to stopper->enabled
being false.

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock
  2019-07-27  5:56 ` [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock Scott Wood
@ 2019-09-26 16:39   ` Sebastian Andrzej Siewior
  2019-09-26 16:52     ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-26 16:39 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-07-27 00:56:34 [-0500], Scott Wood wrote:
> Various places assume that cpus_ptr is protected by rq/pi locks,
> so don't change it before grabbing those locks.
> 
> Signed-off-by: Scott Wood <swood@redhat.com>

I applied now everything until here and you can take a look at
  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/log/?h=linux-5.2.y-rt-RC

If there are no objections then I would make this a -rt9 and then go
further.

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock
  2019-09-26 16:39   ` Sebastian Andrzej Siewior
@ 2019-09-26 16:52     ` Scott Wood
  2019-09-27 12:19       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-09-26 16:52 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Thu, 2019-09-26 at 18:39 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-07-27 00:56:34 [-0500], Scott Wood wrote:
> > Various places assume that cpus_ptr is protected by rq/pi locks,
> > so don't change it before grabbing those locks.
> > 
> > Signed-off-by: Scott Wood <swood@redhat.com>
> 
> I applied now everything until here and you can take a look at
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/log/?h=linux-5.2.y-rt-RC
> 
> If there are no objections then I would make this a -rt9 and then go
> further.
> 
> Sebastian

Looks good, thanks!

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-07-27  5:56 ` [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq() Scott Wood
  2019-09-17 15:10   ` Sebastian Andrzej Siewior
@ 2019-09-27  8:11   ` Juri Lelli
  2019-09-27 16:40     ` Scott Wood
  1 sibling, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2019-09-27  8:11 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

Hi Scott,

On 27/07/19 00:56, Scott Wood wrote:
> With the changes to migrate disabling, ->set_cpus_allowed() no longer
> gets deferred until migrate_enable().  To avoid releasing the bandwidth
> while the task may still be executing on the old CPU, move the subtraction
> to ->migrate_task_rq().
> 
> Signed-off-by: Scott Wood <swood@redhat.com>
> ---
>  kernel/sched/deadline.c | 67 +++++++++++++++++++++++--------------------------
>  1 file changed, 31 insertions(+), 36 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index c18be51f7608..2f18d0cf1b56 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
>  	return cpu;
>  }
>  
> +static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct *p)
> +{
> +	struct root_domain *src_rd = rq->rd;
> +
> +	/*
> +	 * Migrating a SCHED_DEADLINE task between exclusive
> +	 * cpusets (different root_domains) entails a bandwidth
> +	 * update. We already made space for us in the destination
> +	 * domain (see cpuset_can_attach()).
> +	 */
> +	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
> +		struct dl_bw *src_dl_b;
> +
> +		src_dl_b = dl_bw_of(cpu_of(rq));
> +		/*
> +		 * We now free resources of the root_domain we are migrating
> +		 * off. In the worst case, sched_setattr() may temporary fail
> +		 * until we complete the update.
> +		 */
> +		raw_spin_lock(&src_dl_b->lock);
> +		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
> +		raw_spin_unlock(&src_dl_b->lock);
> +	}
> +}
> +
>  static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
>  {
>  	struct rq *rq;
>  
> -	if (p->state != TASK_WAKING)
> +	rq = task_rq(p);
> +
> +	if (p->state != TASK_WAKING) {
> +		free_old_cpuset_bw_dl(rq, p);

What happens if a DEADLINE task is moved between cpusets while it was
sleeping? Don't we miss removing from the old cpuset if the task gets
migrated on wakeup?

Thanks,

Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock
  2019-09-26 16:52     ` Scott Wood
@ 2019-09-27 12:19       ` Sebastian Andrzej Siewior
  2019-09-27 20:02         ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-09-27 12:19 UTC (permalink / raw)
  To: Scott Wood
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On 2019-09-26 11:52:42 [-0500], Scott Wood wrote:
> Looks good, thanks!

Thanks, just released.
Moving forward. It would be nice to have some DL-dev feedback on DL
patch. For the remaining once, could please throw Steven's
stress-test-hostplug-cpu-script? If that one does not complain I don't
see a reason why not apply the patches (since they improve performance
and do not break anything while doing so).

> -Scott

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-09-27  8:11   ` Juri Lelli
@ 2019-09-27 16:40     ` Scott Wood
  2019-09-30  7:12       ` Juri Lelli
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-09-27 16:40 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On Fri, 2019-09-27 at 10:11 +0200, Juri Lelli wrote:
> Hi Scott,
> 
> On 27/07/19 00:56, Scott Wood wrote:
> > With the changes to migrate disabling, ->set_cpus_allowed() no longer
> > gets deferred until migrate_enable().  To avoid releasing the bandwidth
> > while the task may still be executing on the old CPU, move the
> > subtraction
> > to ->migrate_task_rq().
> > 
> > Signed-off-by: Scott Wood <swood@redhat.com>
> > ---
> >  kernel/sched/deadline.c | 67 +++++++++++++++++++++++-------------------
> > -------
> >  1 file changed, 31 insertions(+), 36 deletions(-)
> > 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index c18be51f7608..2f18d0cf1b56 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
> >  	return cpu;
> >  }
> >  
> > +static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct *p)
> > +{
> > +	struct root_domain *src_rd = rq->rd;
> > +
> > +	/*
> > +	 * Migrating a SCHED_DEADLINE task between exclusive
> > +	 * cpusets (different root_domains) entails a bandwidth
> > +	 * update. We already made space for us in the destination
> > +	 * domain (see cpuset_can_attach()).
> > +	 */
> > +	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
> > +		struct dl_bw *src_dl_b;
> > +
> > +		src_dl_b = dl_bw_of(cpu_of(rq));
> > +		/*
> > +		 * We now free resources of the root_domain we are migrating
> > +		 * off. In the worst case, sched_setattr() may temporary
> > fail
> > +		 * until we complete the update.
> > +		 */
> > +		raw_spin_lock(&src_dl_b->lock);
> > +		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
> > +		raw_spin_unlock(&src_dl_b->lock);
> > +	}
> > +}
> > +
> >  static void migrate_task_rq_dl(struct task_struct *p, int new_cpu
> > __maybe_unused)
> >  {
> >  	struct rq *rq;
> >  
> > -	if (p->state != TASK_WAKING)
> > +	rq = task_rq(p);
> > +
> > +	if (p->state != TASK_WAKING) {
> > +		free_old_cpuset_bw_dl(rq, p);
> 
> What happens if a DEADLINE task is moved between cpusets while it was
> sleeping? Don't we miss removing from the old cpuset if the task gets
> migrated on wakeup?

In that case set_task_cpu() is called by ttwu after setting state to
TASK_WAKING.  I guess it could be annoying if the task doesn't wake up for a
long time and therefore doesn't release the bandwidth until then.

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock
  2019-09-27 12:19       ` Sebastian Andrzej Siewior
@ 2019-09-27 20:02         ` Scott Wood
  0 siblings, 0 replies; 32+ messages in thread
From: Scott Wood @ 2019-09-27 20:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Thomas Gleixner, Steven Rostedt, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira, Clark Williams, linux-kernel,
	linux-rt-users

On Fri, 2019-09-27 at 14:19 +0200, Sebastian Andrzej Siewior wrote:
> On 2019-09-26 11:52:42 [-0500], Scott Wood wrote:
> > Looks good, thanks!
> 
> Thanks, just released.
> Moving forward. It would be nice to have some DL-dev feedback on DL
> patch. For the remaining once, could please throw Steven's
> stress-test-hostplug-cpu-script? If that one does not complain I don't
> see a reason why not apply the patches (since they improve performance
> and do not break anything while doing so).

I'd been using a quick-and-dirty script that does something similar, and ran
it along with rcutorture, a kernel build, and something that randomly
changes affinities -- though with these loads I have to ignore occasional
RCU forward progress complaints that Paul said were expected with this
version of the RCU code, XFS warnings that happen even on unmodified non-RT, 
and sometimes a transitory netdev timeout that is not that surprising given
the heavy load and affinity stress (I get it without these patches as well,
though it takes longer).

I just ran Steven's script (both alone and with the other stuff above) and
saw no difference.

-Scott


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-09-27 16:40     ` Scott Wood
@ 2019-09-30  7:12       ` Juri Lelli
  2019-09-30 16:24         ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2019-09-30  7:12 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On 27/09/19 11:40, Scott Wood wrote:
> On Fri, 2019-09-27 at 10:11 +0200, Juri Lelli wrote:
> > Hi Scott,
> > 
> > On 27/07/19 00:56, Scott Wood wrote:
> > > With the changes to migrate disabling, ->set_cpus_allowed() no longer
> > > gets deferred until migrate_enable().  To avoid releasing the bandwidth
> > > while the task may still be executing on the old CPU, move the
> > > subtraction
> > > to ->migrate_task_rq().
> > > 
> > > Signed-off-by: Scott Wood <swood@redhat.com>
> > > ---
> > >  kernel/sched/deadline.c | 67 +++++++++++++++++++++++-------------------
> > > -------
> > >  1 file changed, 31 insertions(+), 36 deletions(-)
> > > 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index c18be51f7608..2f18d0cf1b56 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
> > >  	return cpu;
> > >  }
> > >  
> > > +static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct *p)
> > > +{
> > > +	struct root_domain *src_rd = rq->rd;
> > > +
> > > +	/*
> > > +	 * Migrating a SCHED_DEADLINE task between exclusive
> > > +	 * cpusets (different root_domains) entails a bandwidth
> > > +	 * update. We already made space for us in the destination
> > > +	 * domain (see cpuset_can_attach()).
> > > +	 */
> > > +	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
> > > +		struct dl_bw *src_dl_b;
> > > +
> > > +		src_dl_b = dl_bw_of(cpu_of(rq));
> > > +		/*
> > > +		 * We now free resources of the root_domain we are migrating
> > > +		 * off. In the worst case, sched_setattr() may temporary
> > > fail
> > > +		 * until we complete the update.
> > > +		 */
> > > +		raw_spin_lock(&src_dl_b->lock);
> > > +		__dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
> > > +		raw_spin_unlock(&src_dl_b->lock);
> > > +	}
> > > +}
> > > +
> > >  static void migrate_task_rq_dl(struct task_struct *p, int new_cpu
> > > __maybe_unused)
> > >  {
> > >  	struct rq *rq;
> > >  
> > > -	if (p->state != TASK_WAKING)
> > > +	rq = task_rq(p);
> > > +
> > > +	if (p->state != TASK_WAKING) {
> > > +		free_old_cpuset_bw_dl(rq, p);
> > 
> > What happens if a DEADLINE task is moved between cpusets while it was
> > sleeping? Don't we miss removing from the old cpuset if the task gets
> > migrated on wakeup?
> 
> In that case set_task_cpu() is called by ttwu after setting state to
> TASK_WAKING.

Right.

> I guess it could be annoying if the task doesn't wake up for a
> long time and therefore doesn't release the bandwidth until then.

Hummm, I was actually more worried about the fact that we call free_old_
cpuset_bw_dl() only if p->state != TASK_WAKING.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-09-30  7:12       ` Juri Lelli
@ 2019-09-30 16:24         ` Scott Wood
  2019-10-01  8:52           ` Juri Lelli
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-09-30 16:24 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> On 27/09/19 11:40, Scott Wood wrote:
> > On Fri, 2019-09-27 at 10:11 +0200, Juri Lelli wrote:
> > > Hi Scott,
> > > 
> > > On 27/07/19 00:56, Scott Wood wrote:
> > > > With the changes to migrate disabling, ->set_cpus_allowed() no
> > > > longer
> > > > gets deferred until migrate_enable().  To avoid releasing the
> > > > bandwidth
> > > > while the task may still be executing on the old CPU, move the
> > > > subtraction
> > > > to ->migrate_task_rq().
> > > > 
> > > > Signed-off-by: Scott Wood <swood@redhat.com>
> > > > ---
> > > >  kernel/sched/deadline.c | 67 +++++++++++++++++++++++---------------
> > > > ----
> > > > -------
> > > >  1 file changed, 31 insertions(+), 36 deletions(-)
> > > > 
> > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > > index c18be51f7608..2f18d0cf1b56 100644
> > > > --- a/kernel/sched/deadline.c
> > > > +++ b/kernel/sched/deadline.c
> > > > @@ -1606,14 +1606,42 @@ static void yield_task_dl(struct rq *rq)
> > > >  	return cpu;
> > > >  }
> > > >  
> > > > +static void free_old_cpuset_bw_dl(struct rq *rq, struct task_struct
> > > > *p)
> > > > +{
> > > > +	struct root_domain *src_rd = rq->rd;
> > > > +
> > > > +	/*
> > > > +	 * Migrating a SCHED_DEADLINE task between exclusive
> > > > +	 * cpusets (different root_domains) entails a bandwidth
> > > > +	 * update. We already made space for us in the destination
> > > > +	 * domain (see cpuset_can_attach()).
> > > > +	 */
> > > > +	if (!cpumask_intersects(src_rd->span, p->cpus_ptr)) {
> > > > +		struct dl_bw *src_dl_b;
> > > > +
> > > > +		src_dl_b = dl_bw_of(cpu_of(rq));
> > > > +		/*
> > > > +		 * We now free resources of the root_domain we are
> > > > migrating
> > > > +		 * off. In the worst case, sched_setattr() may
> > > > temporary
> > > > fail
> > > > +		 * until we complete the update.
> > > > +		 */
> > > > +		raw_spin_lock(&src_dl_b->lock);
> > > > +		__dl_sub(src_dl_b, p->dl.dl_bw,
> > > > dl_bw_cpus(task_cpu(p)));
> > > > +		raw_spin_unlock(&src_dl_b->lock);
> > > > +	}
> > > > +}
> > > > +
> > > >  static void migrate_task_rq_dl(struct task_struct *p, int new_cpu
> > > > __maybe_unused)
> > > >  {
> > > >  	struct rq *rq;
> > > >  
> > > > -	if (p->state != TASK_WAKING)
> > > > +	rq = task_rq(p);
> > > > +
> > > > +	if (p->state != TASK_WAKING) {
> > > > +		free_old_cpuset_bw_dl(rq, p);
> > > 
> > > What happens if a DEADLINE task is moved between cpusets while it was
> > > sleeping? Don't we miss removing from the old cpuset if the task gets
> > > migrated on wakeup?
> > 
> > In that case set_task_cpu() is called by ttwu after setting state to
> > TASK_WAKING.
> 
> Right.
> 
> > I guess it could be annoying if the task doesn't wake up for a
> > long time and therefore doesn't release the bandwidth until then.
> 
> Hummm, I was actually more worried about the fact that we call free_old_
> cpuset_bw_dl() only if p->state != TASK_WAKING.

Oh, right. :-P  Not sure what I had in mind there; we want to call it
regardless.

I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something like
this:

	if (p->state == TASK_WAITING)
		raw_spin_lock(&rq->lock);
	free_old_cpuset_bw_dl(rq, p);
	if (p->state != TASK_WAITING)
		return;

	if (p->dl.dl_non_contending) {
	....	

BTW, is the full cpumask_intersects() necessary or would it suffice to see
that the new cpu is not in the old span?

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-09-30 16:24         ` Scott Wood
@ 2019-10-01  8:52           ` Juri Lelli
  2019-10-09  6:25             ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2019-10-01  8:52 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On 30/09/19 11:24, Scott Wood wrote:
> On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:

[...]

> > Hummm, I was actually more worried about the fact that we call free_old_
> > cpuset_bw_dl() only if p->state != TASK_WAKING.
> 
> Oh, right. :-P  Not sure what I had in mind there; we want to call it
> regardless.
> 
> I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something like

I think we can do with rcu_read_lock_sched() (see dl_task_can_attach()).

> this:
> 
> 	if (p->state == TASK_WAITING)
> 		raw_spin_lock(&rq->lock);
> 	free_old_cpuset_bw_dl(rq, p);
> 	if (p->state != TASK_WAITING)
> 		return;
> 
> 	if (p->dl.dl_non_contending) {
> 	....	
> 
> BTW, is the full cpumask_intersects() necessary or would it suffice to see
> that the new cpu is not in the old span?

Checking new cpu only should be OK.

Thanks,

Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-10-01  8:52           ` Juri Lelli
@ 2019-10-09  6:25             ` Scott Wood
  2019-10-09  7:27               ` Juri Lelli
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-10-09  6:25 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On Tue, 2019-10-01 at 10:52 +0200, Juri Lelli wrote:
> On 30/09/19 11:24, Scott Wood wrote:
> > On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> 
> [...]
> 
> > > Hummm, I was actually more worried about the fact that we call
> > > free_old_
> > > cpuset_bw_dl() only if p->state != TASK_WAKING.
> > 
> > Oh, right. :-P  Not sure what I had in mind there; we want to call it
> > regardless.
> > 
> > I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something like
> 
> I think we can do with rcu_read_lock_sched() (see dl_task_can_attach()).

RCU will keep dl_bw from being freed under us (we're implicitly in an RCU
sched read section due to atomic context).  It won't stop rq->rd from
changing, but that could have happened before we took rq->lock.  If the cpu
the task was running on was removed from the cpuset, and that raced with the
task being moved to a different cpuset, couldn't we end up erroneously
subtracting from the cpu's new root domain (or failing to subtract at all if
the old cpu's new cpuset happens to be the task's new cpuset)?  I don't see
anything that forces tasks off of the cpu when a cpu is removed from a
cpuset (though maybe I'm not looking in the right place), so the race window
could be quite large.  In any case, that's an existing problem that's not
going to get solved in this patchset.

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-10-09  6:25             ` Scott Wood
@ 2019-10-09  7:27               ` Juri Lelli
  2019-10-09 19:12                 ` Scott Wood
  0 siblings, 1 reply; 32+ messages in thread
From: Juri Lelli @ 2019-10-09  7:27 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On 09/10/19 01:25, Scott Wood wrote:
> On Tue, 2019-10-01 at 10:52 +0200, Juri Lelli wrote:
> > On 30/09/19 11:24, Scott Wood wrote:
> > > On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> > 
> > [...]
> > 
> > > > Hummm, I was actually more worried about the fact that we call
> > > > free_old_
> > > > cpuset_bw_dl() only if p->state != TASK_WAKING.
> > > 
> > > Oh, right. :-P  Not sure what I had in mind there; we want to call it
> > > regardless.
> > > 
> > > I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something like
> > 
> > I think we can do with rcu_read_lock_sched() (see dl_task_can_attach()).
> 
> RCU will keep dl_bw from being freed under us (we're implicitly in an RCU
> sched read section due to atomic context).  It won't stop rq->rd from
> changing, but that could have happened before we took rq->lock.  If the cpu
> the task was running on was removed from the cpuset, and that raced with the
> task being moved to a different cpuset, couldn't we end up erroneously
> subtracting from the cpu's new root domain (or failing to subtract at all if
> the old cpu's new cpuset happens to be the task's new cpuset)?  I don't see
> anything that forces tasks off of the cpu when a cpu is removed from a
> cpuset (though maybe I'm not looking in the right place), so the race window
> could be quite large.  In any case, that's an existing problem that's not
> going to get solved in this patchset.

OK. So, mainline has got cpuset_read_lock() which should be enough to
guard against changes to rd(s).

I agree that rq->lock is needed here.

Thanks,

Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-10-09  7:27               ` Juri Lelli
@ 2019-10-09 19:12                 ` Scott Wood
  2019-10-10  8:18                   ` Juri Lelli
  0 siblings, 1 reply; 32+ messages in thread
From: Scott Wood @ 2019-10-09 19:12 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On Wed, 2019-10-09 at 09:27 +0200, Juri Lelli wrote:
> On 09/10/19 01:25, Scott Wood wrote:
> > On Tue, 2019-10-01 at 10:52 +0200, Juri Lelli wrote:
> > > On 30/09/19 11:24, Scott Wood wrote:
> > > > On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> > > 
> > > [...]
> > > 
> > > > > Hummm, I was actually more worried about the fact that we call
> > > > > free_old_
> > > > > cpuset_bw_dl() only if p->state != TASK_WAKING.
> > > > 
> > > > Oh, right. :-P  Not sure what I had in mind there; we want to call
> > > > it
> > > > regardless.
> > > > 
> > > > I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something
> > > > like
> > > 
> > > I think we can do with rcu_read_lock_sched() (see
> > > dl_task_can_attach()).
> > 
> > RCU will keep dl_bw from being freed under us (we're implicitly in an
> > RCU
> > sched read section due to atomic context).  It won't stop rq->rd from
> > changing, but that could have happened before we took rq->lock.  If the
> > cpu
> > the task was running on was removed from the cpuset, and that raced with
> > the
> > task being moved to a different cpuset, couldn't we end up erroneously
> > subtracting from the cpu's new root domain (or failing to subtract at
> > all if
> > the old cpu's new cpuset happens to be the task's new cpuset)?  I don't
> > see
> > anything that forces tasks off of the cpu when a cpu is removed from a
> > cpuset (though maybe I'm not looking in the right place), so the race
> > window
> > could be quite large.  In any case, that's an existing problem that's
> > not
> > going to get solved in this patchset.
> 
> OK. So, mainline has got cpuset_read_lock() which should be enough to
> guard against changes to rd(s).
> 
> I agree that rq->lock is needed here.

My point was that rq->lock isn't actually helping, because rq->rd could have
changed before rq->lock is acquired (and it's still the old rd that needs
the bandwidth subtraction).  cpuset_mutex/cpuset_rwsem helps somewhat,
though there's still a problem due to the subtraction not happening until
the task is woken up (by which time cpuset_mutex may have been released and
further reconfiguration could have happened).  This would be an issue even
without lazy migrate, since in that case ->set_cpus_allowed() can get
deferred, but this patch makes the window much bigger.

The right solution is probably to explicitly track the rd for which a task
has a pending bandwidth subtraction (if any), and to continue doing it from
set_cpus_allowed() if the task is not migrate-disabled.  In the meantime, I
think we should drop this patch from the patchset -- without it, lazy
migrate disable doesn't really make the race situation any worse than it
already was.

BTW, what happens to the bw addition in dl_task_can_attach() if a subsequent
can_attach fails and the whole operation is cancelled?

-Scott



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq()
  2019-10-09 19:12                 ` Scott Wood
@ 2019-10-10  8:18                   ` Juri Lelli
  0 siblings, 0 replies; 32+ messages in thread
From: Juri Lelli @ 2019-10-10  8:18 UTC (permalink / raw)
  To: Scott Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra, Daniel Bristot de Oliveira, Clark Williams,
	linux-kernel, linux-rt-users

On 09/10/19 14:12, Scott Wood wrote:
> On Wed, 2019-10-09 at 09:27 +0200, Juri Lelli wrote:
> > On 09/10/19 01:25, Scott Wood wrote:
> > > On Tue, 2019-10-01 at 10:52 +0200, Juri Lelli wrote:
> > > > On 30/09/19 11:24, Scott Wood wrote:
> > > > > On Mon, 2019-09-30 at 09:12 +0200, Juri Lelli wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > > Hummm, I was actually more worried about the fact that we call
> > > > > > free_old_
> > > > > > cpuset_bw_dl() only if p->state != TASK_WAKING.
> > > > > 
> > > > > Oh, right. :-P  Not sure what I had in mind there; we want to call
> > > > > it
> > > > > regardless.
> > > > > 
> > > > > I assume we need rq->lock in free_old_cpuset_bw_dl()?  So something
> > > > > like
> > > > 
> > > > I think we can do with rcu_read_lock_sched() (see
> > > > dl_task_can_attach()).
> > > 
> > > RCU will keep dl_bw from being freed under us (we're implicitly in an
> > > RCU
> > > sched read section due to atomic context).  It won't stop rq->rd from
> > > changing, but that could have happened before we took rq->lock.  If the
> > > cpu
> > > the task was running on was removed from the cpuset, and that raced with
> > > the
> > > task being moved to a different cpuset, couldn't we end up erroneously
> > > subtracting from the cpu's new root domain (or failing to subtract at
> > > all if
> > > the old cpu's new cpuset happens to be the task's new cpuset)?  I don't
> > > see
> > > anything that forces tasks off of the cpu when a cpu is removed from a
> > > cpuset (though maybe I'm not looking in the right place), so the race
> > > window
> > > could be quite large.  In any case, that's an existing problem that's
> > > not
> > > going to get solved in this patchset.
> > 
> > OK. So, mainline has got cpuset_read_lock() which should be enough to
> > guard against changes to rd(s).
> > 
> > I agree that rq->lock is needed here.
> 
> My point was that rq->lock isn't actually helping, because rq->rd could have
> changed before rq->lock is acquired (and it's still the old rd that needs
> the bandwidth subtraction).  cpuset_mutex/cpuset_rwsem helps somewhat,
> though there's still a problem due to the subtraction not happening until
> the task is woken up (by which time cpuset_mutex may have been released and
> further reconfiguration could have happened).  This would be an issue even
> without lazy migrate, since in that case ->set_cpus_allowed() can get
> deferred, but this patch makes the window much bigger.
> 
> The right solution is probably to explicitly track the rd for which a task
> has a pending bandwidth subtraction (if any), and to continue doing it from
> set_cpus_allowed() if the task is not migrate-disabled.  In the meantime, I
> think we should drop this patch from the patchset -- without it, lazy
> migrate disable doesn't really make the race situation any worse than it
> already was.

I'm OK with dropping it for now (as we also have other possible issues
as you point out below), but I really wonder what would be a solution
here. Problem is that if a domain(s) reconfiguration happened while the
task was migrate disabled, and we let the reconf destroy/rebuild
domains, the old rd could be gone by the time the task gets migrate
enabled again and the task could continue running, w/o its bandwidth
been accounted for, in a new rd since the migrate enable instant, no?

:-/

> BTW, what happens to the bw addition in dl_task_can_attach() if a subsequent
> can_attach fails and the whole operation is cancelled?

Oh, yeah, that doesn't look good. :-(

Maybe we can use cancel_attach() to fix things up?

Thanks,

Juri

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-10-10  8:18 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-27  5:56 [RT PATCH 0/8] migrate disable fixes and performance Scott Wood
2019-07-27  5:56 ` [PATCH RT 1/8] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep Scott Wood
2019-07-29 13:26   ` Steven Rostedt
2019-07-27  5:56 ` [PATCH RT 2/8] sched: __set_cpus_allowed_ptr: Check cpus_mask, not cpus_ptr Scott Wood
2019-09-17 14:57   ` Sebastian Andrzej Siewior
2019-09-17 15:23     ` Scott Wood
2019-07-27  5:56 ` [PATCH RT 3/8] sched: Remove dead __migrate_disabled() check Scott Wood
2019-07-27  5:56 ` [PATCH RT 4/8] sched: migrate disable: Protect cpus_ptr with lock Scott Wood
2019-09-26 16:39   ` Sebastian Andrzej Siewior
2019-09-26 16:52     ` Scott Wood
2019-09-27 12:19       ` Sebastian Andrzej Siewior
2019-09-27 20:02         ` Scott Wood
2019-07-27  5:56 ` [PATCH RT 5/8] sched/deadline: Reclaim cpuset bandwidth in .migrate_task_rq() Scott Wood
2019-09-17 15:10   ` Sebastian Andrzej Siewior
2019-09-27  8:11   ` Juri Lelli
2019-09-27 16:40     ` Scott Wood
2019-09-30  7:12       ` Juri Lelli
2019-09-30 16:24         ` Scott Wood
2019-10-01  8:52           ` Juri Lelli
2019-10-09  6:25             ` Scott Wood
2019-10-09  7:27               ` Juri Lelli
2019-10-09 19:12                 ` Scott Wood
2019-10-10  8:18                   ` Juri Lelli
2019-07-27  5:56 ` [PATCH RT 6/8] sched: migrate_enable: Set state to TASK_RUNNING Scott Wood
2019-09-17 15:31   ` Sebastian Andrzej Siewior
2019-09-17 17:54     ` Scott Wood
2019-07-27  5:56 ` [PATCH RT 7/8] sched: migrate_enable: Use select_fallback_rq() Scott Wood
2019-09-17 16:00   ` Sebastian Andrzej Siewior
2019-09-24 18:05     ` Scott Wood
2019-07-27  5:56 ` [PATCH RT 8/8] sched: Lazy migrate_disable processing Scott Wood
2019-09-17 16:50   ` Sebastian Andrzej Siewior
2019-09-17 17:06     ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).