All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
@ 2017-11-15 19:25 joe.korty
  2017-11-17 22:48 ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: joe.korty @ 2017-11-15 19:25 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Steven Rostedt; +Cc: Linux Kernel Mailing List

4.4.86-rt99's patch

  0037-Intrduce-migrate_disable-cpu_light.patch

introduces a place where a task's cpus_allowed mask is
updated without a corresponding update to nr_cpus_allowed.

This path is executed when task affinity is changed while
migrate_disabled() is true.  As there is no code present
to set nr_cpus_allowed when the migrate_disable state is
dropped, the scheduler at that point on may make incorrect
scheduling decisions for this task.

My testing consists of temporarily adding a

 if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
 	printk_ratelimited(...)

stmt to schedule() and running a simple affinity rotation
program I wrote, one that rotates the threads of stress(1).
While rotating, I got the expected kernel error messages.
With this patch applied the messages disappeared.

Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
 	lockdep_assert_held(&p->pi_lock);
 
 	if (__migrate_disabled(p)) {
+		p->nr_cpus_allowed = cpumask_weight(new_mask);
 		cpumask_copy(&p->cpus_allowed, new_mask);
 		return;
 	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-15 19:25 [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed joe.korty
@ 2017-11-17 22:48 ` Steven Rostedt
  2017-11-20 16:30   ` joe.korty
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2017-11-17 22:48 UTC (permalink / raw)
  To: joe.korty; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

On Wed, 15 Nov 2017 14:25:29 -0500
joe.korty@concurrent-rt.com wrote:

> 4.4.86-rt99's patch
> 
>   0037-Intrduce-migrate_disable-cpu_light.patch
> 
> introduces a place where a task's cpus_allowed mask is
> updated without a corresponding update to nr_cpus_allowed.
> 
> This path is executed when task affinity is changed while
> migrate_disabled() is true.  As there is no code present
> to set nr_cpus_allowed when the migrate_disable state is
> dropped, the scheduler at that point on may make incorrect
> scheduling decisions for this task.
> 
> My testing consists of temporarily adding a
> 
>  if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
>  	printk_ratelimited(...)

Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a
bug in 4.13-rt then it needs to go there first, and then backported to
the stable releases (which I'm actually working on now).

-- Steve

> 
> stmt to schedule() and running a simple affinity rotation
> program I wrote, one that rotates the threads of stress(1).
> While rotating, I got the expected kernel error messages.
> With this patch applied the messages disappeared.
> 
> Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
> 
> Index: b/kernel/sched/core.c
> ===================================================================
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
>  	lockdep_assert_held(&p->pi_lock);
>  
>  	if (__migrate_disabled(p)) {
> +		p->nr_cpus_allowed = cpumask_weight(new_mask);
>  		cpumask_copy(&p->cpus_allowed, new_mask);
>  		return;
>  	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-17 22:48 ` Steven Rostedt
@ 2017-11-20 16:30   ` joe.korty
  2017-11-21  4:02     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: joe.korty @ 2017-11-20 16:30 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

Hi Steve,
A quick perusal of 4.11.12-rt16 shows that it has an
entirely new version of migrate_disable which to me appears
correct.

In that new implementation, migrate_enable() recalculates
p->nr_cpus_allowed when it switches the task back to
using p->cpus_mask.  This brings the two back into sync
if anything had happened to get them out of sync while
migration was disabled (as would happen on an affinity
change during that disable period).

4.9.47-rt37 has the old implementation and it appears to
have same bug as 4.4-rt though I have yet to test 4.9-rt.

The fix in  these older versions could take one of two
forms: either we recalculate p->nr_cpus_allowed when
migrate_enable goes back to using p->cpus_allowed,
as the 4.11-rt version does, or the one place where we
allow p->nr_cpus_allowed to diverge from p->cpus_allowed
be fixed.  The patch I submitted earlier takes this second
approach.

Regards,
Joe



On Fri, Nov 17, 2017 at 05:48:51PM -0500, Steven Rostedt wrote:
> On Wed, 15 Nov 2017 14:25:29 -0500
> joe.korty@concurrent-rt.com wrote:
> 
> > 4.4.86-rt99's patch
> > 
> >   0037-Intrduce-migrate_disable-cpu_light.patch
> > 
> > introduces a place where a task's cpus_allowed mask is
> > updated without a corresponding update to nr_cpus_allowed.
> > 
> > This path is executed when task affinity is changed while
> > migrate_disabled() is true.  As there is no code present
> > to set nr_cpus_allowed when the migrate_disable state is
> > dropped, the scheduler at that point on may make incorrect
> > scheduling decisions for this task.
> > 
> > My testing consists of temporarily adding a
> > 
> >  if (tsk_nr_cpus_allowed(p) == cpumask_weight(tsk_cpus_allowed(p))
> >  	printk_ratelimited(...)
> 
> Have you tested v4.9-rt or 4.13-rt if it has the same bug? If it is a
> bug in 4.13-rt then it needs to go there first, and then backported to
> the stable releases (which I'm actually working on now).
> 
> -- Steve
> 
> > 
> > stmt to schedule() and running a simple affinity rotation
> > program I wrote, one that rotates the threads of stress(1).
> > While rotating, I got the expected kernel error messages.
> > With this patch applied the messages disappeared.
> > 
> > Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com>
> > 
> > Index: b/kernel/sched/core.c
> > ===================================================================
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1220,6 +1220,7 @@ void do_set_cpus_allowed(struct task_str
> >  	lockdep_assert_held(&p->pi_lock);
> >  
> >  	if (__migrate_disabled(p)) {
> > +		p->nr_cpus_allowed = cpumask_weight(new_mask);
> >  		cpumask_copy(&p->cpus_allowed, new_mask);
> >  		return;
> >  	}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-20 16:30   ` joe.korty
@ 2017-11-21  4:02     ` Steven Rostedt
  2017-11-21  4:57       ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2017-11-21  4:02 UTC (permalink / raw)
  To: joe.korty; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

On Mon, 20 Nov 2017 11:30:40 -0500
joe.korty@concurrent-rt.com wrote:

> Hi Steve,
> A quick perusal of 4.11.12-rt16 shows that it has an
> entirely new version of migrate_disable which to me appears
> correct.
> 
> In that new implementation, migrate_enable() recalculates
> p->nr_cpus_allowed when it switches the task back to
> using p->cpus_mask.  This brings the two back into sync
> if anything had happened to get them out of sync while
> migration was disabled (as would happen on an affinity
> change during that disable period).
> 
> 4.9.47-rt37 has the old implementation and it appears to
> have same bug as 4.4-rt though I have yet to test 4.9-rt.
> 
> The fix in  these older versions could take one of two
> forms: either we recalculate p->nr_cpus_allowed when
> migrate_enable goes back to using p->cpus_allowed,
> as the 4.11-rt version does, or the one place where we
> allow p->nr_cpus_allowed to diverge from p->cpus_allowed
> be fixed.  The patch I submitted earlier takes this second
> approach.
> 

Ideally, I would like to stay close to what upstream -rt does. Would
you be able to backport the 4.11-rt patch?

I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
backports. I could easily add this one too.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-21  4:02     ` Steven Rostedt
@ 2017-11-21  4:57       ` Steven Rostedt
  2017-11-21 14:33         ` joe.korty
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2017-11-21  4:57 UTC (permalink / raw)
  To: joe.korty; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

On Mon, 20 Nov 2017 23:02:07 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:


> Ideally, I would like to stay close to what upstream -rt does. Would
> you be able to backport the 4.11-rt patch?
> 
> I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
> backports. I could easily add this one too.

Speaking of which. I just backported this patch to 4.4-rt. Is this what
you are talking about?

-- Steve

>From 1dc89be37874bfc7bb4a0ea7c45492d7db39f62b Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Mon, 19 Jun 2017 09:55:47 +0200
Subject: [PATCH] sched/migrate disable: handle updated task-mask mg-dis
 section

If task's cpumask changes while in the task is in a migrate_disable()
section then we don't react on it after a migrate_enable(). It matters
however if current CPU is no longer part of the cpumask. We also miss
the ->set_cpus_allowed() callback.
This patch fixes it by setting task->migrate_disable_update once we this
"delayed" hook.
This bug was introduced while fixing unrelated issue in
migrate_disable() in v4.4-rt3 (update_migrate_disable() got removed
during that).

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/sched.h |    1 
 kernel/sched/core.c   |   59 ++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 54 insertions(+), 6 deletions(-)

Index: stable-rt.git/include/linux/sched.h
===================================================================
--- stable-rt.git.orig/include/linux/sched.h	2017-11-20 23:43:24.214077537 -0500
+++ stable-rt.git/include/linux/sched.h	2017-11-20 23:43:24.154079278 -0500
@@ -1438,6 +1438,7 @@ struct task_struct {
 	unsigned int policy;
 #ifdef CONFIG_PREEMPT_RT_FULL
 	int migrate_disable;
+	int migrate_disable_update;
 # ifdef CONFIG_SCHED_DEBUG
 	int migrate_disable_atomic;
 # endif
Index: stable-rt.git/kernel/sched/core.c
===================================================================
--- stable-rt.git.orig/kernel/sched/core.c	2017-11-20 23:43:24.214077537 -0500
+++ stable-rt.git/kernel/sched/core.c	2017-11-20 23:56:05.071687323 -0500
@@ -1212,18 +1212,14 @@ void set_cpus_allowed_common(struct task
 	p->nr_cpus_allowed = cpumask_weight(new_mask);
 }
 
-void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
+static void __do_set_cpus_allowed_tail(struct task_struct *p,
+				       const struct cpumask *new_mask)
 {
 	struct rq *rq = task_rq(p);
 	bool queued, running;
 
 	lockdep_assert_held(&p->pi_lock);
 
-	if (__migrate_disabled(p)) {
-		cpumask_copy(&p->cpus_allowed, new_mask);
-		return;
-	}
-
 	queued = task_on_rq_queued(p);
 	running = task_current(rq, p);
 
@@ -1246,6 +1242,20 @@ void do_set_cpus_allowed(struct task_str
 		enqueue_task(rq, p, ENQUEUE_RESTORE);
 }
 
+void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
+{
+	if (__migrate_disabled(p)) {
+		lockdep_assert_held(&p->pi_lock);
+
+		cpumask_copy(&p->cpus_allowed, new_mask);
+#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_SMP)
+		p->migrate_disable_update = 1;
+#endif
+		return;
+	}
+	__do_set_cpus_allowed_tail(p, new_mask);
+}
+
 static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
 static DEFINE_MUTEX(sched_down_mutex);
 static cpumask_t sched_down_cpumask;
@@ -3231,6 +3241,43 @@ void migrate_enable(void)
 	 */
 	p->migrate_disable = 0;
 
+	if (p->migrate_disable_update) {
+		unsigned long flags;
+		struct rq *rq;
+
+		rq = task_rq_lock(p, &flags);
+		update_rq_clock(rq);
+
+		__do_set_cpus_allowed_tail(p, &p->cpus_allowed);
+		task_rq_unlock(rq, p, &flags);
+
+		p->migrate_disable_update = 0;
+
+		WARN_ON(smp_processor_id() != task_cpu(p));
+		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_allowed)) {
+			const struct cpumask *cpu_valid_mask = cpu_active_mask;
+			struct migration_arg arg;
+			unsigned int dest_cpu;
+
+			if (p->flags & PF_KTHREAD) {
+				/*
+				 * Kernel threads are allowed on online && !active CPUs
+				 */
+				cpu_valid_mask = cpu_online_mask;
+			}
+			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_allowed);
+			arg.task = p;
+			arg.dest_cpu = dest_cpu;
+
+			unpin_current_cpu();
+			preempt_lazy_enable();
+			preempt_enable();
+			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
+			tlb_migrate_finish(p->mm);
+			return;
+		}
+	}
+
 	unpin_current_cpu();
 	preempt_enable();
 	preempt_lazy_enable();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-21  4:57       ` Steven Rostedt
@ 2017-11-21 14:33         ` joe.korty
  2017-11-21 15:33           ` joe.korty
  0 siblings, 1 reply; 9+ messages in thread
From: joe.korty @ 2017-11-21 14:33 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

On Mon, Nov 20, 2017 at 11:57:51PM -0500, Steven Rostedt wrote:
> On Mon, 20 Nov 2017 23:02:07 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> 
> > Ideally, I would like to stay close to what upstream -rt does. Would
> > you be able to backport the 4.11-rt patch?
> > 
> > I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
> > backports. I could easily add this one too.
> 
> Speaking of which. I just backported this patch to 4.4-rt. Is this what
> you are talking about?

Yes it is.
Thanks for finding that!
Joe

> >From 1dc89be37874bfc7bb4a0ea7c45492d7db39f62b Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Mon, 19 Jun 2017 09:55:47 +0200
> Subject: [PATCH] sched/migrate disable: handle updated task-mask mg-dis
>  section
> 
> If task's cpumask changes while in the task is in a migrate_disable()
> section then we don't react on it after a migrate_enable(). It matters
> however if current CPU is no longer part of the cpumask. We also miss
> the ->set_cpus_allowed() callback.
> This patch fixes it by setting task->migrate_disable_update once we this
> "delayed" hook.
> This bug was introduced while fixing unrelated issue in
> migrate_disable() in v4.4-rt3 (update_migrate_disable() got removed
> during that).
> 
> Cc: stable-rt@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> ---
>  include/linux/sched.h |    1 
>  kernel/sched/core.c   |   59 ++++++++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 54 insertions(+), 6 deletions(-)
> 
> Index: stable-rt.git/include/linux/sched.h
> ===================================================================
> --- stable-rt.git.orig/include/linux/sched.h	2017-11-20 23:43:24.214077537 -0500
> +++ stable-rt.git/include/linux/sched.h	2017-11-20 23:43:24.154079278 -0500
> @@ -1438,6 +1438,7 @@ struct task_struct {
>  	unsigned int policy;
>  #ifdef CONFIG_PREEMPT_RT_FULL
>  	int migrate_disable;
> +	int migrate_disable_update;
>  # ifdef CONFIG_SCHED_DEBUG
>  	int migrate_disable_atomic;
>  # endif
> Index: stable-rt.git/kernel/sched/core.c
> ===================================================================
> --- stable-rt.git.orig/kernel/sched/core.c	2017-11-20 23:43:24.214077537 -0500
> +++ stable-rt.git/kernel/sched/core.c	2017-11-20 23:56:05.071687323 -0500
> @@ -1212,18 +1212,14 @@ void set_cpus_allowed_common(struct task
>  	p->nr_cpus_allowed = cpumask_weight(new_mask);
>  }
>  
> -void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> +static void __do_set_cpus_allowed_tail(struct task_struct *p,
> +				       const struct cpumask *new_mask)
>  {
>  	struct rq *rq = task_rq(p);
>  	bool queued, running;
>  
>  	lockdep_assert_held(&p->pi_lock);
>  
> -	if (__migrate_disabled(p)) {
> -		cpumask_copy(&p->cpus_allowed, new_mask);
> -		return;
> -	}
> -
>  	queued = task_on_rq_queued(p);
>  	running = task_current(rq, p);
>  
> @@ -1246,6 +1242,20 @@ void do_set_cpus_allowed(struct task_str
>  		enqueue_task(rq, p, ENQUEUE_RESTORE);
>  }
>  
> +void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> +{
> +	if (__migrate_disabled(p)) {
> +		lockdep_assert_held(&p->pi_lock);
> +
> +		cpumask_copy(&p->cpus_allowed, new_mask);
> +#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_SMP)
> +		p->migrate_disable_update = 1;
> +#endif
> +		return;
> +	}
> +	__do_set_cpus_allowed_tail(p, new_mask);
> +}
> +
>  static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
>  static DEFINE_MUTEX(sched_down_mutex);
>  static cpumask_t sched_down_cpumask;
> @@ -3231,6 +3241,43 @@ void migrate_enable(void)
>  	 */
>  	p->migrate_disable = 0;
>  
> +	if (p->migrate_disable_update) {
> +		unsigned long flags;
> +		struct rq *rq;
> +
> +		rq = task_rq_lock(p, &flags);
> +		update_rq_clock(rq);
> +
> +		__do_set_cpus_allowed_tail(p, &p->cpus_allowed);
> +		task_rq_unlock(rq, p, &flags);
> +
> +		p->migrate_disable_update = 0;
> +
> +		WARN_ON(smp_processor_id() != task_cpu(p));
> +		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_allowed)) {
> +			const struct cpumask *cpu_valid_mask = cpu_active_mask;
> +			struct migration_arg arg;
> +			unsigned int dest_cpu;
> +
> +			if (p->flags & PF_KTHREAD) {
> +				/*
> +				 * Kernel threads are allowed on online && !active CPUs
> +				 */
> +				cpu_valid_mask = cpu_online_mask;
> +			}
> +			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_allowed);
> +			arg.task = p;
> +			arg.dest_cpu = dest_cpu;
> +
> +			unpin_current_cpu();
> +			preempt_lazy_enable();
> +			preempt_enable();
> +			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> +			tlb_migrate_finish(p->mm);
> +			return;
> +		}
> +	}
> +
>  	unpin_current_cpu();
>  	preempt_enable();
>  	preempt_lazy_enable();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-21 14:33         ` joe.korty
@ 2017-11-21 15:33           ` joe.korty
  2017-11-29  0:22             ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: joe.korty @ 2017-11-21 15:33 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List

On Tue, Nov 21, 2017 at 09:33:52AM -0500, joe.korty@concurrent-rt.com wrote:
> On Mon, Nov 20, 2017 at 11:57:51PM -0500, Steven Rostedt wrote:
> > On Mon, 20 Nov 2017 23:02:07 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > 
> > > Ideally, I would like to stay close to what upstream -rt does. Would
> > > you be able to backport the 4.11-rt patch?
> > > 
> > > I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
> > > backports. I could easily add this one too.
> > 
> > Speaking of which. I just backported this patch to 4.4-rt. Is this what
> > you are talking about?
> 
> Yes it is.
> Thanks for finding that!
> Joe

I spoke too fast.  You will a variant of my one-liner fix
when you backport the 4.11.12-r16 patch:

    rt-Increase-decrease-the-nr-of-migratory-tasks-when-.patch

to 4.9-rt and 4.4-rt.  The fix of interest is the introduction of

    p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);

to migrate_enable_update_cpus_allowed().

Regards,
Joe

> 
> > >From 1dc89be37874bfc7bb4a0ea7c45492d7db39f62b Mon Sep 17 00:00:00 2001
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Date: Mon, 19 Jun 2017 09:55:47 +0200
> > Subject: [PATCH] sched/migrate disable: handle updated task-mask mg-dis
> >  section
> > 
> > If task's cpumask changes while in the task is in a migrate_disable()
> > section then we don't react on it after a migrate_enable(). It matters
> > however if current CPU is no longer part of the cpumask. We also miss
> > the ->set_cpus_allowed() callback.
> > This patch fixes it by setting task->migrate_disable_update once we this
> > "delayed" hook.
> > This bug was introduced while fixing unrelated issue in
> > migrate_disable() in v4.4-rt3 (update_migrate_disable() got removed
> > during that).
> > 
> > Cc: stable-rt@vger.kernel.org
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> > ---
> >  include/linux/sched.h |    1 
> >  kernel/sched/core.c   |   59 ++++++++++++++++++++++++++++++++++++++++++++------
> >  2 files changed, 54 insertions(+), 6 deletions(-)
> > 
> > Index: stable-rt.git/include/linux/sched.h
> > ===================================================================
> > --- stable-rt.git.orig/include/linux/sched.h	2017-11-20 23:43:24.214077537 -0500
> > +++ stable-rt.git/include/linux/sched.h	2017-11-20 23:43:24.154079278 -0500
> > @@ -1438,6 +1438,7 @@ struct task_struct {
> >  	unsigned int policy;
> >  #ifdef CONFIG_PREEMPT_RT_FULL
> >  	int migrate_disable;
> > +	int migrate_disable_update;
> >  # ifdef CONFIG_SCHED_DEBUG
> >  	int migrate_disable_atomic;
> >  # endif
> > Index: stable-rt.git/kernel/sched/core.c
> > ===================================================================
> > --- stable-rt.git.orig/kernel/sched/core.c	2017-11-20 23:43:24.214077537 -0500
> > +++ stable-rt.git/kernel/sched/core.c	2017-11-20 23:56:05.071687323 -0500
> > @@ -1212,18 +1212,14 @@ void set_cpus_allowed_common(struct task
> >  	p->nr_cpus_allowed = cpumask_weight(new_mask);
> >  }
> >  
> > -void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> > +static void __do_set_cpus_allowed_tail(struct task_struct *p,
> > +				       const struct cpumask *new_mask)
> >  {
> >  	struct rq *rq = task_rq(p);
> >  	bool queued, running;
> >  
> >  	lockdep_assert_held(&p->pi_lock);
> >  
> > -	if (__migrate_disabled(p)) {
> > -		cpumask_copy(&p->cpus_allowed, new_mask);
> > -		return;
> > -	}
> > -
> >  	queued = task_on_rq_queued(p);
> >  	running = task_current(rq, p);
> >  
> > @@ -1246,6 +1242,20 @@ void do_set_cpus_allowed(struct task_str
> >  		enqueue_task(rq, p, ENQUEUE_RESTORE);
> >  }
> >  
> > +void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
> > +{
> > +	if (__migrate_disabled(p)) {
> > +		lockdep_assert_held(&p->pi_lock);
> > +
> > +		cpumask_copy(&p->cpus_allowed, new_mask);
> > +#if defined(CONFIG_PREEMPT_RT_FULL) && defined(CONFIG_SMP)
> > +		p->migrate_disable_update = 1;
> > +#endif
> > +		return;
> > +	}
> > +	__do_set_cpus_allowed_tail(p, new_mask);
> > +}
> > +
> >  static DEFINE_PER_CPU(struct cpumask, sched_cpumasks);
> >  static DEFINE_MUTEX(sched_down_mutex);
> >  static cpumask_t sched_down_cpumask;
> > @@ -3231,6 +3241,43 @@ void migrate_enable(void)
> >  	 */
> >  	p->migrate_disable = 0;
> >  
> > +	if (p->migrate_disable_update) {
> > +		unsigned long flags;
> > +		struct rq *rq;
> > +
> > +		rq = task_rq_lock(p, &flags);
> > +		update_rq_clock(rq);
> > +
> > +		__do_set_cpus_allowed_tail(p, &p->cpus_allowed);
> > +		task_rq_unlock(rq, p, &flags);
> > +
> > +		p->migrate_disable_update = 0;
> > +
> > +		WARN_ON(smp_processor_id() != task_cpu(p));
> > +		if (!cpumask_test_cpu(task_cpu(p), &p->cpus_allowed)) {
> > +			const struct cpumask *cpu_valid_mask = cpu_active_mask;
> > +			struct migration_arg arg;
> > +			unsigned int dest_cpu;
> > +
> > +			if (p->flags & PF_KTHREAD) {
> > +				/*
> > +				 * Kernel threads are allowed on online && !active CPUs
> > +				 */
> > +				cpu_valid_mask = cpu_online_mask;
> > +			}
> > +			dest_cpu = cpumask_any_and(cpu_valid_mask, &p->cpus_allowed);
> > +			arg.task = p;
> > +			arg.dest_cpu = dest_cpu;
> > +
> > +			unpin_current_cpu();
> > +			preempt_lazy_enable();
> > +			preempt_enable();
> > +			stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
> > +			tlb_migrate_finish(p->mm);
> > +			return;
> > +		}
> > +	}
> > +
> >  	unpin_current_cpu();
> >  	preempt_enable();
> >  	preempt_lazy_enable();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-21 15:33           ` joe.korty
@ 2017-11-29  0:22             ` Steven Rostedt
  2017-11-29 14:24               ` joe.korty
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2017-11-29  0:22 UTC (permalink / raw)
  To: joe.korty
  Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List,
	Sebastian Andrzej Siewior

On Tue, 21 Nov 2017 10:33:17 -0500
joe.korty@concurrent-rt.com wrote:

> On Tue, Nov 21, 2017 at 09:33:52AM -0500, joe.korty@concurrent-rt.com wrote:
> > On Mon, Nov 20, 2017 at 11:57:51PM -0500, Steven Rostedt wrote:  
> > > On Mon, 20 Nov 2017 23:02:07 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > >   
> > > > Ideally, I would like to stay close to what upstream -rt does. Would
> > > > you be able to backport the 4.11-rt patch?
> > > > 
> > > > I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
> > > > backports. I could easily add this one too.  
> > > 
> > > Speaking of which. I just backported this patch to 4.4-rt. Is this what
> > > you are talking about?  
> > 
> > Yes it is.
> > Thanks for finding that!
> > Joe  
> 
> I spoke too fast.  You will a variant of my one-liner fix
> when you backport the 4.11.12-r16 patch:
> 
>     rt-Increase-decrease-the-nr-of-migratory-tasks-when-.patch
> 
> to 4.9-rt and 4.4-rt.  The fix of interest is the introduction of
> 
>     p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
> 
> to migrate_enable_update_cpus_allowed().

You totally confused me here.

Hmm, that patch isn't marked for stable. I'm guessing that it should be
backported.

Now are you saying your patch still needs to be applied if we backport
this patch? Or does your patch need to be applied to what I have
already done?

I want to release 4.4-rt (and 4.9-rt) this week so let me know.

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed
  2017-11-29  0:22             ` Steven Rostedt
@ 2017-11-29 14:24               ` joe.korty
  0 siblings, 0 replies; 9+ messages in thread
From: joe.korty @ 2017-11-29 14:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, Peter Zijlstra, Linux Kernel Mailing List,
	Sebastian Andrzej Siewior

On Tue, Nov 28, 2017 at 07:22:34PM -0500, Steven Rostedt wrote:
> On Tue, 21 Nov 2017 10:33:17 -0500
> joe.korty@concurrent-rt.com wrote:
> 
> > On Tue, Nov 21, 2017 at 09:33:52AM -0500, joe.korty@concurrent-rt.com wrote:
> > > On Mon, Nov 20, 2017 at 11:57:51PM -0500, Steven Rostedt wrote:  
> > > > On Mon, 20 Nov 2017 23:02:07 -0500
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > >   
> > > > > Ideally, I would like to stay close to what upstream -rt does. Would
> > > > > you be able to backport the 4.11-rt patch?
> > > > > 
> > > > > I'm currently working on releasing 4.9-rt and 4.4-rt with the latest
> > > > > backports. I could easily add this one too.  
> > > > 
> > > > Speaking of which. I just backported this patch to 4.4-rt. Is this what
> > > > you are talking about?  
> > > 
> > > Yes it is.
> > > Thanks for finding that!
> > > Joe  
> > 
> > I spoke too fast.  You will a variant of my one-liner fix
> > when you backport the 4.11.12-r16 patch:
> > 
> >     rt-Increase-decrease-the-nr-of-migratory-tasks-when-.patch
> > 
> > to 4.9-rt and 4.4-rt.  The fix of interest is the introduction of
> > 
> >     p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
> > 
> > to migrate_enable_update_cpus_allowed().
> 
> You totally confused me here.
> 
> Hmm, that patch isn't marked for stable. I'm guessing that it should be
> backported.
> 
> Now are you saying your patch still needs to be applied if we backport
> this patch? Or does your patch need to be applied to what I have
> already done?
> 
> I want to release 4.4-rt (and 4.9-rt) this week so let me know.



Hi Steve,
Just porting that other patch should do the trick.  Or you can just apply
my patch, I know that one works as it has actually been tested.

Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-29 14:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-15 19:25 [PATCH] 4.4.86-rt99: fix sync breakage between nr_cpus_allowed and cpus_allowed joe.korty
2017-11-17 22:48 ` Steven Rostedt
2017-11-20 16:30   ` joe.korty
2017-11-21  4:02     ` Steven Rostedt
2017-11-21  4:57       ` Steven Rostedt
2017-11-21 14:33         ` joe.korty
2017-11-21 15:33           ` joe.korty
2017-11-29  0:22             ` Steven Rostedt
2017-11-29 14:24               ` joe.korty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.