[RT,21/23] sched: migrate_enable: Busy loop until the migration request is completed
diff mbox series

Message ID fd4bda7ad49f46545a03424fd1327dff8a8b8171.1582814004.git.zanussi@kernel.org
State New
Headers show
Series
  • Linux v4.14.170-rt75-rc2
Related show

Commit Message

Tom Zanussi Feb. 27, 2020, 2:33 p.m. UTC
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

v4.14.170-rt75-rc2 stable review patch.
If anyone has any objections, please let me know.

-----------


[ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]

If user task changes the CPU affinity mask of a running task it will
dispatch migration request if the current CPU is no longer allowed. This
might happen shortly before a task enters a migrate_disable() section.
Upon leaving the migrate_disable() section, the task will notice that
the current CPU is no longer allowed and will will dispatch its own
migration request to move it off the current CPU.
While invoking __schedule() the first migration request will be
processed and the task returns on the "new" CPU with "arg.done = 0". Its
own migration request will be processed shortly after and will result in
memory corruption if the stack memory, designed for request, was used
otherwise in the meantime.

Spin until the migration request has been processed if it was accepted.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
---
 kernel/sched/core.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Scott Wood March 3, 2020, 7:56 p.m. UTC | #1
On Thu, 2020-02-27 at 08:33 -0600, zanussi@kernel.org wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> v4.14.170-rt75-rc2 stable review patch.
> If anyone has any objections, please let me know.
> 
> -----------
> 
> 
> [ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]
> 
> If user task changes the CPU affinity mask of a running task it will
> dispatch migration request if the current CPU is no longer allowed. This
> might happen shortly before a task enters a migrate_disable() section.
> Upon leaving the migrate_disable() section, the task will notice that
> the current CPU is no longer allowed and will will dispatch its own
> migration request to move it off the current CPU.
> While invoking __schedule() the first migration request will be
> processed and the task returns on the "new" CPU with "arg.done = 0". Its
> own migration request will be processed shortly after and will result in
> memory corruption if the stack memory, designed for request, was used
> otherwise in the meantime.
> 
> Spin until the migration request has been processed if it was accepted.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> ---
>  kernel/sched/core.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

As I said in https://marc.info/?l=linux-rt-users&m=158258256415340&w=2 if
you take thhis you should take the followup 2dcd94b443c5dcbc ("sched:
migrate_enable: Use per-cpu cpu_stop_work")

-Scott
Tom Zanussi March 3, 2020, 8:39 p.m. UTC | #2
Hi Scott,

On Tue, 2020-03-03 at 13:56 -0600, Scott Wood wrote:
> On Thu, 2020-02-27 at 08:33 -0600, zanussi@kernel.org wrote:
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > v4.14.170-rt75-rc2 stable review patch.
> > If anyone has any objections, please let me know.
> > 
> > -----------
> > 
> > 
> > [ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]
> > 
> > If user task changes the CPU affinity mask of a running task it
> > will
> > dispatch migration request if the current CPU is no longer allowed.
> > This
> > might happen shortly before a task enters a migrate_disable()
> > section.
> > Upon leaving the migrate_disable() section, the task will notice
> > that
> > the current CPU is no longer allowed and will will dispatch its own
> > migration request to move it off the current CPU.
> > While invoking __schedule() the first migration request will be
> > processed and the task returns on the "new" CPU with "arg.done =
> > 0". Its
> > own migration request will be processed shortly after and will
> > result in
> > memory corruption if the stack memory, designed for request, was
> > used
> > otherwise in the meantime.
> > 
> > Spin until the migration request has been processed if it was
> > accepted.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> > ---
> >  kernel/sched/core.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> As I said in https://marc.info/?l=linux-rt-users&m=158258256415340&w=
> 2 if
> you take thhis you should take the followup 2dcd94b443c5dcbc ("sched:
> migrate_enable: Use per-cpu cpu_stop_work")
> 

Yes, I didn't forget about this, it's just that I can't apply this to
4.14-rt until 4.19-rt does, otherwise it will be seen as a regression
to someone moving from 4.14-rt to 4.19-rt.

I will be keeping my eye out for when that happens and will apply it to
the next backport release at that point.

Thanks for making sure it wasn't missed in any case.

Tom 


> -Scott
> 
>
Scott Wood March 3, 2020, 9:19 p.m. UTC | #3
On Tue, 2020-03-03 at 14:39 -0600, Tom Zanussi wrote:
> Hi Scott,
> 
> On Tue, 2020-03-03 at 13:56 -0600, Scott Wood wrote:
> > On Thu, 2020-02-27 at 08:33 -0600, zanussi@kernel.org wrote:
> > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > 
> > > v4.14.170-rt75-rc2 stable review patch.
> > > If anyone has any objections, please let me know.
> > > 
> > > -----------
> > > 
> > > 
> > > [ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]
> > > 
> > > If user task changes the CPU affinity mask of a running task it
> > > will
> > > dispatch migration request if the current CPU is no longer allowed.
> > > This
> > > might happen shortly before a task enters a migrate_disable()
> > > section.
> > > Upon leaving the migrate_disable() section, the task will notice
> > > that
> > > the current CPU is no longer allowed and will will dispatch its own
> > > migration request to move it off the current CPU.
> > > While invoking __schedule() the first migration request will be
> > > processed and the task returns on the "new" CPU with "arg.done =
> > > 0". Its
> > > own migration request will be processed shortly after and will
> > > result in
> > > memory corruption if the stack memory, designed for request, was
> > > used
> > > otherwise in the meantime.
> > > 
> > > Spin until the migration request has been processed if it was
> > > accepted.
> > > 
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> > > ---
> > >  kernel/sched/core.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > As I said in https://marc.info/?l=linux-rt-users&m=158258256415340&w=
> > 2 if
> > you take thhis you should take the followup 2dcd94b443c5dcbc ("sched:
> > migrate_enable: Use per-cpu cpu_stop_work")
> > 
> 
> Yes, I didn't forget about this, it's just that I can't apply this to
> 4.14-rt until 4.19-rt does, otherwise it will be seen as a regression
> to someone moving from 4.14-rt to 4.19-rt.
> 
> I will be keeping my eye out for when that happens and will apply it to
> the next backport release at that point.
> 
> Thanks for making sure it wasn't missed in any case.

Steven, any plans to merge that patch into 4.19-rt?

In the meantime, I guess it's a question of whether the bug fixed by patch
18/23 is worse than the (probably quite hard to hit) deadlock addressed by
2dcd94b443c5dcbc.

-Scott
Steven Rostedt March 3, 2020, 9:54 p.m. UTC | #4
On Tue, 03 Mar 2020 15:19:23 -0600
Scott Wood <swood@redhat.com> wrote:

> > Thanks for making sure it wasn't missed in any case.  
> 
> Steven, any plans to merge that patch into 4.19-rt?
> 
> In the meantime, I guess it's a question of whether the bug fixed by patch
> 18/23 is worse than the (probably quite hard to hit) deadlock addressed by
> 2dcd94b443c5dcbc.

Yes, I plan on doing my backports thursday and friday.

-- Steve
David Laight March 5, 2020, 1:38 p.m. UTC | #5
From: zanussi@kernel.org
> Sent: 27 February 2020 14:34
> [ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]
> 
> If user task changes the CPU affinity mask of a running task it will
> dispatch migration request if the current CPU is no longer allowed. This
> might happen shortly before a task enters a migrate_disable() section.
> Upon leaving the migrate_disable() section, the task will notice that
> the current CPU is no longer allowed and will will dispatch its own
> migration request to move it off the current CPU.
> While invoking __schedule() the first migration request will be
> processed and the task returns on the "new" CPU with "arg.done = 0". Its
> own migration request will be processed shortly after and will result in
> memory corruption if the stack memory, designed for request, was used
> otherwise in the meantime.
> 
> Spin until the migration request has been processed if it was accepted.

What happens if the process changing the affinity mask is running
at a higher RT priority than that of the task being changed and
the new mask requires it run on the same cpu?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Tom Zanussi March 5, 2020, 5:58 p.m. UTC | #6
Hi David,

On Thu, 2020-03-05 at 13:38 +0000, David Laight wrote:
> From: zanussi@kernel.org
> > Sent: 27 February 2020 14:34
> > [ Upstream commit 140d7f54a5fff02898d2ca9802b39548bf7455f1 ]
> > 
> > If user task changes the CPU affinity mask of a running task it
> > will
> > dispatch migration request if the current CPU is no longer allowed.
> > This
> > might happen shortly before a task enters a migrate_disable()
> > section.
> > Upon leaving the migrate_disable() section, the task will notice
> > that
> > the current CPU is no longer allowed and will will dispatch its own
> > migration request to move it off the current CPU.
> > While invoking __schedule() the first migration request will be
> > processed and the task returns on the "new" CPU with "arg.done =
> > 0". Its
> > own migration request will be processed shortly after and will
> > result in
> > memory corruption if the stack memory, designed for request, was
> > used
> > otherwise in the meantime.
> > 
> > Spin until the migration request has been processed if it was
> > accepted.
> 
> What happens if the process changing the affinity mask is running
> at a higher RT priority than that of the task being changed and
> the new mask requires it run on the same cpu?
> 
> 	David
> 

This patch, 'sched: migrate_enable: Use per-cpu cpu_stop_work', removes
the busy wait and is queued for the next update:

https://lore.kernel.org/linux-rt-users/20200203173732.ldbgbpwao7xm23mm@linutronix.de/T/#mf19a8af38ac4ea0cc01775835e9d715f175f0b7b

Thanks,

Tom


> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

Patch
diff mbox series

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e10e3956bb29..f30bb249123b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7002,7 +7002,7 @@  void migrate_enable(void)
 
 	WARN_ON(smp_processor_id() != cpu);
 	if (!is_cpu_allowed(p, cpu)) {
-		struct migration_arg arg = { p };
+		struct migration_arg arg = { .task = p };
 		struct cpu_stop_work work;
 		struct rq_flags rf;
 
@@ -7015,7 +7015,10 @@  void migrate_enable(void)
 				    &arg, &work);
 		tlb_migrate_finish(p->mm);
 		__schedule(true);
-		WARN_ON_ONCE(!arg.done && !work.disabled);
+		if (!work.disabled) {
+			while (!arg.done)
+				cpu_relax();
+		}
 	}
 
 out: