All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Quentin Perret <qperret@google.com>, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	kernel-team@android.com
Subject: Re: [PATCH v7 01/22] sched: Favour predetermined active CPU as migration destination
Date: Wed, 26 May 2021 12:14:20 +0100	[thread overview]
Message-ID: <877djlhhmb.mognet@arm.com> (raw)
In-Reply-To: <20210525151432.16875-2-will@kernel.org>

Hi,

On 25/05/21 16:14, Will Deacon wrote:
> Since commit 6d337eab041d ("sched: Fix migrate_disable() vs
> set_cpus_allowed_ptr()"), the migration stopper thread is left to
> determine the destination CPU of the running task being migrated, even
> though set_cpus_allowed_ptr() already identified a candidate target
> earlier on.
>
> Unfortunately, the stopper doesn't check whether or not the new
> destination CPU is active or not, so __migrate_task() can leave the task
> sitting on a CPU that is outside of its affinity mask, even if the CPU
> originally chosen by SCA is still active.
>
> For example, with CONFIG_CPUSET=n:
>
>  $ taskset -pc 0-2 $PID
>  # offline CPUs 3-4
>  $ taskset -pc 3-5 $PID
>
> Then $PID remains on its current CPU (one of 0-2) and does not get
> migrated to CPU 5.
>
> Rework 'struct migration_arg' so that an optional pointer to an affinity
> mask can be provided to the stopper, allowing us to respect the
> original choice of destination CPU when migrating. Note that there is
> still the potential to race with a concurrent CPU hot-unplug of the
> destination CPU if the caller does not hold the hotplug lock.
>
> Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/sched/core.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5226cc26a095..1702a60d178d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1869,6 +1869,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
>  struct migration_arg {
>       struct task_struct		*task;
>       int				dest_cpu;
> +	const struct cpumask		*dest_mask;
>       struct set_affinity_pending	*pending;
>  };
>
> @@ -1917,6 +1918,7 @@ static int migration_cpu_stop(void *data)
>       struct set_affinity_pending *pending = arg->pending;
>       struct task_struct *p = arg->task;
>       int dest_cpu = arg->dest_cpu;
> +	const struct cpumask *dest_mask = arg->dest_mask;
>       struct rq *rq = this_rq();
>       bool complete = false;
>       struct rq_flags rf;
> @@ -1956,12 +1958,8 @@ static int migration_cpu_stop(void *data)
>                       complete = true;
>               }
>
> -		if (dest_cpu < 0) {
> -			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> -				goto out;
> -
> -			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
> -		}
> +		if (dest_mask && (cpumask_test_cpu(task_cpu(p), dest_mask)))
> +			goto out;
>

IIRC the reason we deferred the pick to migration_cpu_stop() was because of
those insane races involving multiple SCA calls the likes of:

  p->cpus_mask = [0, 1]; p on CPU0

  CPUx                           CPUy                   CPU0

  SCA(p, [2])
    __do_set_cpus_allowed();
    queue migration_cpu_stop()
                                 SCA(p, [3])
                                   __do_set_cpus_allowed();
                                                        migration_cpu_stop()

The stopper needs to use the latest cpumask set by the second SCA despite
having an arg->pending set up by the first SCA. Doesn't this break here?

I'm not sure I've paged back in all of the subtleties laying in ambush
here, but what about the below?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5226cc26a095..cd447c9db61d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1916,7 +1916,6 @@ static int migration_cpu_stop(void *data)
 	struct migration_arg *arg = data;
 	struct set_affinity_pending *pending = arg->pending;
 	struct task_struct *p = arg->task;
-	int dest_cpu = arg->dest_cpu;
 	struct rq *rq = this_rq();
 	bool complete = false;
 	struct rq_flags rf;
@@ -1954,19 +1953,15 @@ static int migration_cpu_stop(void *data)
 		if (pending) {
 			p->migration_pending = NULL;
 			complete = true;
-		}
 
-		if (dest_cpu < 0) {
 			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
 				goto out;
-
-			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
 		}
 
 		if (task_on_rq_queued(p))
-			rq = __migrate_task(rq, &rf, p, dest_cpu);
+			rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
 		else
-			p->wake_cpu = dest_cpu;
+			p->wake_cpu = arg->dest_cpu;
 
 		/*
 		 * XXX __migrate_task() can fail, at which point we might end
@@ -2249,7 +2244,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			init_completion(&my_pending.done);
 			my_pending.arg = (struct migration_arg) {
 				.task = p,
-				.dest_cpu = -1,		/* any */
+				.dest_cpu = dest_cpu,
 				.pending = &my_pending,
 			};
 
@@ -2257,6 +2252,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		} else {
 			pending = p->migration_pending;
 			refcount_inc(&pending->refs);
+			pending->arg.dest_cpu = dest_cpu;
 		}
 	}
 	pending = p->migration_pending;

WARNING: multiple messages have this Message-ID (diff)
From: Valentin Schneider <valentin.schneider@arm.com>
To: Will Deacon <will@kernel.org>, linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Quentin Perret <qperret@google.com>, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	kernel-team@android.com
Subject: Re: [PATCH v7 01/22] sched: Favour predetermined active CPU as migration destination
Date: Wed, 26 May 2021 12:14:20 +0100	[thread overview]
Message-ID: <877djlhhmb.mognet@arm.com> (raw)
In-Reply-To: <20210525151432.16875-2-will@kernel.org>

Hi,

On 25/05/21 16:14, Will Deacon wrote:
> Since commit 6d337eab041d ("sched: Fix migrate_disable() vs
> set_cpus_allowed_ptr()"), the migration stopper thread is left to
> determine the destination CPU of the running task being migrated, even
> though set_cpus_allowed_ptr() already identified a candidate target
> earlier on.
>
> Unfortunately, the stopper doesn't check whether or not the new
> destination CPU is active or not, so __migrate_task() can leave the task
> sitting on a CPU that is outside of its affinity mask, even if the CPU
> originally chosen by SCA is still active.
>
> For example, with CONFIG_CPUSET=n:
>
>  $ taskset -pc 0-2 $PID
>  # offline CPUs 3-4
>  $ taskset -pc 3-5 $PID
>
> Then $PID remains on its current CPU (one of 0-2) and does not get
> migrated to CPU 5.
>
> Rework 'struct migration_arg' so that an optional pointer to an affinity
> mask can be provided to the stopper, allowing us to respect the
> original choice of destination CPU when migrating. Note that there is
> still the potential to race with a concurrent CPU hot-unplug of the
> destination CPU if the caller does not hold the hotplug lock.
>
> Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  kernel/sched/core.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5226cc26a095..1702a60d178d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1869,6 +1869,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
>  struct migration_arg {
>       struct task_struct		*task;
>       int				dest_cpu;
> +	const struct cpumask		*dest_mask;
>       struct set_affinity_pending	*pending;
>  };
>
> @@ -1917,6 +1918,7 @@ static int migration_cpu_stop(void *data)
>       struct set_affinity_pending *pending = arg->pending;
>       struct task_struct *p = arg->task;
>       int dest_cpu = arg->dest_cpu;
> +	const struct cpumask *dest_mask = arg->dest_mask;
>       struct rq *rq = this_rq();
>       bool complete = false;
>       struct rq_flags rf;
> @@ -1956,12 +1958,8 @@ static int migration_cpu_stop(void *data)
>                       complete = true;
>               }
>
> -		if (dest_cpu < 0) {
> -			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
> -				goto out;
> -
> -			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
> -		}
> +		if (dest_mask && (cpumask_test_cpu(task_cpu(p), dest_mask)))
> +			goto out;
>

IIRC the reason we deferred the pick to migration_cpu_stop() was because of
those insane races involving multiple SCA calls the likes of:

  p->cpus_mask = [0, 1]; p on CPU0

  CPUx                           CPUy                   CPU0

  SCA(p, [2])
    __do_set_cpus_allowed();
    queue migration_cpu_stop()
                                 SCA(p, [3])
                                   __do_set_cpus_allowed();
                                                        migration_cpu_stop()

The stopper needs to use the latest cpumask set by the second SCA despite
having an arg->pending set up by the first SCA. Doesn't this break here?

I'm not sure I've paged back in all of the subtleties laying in ambush
here, but what about the below?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5226cc26a095..cd447c9db61d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1916,7 +1916,6 @@ static int migration_cpu_stop(void *data)
 	struct migration_arg *arg = data;
 	struct set_affinity_pending *pending = arg->pending;
 	struct task_struct *p = arg->task;
-	int dest_cpu = arg->dest_cpu;
 	struct rq *rq = this_rq();
 	bool complete = false;
 	struct rq_flags rf;
@@ -1954,19 +1953,15 @@ static int migration_cpu_stop(void *data)
 		if (pending) {
 			p->migration_pending = NULL;
 			complete = true;
-		}
 
-		if (dest_cpu < 0) {
 			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
 				goto out;
-
-			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
 		}
 
 		if (task_on_rq_queued(p))
-			rq = __migrate_task(rq, &rf, p, dest_cpu);
+			rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
 		else
-			p->wake_cpu = dest_cpu;
+			p->wake_cpu = arg->dest_cpu;
 
 		/*
 		 * XXX __migrate_task() can fail, at which point we might end
@@ -2249,7 +2244,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			init_completion(&my_pending.done);
 			my_pending.arg = (struct migration_arg) {
 				.task = p,
-				.dest_cpu = -1,		/* any */
+				.dest_cpu = dest_cpu,
 				.pending = &my_pending,
 			};
 
@@ -2257,6 +2252,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		} else {
 			pending = p->migration_pending;
 			refcount_inc(&pending->refs);
+			pending->arg.dest_cpu = dest_cpu;
 		}
 	}
 	pending = p->migration_pending;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-26 11:14 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-25 15:14 [PATCH v7 00/22] Add support for 32-bit tasks on asymmetric AArch32 systems Will Deacon
2021-05-25 15:14 ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 01/22] sched: Favour predetermined active CPU as migration destination Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-26 11:14   ` Valentin Schneider [this message]
2021-05-26 11:14     ` Valentin Schneider
2021-05-26 12:32     ` Peter Zijlstra
2021-05-26 12:32       ` Peter Zijlstra
2021-05-26 12:36       ` Valentin Schneider
2021-05-26 12:36         ` Valentin Schneider
2021-05-26 16:03     ` Will Deacon
2021-05-26 16:03       ` Will Deacon
2021-05-26 17:46       ` Valentin Schneider
2021-05-26 17:46         ` Valentin Schneider
2021-05-25 15:14 ` [PATCH v7 02/22] arm64: cpuinfo: Split AArch32 registers out into a separate struct Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 03/22] arm64: Allow mismatched 32-bit EL0 support Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 04/22] KVM: arm64: Kill 32-bit vCPUs on systems with mismatched " Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 05/22] arm64: Kill 32-bit applications scheduled on 64-bit-only CPUs Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 06/22] arm64: Advertise CPUs capable of running 32-bit applications in sysfs Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 07/22] sched: Introduce task_cpu_possible_mask() to limit fallback rq selection Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 08/22] cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-26 15:02   ` Peter Zijlstra
2021-05-26 15:02     ` Peter Zijlstra
2021-05-26 16:07     ` Will Deacon
2021-05-26 16:07       ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 09/22] cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 10/22] sched: Reject CPU affinity changes based on task_cpu_possible_mask() Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-26 15:15   ` Peter Zijlstra
2021-05-26 15:15     ` Peter Zijlstra
2021-05-26 16:12     ` Will Deacon
2021-05-26 16:12       ` Will Deacon
2021-05-26 17:56       ` Peter Zijlstra
2021-05-26 17:56         ` Peter Zijlstra
2021-05-26 18:59         ` Will Deacon
2021-05-26 18:59           ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 11/22] sched: Introduce task_struct::user_cpus_ptr to track requested affinity Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 12/22] sched: Split the guts of sched_setaffinity() into a helper function Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 13/22] sched: Allow task CPU affinity to be restricted on asymmetric systems Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-26 16:20   ` Peter Zijlstra
2021-05-26 16:20     ` Peter Zijlstra
2021-05-26 16:35     ` Will Deacon
2021-05-26 16:35       ` Will Deacon
2021-05-26 16:30   ` Peter Zijlstra
2021-05-26 16:30     ` Peter Zijlstra
2021-05-26 17:02     ` Will Deacon
2021-05-26 17:02       ` Will Deacon
2021-05-27  7:56       ` Peter Zijlstra
2021-05-27  7:56         ` Peter Zijlstra
2021-05-25 15:14 ` [PATCH v7 14/22] sched: Introduce task_cpus_dl_admissible() to check proposed affinity Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 15/22] freezer: Add frozen_or_skipped() helper function Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 16/22] sched: Defer wakeup in ttwu() for unschedulable frozen tasks Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-27 14:10   ` Peter Zijlstra
2021-05-27 14:10     ` Peter Zijlstra
2021-05-27 14:31     ` Peter Zijlstra
2021-05-27 14:31       ` Peter Zijlstra
2021-05-27 14:44       ` Will Deacon
2021-05-27 14:44         ` Will Deacon
2021-05-27 14:55         ` Peter Zijlstra
2021-05-27 14:55           ` Peter Zijlstra
2021-05-27 14:50       ` Peter Zijlstra
2021-05-27 14:50         ` Peter Zijlstra
2021-05-28 10:49       ` Peter Zijlstra
2021-05-28 10:49         ` Peter Zijlstra
2021-05-27 14:36     ` Will Deacon
2021-05-27 14:36       ` Will Deacon
2021-06-01  8:21   ` [RFC][PATCH] freezer,sched: Rewrite core freezer logic Peter Zijlstra
2021-06-01  8:21     ` Peter Zijlstra
2021-06-01 11:27     ` Peter Zijlstra
2021-06-01 11:27       ` Peter Zijlstra
2021-06-02 12:54       ` Will Deacon
2021-06-02 12:54         ` Will Deacon
2021-06-03 10:35         ` Peter Zijlstra
2021-06-03 10:35           ` Peter Zijlstra
2021-06-03 10:58           ` Will Deacon
2021-06-03 10:58             ` Will Deacon
2021-06-03 11:26             ` Peter Zijlstra
2021-06-03 11:26               ` Peter Zijlstra
2021-06-03 11:36               ` Will Deacon
2021-06-03 11:36                 ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 17/22] arm64: Implement task_cpu_possible_mask() Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 18/22] arm64: exec: Adjust affinity for compat tasks with mismatched 32-bit EL0 Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 19/22] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 20/22] arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0 Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 21/22] arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 15:14 ` [PATCH v7 22/22] Documentation: arm64: describe asymmetric 32-bit support Will Deacon
2021-05-25 15:14   ` Will Deacon
2021-05-25 17:13   ` Marc Zyngier
2021-05-25 17:13     ` Marc Zyngier
2021-05-25 17:27     ` Will Deacon
2021-05-25 17:27       ` Will Deacon
2021-05-25 18:11       ` Marc Zyngier
2021-05-25 18:11         ` Marc Zyngier
2021-05-26 16:00         ` Will Deacon
2021-05-26 16:00           ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877djlhhmb.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rjw@rjwysocki.net \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.