linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient
@ 2023-02-02 14:31 Waiman Long
  2023-02-02 14:31 ` [PATCH v2 1/2] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Waiman Long @ 2023-02-02 14:31 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, Peter Zijlstra
  Cc: linux-kernel, cgroups, kernel-team, Waiman Long

 v2:
  - It turns out it works around the v1 cpuset missing offline cpu
    problem better than I originally thought. So the patch description
    is updated accordingly.

This small patch series makes the cpuset hotplug a bit more efficient
by eliminating unnecessary task iteration and cpu/node masks update
when a cpu hotplug event (online/offline) happens.

It can also largely work around the known problem of missing previously
offlined cpus in v1 cpuset with some exceptions.

Waiman Long (2):
  cgroup/cpuset: Skip task update if hotplug doesn't affect current
    cpuset
  cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events

 kernel/cgroup/cpuset.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
  2023-02-02 14:31 [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Waiman Long
@ 2023-02-02 14:31 ` Waiman Long
  2023-02-02 14:32 ` [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events Waiman Long
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2023-02-02 14:31 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, Peter Zijlstra
  Cc: linux-kernel, cgroups, kernel-team, Waiman Long

If a hotplug event doesn't affect the current cpuset, there is no point
to call hotplug_update_tasks() or hotplug_update_tasks_legacy(). So
just skip it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 205dc9edcaa9..cbf749fc05d9 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3506,6 +3506,8 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 update_tasks:
 	cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
 	mems_updated = !nodes_equal(new_mems, cs->effective_mems);
+	if (!cpus_updated && !mems_updated)
+		goto unlock;	/* Hotplug doesn't affect this cpuset */
 
 	if (mems_updated)
 		check_insane_mems_config(&new_mems);
@@ -3517,6 +3519,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 		hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
 					    cpus_updated, mems_updated);
 
+unlock:
 	percpu_up_write(&cpuset_rwsem);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events
  2023-02-02 14:31 [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Waiman Long
  2023-02-02 14:31 ` [PATCH v2 1/2] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
@ 2023-02-02 14:32 ` Waiman Long
  2023-02-04  9:40   ` Peter Zijlstra
  2023-02-02 21:52 ` [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Tejun Heo
  2023-02-03 12:07 ` Will Deacon
  3 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2023-02-02 14:32 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, Peter Zijlstra
  Cc: linux-kernel, cgroups, kernel-team, Waiman Long

It is a known issue that when a task is in a non-root v1 cpuset, a cpu
offline event will cause that cpu to be lost from the task's cpumask
permanently as the cpuset's cpus_allowed mask won't get back that cpu
when it becomes online again. A possible workaround for this type of
cpu offline/online sequence is to leave the offline cpu in the task's
cpumask and do the update only if new cpus are added. It also has the
benefit of reducing the overhead of a cpu offline event.

Note that the scheduler is able to ignore the offline cpus and so
leaving offline cpus in the cpumask won't do any harm.

Now with v2, only the cpu online events will cause a call to
hotplug_update_tasks() to update the tasks' cpumasks. For tasks
in a non-root v1 cpuset, the situation is a bit different. The cpu
offline event will not cause change to a task's cpumask. Neither does a
subsequent cpu online event because "cpuset.cpus" had that offline cpu
removed and its subsequent onlining won't be registered as a change
to the cpuset. An exception is when all the cpus in the original
"cpuset.cpus" have gone offline once. In that case, "cpuset.cpus" will
become empty which will force task migration to its parent. A task's
cpumask will also be changed if set_cpus_allowed_ptr() is somehow called
for whatever reason.

Of course, this patch can cause a discrepancy between v1's "cpuset.cpus"
and and its tasks' cpumasks. Howver, it can also largely work around
the offline cpu losing problem with v1 cpuset.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index cbf749fc05d9..207bafdb05e8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3332,7 +3332,7 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 static void
 hotplug_update_tasks_legacy(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
-			    bool cpus_updated, bool mems_updated)
+			    bool update_task_cpus, bool mems_updated)
 {
 	bool is_empty;
 
@@ -3347,7 +3347,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 	 * Don't call update_tasks_cpumask() if the cpuset becomes empty,
 	 * as the tasks will be migrated to an ancestor.
 	 */
-	if (cpus_updated && !cpumask_empty(cs->cpus_allowed))
+	if (update_task_cpus && !cpumask_empty(cs->cpus_allowed))
 		update_tasks_cpumask(cs);
 	if (mems_updated && !nodes_empty(cs->mems_allowed))
 		update_tasks_nodemask(cs);
@@ -3371,11 +3371,14 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 static void
 hotplug_update_tasks(struct cpuset *cs,
 		     struct cpumask *new_cpus, nodemask_t *new_mems,
-		     bool cpus_updated, bool mems_updated)
+		     bool update_task_cpus, bool mems_updated)
 {
 	/* A partition root is allowed to have empty effective cpus */
-	if (cpumask_empty(new_cpus) && !is_partition_valid(cs))
+	if (cpumask_empty(new_cpus) && !is_partition_valid(cs)) {
 		cpumask_copy(new_cpus, parent_cs(cs)->effective_cpus);
+		update_task_cpus = true;
+	}
+
 	if (nodes_empty(*new_mems))
 		*new_mems = parent_cs(cs)->effective_mems;
 
@@ -3384,7 +3387,7 @@ hotplug_update_tasks(struct cpuset *cs,
 	cs->effective_mems = *new_mems;
 	spin_unlock_irq(&callback_lock);
 
-	if (cpus_updated)
+	if (update_task_cpus)
 		update_tasks_cpumask(cs);
 	if (mems_updated)
 		update_tasks_nodemask(cs);
@@ -3410,7 +3413,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 {
 	static cpumask_t new_cpus;
 	static nodemask_t new_mems;
-	bool cpus_updated;
+	bool cpus_updated, update_task_cpus;
 	bool mems_updated;
 	struct cpuset *parent;
 retry:
@@ -3512,12 +3515,21 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 	if (mems_updated)
 		check_insane_mems_config(&new_mems);
 
+	/*
+	 * Update tasks' cpumasks only if new cpus are added. Some offline
+	 * cpus may be left, but the scheduler has no problem ignoring those.
+	 * The case of empty new_cpus will be handled inside
+	 * hotplug_update_tasks().
+	 */
+	update_task_cpus = cpus_updated &&
+			   !cpumask_subset(&new_cpus, cs->effective_cpus);
+
 	if (is_in_v2_mode())
 		hotplug_update_tasks(cs, &new_cpus, &new_mems,
-				     cpus_updated, mems_updated);
+				     update_task_cpus, mems_updated);
 	else
 		hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
-					    cpus_updated, mems_updated);
+					    update_task_cpus, mems_updated);
 
 unlock:
 	percpu_up_write(&cpuset_rwsem);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient
  2023-02-02 14:31 [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Waiman Long
  2023-02-02 14:31 ` [PATCH v2 1/2] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
  2023-02-02 14:32 ` [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events Waiman Long
@ 2023-02-02 21:52 ` Tejun Heo
  2023-02-03 12:07 ` Will Deacon
  3 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2023-02-02 21:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Zefan Li, Johannes Weiner, Will Deacon, Peter Zijlstra,
	linux-kernel, cgroups, kernel-team

On Thu, Feb 02, 2023 at 09:31:58AM -0500, Waiman Long wrote:
>  v2:
>   - It turns out it works around the v1 cpuset missing offline cpu
>     problem better than I originally thought. So the patch description
>     is updated accordingly.
> 
> This small patch series makes the cpuset hotplug a bit more efficient
> by eliminating unnecessary task iteration and cpu/node masks update
> when a cpu hotplug event (online/offline) happens.
> 
> It can also largely work around the known problem of missing previously
> offlined cpus in v1 cpuset with some exceptions.

Imma hold on these patches for now. Let's see how the regression fix goes
and revisit after the merge window.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient
  2023-02-02 14:31 [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Waiman Long
                   ` (2 preceding siblings ...)
  2023-02-02 21:52 ` [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Tejun Heo
@ 2023-02-03 12:07 ` Will Deacon
  3 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2023-02-03 12:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Peter Zijlstra,
	linux-kernel, cgroups, kernel-team

Hi Waiman,

On Thu, Feb 02, 2023 at 09:31:58AM -0500, Waiman Long wrote:
>  v2:
>   - It turns out it works around the v1 cpuset missing offline cpu
>     problem better than I originally thought. So the patch description
>     is updated accordingly.
> 
> This small patch series makes the cpuset hotplug a bit more efficient
> by eliminating unnecessary task iteration and cpu/node masks update
> when a cpu hotplug event (online/offline) happens.
> 
> It can also largely work around the known problem of missing previously
> offlined cpus in v1 cpuset with some exceptions.

I can't tell whether this is intended to address the issue I initially
reported at [1], so I gave it a spin anyway and it doesn't seem to help.

If it's unrelated, then sorry for the noise!

Will

[1] https://lore.kernel.org/r/20230117160825.GA17756@willie-the-truck

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events
  2023-02-02 14:32 ` [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events Waiman Long
@ 2023-02-04  9:40   ` Peter Zijlstra
  2023-02-05  4:40     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2023-02-04  9:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, linux-kernel,
	cgroups, kernel-team

On Thu, Feb 02, 2023 at 09:32:00AM -0500, Waiman Long wrote:
> It is a known issue that when a task is in a non-root v1 cpuset, a cpu
> offline event will cause that cpu to be lost from the task's cpumask
> permanently as the cpuset's cpus_allowed mask won't get back that cpu
> when it becomes online again. A possible workaround for this type of
> cpu offline/online sequence is to leave the offline cpu in the task's
> cpumask and do the update only if new cpus are added. It also has the
> benefit of reducing the overhead of a cpu offline event.
> 
> Note that the scheduler is able to ignore the offline cpus and so
> leaving offline cpus in the cpumask won't do any harm.
> 
> Now with v2, only the cpu online events will cause a call to
> hotplug_update_tasks() to update the tasks' cpumasks. For tasks
> in a non-root v1 cpuset, the situation is a bit different. The cpu
> offline event will not cause change to a task's cpumask. Neither does a
> subsequent cpu online event because "cpuset.cpus" had that offline cpu
> removed and its subsequent onlining won't be registered as a change
> to the cpuset. An exception is when all the cpus in the original
> "cpuset.cpus" have gone offline once. In that case, "cpuset.cpus" will
> become empty which will force task migration to its parent. A task's
> cpumask will also be changed if set_cpus_allowed_ptr() is somehow called
> for whatever reason.
> 
> Of course, this patch can cause a discrepancy between v1's "cpuset.cpus"
> and and its tasks' cpumasks. Howver, it can also largely work around
> the offline cpu losing problem with v1 cpuset.

I don't thikn you can simply not update on offline, even if
effective_cpus doesn't go empty, because the intersection between
task_cpu_possible_mask() and effective_cpus might have gone empty.

Very fundamentally, the introduction of task_cpu_possible_mask() means
that you now *HAVE* to always consider affinity settings per-task, you
cannot group them anymore.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events
  2023-02-04  9:40   ` Peter Zijlstra
@ 2023-02-05  4:40     ` Waiman Long
  2023-02-05 16:34       ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2023-02-05  4:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, linux-kernel,
	cgroups, kernel-team


On 2/4/23 04:40, Peter Zijlstra wrote:
> On Thu, Feb 02, 2023 at 09:32:00AM -0500, Waiman Long wrote:
>> It is a known issue that when a task is in a non-root v1 cpuset, a cpu
>> offline event will cause that cpu to be lost from the task's cpumask
>> permanently as the cpuset's cpus_allowed mask won't get back that cpu
>> when it becomes online again. A possible workaround for this type of
>> cpu offline/online sequence is to leave the offline cpu in the task's
>> cpumask and do the update only if new cpus are added. It also has the
>> benefit of reducing the overhead of a cpu offline event.
>>
>> Note that the scheduler is able to ignore the offline cpus and so
>> leaving offline cpus in the cpumask won't do any harm.
>>
>> Now with v2, only the cpu online events will cause a call to
>> hotplug_update_tasks() to update the tasks' cpumasks. For tasks
>> in a non-root v1 cpuset, the situation is a bit different. The cpu
>> offline event will not cause change to a task's cpumask. Neither does a
>> subsequent cpu online event because "cpuset.cpus" had that offline cpu
>> removed and its subsequent onlining won't be registered as a change
>> to the cpuset. An exception is when all the cpus in the original
>> "cpuset.cpus" have gone offline once. In that case, "cpuset.cpus" will
>> become empty which will force task migration to its parent. A task's
>> cpumask will also be changed if set_cpus_allowed_ptr() is somehow called
>> for whatever reason.
>>
>> Of course, this patch can cause a discrepancy between v1's "cpuset.cpus"
>> and and its tasks' cpumasks. Howver, it can also largely work around
>> the offline cpu losing problem with v1 cpuset.
> I don't thikn you can simply not update on offline, even if
> effective_cpus doesn't go empty, because the intersection between
> task_cpu_possible_mask() and effective_cpus might have gone empty.
>
> Very fundamentally, the introduction of task_cpu_possible_mask() means
> that you now *HAVE* to always consider affinity settings per-task, you
> cannot group them anymore.

Right, it makes sense to me. That is why I am thinking that we should 
have an API like may_have_task_cpu_possible_mask() that returns true for 
heterogeneous systems. That will allow us to apply some optimizations in 
systems with homogeneous cpus. So far, this is an arm64 only feature. We 
shouldn't penalize other arches because arm64 needs that. In the future, 
maybe more arches will have that.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events
  2023-02-05  4:40     ` Waiman Long
@ 2023-02-05 16:34       ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2023-02-05 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Zefan Li, Johannes Weiner, Will Deacon, linux-kernel,
	cgroups, kernel-team

On 2/4/23 23:40, Waiman Long wrote:
>
> On 2/4/23 04:40, Peter Zijlstra wrote:
>> On Thu, Feb 02, 2023 at 09:32:00AM -0500, Waiman Long wrote:
>>> It is a known issue that when a task is in a non-root v1 cpuset, a cpu
>>> offline event will cause that cpu to be lost from the task's cpumask
>>> permanently as the cpuset's cpus_allowed mask won't get back that cpu
>>> when it becomes online again. A possible workaround for this type of
>>> cpu offline/online sequence is to leave the offline cpu in the task's
>>> cpumask and do the update only if new cpus are added. It also has the
>>> benefit of reducing the overhead of a cpu offline event.
>>>
>>> Note that the scheduler is able to ignore the offline cpus and so
>>> leaving offline cpus in the cpumask won't do any harm.
>>>
>>> Now with v2, only the cpu online events will cause a call to
>>> hotplug_update_tasks() to update the tasks' cpumasks. For tasks
>>> in a non-root v1 cpuset, the situation is a bit different. The cpu
>>> offline event will not cause change to a task's cpumask. Neither does a
>>> subsequent cpu online event because "cpuset.cpus" had that offline cpu
>>> removed and its subsequent onlining won't be registered as a change
>>> to the cpuset. An exception is when all the cpus in the original
>>> "cpuset.cpus" have gone offline once. In that case, "cpuset.cpus" will
>>> become empty which will force task migration to its parent. A task's
>>> cpumask will also be changed if set_cpus_allowed_ptr() is somehow 
>>> called
>>> for whatever reason.
>>>
>>> Of course, this patch can cause a discrepancy between v1's 
>>> "cpuset.cpus"
>>> and and its tasks' cpumasks. Howver, it can also largely work around
>>> the offline cpu losing problem with v1 cpuset.
>> I don't thikn you can simply not update on offline, even if
>> effective_cpus doesn't go empty, because the intersection between
>> task_cpu_possible_mask() and effective_cpus might have gone empty.
>>
>> Very fundamentally, the introduction of task_cpu_possible_mask() means
>> that you now *HAVE* to always consider affinity settings per-task, you
>> cannot group them anymore.
>
> Right, it makes sense to me. That is why I am thinking that we should 
> have an API like may_have_task_cpu_possible_mask() that returns true 
> for heterogeneous systems. That will allow us to apply some 
> optimizations in systems with homogeneous cpus. So far, this is an 
> arm64 only feature. We shouldn't penalize other arches because arm64 
> needs that. In the future, maybe more arches will have that.

It turns out there still gaps in the handling of 
task_cpu_possible_mask() in existing cpuset code. We currently do not 
check for the returned value of set_cpus_allowed_ptr() which may fails 
in case task_cpu_possible_mask() returns a different mask. I think it 
may be too late for the upcoming merge window. I am planning to have 
that addressed in the next one.

In the case of cpu offline, one possible solution may be to move the 
task to its parent like what is done for v1 when effective_cpus becomes 
empty. In the case of cpuset.cpus update, perhaps just ignore the 
failure and print an info message about that.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-05 16:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 14:31 [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Waiman Long
2023-02-02 14:31 ` [PATCH v2 1/2] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset Waiman Long
2023-02-02 14:32 ` [PATCH v2 2/2] cgroup/cpuset: Don't update tasks' cpumasks for cpu offline events Waiman Long
2023-02-04  9:40   ` Peter Zijlstra
2023-02-05  4:40     ` Waiman Long
2023-02-05 16:34       ` Waiman Long
2023-02-02 21:52 ` [PATCH v2 0/2] cgroup/cpuset: Make hotplug code more efficient Tejun Heo
2023-02-03 12:07 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).