stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
       [not found] <cover.1607036601.git.reinette.chatre@intel.com>
@ 2020-12-03 23:25 ` Reinette Chatre
  2020-12-07 18:29   ` Borislav Petkov
  2020-12-09 16:47   ` James Morse
  2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
  2020-12-03 23:25 ` [PATCH 3/3] x86/resctrl: Don't move a task to the same " Reinette Chatre
  2 siblings, 2 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-03 23:25 UTC (permalink / raw)
  To: tglx, fenghua.yu, bp, tony.luck
  Cc: kuo-lang.tseng, shakeelb, valentin.schneider, mingo, babu.moger,
	james.morse, hpa, x86, linux-kernel, Reinette Chatre, stable

From: Fenghua Yu <fenghua.yu@intel.com>

The code of setting the CPU on which a task is running in a CPU mask is
moved into a couple of helpers. The new helper task_on_cpu() will be
reused shortly.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 +++++++++++++++++++-------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6f4ca4bea625..68db7d2dec8f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
 	kfree(rdtgrp);
 }
 
+#ifdef CONFIG_SMP
+/* Get the CPU if the task is on it. */
+static bool task_on_cpu(struct task_struct *t, int *cpu)
+{
+	/*
+	 * This is safe on x86 w/o barriers as the ordering of writing to
+	 * task_cpu() and t->on_cpu is reverse to the reading here. The
+	 * detection is inaccurate as tasks might move or schedule before
+	 * the smp function call takes place. In such a case the function
+	 * call is pointless, but there is no other side effect.
+	 */
+	if (t->on_cpu) {
+		*cpu = task_cpu(t);
+
+		return true;
+	}
+
+	return false;
+}
+
+static void set_task_cpumask(struct task_struct *t, struct cpumask *mask)
+{
+	int cpu;
+
+	if (mask && task_on_cpu(t, &cpu))
+		cpumask_set_cpu(cpu, mask);
+}
+#else
+static inline void
+set_task_cpumask(struct task_struct *t, struct cpumask *mask) { }
+#endif
+
 struct task_move_callback {
 	struct callback_head	work;
 	struct rdtgroup		*rdtgrp;
@@ -2327,19 +2359,8 @@ static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
 			t->closid = to->closid;
 			t->rmid = to->mon.rmid;
 
-#ifdef CONFIG_SMP
-			/*
-			 * This is safe on x86 w/o barriers as the ordering
-			 * of writing to task_cpu() and t->on_cpu is
-			 * reverse to the reading here. The detection is
-			 * inaccurate as tasks might move or schedule
-			 * before the smp function call takes place. In
-			 * such a case the function call is pointless, but
-			 * there is no other side effect.
-			 */
-			if (mask && t->on_cpu)
-				cpumask_set_cpu(task_cpu(t), mask);
-#endif
+			/* If the task is on a CPU, set the CPU in the mask. */
+			set_task_cpumask(t, mask);
 		}
 	}
 	read_unlock(&tasklist_lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
       [not found] <cover.1607036601.git.reinette.chatre@intel.com>
  2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
@ 2020-12-03 23:25 ` Reinette Chatre
  2020-12-09 16:51   ` James Morse
  2020-12-11 20:46   ` Valentin Schneider
  2020-12-03 23:25 ` [PATCH 3/3] x86/resctrl: Don't move a task to the same " Reinette Chatre
  2 siblings, 2 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-03 23:25 UTC (permalink / raw)
  To: tglx, fenghua.yu, bp, tony.luck
  Cc: kuo-lang.tseng, shakeelb, valentin.schneider, mingo, babu.moger,
	james.morse, hpa, x86, linux-kernel, Reinette Chatre, stable

From: Fenghua Yu <fenghua.yu@intel.com>

Currently when moving a task to a resource group the PQR_ASSOC MSR
is updated with the new closid and rmid in an added task callback.
If the task is running the work is run as soon as possible. If the
task is not running the work is executed later in the kernel exit path
when the kernel returns to the task again.

Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when the
task is scheduled in) and causing system resource waste with the way in
which it is implemented: Work to update the PQR_ASSOC register is queued
every time the user writes a task id to the "tasks" file, even if the task
already belongs to the resource group. This could result in multiple pending
work items associated with a single task even if they are all identical and
even though only a single update with most recent values is needed.
Specifically, even if a task is moved between different resource groups
while it is sleeping then it is only the last move that is relevant but
yet a work item is queued during each move.
This unnecessary queueing of work items could result in significant system
resource waste, especially on tasks sleeping for a long time. For example,
as demonstrated by Shakeel Butt in [1] writing the same task id to the
"tasks" file can quickly consume significant memory. The same problem
(wasted system resources) occurs when moving a task between different
resource groups.

As pointed out by Valentin Schneider in [2] there is an additional issue with
the way in which the queueing of work is done in that the task_struct update
is currently done after the work is queued, resulting in a race with the
register update possibly done before the data needed by the update is available.

To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing is
done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
This is the same way used to update the register when tasks are moved as
part of resource group removal.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 123 ++++++++++---------------
 1 file changed, 50 insertions(+), 73 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 68db7d2dec8f..9d62f1fadcc3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
 	kfree(rdtgrp);
 }
 
+static void _update_task_closid_rmid(void *task)
+{
+	/*
+	 * If the task is still current on this CPU, update PQR_ASSOC MSR.
+	 * Otherwise, the MSR is updated when the task is scheduled in.
+	 */
+	if (task == current)
+		resctrl_sched_in();
+}
+
 #ifdef CONFIG_SMP
 /* Get the CPU if the task is on it. */
 static bool task_on_cpu(struct task_struct *t, int *cpu)
@@ -552,94 +562,61 @@ static void set_task_cpumask(struct task_struct *t, struct cpumask *mask)
 	if (mask && task_on_cpu(t, &cpu))
 		cpumask_set_cpu(cpu, mask);
 }
-#else
-static inline void
-set_task_cpumask(struct task_struct *t, struct cpumask *mask) { }
-#endif
-
-struct task_move_callback {
-	struct callback_head	work;
-	struct rdtgroup		*rdtgrp;
-};
 
-static void move_myself(struct callback_head *head)
+static void update_task_closid_rmid(struct task_struct *t)
 {
-	struct task_move_callback *callback;
-	struct rdtgroup *rdtgrp;
-
-	callback = container_of(head, struct task_move_callback, work);
-	rdtgrp = callback->rdtgrp;
-
-	/*
-	 * If resource group was deleted before this task work callback
-	 * was invoked, then assign the task to root group and free the
-	 * resource group.
-	 */
-	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
-	    (rdtgrp->flags & RDT_DELETED)) {
-		current->closid = 0;
-		current->rmid = 0;
-		rdtgroup_remove(rdtgrp);
-	}
+	int cpu;
 
-	if (unlikely(current->flags & PF_EXITING))
-		goto out;
+	if (task_on_cpu(t, &cpu))
+		smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
+}
 
-	preempt_disable();
-	/* update PQR_ASSOC MSR to make resource group go into effect */
-	resctrl_sched_in();
-	preempt_enable();
+#else
+static inline void
+set_task_cpumask(struct task_struct *t, struct cpumask *mask) { }
 
-out:
-	kfree(callback);
+static void update_task_closid_rmid(struct task_struct *t)
+{
+	_update_task_closid_rmid(t);
 }
+#endif
 
 static int __rdtgroup_move_task(struct task_struct *tsk,
 				struct rdtgroup *rdtgrp)
 {
-	struct task_move_callback *callback;
-	int ret;
-
-	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
-	if (!callback)
-		return -ENOMEM;
-	callback->work.func = move_myself;
-	callback->rdtgrp = rdtgrp;
-
 	/*
-	 * Take a refcount, so rdtgrp cannot be freed before the
-	 * callback has been invoked.
+	 * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+	 * updated by them.
+	 *
+	 * For ctrl_mon groups, move both closid and rmid.
+	 * For monitor groups, can move the tasks only from
+	 * their parent CTRL group.
 	 */
-	atomic_inc(&rdtgrp->waitcount);
-	ret = task_work_add(tsk, &callback->work, TWA_RESUME);
-	if (ret) {
-		/*
-		 * Task is exiting. Drop the refcount and free the callback.
-		 * No need to check the refcount as the group cannot be
-		 * deleted before the write function unlocks rdtgroup_mutex.
-		 */
-		atomic_dec(&rdtgrp->waitcount);
-		kfree(callback);
-		rdt_last_cmd_puts("Task exited\n");
-	} else {
-		/*
-		 * For ctrl_mon groups move both closid and rmid.
-		 * For monitor groups, can move the tasks only from
-		 * their parent CTRL group.
-		 */
-		if (rdtgrp->type == RDTCTRL_GROUP) {
-			tsk->closid = rdtgrp->closid;
+
+	if (rdtgrp->type == RDTCTRL_GROUP) {
+		tsk->closid = rdtgrp->closid;
+		tsk->rmid = rdtgrp->mon.rmid;
+	} else if (rdtgrp->type == RDTMON_GROUP) {
+		if (rdtgrp->mon.parent->closid == tsk->closid) {
 			tsk->rmid = rdtgrp->mon.rmid;
-		} else if (rdtgrp->type == RDTMON_GROUP) {
-			if (rdtgrp->mon.parent->closid == tsk->closid) {
-				tsk->rmid = rdtgrp->mon.rmid;
-			} else {
-				rdt_last_cmd_puts("Can't move task to different control group\n");
-				ret = -EINVAL;
-			}
+		} else {
+			rdt_last_cmd_puts("Can't move task to different control group\n");
+			return -EINVAL;
 		}
+	} else {
+		rdt_last_cmd_puts("Invalid resource group type\n");
+		return -EINVAL;
 	}
-	return ret;
+
+	/*
+	 * By now, the task's closid and rmid are set. If the task is current
+	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+	 * group go into effect. If the task is not current, the MSR will be
+	 * updated when the task is scheduled in.
+	 */
+	update_task_closid_rmid(tsk);
+
+	return 0;
 }
 
 static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] x86/resctrl: Don't move a task to the same resource group
       [not found] <cover.1607036601.git.reinette.chatre@intel.com>
  2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
  2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
@ 2020-12-03 23:25 ` Reinette Chatre
  2 siblings, 0 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-03 23:25 UTC (permalink / raw)
  To: tglx, fenghua.yu, bp, tony.luck
  Cc: kuo-lang.tseng, shakeelb, valentin.schneider, mingo, babu.moger,
	james.morse, hpa, x86, linux-kernel, Reinette Chatre, stable

From: Fenghua Yu <fenghua.yu@intel.com>

Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.

Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9d62f1fadcc3..d3523032265c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -584,6 +584,13 @@ static void update_task_closid_rmid(struct task_struct *t)
 static int __rdtgroup_move_task(struct task_struct *tsk,
 				struct rdtgroup *rdtgrp)
 {
+	/* If the task is already in rdtgrp, no need to move the task. */
+	if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
+	     tsk->rmid == rdtgrp->mon.rmid) ||
+	    (rdtgrp->type == RDTMON_GROUP && tsk->rmid == rdtgrp->mon.rmid &&
+	     tsk->closid == rdtgrp->mon.parent->closid))
+		return 0;
+
 	/*
 	 * Set the task's closid/rmid before the PQR_ASSOC MSR can be
 	 * updated by them.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
@ 2020-12-07 18:29   ` Borislav Petkov
  2020-12-07 21:24     ` Reinette Chatre
  2020-12-09 16:47   ` James Morse
  1 sibling, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2020-12-07 18:29 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, james.morse, hpa, x86,
	linux-kernel, stable

On Thu, Dec 03, 2020 at 03:25:48PM -0800, Reinette Chatre wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> The code of setting the CPU on which a task is running in a CPU mask is
> moved into a couple of helpers.

Pls read section "2) Describe your changes" in
Documentation/process/submitting-patches.rst for more details.

More specifically:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

> The new helper task_on_cpu() will be reused shortly.

"reused shortly"? I don't think so.

> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Cc: stable@vger.kernel.org

Fixes?

I guess the same commit from the other two:

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")

?

> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 +++++++++++++++++++-------
>  1 file changed, 34 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6f4ca4bea625..68db7d2dec8f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
>  	kfree(rdtgrp);
>  }
>  
> +#ifdef CONFIG_SMP
> +/* Get the CPU if the task is on it. */
> +static bool task_on_cpu(struct task_struct *t, int *cpu)
> +{
> +	/*
> +	 * This is safe on x86 w/o barriers as the ordering of writing to
> +	 * task_cpu() and t->on_cpu is reverse to the reading here. The
> +	 * detection is inaccurate as tasks might move or schedule before
> +	 * the smp function call takes place. In such a case the function
> +	 * call is pointless, but there is no other side effect.
> +	 */
> +	if (t->on_cpu) {
> +		*cpu = task_cpu(t);

Why have an I/O parameter when you can make it simply:

static int task_on_cpu(struct task_struct *t)
{
	if (t->on_cpu)
		return task_cpu(t);

	return -1;
}

> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void set_task_cpumask(struct task_struct *t, struct cpumask *mask)
> +{
> +	int cpu;
> +
> +	if (mask && task_on_cpu(t, &cpu))
> +		cpumask_set_cpu(cpu, mask);

And that you can turn into:

	if (!mask)
		return;

	cpu = task_on_cpu(t);
	if (cpu < 0)
		return;

	cpumask_set_cpu(cpu, mask);

Readable and simple.

Hmm?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-07 18:29   ` Borislav Petkov
@ 2020-12-07 21:24     ` Reinette Chatre
  2020-12-08  9:49       ` Borislav Petkov
  0 siblings, 1 reply; 16+ messages in thread
From: Reinette Chatre @ 2020-12-07 21:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, fenghua.yu, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, james.morse, hpa, x86,
	linux-kernel, stable

Hi Borislav,

Thank you very much for your review.

On 12/7/2020 10:29 AM, Borislav Petkov wrote:
> On Thu, Dec 03, 2020 at 03:25:48PM -0800, Reinette Chatre wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>>
>> The code of setting the CPU on which a task is running in a CPU mask is
>> moved into a couple of helpers.
> 
> Pls read section "2) Describe your changes" in
> Documentation/process/submitting-patches.rst for more details.
> 
> More specifically:
> 
> "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour."
> 
>> The new helper task_on_cpu() will be reused shortly.
> 
> "reused shortly"? I don't think so.


How about:
"Move the setting of the CPU on which a task is running in a CPU mask 
into a couple of helpers.

There is no functional change. This is a preparatory change for the fix 
in the following patch from where the Fixes tag is copied."

> 
>>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Cc: stable@vger.kernel.org
> 
> Fixes?
> 
> I guess the same commit from the other two:
> 
> Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
> 
> ?

Correct. I will add it. The addition to the commit message above aims to 
explain a Fixes tag to a patch with no functional changes.


>> ---
>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 47 +++++++++++++++++++-------
>>   1 file changed, 34 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6f4ca4bea625..68db7d2dec8f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
>>   	kfree(rdtgrp);
>>   }
>>   
>> +#ifdef CONFIG_SMP
>> +/* Get the CPU if the task is on it. */
>> +static bool task_on_cpu(struct task_struct *t, int *cpu)
>> +{
>> +	/*
>> +	 * This is safe on x86 w/o barriers as the ordering of writing to
>> +	 * task_cpu() and t->on_cpu is reverse to the reading here. The
>> +	 * detection is inaccurate as tasks might move or schedule before
>> +	 * the smp function call takes place. In such a case the function
>> +	 * call is pointless, but there is no other side effect.
>> +	 */
>> +	if (t->on_cpu) {
>> +		*cpu = task_cpu(t);
> 
> Why have an I/O parameter when you can make it simply:
> 
> static int task_on_cpu(struct task_struct *t)
> {
> 	if (t->on_cpu)
> 		return task_cpu(t);
> 
> 	return -1;
> }
> 
>> +
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static void set_task_cpumask(struct task_struct *t, struct cpumask *mask)
>> +{
>> +	int cpu;
>> +
>> +	if (mask && task_on_cpu(t, &cpu))
>> +		cpumask_set_cpu(cpu, mask);
> 
> And that you can turn into:
> 
> 	if (!mask)
> 		return;
> 
> 	cpu = task_on_cpu(t);
> 	if (cpu < 0)
> 		return;
> 
> 	cpumask_set_cpu(cpu, mask);
> 
> Readable and simple.
> 
> Hmm?
> 

Will do. Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-07 21:24     ` Reinette Chatre
@ 2020-12-08  9:49       ` Borislav Petkov
  2020-12-08 16:35         ` Reinette Chatre
  0 siblings, 1 reply; 16+ messages in thread
From: Borislav Petkov @ 2020-12-08  9:49 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, james.morse, hpa, x86,
	linux-kernel, stable

On Mon, Dec 07, 2020 at 01:24:51PM -0800, Reinette Chatre wrote:
> How about:
> "Move the setting of the CPU on which a task is running in a CPU mask into a
> couple of helpers.
> 
> There is no functional change. This is a preparatory change for the fix in
> the following patch from where the Fixes tag is copied."

Almost. Just not call it a "following patch" because once this is
applied, the following one might be a different one depending on the
ordering a git command has requested. So a "later patch" would be
probably better.

> Correct. I will add it. The addition to the commit message above aims to
> explain a Fixes tag to a patch with no functional changes.

Yes but you need to tell the stable people somehow that this one is a
prerequisite and that they should pick it up too.

Unless you can reorg your code this way that you don't need patch 1...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-08  9:49       ` Borislav Petkov
@ 2020-12-08 16:35         ` Reinette Chatre
  0 siblings, 0 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-08 16:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, fenghua.yu, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, james.morse, hpa, x86,
	linux-kernel, stable

Hi Borislav,

On 12/8/2020 1:49 AM, Borislav Petkov wrote:
> On Mon, Dec 07, 2020 at 01:24:51PM -0800, Reinette Chatre wrote:
>> How about:
>> "Move the setting of the CPU on which a task is running in a CPU mask into a
>> couple of helpers.
>>
>> There is no functional change. This is a preparatory change for the fix in
>> the following patch from where the Fixes tag is copied."
> 
> Almost. Just not call it a "following patch" because once this is
> applied, the following one might be a different one depending on the
> ordering a git command has requested. So a "later patch" would be
> probably better.

Indeed, will do. Thank you.

> 
>> Correct. I will add it. The addition to the commit message above aims to
>> explain a Fixes tag to a patch with no functional changes.
> 
> Yes but you need to tell the stable people somehow that this one is a
> prerequisite and that they should pick it up too.

Right. Thanks for guiding here.

> 
> Unless you can reorg your code this way that you don't need patch 1...

I think that the current organization, with patch 1 containing the 
preparatory work without functional changes, makes the fix in patch 2 
easier to review. I thus plan to keep the code organization as is while 
surely following your suggestion on how to support the stable team.

Thank you very much

Reinette


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
  2020-12-07 18:29   ` Borislav Petkov
@ 2020-12-09 16:47   ` James Morse
  2020-12-10  0:21     ` Reinette Chatre
  1 sibling, 1 reply; 16+ messages in thread
From: James Morse @ 2020-12-09 16:47 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu
  Cc: tglx, bp, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, hpa, x86, linux-kernel,
	stable

Hi Reinette, Fenghua,

On 03/12/2020 23:25, Reinette Chatre wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> The code of setting the CPU on which a task is running in a CPU mask is
> moved into a couple of helpers. The new helper task_on_cpu() will be
> reused shortly.

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6f4ca4bea625..68db7d2dec8f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)

> +#ifdef CONFIG_SMP

(using IS_ENABLED(CONFIG_SMP) lets the compiler check all the code in one go, then
dead-code-remove the stuff that will never happen... its also easier on the eye!)


> +/* Get the CPU if the task is on it. */
> +static bool task_on_cpu(struct task_struct *t, int *cpu)
> +{
> +	/*
> +	 * This is safe on x86 w/o barriers as the ordering of writing to
> +	 * task_cpu() and t->on_cpu is reverse to the reading here. The
> +	 * detection is inaccurate as tasks might move or schedule before
> +	 * the smp function call takes place. In such a case the function
> +	 * call is pointless, but there is no other side effect.
> +	 */

> +	if (t->on_cpu) {

kernel/sched/core.c calls out that there can be two tasks on one CPU with this set.
(grep astute)
I think that means this series will falsely match the old task for a CPU while the
scheduler is running, and IPI it unnecessarily.

task_curr() is the helper that knows not to do this.


> +		*cpu = task_cpu(t);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}


Thanks,

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
@ 2020-12-09 16:51   ` James Morse
  2020-12-10  0:22     ` Reinette Chatre
  2020-12-11 20:46   ` Valentin Schneider
  1 sibling, 1 reply; 16+ messages in thread
From: James Morse @ 2020-12-09 16:51 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu
  Cc: tglx, bp, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, hpa, x86, linux-kernel,
	stable

Hi Reinette, Fenghua,

Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the
guts of this change more quickly!

On 03/12/2020 23:25, Reinette Chatre wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Currently when moving a task to a resource group the PQR_ASSOC MSR
> is updated with the new closid and rmid in an added task callback.
> If the task is running the work is run as soon as possible. If the
> task is not running the work is executed later

> in the kernel exit path when the kernel returns to the task again.

kernel exit makes me thing of user-space... is it enough to just say:
"by __switch_to() when task is next run"?


> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
> is running is the right thing to do. Queueing work for a task that is
> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
> task is scheduled in) and causing system resource waste with the way in
> which it is implemented: Work to update the PQR_ASSOC register is queued
> every time the user writes a task id to the "tasks" file, even if the task
> already belongs to the resource group. This could result in multiple pending
> work items associated with a single task even if they are all identical and
> even though only a single update with most recent values is needed.
> Specifically, even if a task is moved between different resource groups
> while it is sleeping then it is only the last move that is relevant but
> yet a work item is queued during each move.
> This unnecessary queueing of work items could result in significant system
> resource waste, especially on tasks sleeping for a long time. For example,
> as demonstrated by Shakeel Butt in [1] writing the same task id to the
> "tasks" file can quickly consume significant memory. The same problem
> (wasted system resources) occurs when moving a task between different
> resource groups.
> 
> As pointed out by Valentin Schneider in [2] there is an additional issue with
> the way in which the queueing of work is done in that the task_struct update
> is currently done after the work is queued, resulting in a race with the
> register update possibly done before the data needed by the update is available.
> 
> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
> right after the new closid and rmid are ready during the task movement,
> only if the task is running. If a moved task is not running nothing is
> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
> This is the same way used to update the register when tasks are moved as
> part of resource group removal.

(as t->on_cpu is already used...)

Reviewed-by: James Morse <james.morse@arm.com>


> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 68db7d2dec8f..9d62f1fadcc3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)


> +static void update_task_closid_rmid(struct task_struct *t)
>  {
> +	int cpu;
>  
> +	if (task_on_cpu(t, &cpu))
> +		smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);


I think:
|	if (task_curr(t))
|		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);

here would make for an easier backport as it doesn't depend on the previous patch.


> +}

[...]

>  static int __rdtgroup_move_task(struct task_struct *tsk,
>  				struct rdtgroup *rdtgrp)
>  {

> +	if (rdtgrp->type == RDTCTRL_GROUP) {
> +		tsk->closid = rdtgrp->closid;
> +		tsk->rmid = rdtgrp->mon.rmid;
> +	} else if (rdtgrp->type == RDTMON_GROUP) {

[...]

> +	} else {

> +		rdt_last_cmd_puts("Invalid resource group type\n");
> +		return -EINVAL;

Wouldn't this be a kernel bug?
I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't
user-space's fault!


>  	}
> -	return ret;
> +
> +	/*
> +	 * By now, the task's closid and rmid are set. If the task is current
> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
> +	 * group go into effect. If the task is not current, the MSR will be
> +	 * updated when the task is scheduled in.
> +	 */
> +	update_task_closid_rmid(tsk);
> +
> +	return 0;
>  }


Thanks,

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers
  2020-12-09 16:47   ` James Morse
@ 2020-12-10  0:21     ` Reinette Chatre
  0 siblings, 0 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-10  0:21 UTC (permalink / raw)
  To: James Morse, fenghua.yu
  Cc: tglx, bp, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, hpa, x86, linux-kernel,
	stable

Hi James,

Thank you very much for your review.

On 12/9/2020 8:47 AM, James Morse wrote:
> Hi Reinette, Fenghua,
> 
> On 03/12/2020 23:25, Reinette Chatre wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>>
>> The code of setting the CPU on which a task is running in a CPU mask is
>> moved into a couple of helpers. The new helper task_on_cpu() will be
>> reused shortly.
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6f4ca4bea625..68db7d2dec8f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> 
>> +#ifdef CONFIG_SMP
> 
> (using IS_ENABLED(CONFIG_SMP) lets the compiler check all the code in one go, then
> dead-code-remove the stuff that will never happen... its also easier on the eye!)

Thank you for catching this. New fix (see below) uses this.

>> +/* Get the CPU if the task is on it. */
>> +static bool task_on_cpu(struct task_struct *t, int *cpu)
>> +{
>> +	/*
>> +	 * This is safe on x86 w/o barriers as the ordering of writing to
>> +	 * task_cpu() and t->on_cpu is reverse to the reading here. The
>> +	 * detection is inaccurate as tasks might move or schedule before
>> +	 * the smp function call takes place. In such a case the function
>> +	 * call is pointless, but there is no other side effect.
>> +	 */
> 
>> +	if (t->on_cpu) {
> 
> kernel/sched/core.c calls out that there can be two tasks on one CPU with this set.
> (grep astute)
> I think that means this series will falsely match the old task for a CPU while the
> scheduler is running, and IPI it unnecessarily.
> 
> task_curr() is the helper that knows not to do this.
> 

Thank you very much for catching this. I did not know this. This exposes 
an issue with the current implementation of moving tasks as part of 
directory removal. I now plan to replace this patch with a new fix to 
address this new issue you exposed: the fix will replace the current 
usage of t->on_cpu with task_curr(). Since I also follow your suggestion 
for patch #2 this original patch is no longer needed, which is something 
Borislav also suggested but I could not see a way to do it at the time.

This new fix does seem to fall into the "This could be a problem.." 
category of issues referred to in stable-kernel-rules.rst so while I 
plan on adding a Fixes tag I plan to not cc the stable team on this one. 
I am unsure about the right thing to do here so if you have an opinion I 
would appreciate it.

What do you think of this replacement patch (that will be moved to end 
of series)?

Reinette

----8<------
x86/resctrl: Replace t->on_cpu with task_curr() to prevent unnecessary IPI

James reported in [1] that there could be two tasks running on the same CPU
with t->on_cpu set. Using t->on_cpu as a test if a task is running on a
CPU may thus match the old task for a CPU while the scheduler is
running and IPI it unnecessarily.

task_curr() is the correct helper to use. While doing so move the #ifdef
check of the CONFIG_SMP symbol to be a C conditional used to determine
if this helper should be used so ensure the code is always checked for
correctness by the compiler.

[1] 
https://lore.kernel.org/lkml/a782d2f3-d2f6-795f-f4b1-9462205fd581@arm.com

Fixes: 0efc89be9471 ("x86/intel_rdt: Update task closid immediately on 
CPU in rmdir and unmount")
Reported-by: James Morse <james.morse@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 14 ++------------
  1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5e5a49f38fa1..c64fb37f0aec 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2317,19 +2317,9 @@ static void rdt_move_group_tasks(struct rdtgroup 
*from, struct rdtgroup *to,
  			t->closid = to->closid;
  			t->rmid = to->mon.rmid;

-#ifdef CONFIG_SMP
-			/*
-			 * This is safe on x86 w/o barriers as the ordering
-			 * of writing to task_cpu() and t->on_cpu is
-			 * reverse to the reading here. The detection is
-			 * inaccurate as tasks might move or schedule
-			 * before the smp function call takes place. In
-			 * such a case the function call is pointless, but
-			 * there is no other side effect.
-			 */
-			if (mask && t->on_cpu)
+			/* If the task is on a CPU, set the CPU in the mask. */
+			if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
  				cpumask_set_cpu(task_cpu(t), mask);
-#endif
  		}
  	}
  	read_unlock(&tasklist_lock);








^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-09 16:51   ` James Morse
@ 2020-12-10  0:22     ` Reinette Chatre
  0 siblings, 0 replies; 16+ messages in thread
From: Reinette Chatre @ 2020-12-10  0:22 UTC (permalink / raw)
  To: James Morse, fenghua.yu
  Cc: tglx, bp, tony.luck, kuo-lang.tseng, shakeelb,
	valentin.schneider, mingo, babu.moger, hpa, x86, linux-kernel,
	stable

Hi James,

On 12/9/2020 8:51 AM, James Morse wrote:
> Hi Reinette, Fenghua,
> 
> Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the
> guts of this change more quickly!

Sure. Thank you. A small change is that I plan to write "PQR_ASSOC MSR" 
instead to closer match the name.

> 
> On 03/12/2020 23:25, Reinette Chatre wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>>
>> Currently when moving a task to a resource group the PQR_ASSOC MSR
>> is updated with the new closid and rmid in an added task callback.
>> If the task is running the work is run as soon as possible. If the
>> task is not running the work is executed later
> 
>> in the kernel exit path when the kernel returns to the task again.
> 
> kernel exit makes me thing of user-space... is it enough to just say:
> "by __switch_to() when task is next run"?

I do not think that would be accurate. The paragraph to which you are 
proposing the change states the current context, before the fix, when 
task_work_add() is still used to update PQR_ASSOC. The "kernel exit" 
text you refer to is quite close to task_work_add()'s comments and 
indeed refers to the work that is run before returning to user space. If 
a function name would make things clearer perhaps adding 
exit_to_user_mode_loop() instead? Changing the text to __switch_to() 
does not reflect the context described here since __switch_to() is what 
is called when task is context switched back from where 
resctrl_sched_in() is called, not the queued work being described here.

>> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
>> is running is the right thing to do. Queueing work for a task that is
>> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
>> task is scheduled in) and causing system resource waste with the way in
>> which it is implemented: Work to update the PQR_ASSOC register is queued
>> every time the user writes a task id to the "tasks" file, even if the task
>> already belongs to the resource group. This could result in multiple pending
>> work items associated with a single task even if they are all identical and
>> even though only a single update with most recent values is needed.
>> Specifically, even if a task is moved between different resource groups
>> while it is sleeping then it is only the last move that is relevant but
>> yet a work item is queued during each move.
>> This unnecessary queueing of work items could result in significant system
>> resource waste, especially on tasks sleeping for a long time. For example,
>> as demonstrated by Shakeel Butt in [1] writing the same task id to the
>> "tasks" file can quickly consume significant memory. The same problem
>> (wasted system resources) occurs when moving a task between different
>> resource groups.
>>
>> As pointed out by Valentin Schneider in [2] there is an additional issue with
>> the way in which the queueing of work is done in that the task_struct update
>> is currently done after the work is queued, resulting in a race with the
>> register update possibly done before the data needed by the update is available.
>>
>> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
>> right after the new closid and rmid are ready during the task movement,
>> only if the task is running. If a moved task is not running nothing is
>> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
>> This is the same way used to update the register when tasks are moved as
>> part of resource group removal.
> 
> (as t->on_cpu is already used...)
> 
> Reviewed-by: James Morse <james.morse@arm.com>

Thank you very much. I do plan to follow your suggestion below though.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 68db7d2dec8f..9d62f1fadcc3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> 
> 
>> +static void update_task_closid_rmid(struct task_struct *t)
>>   {
>> +	int cpu;
>>   
>> +	if (task_on_cpu(t, &cpu))
>> +		smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
> 
> 
> I think:
> |	if (task_curr(t))
> |		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
> 
> here would make for an easier backport as it doesn't depend on the previous patch.
> 

Will do, thank you very much.

The previous patch has now become a bugfix in its own right after you 
pointed out the issue with using t->on_cpu. In addressing that I plan to 
remove the helpers found in patch #1 so backporting should continue to 
be easier.

> 
>> +}
> 
> [...]
> 
>>   static int __rdtgroup_move_task(struct task_struct *tsk,
>>   				struct rdtgroup *rdtgrp)
>>   {
> 
>> +	if (rdtgrp->type == RDTCTRL_GROUP) {
>> +		tsk->closid = rdtgrp->closid;
>> +		tsk->rmid = rdtgrp->mon.rmid;
>> +	} else if (rdtgrp->type == RDTMON_GROUP) {
> 
> [...]
> 
>> +	} else {
> 
>> +		rdt_last_cmd_puts("Invalid resource group type\n");
>> +		return -EINVAL;
> 
> Wouldn't this be a kernel bug?
> I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't
> user-space's fault!

You are right, this would be a kernel bug. I'll add a WARN_ON_ONCE().

> 
> 
>>   	}
>> -	return ret;
>> +
>> +	/*
>> +	 * By now, the task's closid and rmid are set. If the task is current
>> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>> +	 * group go into effect. If the task is not current, the MSR will be
>> +	 * updated when the task is scheduled in.
>> +	 */
>> +	update_task_closid_rmid(tsk);
>> +
>> +	return 0;
>>   }
> 

Thank you very much.

Reinette


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
  2020-12-09 16:51   ` James Morse
@ 2020-12-11 20:46   ` Valentin Schneider
  2020-12-14 18:41     ` Reinette Chatre
  1 sibling, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2020-12-11 20:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, bp, tony.luck, kuo-lang.tseng, shakeelb, mingo,
	babu.moger, james.morse, hpa, x86, linux-kernel, stable


On 03/12/20 23:25, Reinette Chatre wrote:
> Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Reported-by: Valentin Schneider <valentin.schneider@arm.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Cc: stable@vger.kernel.org

Some pedantic comments below; with James' task_curr() + task_cpu()
suggestion:

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

> ---
>  static int __rdtgroup_move_task(struct task_struct *tsk,
>                               struct rdtgroup *rdtgrp)
>  {
> -	struct task_move_callback *callback;
> -	int ret;
> -
> -	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
> -	if (!callback)
> -		return -ENOMEM;
> -	callback->work.func = move_myself;
> -	callback->rdtgrp = rdtgrp;
> -
>       /*
> -	 * Take a refcount, so rdtgrp cannot be freed before the
> -	 * callback has been invoked.
> +	 * Set the task's closid/rmid before the PQR_ASSOC MSR can be
> +	 * updated by them.
> +	 *
> +	 * For ctrl_mon groups, move both closid and rmid.
> +	 * For monitor groups, can move the tasks only from
> +	 * their parent CTRL group.
>        */
> -	atomic_inc(&rdtgrp->waitcount);
> -	ret = task_work_add(tsk, &callback->work, TWA_RESUME);
> -	if (ret) {
> -		/*
> -		 * Task is exiting. Drop the refcount and free the callback.
> -		 * No need to check the refcount as the group cannot be
> -		 * deleted before the write function unlocks rdtgroup_mutex.
> -		 */
> -		atomic_dec(&rdtgrp->waitcount);
> -		kfree(callback);
> -		rdt_last_cmd_puts("Task exited\n");
> -	} else {
> -		/*
> -		 * For ctrl_mon groups move both closid and rmid.
> -		 * For monitor groups, can move the tasks only from
> -		 * their parent CTRL group.
> -		 */
> -		if (rdtgrp->type == RDTCTRL_GROUP) {
> -			tsk->closid = rdtgrp->closid;
> +
> +	if (rdtgrp->type == RDTCTRL_GROUP) {
> +		tsk->closid = rdtgrp->closid;
> +		tsk->rmid = rdtgrp->mon.rmid;
> +	} else if (rdtgrp->type == RDTMON_GROUP) {
> +		if (rdtgrp->mon.parent->closid == tsk->closid) {
>                       tsk->rmid = rdtgrp->mon.rmid;
> -		} else if (rdtgrp->type == RDTMON_GROUP) {
> -			if (rdtgrp->mon.parent->closid == tsk->closid) {
> -				tsk->rmid = rdtgrp->mon.rmid;
> -			} else {
> -				rdt_last_cmd_puts("Can't move task to different control group\n");
> -				ret = -EINVAL;
> -			}
> +		} else {
> +			rdt_last_cmd_puts("Can't move task to different control group\n");
> +			return -EINVAL;
>               }
> +	} else {
> +		rdt_last_cmd_puts("Invalid resource group type\n");
> +		return -EINVAL;
>       }

James already pointed out this should be a WARN_ON_ONCE(), but is that the
right place to assert rdtgrp->type validity?

I see only a single assignment to rdtgrp->type in mkdir_rdt_prepare();
could we fail the group creation there instead if the passed rtype is
invalid?

> -	return ret;
> +
> +	/*
> +	 * By now, the task's closid and rmid are set. If the task is current
> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
> +	 * group go into effect. If the task is not current, the MSR will be
> +	 * updated when the task is scheduled in.
> +	 */
> +	update_task_closid_rmid(tsk);

We need the above writes to be compile-ordered before the IPI is sent.
There *is* a preempt_disable() down in smp_call_function_single() that
gives us the required barrier(), can we deem that sufficient or would we
want one before update_task_closid_rmid() for the sake of clarity?

> +
> +	return 0;
>  }
>
>  static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-11 20:46   ` Valentin Schneider
@ 2020-12-14 18:41     ` Reinette Chatre
  2020-12-16 17:41       ` Valentin Schneider
  0 siblings, 1 reply; 16+ messages in thread
From: Reinette Chatre @ 2020-12-14 18:41 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, fenghua.yu, bp, tony.luck, kuo-lang.tseng, shakeelb, mingo,
	babu.moger, james.morse, hpa, x86, linux-kernel, stable

Hi Valentin,

On 12/11/2020 12:46 PM, Valentin Schneider wrote:
> 
> On 03/12/20 23:25, Reinette Chatre wrote:
>> Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
>> Reported-by: Shakeel Butt <shakeelb@google.com>
>> Reported-by: Valentin Schneider <valentin.schneider@arm.com>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Cc: stable@vger.kernel.org
> 
> Some pedantic comments below; with James' task_curr() + task_cpu()
> suggestion:
> 
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

Thank you very much.


>> ---
...

>> +	if (rdtgrp->type == RDTCTRL_GROUP) {
>> +		tsk->closid = rdtgrp->closid;
>> +		tsk->rmid = rdtgrp->mon.rmid;
>> +	} else if (rdtgrp->type == RDTMON_GROUP) {
>> +		if (rdtgrp->mon.parent->closid == tsk->closid) {
>>                        tsk->rmid = rdtgrp->mon.rmid;
>> -		} else if (rdtgrp->type == RDTMON_GROUP) {
>> -			if (rdtgrp->mon.parent->closid == tsk->closid) {
>> -				tsk->rmid = rdtgrp->mon.rmid;
>> -			} else {
>> -				rdt_last_cmd_puts("Can't move task to different control group\n");
>> -				ret = -EINVAL;
>> -			}
>> +		} else {
>> +			rdt_last_cmd_puts("Can't move task to different control group\n");
>> +			return -EINVAL;
>>                }
>> +	} else {
>> +		rdt_last_cmd_puts("Invalid resource group type\n");
>> +		return -EINVAL;
>>        }
> 
> James already pointed out this should be a WARN_ON_ONCE(), but is that the
> right place to assert rdtgrp->type validity?
> 
> I see only a single assignment to rdtgrp->type in mkdir_rdt_prepare();
> could we fail the group creation there instead if the passed rtype is
> invalid?

Yes, there is that single assignment in mkdir_rdt_prepare() and looking 
at how mkdir_rdt_prepare() is called it is only ever called with 
RDTMON_GROUP or RDTCTRL_GROUP. This additional error checking was added 
as part of this fix but unrelated to the fix itself. Since you and James 
both pointed out flaws with it and it is unnecessary I will remove it 
here and maintain the original checking that continues to be sufficient.

>> -	return ret;
>> +
>> +	/*
>> +	 * By now, the task's closid and rmid are set. If the task is current
>> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>> +	 * group go into effect. If the task is not current, the MSR will be
>> +	 * updated when the task is scheduled in.
>> +	 */
>> +	update_task_closid_rmid(tsk);
> 
> We need the above writes to be compile-ordered before the IPI is sent.
> There *is* a preempt_disable() down in smp_call_function_single() that
> gives us the required barrier(), can we deem that sufficient or would we
> want one before update_task_closid_rmid() for the sake of clarity?
> 

Apologies, it is not clear to me why the preempt_disable() would be 
insufficient. If it is not then there may be a few other areas (where 
resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.

Thank you very much

Reinette


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-14 18:41     ` Reinette Chatre
@ 2020-12-16 17:41       ` Valentin Schneider
  2020-12-16 18:26         ` Reinette Chatre
  0 siblings, 1 reply; 16+ messages in thread
From: Valentin Schneider @ 2020-12-16 17:41 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, bp, tony.luck, kuo-lang.tseng, shakeelb, mingo,
	babu.moger, james.morse, hpa, x86, linux-kernel, stable


On 14/12/20 18:41, Reinette Chatre wrote:
>>> -	return ret;
>>> +
>>> +	/*
>>> +	 * By now, the task's closid and rmid are set. If the task is current
>>> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>>> +	 * group go into effect. If the task is not current, the MSR will be
>>> +	 * updated when the task is scheduled in.
>>> +	 */
>>> +	update_task_closid_rmid(tsk);
>>
>> We need the above writes to be compile-ordered before the IPI is sent.
>> There *is* a preempt_disable() down in smp_call_function_single() that
>> gives us the required barrier(), can we deem that sufficient or would we
>> want one before update_task_closid_rmid() for the sake of clarity?
>>
>
> Apologies, it is not clear to me why the preempt_disable() would be
> insufficient. If it is not then there may be a few other areas (where
> resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.

So that's part paranoia and part nonsense from my end - the contents of
smp_call() shouldn't matter here.

If we distill the code to:

  tsk->closid = x;

  if (task_curr(tsk))
      smp_call(...);

It is somewhat far fetched, but AFAICT this can be compiled as:

  if (task_curr(tsk))
      tsk->closid = x;
      smp_call(...);
  else
      tsk->closid = x;

IOW, there could be a sequence where the closid write is ordered *after*
the task_curr() read. With

  tsk->closid = x;

  barrier();

  if (task_curr(tsk))
      smp_call(...);

that explicitely cannot happen.

>
> Thank you very much
>
> Reinette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-16 17:41       ` Valentin Schneider
@ 2020-12-16 18:26         ` Reinette Chatre
  2020-12-17 10:39           ` Valentin Schneider
  0 siblings, 1 reply; 16+ messages in thread
From: Reinette Chatre @ 2020-12-16 18:26 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: tglx, fenghua.yu, bp, tony.luck, kuo-lang.tseng, shakeelb, mingo,
	babu.moger, james.morse, hpa, x86, linux-kernel, stable

Hi Valentin,

On 12/16/2020 9:41 AM, Valentin Schneider wrote:
> 
> On 14/12/20 18:41, Reinette Chatre wrote:
>>>> -	return ret;
>>>> +
>>>> +	/*
>>>> +	 * By now, the task's closid and rmid are set. If the task is current
>>>> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>>>> +	 * group go into effect. If the task is not current, the MSR will be
>>>> +	 * updated when the task is scheduled in.
>>>> +	 */
>>>> +	update_task_closid_rmid(tsk);
>>>
>>> We need the above writes to be compile-ordered before the IPI is sent.
>>> There *is* a preempt_disable() down in smp_call_function_single() that
>>> gives us the required barrier(), can we deem that sufficient or would we
>>> want one before update_task_closid_rmid() for the sake of clarity?
>>>
>>
>> Apologies, it is not clear to me why the preempt_disable() would be
>> insufficient. If it is not then there may be a few other areas (where
>> resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.
> 
> So that's part paranoia and part nonsense from my end - the contents of
> smp_call() shouldn't matter here.
> 
> If we distill the code to:
> 
>    tsk->closid = x;
> 
>    if (task_curr(tsk))
>        smp_call(...);
> 
> It is somewhat far fetched, but AFAICT this can be compiled as:
> 
>    if (task_curr(tsk))
>        tsk->closid = x;
>        smp_call(...);
>    else
>        tsk->closid = x;
> 
> IOW, there could be a sequence where the closid write is ordered *after*
> the task_curr() read.

Could you please elaborate why it would be an issue is the closid write 
is ordered after the task_curr() read? task_curr() does not depend on 
the closid.

> With
> 
>    tsk->closid = x;
> 
>    barrier();
> 
>    if (task_curr(tsk))
>        smp_call(...);
> 
> that explicitely cannot happen.
> 


Reinette

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
  2020-12-16 18:26         ` Reinette Chatre
@ 2020-12-17 10:39           ` Valentin Schneider
  0 siblings, 0 replies; 16+ messages in thread
From: Valentin Schneider @ 2020-12-17 10:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, bp, tony.luck, kuo-lang.tseng, shakeelb, mingo,
	babu.moger, james.morse, hpa, x86, linux-kernel, stable


On 16/12/20 18:26, Reinette Chatre wrote:
> Hi Valentin,
>> So that's part paranoia and part nonsense from my end - the contents of
>> smp_call() shouldn't matter here.
>>
>> If we distill the code to:
>>
>>    tsk->closid = x;
>>
>>    if (task_curr(tsk))
>>        smp_call(...);
>>
>> It is somewhat far fetched, but AFAICT this can be compiled as:
>>
>>    if (task_curr(tsk))
>>        tsk->closid = x;
>>        smp_call(...);
>>    else
>>        tsk->closid = x;
>>
>> IOW, there could be a sequence where the closid write is ordered *after*
>> the task_curr() read.
>
> Could you please elaborate why it would be an issue is the closid write
> is ordered after the task_curr() read? task_curr() does not depend on
> the closid.
>

IMO the 'task_curr()' check only makes sense if it happens *after* the
write, the logic being: 'closid/rmid has been written to, does the task now
need interrupting?'

In the above 'else' clause, task_curr() would need to be re-evaluated after
the write: the task wasn't current *before* the write, but nothing
guarantees this still holds *after* the write.

>> With
>>
>>    tsk->closid = x;
>>
>>    barrier();
>>
>>    if (task_curr(tsk))
>>        smp_call(...);
>>
>> that explicitely cannot happen.
>>
>
>
> Reinette

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-12-17 10:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1607036601.git.reinette.chatre@intel.com>
2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
2020-12-07 18:29   ` Borislav Petkov
2020-12-07 21:24     ` Reinette Chatre
2020-12-08  9:49       ` Borislav Petkov
2020-12-08 16:35         ` Reinette Chatre
2020-12-09 16:47   ` James Morse
2020-12-10  0:21     ` Reinette Chatre
2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
2020-12-09 16:51   ` James Morse
2020-12-10  0:22     ` Reinette Chatre
2020-12-11 20:46   ` Valentin Schneider
2020-12-14 18:41     ` Reinette Chatre
2020-12-16 17:41       ` Valentin Schneider
2020-12-16 18:26         ` Reinette Chatre
2020-12-17 10:39           ` Valentin Schneider
2020-12-03 23:25 ` [PATCH 3/3] x86/resctrl: Don't move a task to the same " Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).