All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Shakeel Butt <shakeelb@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Borislav Petkov <bp@suse.de>, Tony Luck <tony.luck@intel.com>,
	James Morse <james.morse@arm.com>
Subject: [PATCH 4.19 16/43] x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
Date: Fri, 15 Jan 2021 13:27:46 +0100	[thread overview]
Message-ID: <20210115121957.832790615@linuxfoundation.org> (raw)
In-Reply-To: <20210115121957.037407908@linuxfoundation.org>

From: Fenghua Yu <fenghua.yu@intel.com>

commit ae28d1aae48a1258bd09a6f707ebb4231d79a761 upstream

Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.

Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.

This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.

This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.

As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.

To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com

 [ bp: Massage commit message and drop the two update_task_closid_rmid()
   variants. ]

Backporting notes:

Since upstream commit fa7d949337cc ("x86/resctrl: Rename and move rdt
files to a separate directory"), the file
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c has been renamed and moved to
arch/x86/kernel/cpu/resctrl/rdtgroup.c.
Apply the change against file arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
for older stable trees.

Since upstream commit 352940ececaca ("x86/resctrl: Rename the RDT
functions and definitions"), resctrl functions received more generic
names. Specifically related to this backport, intel_rdt_sched_in()
was renamed to rescrl_sched_in().

Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  108 ++++++++++++-------------------
 1 file changed, 43 insertions(+), 65 deletions(-)

--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -533,85 +533,63 @@ static void rdtgroup_remove(struct rdtgr
 	kfree(rdtgrp);
 }
 
-struct task_move_callback {
-	struct callback_head	work;
-	struct rdtgroup		*rdtgrp;
-};
-
-static void move_myself(struct callback_head *head)
+static void _update_task_closid_rmid(void *task)
 {
-	struct task_move_callback *callback;
-	struct rdtgroup *rdtgrp;
-
-	callback = container_of(head, struct task_move_callback, work);
-	rdtgrp = callback->rdtgrp;
-
 	/*
-	 * If resource group was deleted before this task work callback
-	 * was invoked, then assign the task to root group and free the
-	 * resource group.
+	 * If the task is still current on this CPU, update PQR_ASSOC MSR.
+	 * Otherwise, the MSR is updated when the task is scheduled in.
 	 */
-	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
-	    (rdtgrp->flags & RDT_DELETED)) {
-		current->closid = 0;
-		current->rmid = 0;
-		rdtgroup_remove(rdtgrp);
-	}
-
-	preempt_disable();
-	/* update PQR_ASSOC MSR to make resource group go into effect */
-	intel_rdt_sched_in();
-	preempt_enable();
+	if (task == current)
+		intel_rdt_sched_in();
+}
 
-	kfree(callback);
+static void update_task_closid_rmid(struct task_struct *t)
+{
+	if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
+		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+	else
+		_update_task_closid_rmid(t);
 }
 
 static int __rdtgroup_move_task(struct task_struct *tsk,
 				struct rdtgroup *rdtgrp)
 {
-	struct task_move_callback *callback;
-	int ret;
-
-	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
-	if (!callback)
-		return -ENOMEM;
-	callback->work.func = move_myself;
-	callback->rdtgrp = rdtgrp;
-
 	/*
-	 * Take a refcount, so rdtgrp cannot be freed before the
-	 * callback has been invoked.
+	 * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+	 * updated by them.
+	 *
+	 * For ctrl_mon groups, move both closid and rmid.
+	 * For monitor groups, can move the tasks only from
+	 * their parent CTRL group.
 	 */
-	atomic_inc(&rdtgrp->waitcount);
-	ret = task_work_add(tsk, &callback->work, true);
-	if (ret) {
-		/*
-		 * Task is exiting. Drop the refcount and free the callback.
-		 * No need to check the refcount as the group cannot be
-		 * deleted before the write function unlocks rdtgroup_mutex.
-		 */
-		atomic_dec(&rdtgrp->waitcount);
-		kfree(callback);
-		rdt_last_cmd_puts("task exited\n");
-	} else {
-		/*
-		 * For ctrl_mon groups move both closid and rmid.
-		 * For monitor groups, can move the tasks only from
-		 * their parent CTRL group.
-		 */
-		if (rdtgrp->type == RDTCTRL_GROUP) {
-			tsk->closid = rdtgrp->closid;
+
+	if (rdtgrp->type == RDTCTRL_GROUP) {
+		tsk->closid = rdtgrp->closid;
+		tsk->rmid = rdtgrp->mon.rmid;
+	} else if (rdtgrp->type == RDTMON_GROUP) {
+		if (rdtgrp->mon.parent->closid == tsk->closid) {
 			tsk->rmid = rdtgrp->mon.rmid;
-		} else if (rdtgrp->type == RDTMON_GROUP) {
-			if (rdtgrp->mon.parent->closid == tsk->closid) {
-				tsk->rmid = rdtgrp->mon.rmid;
-			} else {
-				rdt_last_cmd_puts("Can't move task to different control group\n");
-				ret = -EINVAL;
-			}
+		} else {
+			rdt_last_cmd_puts("Can't move task to different control group\n");
+			return -EINVAL;
 		}
 	}
-	return ret;
+
+	/*
+	 * Ensure the task's closid and rmid are written before determining if
+	 * the task is current that will decide if it will be interrupted.
+	 */
+	barrier();
+
+	/*
+	 * By now, the task's closid and rmid are set. If the task is current
+	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+	 * group go into effect. If the task is not current, the MSR will be
+	 * updated when the task is scheduled in.
+	 */
+	update_task_closid_rmid(tsk);
+
+	return 0;
 }
 
 /**



  parent reply	other threads:[~2021-01-15 12:33 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15 12:27 [PATCH 4.19 00/43] 4.19.168-rc1 review Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 01/43] net: cdc_ncm: correct overhead in delayed_ndp_size Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 02/43] net: hns3: fix the number of queues actually used by ARQ Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 03/43] net: stmmac: dwmac-sun8i: Balance internal PHY resource references Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 04/43] net: stmmac: dwmac-sun8i: Balance internal PHY power Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 05/43] net: vlan: avoid leaks on register_vlan_dev() failures Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 06/43] net/sonic: Fix some resource leaks in error handling paths Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 07/43] net: ip: always refragment ip defragmented packets Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 08/43] net: fix pmtu check in nopmtudisc mode Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 09/43] net: ipv6: fib: flush exceptions when purging route Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 10/43] chtls: Fix hardware tid leak Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 11/43] chtls: Remove invalid set_tcb call Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 12/43] chtls: Fix panic when route to peer not configured Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 13/43] chtls: Replace skb_dequeue with skb_peek Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 14/43] chtls: Added a check to avoid NULL pointer dereference Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 15/43] chtls: Fix chtls resources release sequence Greg Kroah-Hartman
2021-01-15 12:27 ` Greg Kroah-Hartman [this message]
2021-01-15 12:27 ` [PATCH 4.19 17/43] x86/resctrl: Dont move a task to the same resource group Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 18/43] vmlinux.lds.h: Add PGO and AutoFDO input sections Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 19/43] drm/i915: Fix mismatch between misplaced vma check and vma insert Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 20/43] spi: pxa2xx: Fix use-after-free on unbind Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 21/43] iio: imu: st_lsm6dsx: flip irq return logic Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 22/43] iio: imu: st_lsm6dsx: fix edge-trigger interrupts Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 23/43] HID: wacom: Fix memory leakage caused by kfifo_alloc Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 24/43] ARM: OMAP2+: omap_device: fix idling of devices during probe Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 25/43] i2c: sprd: use a specific timeout to avoid system hang up issue Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 26/43] cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get() Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 27/43] spi: stm32: FIFO threshold level - fix align packet size Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 28/43] dmaengine: mediatek: mtk-hsdma: Fix a resource leak in the error handling path of the probe function Greg Kroah-Hartman
2021-01-15 12:27 ` [PATCH 4.19 29/43] dmaengine: xilinx_dma: check dma_async_device_register return value Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 30/43] dmaengine: xilinx_dma: fix incompatible param warning in _child_probe() Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 31/43] dmaengine: xilinx_dma: fix mixed_enum_type coverity warning Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 32/43] wil6210: select CONFIG_CRC32 Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 33/43] block: rsxx: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 34/43] lightnvm: " Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 35/43] iommu/intel: Fix memleak in intel_irq_remapping_alloc Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 36/43] net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 37/43] net/mlx5e: Fix two double free cases Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 38/43] regmap: debugfs: Fix a memory leak when calling regmap_attach_dev Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 39/43] wan: ds26522: select CONFIG_BITREVERSE Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 40/43] KVM: arm64: Dont access PMCR_EL0 when no PMU is available Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 41/43] block: fix use-after-free in disk_part_iter_next Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 42/43] net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet Greg Kroah-Hartman
2021-01-15 12:28 ` [PATCH 4.19 43/43] regmap: debugfs: Fix a reversed if statement in regmap_debugfs_init() Greg Kroah-Hartman
2021-01-15 16:19 ` [PATCH 4.19 00/43] 4.19.168-rc1 review Jon Hunter
2021-01-15 21:14 ` Shuah Khan
2021-01-15 21:18 ` Guenter Roeck
2021-01-16  6:03 ` Naresh Kamboju
2021-01-16  7:58 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210115121957.832790615@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bp@suse.de \
    --cc=fenghua.yu@intel.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.