All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking
@ 2023-09-14 17:21 James Morse
  2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
                   ` (24 more replies)
  0 siblings, 25 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
in a way that works for MPAM, and it separates the locking so that the arch code
and filesystem code don't have to share a mutex. I tried to split this as two
series, but these touch similar call sites, so it would create more work.

(What's MPAM? See the cover letter of the first series. [1])

On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
isn't an independent number - it extends the PARTID (same as CLOSID) space
with bits that aren't used to select the configuration. The monitors can
then be told to match specific PMG values, allowing monitor-groups to be
created.

But, MPAM expects the monitors to always monitor by PARTID. The
Cache-storage-utilisation counters can only work this way.
(In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
this)

It gets worse, as some SoCs may have very few PMG bits. I've seen the
datasheet for one that has a single bit of PMG space.

To be usable, MPAM's counters always need the PARTID and the PMG.
For resctrl, this means always making the CLOSID available when the RMID
is used.

To ensure RMID are always unique, this series combines the CLOSID and RMID
into an index, and manages RMID based on that. For x86, the index and RMID
would always be the same.


Currently the architecture specific code in the cpuhp callbacks takes the
rdtgroup_mutex. This means the filesystem code would have to export this
lock, resulting in an ill-defined interface between the two, and the possibility
of cross-architecture lock-ordering head aches.

The second part of this series adds a domain_list_lock to protect writes to the
domain list, and protects the domain list with RCU - or cpus_read_lock().

Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
working, its very likely they'll need to be plumbed up to perf. An uncore PMU
driver would need to be a lockless reader of the domain list.

This series is based on v6.6-rc1, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v6

Bugs welcome,

Thanks,

James

[1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/
[v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/
[v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/
[v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com 
[v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com
[v6] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/


James Morse (24):
  tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
  x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  x86/resctrl: Create helper for RMID allocation and mondata dir
    creation
  x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  x86/resctrl: Track the closid with the rmid
  x86/resctrl: Access per-rmid structures by index
  x86/resctrl: Allow RMID allocation to be scoped by CLOSID
  x86/resctrl: Track the number of dirty RMID a CLOSID has
  x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  x86/resctrl: Allocate the cleanest CLOSID by searching
    closid_num_dirty_rmid
  x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
  x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
  x86/resctrl: Queue mon_event_read() instead of sending an IPI
  x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  x86/resctrl: Allow arch to allocate memory needed in
    resctrl_arch_rmid_read()
  x86/resctrl: Make resctrl_mounted checks explicit
  x86/resctrl: Move alloc/mon static keys into helpers
  x86/resctrl: Make rdt_enable_key the arch's decision to switch
  x86/resctrl: Add helpers for system wide mon/alloc capable
  x86/resctrl: Add CPU online callback for resctrl work
  x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
    cpu
  x86/resctrl: Add cpu offline callback for resctrl work
  x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
  x86/resctrl: Separate arch and fs resctrl locks

 arch/x86/include/asm/resctrl.h            |  90 +++++
 arch/x86/kernel/cpu/resctrl/core.c        |  78 ++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  47 ++-
 arch/x86/kernel/cpu/resctrl/internal.h    |  56 ++-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 434 +++++++++++++++++-----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  15 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 345 ++++++++++++-----
 include/linux/resctrl.h                   |  43 ++-
 include/linux/tick.h                      |   9 +-
 9 files changed, 857 insertions(+), 260 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-09-26 14:31   ` Fenghua Yu
  2023-10-03 21:05   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() James Morse
                   ` (23 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

tick_nohz_full_mask lists the CPUs that are nohz_full. This is only
needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu() allows
a specific CPU to be tested against the mask, and evaluates to false
when CONFIG_NO_HZ_FULL is not defined.

The resctrl code needs to pick a CPU to run some work on, a new helper
prefers housekeeping CPUs by examining the tick_nohz_full_mask. Hiding
the declaration behind #ifdef CONFIG_NO_HZ_FULL forces all the users to
be behind an ifdef too.

Move the tick_nohz_full_mask declaration, this lets callers drop the
ifdef, and guard access to tick_nohz_full_mask with IS_ENABLED() or
something like tick_nohz_full_cpu().

The definition does not need to be moved as any callers should be
removed at compile time unless CONFIG_NO_HZ_FULL is defined.

CC: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/tick.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9459fef5b857..65af90ca409a 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -174,9 +174,16 @@ static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
 static inline void tick_nohz_idle_stop_tick_protected(void) { }
 #endif /* !CONFIG_NO_HZ_COMMON */
 
+/*
+ * Mask of CPUs that are nohz_full.
+ *
+ * Users should be guarded by CONFIG_NO_HZ_FULL or a tick_nohz_full_cpu()
+ * check.
+ */
+extern cpumask_var_t tick_nohz_full_mask;
+
 #ifdef CONFIG_NO_HZ_FULL
 extern bool tick_nohz_full_running;
-extern cpumask_var_t tick_nohz_full_mask;
 
 static inline bool tick_nohz_full_enabled(void)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
  2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-02 17:00   ` Reinette Chatre
  2023-10-04 18:00   ` Moger, Babu
  2023-09-14 17:21 ` [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

rmid_ptrs[] is allocated from dom_data_init() but never free()d.

While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.

Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().

There is no reason to backport this to a stable kernel.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v5:
 * This patch is new
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 10 ++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  5 +++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 85ceaf9a31ac..57cf1e6a57bd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -537,6 +537,7 @@ void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
+void resctrl_exit_mon_l3_config(struct rdt_resource *r);
 bool __init rdt_cpu_has(int flag);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index ded1fc7cb7cb..cfb3f632a4b2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
 	return 0;
 }
 
+void resctrl_exit_mon_l3_config(struct rdt_resource *r)
+{
+	mutex_lock(&rdtgroup_mutex);
+
+	kfree(rmid_ptrs);
+	rmid_ptrs = NULL;
+
+	mutex_unlock(&rdtgroup_mutex);
+}
+
 static struct mon_evt llc_occupancy_event = {
 	.name		= "llc_occupancy",
 	.evtid		= QOS_L3_OCCUP_EVENT_ID,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 725344048f85..a2158c266e41 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
 
 void __exit rdtgroup_exit(void)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+	if (r->mon_capable)
+		resctrl_exit_mon_l3_config(r);
+
 	debugfs_remove_recursive(debugfs_resctrl);
 	unregister_filesystem(&rdt_fs_type);
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
  2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
  2023-09-14 17:21 ` [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:07   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

When monitoring is supported, each monitor and control group is allocated
an RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to
allocate the CLOSID.

MPAM's equivalent of RMID are not an independent number, so can't be
allocated until the CLOSID is known. An RMID allocation for one CLOSID
may fail, whereas another may succeed depending on how many monitor
groups a control group has.

The RMID allocation needs to move to be after the CLOSID has been
allocated.

Move the RMID allocation and mondata dir creation to a helper, this
makes a subsequent change easier to read.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v4:
 * Fixed typo in commit message, moved some words around.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 42 +++++++++++++++++---------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a2158c266e41..7a7369a323b5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3165,6 +3165,30 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 	return ret;
 }
 
+static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
+{
+	int ret;
+
+	if (!rdt_mon_capable)
+		return 0;
+
+	ret = alloc_rmid();
+	if (ret < 0) {
+		rdt_last_cmd_puts("Out of RMIDs\n");
+		return ret;
+	}
+	rdtgrp->mon.rmid = ret;
+
+	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
+	if (ret) {
+		rdt_last_cmd_puts("kernfs subdir error\n");
+		free_rmid(rdtgrp->mon.rmid);
+		return ret;
+	}
+
+	return 0;
+}
+
 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 			     const char *name, umode_t mode,
 			     enum rdt_group_type rtype, struct rdtgroup **r)
@@ -3230,20 +3254,10 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 		goto out_destroy;
 	}
 
-	if (rdt_mon_capable) {
-		ret = alloc_rmid();
-		if (ret < 0) {
-			rdt_last_cmd_puts("Out of RMIDs\n");
-			goto out_destroy;
-		}
-		rdtgrp->mon.rmid = ret;
+	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
+	if (ret)
+		goto out_destroy;
 
-		ret = mkdir_mondata_all(kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
-		if (ret) {
-			rdt_last_cmd_puts("kernfs subdir error\n");
-			goto out_idfree;
-		}
-	}
 	kernfs_activate(kn);
 
 	/*
@@ -3251,8 +3265,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 	 */
 	return 0;
 
-out_idfree:
-	free_rmid(rdtgrp->mon.rmid);
 out_destroy:
 	kernfs_put(rdtgrp->kn);
 	kernfs_remove(rdtgrp->kn);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (2 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:07   ` Reinette Chatre
  2023-10-04 18:01   ` Moger, Babu
  2023-09-14 17:21 ` [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid James Morse
                   ` (20 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

RMID are allocated for each monitor or control group directory, because
each of these needs its own RMID. For control groups,
rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID.

MPAM's equivalent of RMID is not an independent number, so can't be
allocated until the CLOSID is known. An RMID allocation for one CLOSID
may fail, whereas another may succeed depending on how many monitor
groups a control group has.

The RMID allocation needs to move to be after the CLOSID has been
allocated.

Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller,
after the mkdir_rdt_prepare() call. This allows the RMID allocator to
know the CLOSID.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Moved kernfs_activate() later to preserve atomicity of files being visible

Changes since v5:
 * Renamed out_id_free as out_closid_free.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 +++++++++++++++++++-------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 7a7369a323b5..d25cb8c9a20e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3189,6 +3189,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	return 0;
 }
 
+static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
+{
+	if (rdt_mon_capable)
+		free_rmid(rgrp->mon.rmid);
+}
+
 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 			     const char *name, umode_t mode,
 			     enum rdt_group_type rtype, struct rdtgroup **r)
@@ -3254,12 +3260,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 		goto out_destroy;
 	}
 
-	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
-	if (ret)
-		goto out_destroy;
-
-	kernfs_activate(kn);
-
 	/*
 	 * The caller unlocks the parent_kn upon success.
 	 */
@@ -3278,7 +3278,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp)
 {
 	kernfs_remove(rgrp->kn);
-	free_rmid(rgrp->mon.rmid);
 	rdtgroup_remove(rgrp);
 }
 
@@ -3300,12 +3299,21 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
 	prgrp = rdtgrp->mon.parent;
 	rdtgrp->closid = prgrp->closid;
 
+	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
+	if (ret) {
+		mkdir_rdt_prepare_clean(rdtgrp);
+		goto out_unlock;
+	}
+
+	kernfs_activate(rdtgrp->kn);
+
 	/*
 	 * Add the rdtgrp to the list of rdtgrps the parent
 	 * ctrl_mon group has to track.
 	 */
 	list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list);
 
+out_unlock:
 	rdtgroup_kn_unlock(parent_kn);
 	return ret;
 }
@@ -3336,9 +3344,16 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	ret = 0;
 
 	rdtgrp->closid = closid;
+
+	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
+	if (ret)
+		goto out_closid_free;
+
+	kernfs_activate(rdtgrp->kn);
+
 	ret = rdtgroup_init_alloc(rdtgrp);
 	if (ret < 0)
-		goto out_id_free;
+		goto out_rmid_free;
 
 	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
 
@@ -3358,7 +3373,9 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 
 out_del_list:
 	list_del(&rdtgrp->rdtgroup_list);
-out_id_free:
+out_rmid_free:
+	mkdir_rdt_prepare_rmid_free(rdtgrp);
+out_closid_free:
 	closid_free(closid);
 out_common_fail:
 	mkdir_rdt_prepare_clean(rdtgrp);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (3 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:11   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index James Morse
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

x86's RMID are independent of the CLOSID. An RMID can be allocated,
used and freed without considering the CLOSID.

MPAM's equivalent feature is PMG, which is not an independent number,
it extends the CLOSID/PARTID space. For MPAM, only PMG-bits worth of
'RMID' can be allocated for a single CLOSID.
i.e. if there is 1 bit of PMG space, then each CLOSID can have two
monitor groups.

To allow resctrl to disambiguate RMID values for different CLOSID,
everything in resctrl that keeps an RMID value needs to know the CLOSID
too. This will always be ignored on x86.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Xin Hao <xhao@linux.alibaba.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>

---
Is there a better term for 'the unique identifier for a monitor group'.
Using RMID for that here may be confusing...

Changes since v1:
 * Added comment in struct rmid_entry

Changes since v2:
 * Moved X86_RESCTRL_BAD_CLOSID from a subsequent patch

Chances since v3:
 * Renamed X86_RESCTRL_BAD_CLOSID to EMPTY
 * Clarified a few comments and kernel-doc

Changes since v5:
 * Use entry->closid from the iterator, instead of the parent control group.
 * Move the reserved defines into this patch to reduce the churn.
 * Added some kernel doc.
 * Renamed some arch closid parameters as 'unused'.
---
 arch/x86/include/asm/resctrl.h            |  7 +++
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 74 +++++++++++++++--------
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 12 ++--
 include/linux/resctrl.h                   | 16 ++++-
 6 files changed, 78 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 255a78d9d906..cc6e1bce7b1a 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -7,6 +7,13 @@
 #include <linux/sched.h>
 #include <linux/jump_label.h>
 
+/*
+ * This value can never be a valid CLOSID, and is used when mapping a
+ * (closid, rmid) pair to an index and back. On x86 only the RMID is
+ * needed. The index is a software defined value.
+ */
+#define X86_RESCTRL_EMPTY_CLOSID         ((u32)~0)
+
 /**
  * struct resctrl_pqr_state - State cache for the PQR MSR
  * @cur_rmid:		The cached Resource Monitoring ID
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 57cf1e6a57bd..91a6ea783200 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -535,7 +535,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int closids_supported(void);
 void closid_free(int closid);
 int alloc_rmid(void);
-void free_rmid(u32 rmid);
+void free_rmid(u32 closid, u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
 void resctrl_exit_mon_l3_config(struct rdt_resource *r);
 bool __init rdt_cpu_has(int flag);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index cfb3f632a4b2..42b9a694fe2f 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,7 +24,21 @@
 
 #include "internal.h"
 
+/**
+ * struct rmid_entry - dirty tracking for all RMID.
+ * @closid:	The CLOSID for this entry.
+ * @rmid:	The RMID for this entry.
+ * @busy:	The number of domains with cached data using this RMID.
+ * @list:	Member of the rmid_free_lru list when busy == 0.
+ *
+ * Some architectures's resctrl_arch_rmid_read() needs the CLOSID value
+ * in order to access the correct monitor. @closid provides the value to
+ * list walkers like __check_limbo(). On x86 this is ignored.
+ *
+ * Take the rdtgroup_mutex when accessing.
+ */
 struct rmid_entry {
+	u32				closid;
 	u32				rmid;
 	int				busy;
 	struct list_head		list;
@@ -136,7 +150,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
 	return val;
 }
 
-static inline struct rmid_entry *__rmid_entry(u32 rmid)
+static inline struct rmid_entry *__rmid_entry(u32 closid, u32 rmid)
 {
 	struct rmid_entry *entry;
 
@@ -190,7 +204,8 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
 }
 
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
-			     u32 rmid, enum resctrl_event_id eventid)
+			     u32 unused, u32 rmid,
+			     enum resctrl_event_id eventid)
 {
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	struct arch_mbm_state *am;
@@ -230,7 +245,8 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 }
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
-			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
+			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
+			   u64 *val)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
@@ -285,9 +301,9 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 		if (nrmid >= r->num_rmid)
 			break;
 
-		entry = __rmid_entry(nrmid);
+		entry = __rmid_entry(X86_RESCTRL_EMPTY_CLOSID, nrmid);// temporary
 
-		if (resctrl_arch_rmid_read(r, d, entry->rmid,
+		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
 					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
 		} else {
@@ -342,7 +358,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
 		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			err = resctrl_arch_rmid_read(r, d, entry->rmid,
+			err = resctrl_arch_rmid_read(r, d, entry->closid,
+						     entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
 			if (err || val <= resctrl_rmid_realloc_threshold)
@@ -366,7 +383,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-void free_rmid(u32 rmid)
+void free_rmid(u32 closid, u32 rmid)
 {
 	struct rmid_entry *entry;
 
@@ -375,7 +392,7 @@ void free_rmid(u32 rmid)
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
-	entry = __rmid_entry(rmid);
+	entry = __rmid_entry(closid, rmid);
 
 	if (is_llc_occupancy_enabled())
 		add_rmid_to_limbo(entry);
@@ -383,8 +400,8 @@ void free_rmid(u32 rmid)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
-				       enum resctrl_event_id evtid)
+static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 closid,
+				       u32 rmid, enum resctrl_event_id evtid)
 {
 	switch (evtid) {
 	case QOS_L3_MBM_TOTAL_EVENT_ID:
@@ -396,20 +413,21 @@ static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 rmid,
 	}
 }
 
-static int __mon_event_count(u32 rmid, struct rmid_read *rr)
+static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
 	struct mbm_state *m;
 	u64 tval = 0;
 
 	if (rr->first) {
-		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
-		m = get_mbm_state(rr->d, rmid, rr->evtid);
+		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
+		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
 		return 0;
 	}
 
-	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
+	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
+					 &tval);
 	if (rr->err)
 		return rr->err;
 
@@ -421,6 +439,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 /*
  * mbm_bw_count() - Update bw count from values previously read by
  *		    __mon_event_count().
+ * @closid:	The closid used to identify the cached mbm_state.
  * @rmid:	The rmid used to identify the cached mbm_state.
  * @rr:		The struct rmid_read populated by __mon_event_count().
  *
@@ -429,7 +448,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
  * __mon_event_count() is compared with the chunks value from the previous
  * invocation. This must be called once per second to maintain values in MBps.
  */
-static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
+static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
 	u64 cur_bw, bytes, cur_bytes;
@@ -459,7 +478,7 @@ void mon_event_count(void *info)
 
 	rdtgrp = rr->rgrp;
 
-	ret = __mon_event_count(rdtgrp->mon.rmid, rr);
+	ret = __mon_event_count(rdtgrp->closid, rdtgrp->mon.rmid, rr);
 
 	/*
 	 * For Ctrl groups read data from child monitor groups and
@@ -470,7 +489,8 @@ void mon_event_count(void *info)
 
 	if (rdtgrp->type == RDTCTRL_GROUP) {
 		list_for_each_entry(entry, head, mon.crdtgrp_list) {
-			if (__mon_event_count(entry->mon.rmid, rr) == 0)
+			if (__mon_event_count(entry->closid, entry->mon.rmid,
+					      rr) == 0)
 				ret = 0;
 		}
 	}
@@ -600,7 +620,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	}
 }
 
-static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
+static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
+		       u32 closid, u32 rmid)
 {
 	struct rmid_read rr;
 
@@ -615,12 +636,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	if (is_mbm_total_enabled()) {
 		rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
 		rr.val = 0;
-		__mon_event_count(rmid, &rr);
+		__mon_event_count(closid, rmid, &rr);
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
 		rr.val = 0;
-		__mon_event_count(rmid, &rr);
+		__mon_event_count(closid, rmid, &rr);
 
 		/*
 		 * Call the MBA software controller only for the
@@ -628,7 +649,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 		 * the software controller explicitly.
 		 */
 		if (is_mba_sc(NULL))
-			mbm_bw_count(rmid, &rr);
+			mbm_bw_count(closid, rmid, &rr);
 	}
 }
 
@@ -685,11 +706,11 @@ void mbm_handle_overflow(struct work_struct *work)
 	d = container_of(work, struct rdt_domain, mbm_over.work);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
-		mbm_update(r, d, prgrp->mon.rmid);
+		mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
 
 		head = &prgrp->mon.crdtgrp_list;
 		list_for_each_entry(crgrp, head, mon.crdtgrp_list)
-			mbm_update(r, d, crgrp->mon.rmid);
+			mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);
 
 		if (is_mba_sc(NULL))
 			update_mba_bw(prgrp, d);
@@ -732,10 +753,11 @@ static int dom_data_init(struct rdt_resource *r)
 	}
 
 	/*
-	 * RMID 0 is special and is always allocated. It's used for all
-	 * tasks that are not monitored.
+	 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and
+	 * are always allocated. These are used for rdtgroup_default control
+	 * group, which will be setup later. See rdtgroup_setup_root().
 	 */
-	entry = __rmid_entry(0);
+	entry = __rmid_entry(RESCTRL_RESERVED_CLOSID, RESCTRL_RESERVED_RMID);
 	list_del(&entry->list);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8f559eeae08e..65bee6f11015 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -752,7 +752,7 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
 	 * anymore when this group would be used for pseudo-locking. This
 	 * is safe to call on platforms not capable of monitoring.
 	 */
-	free_rmid(rdtgrp->mon.rmid);
+	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	ret = 0;
 	goto out;
@@ -787,7 +787,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
 
 	ret = rdtgroup_locksetup_user_restore(rdtgrp);
 	if (ret) {
-		free_rmid(rdtgrp->mon.rmid);
+		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 		return ret;
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d25cb8c9a20e..970ba0531108 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2714,7 +2714,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
 
 	head = &rdtgrp->mon.crdtgrp_list;
 	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
-		free_rmid(sentry->mon.rmid);
+		free_rmid(sentry->closid, sentry->mon.rmid);
 		list_del(&sentry->mon.crdtgrp_list);
 
 		if (atomic_read(&sentry->waitcount) != 0)
@@ -2754,7 +2754,7 @@ static void rmdir_all_sub(void)
 		cpumask_or(&rdtgroup_default.cpu_mask,
 			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
 
-		free_rmid(rdtgrp->mon.rmid);
+		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 		kernfs_remove(rdtgrp->kn);
 		list_del(&rdtgrp->rdtgroup_list);
@@ -3182,7 +3182,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	ret = mkdir_mondata_all(rdtgrp->kn, rdtgrp, &rdtgrp->mon.mon_data_kn);
 	if (ret) {
 		rdt_last_cmd_puts("kernfs subdir error\n");
-		free_rmid(rdtgrp->mon.rmid);
+		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 		return ret;
 	}
 
@@ -3192,7 +3192,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
 {
 	if (rdt_mon_capable)
-		free_rmid(rgrp->mon.rmid);
+		free_rmid(rgrp->closid, rgrp->mon.rmid);
 }
 
 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
@@ -3444,7 +3444,7 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	update_closid_rmid(tmpmask, NULL);
 
 	rdtgrp->flags = RDT_DELETED;
-	free_rmid(rdtgrp->mon.rmid);
+	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 
 	/*
 	 * Remove the rdtgrp from the parent ctrl_mon group's list
@@ -3490,8 +3490,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
+	free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
 	closid_free(rdtgrp->closid);
-	free_rmid(rdtgrp->mon.rmid);
 
 	rdtgroup_ctrl_remove(rdtgrp);
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8334eeacfec5..660752406174 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -6,6 +6,10 @@
 #include <linux/list.h>
 #include <linux/pid.h>
 
+/* CLOSID, RMID value used by the default control group */
+#define RESCTRL_RESERVED_CLOSID		0
+#define RESCTRL_RESERVED_RMID		0
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 int proc_resctrl_show(struct seq_file *m,
@@ -225,6 +229,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  *			      for this resource and domain.
  * @r:			resource that the counter should be read from.
  * @d:			domain that the counter should be read from.
+ * @closid:		closid that matches the rmid. Depending on the architecture, the
+ *			counter may match traffic of both @closid and @rmid, or @rmid
+ *			only.
  * @rmid:		rmid of the counter to read.
  * @eventid:		eventid to read, e.g. L3 occupancy.
  * @val:		result of the counter read in bytes.
@@ -235,20 +242,25 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * 0 on success, or -EIO, -EINVAL etc on error.
  */
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
-			   u32 rmid, enum resctrl_event_id eventid, u64 *val);
+			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
+			   u64 *val);
+
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
  *			       and eventid.
  * @r:		The domain's resource.
  * @d:		The rmid's domain.
+ * @closid:	closid that matches the rmid. Depending on the architecture, the
+ *		counter may match traffic of both @closid and @rmid, or @rmid only.
  * @rmid:	The rmid whose counter values should be reset.
  * @eventid:	The eventid whose counter values should be reset.
  *
  * This can be called from any CPU.
  */
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
-			     u32 rmid, enum resctrl_event_id eventid);
+			     u32 closid, u32 rmid,
+			     enum resctrl_event_id eventid);
 
 /**
  * resctrl_arch_reset_rmid_all() - Reset all private state associated with
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (4 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:12   ` Reinette Chatre
  2023-10-24  9:28   ` Maciej Wieczór-Retman
  2023-09-14 17:21 ` [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
                   ` (18 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

x86 systems identify traffic using the CLOSID and RMID. The CLOSID is
used to lookup the control policy, the RMID is used for monitoring. For
x86 these are independent numbers.
Arm's MPAM has equivalent features PARTID and PMG, where the PARTID is
used to lookup the control policy. The PMG in contrast is a small number
of bits that are used to subdivide PARTID when monitoring. The
cache-occupancy monitors require the PARTID to be specified when monitoring.

This means MPAM's PMG field is not unique. There are multiple PMG-0, one
per allocated CLOSID/PARTID. If PMG is treated as equivalent to RMID, it
cannot be allocated as an independent number. Bitmaps like rmid_busy_llc
need to be sized by the number of unique entries for this resource.

Treat the combined CLOSID and RMID as an index, and provide architecture
helpers to pack and unpack an index. This makes the MPAM values unique.
The domain's rmid_busy_llc and rmid_ptrs[] are then sized by index, as
are domain mbm_local[] and mbm_total[].

x86 can ignore the CLOSID field when packing and unpacking an index, and
report as many indexes as RMID.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Added X86_BAD_CLOSID macro to make it clear what this value means
 * Added second WARN_ON() for closid checking, and made both _ONCE()

Changes since v2:
 * Added RESCTRL_RESERVED_CLOSID
 * Removed a newline
 * Repharsed some comments
 * Renamed a variable 'ignore'd
 * Moved X86_RESCTRL_BAD_CLOSID to a previous patch

Changes since v3:
 * Changed a variable name
 * Fixed various typos

Changes since v4:
 * Removed resource paramter from has_busy_rmid()
 * Rewrote commit message

Changes since v5:
 * Used RESCTRL_RESERVED_RMID in clear_closid_rmid().
 * Added comment againt free_rmid()s index comparison tricks.
---
 arch/x86/include/asm/resctrl.h         | 17 +++++
 arch/x86/kernel/cpu/resctrl/core.c     |  5 +-
 arch/x86/kernel/cpu/resctrl/internal.h |  3 +-
 arch/x86/kernel/cpu/resctrl/monitor.c  | 96 ++++++++++++++++++--------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 +--
 5 files changed, 93 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index cc6e1bce7b1a..db4c84dde2d5 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -101,6 +101,23 @@ static inline void resctrl_sched_in(struct task_struct *tsk)
 		__resctrl_sched_in(tsk);
 }
 
+static inline u32 resctrl_arch_system_num_rmid_idx(void)
+{
+	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
+	return boot_cpu_data.x86_cache_max_rmid + 1;
+}
+
+static inline void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+	*rmid = idx;
+	*closid = X86_RESCTRL_EMPTY_CLOSID;
+}
+
+static inline u32 resctrl_arch_rmid_idx_encode(u32 ignored, u32 rmid)
+{
+	return rmid;
+}
+
 void resctrl_cpu_detect(struct cpuinfo_x86 *c);
 
 #else
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 030d3b409768..eaadf6f20900 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -585,7 +585,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 			mbm_setup_overflow_handler(d, 0);
 		}
 		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
-		    has_busy_rmid(r, d)) {
+		    has_busy_rmid(d)) {
 			cancel_delayed_work(&d->cqm_limbo);
 			cqm_setup_limbo_handler(d, 0);
 		}
@@ -600,7 +600,8 @@ static void clear_closid_rmid(int cpu)
 	state->default_rmid = 0;
 	state->cur_closid = 0;
 	state->cur_rmid = 0;
-	wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
+	wrmsr(MSR_IA32_PQR_ASSOC, RESCTRL_RESERVED_RMID,
+	      RESCTRL_RESERVED_CLOSID);
 }
 
 static int resctrl_online_cpu(unsigned int cpu)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 91a6ea783200..ab96af8d9953 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -7,6 +7,7 @@
 #include <linux/kernfs.h>
 #include <linux/fs_context.h>
 #include <linux/jump_label.h>
+#include <asm/resctrl.h>
 
 #define L3_QOS_CDP_ENABLE		0x01ULL
 
@@ -551,7 +552,7 @@ void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
+bool has_busy_rmid(struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init thread_throttle_mode_init(void);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 42b9a694fe2f..be0b7cb6e1f5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -150,12 +150,29 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
 	return val;
 }
 
-static inline struct rmid_entry *__rmid_entry(u32 closid, u32 rmid)
+/*
+ * x86 and arm64 differ in their handling of monitoring.
+ * x86's RMID are an independent number, there is only one source of traffic
+ * with an RMID value of '1'.
+ * arm64's PMG extend the PARTID/CLOSID space, there are multiple sources of
+ * traffic with a PMG value of '1', one for each CLOSID, meaning the RMID
+ * value is no longer unique.
+ * To account for this, resctrl uses an index. On x86 this is just the RMID,
+ * on arm64 it encodes the CLOSID and RMID. This gives a unique number.
+ *
+ * The domain's rmid_busy_llc and rmid_ptrs[] are sized by index. The arch code
+ * must accept an attempt to read every index.
+ */
+static inline struct rmid_entry *__rmid_entry(u32 idx)
 {
 	struct rmid_entry *entry;
+	u32 closid, rmid;
 
-	entry = &rmid_ptrs[rmid];
-	WARN_ON(entry->rmid != rmid);
+	entry = &rmid_ptrs[idx];
+	resctrl_arch_rmid_idx_decode(idx, &closid, &rmid);
+
+	WARN_ON_ONCE(entry->closid != closid);
+	WARN_ON_ONCE(entry->rmid != rmid);
 
 	return entry;
 }
@@ -285,8 +302,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 void __check_limbo(struct rdt_domain *d, bool force_free)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	struct rmid_entry *entry;
-	u32 crmid = 1, nrmid;
+	u32 idx, cur_idx = 1;
 	bool rmid_dirty;
 	u64 val = 0;
 
@@ -297,12 +315,11 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 	 * RMID and move it to the free list when the counter reaches 0.
 	 */
 	for (;;) {
-		nrmid = find_next_bit(d->rmid_busy_llc, r->num_rmid, crmid);
-		if (nrmid >= r->num_rmid)
+		idx = find_next_bit(d->rmid_busy_llc, idx_limit, cur_idx);
+		if (idx >= idx_limit)
 			break;
 
-		entry = __rmid_entry(X86_RESCTRL_EMPTY_CLOSID, nrmid);// temporary
-
+		entry = __rmid_entry(idx);
 		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
 					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
@@ -311,19 +328,21 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 		}
 
 		if (force_free || !rmid_dirty) {
-			clear_bit(entry->rmid, d->rmid_busy_llc);
+			clear_bit(idx, d->rmid_busy_llc);
 			if (!--entry->busy) {
 				rmid_limbo_count--;
 				list_add_tail(&entry->list, &rmid_free_lru);
 			}
 		}
-		crmid = nrmid + 1;
+		cur_idx = idx + 1;
 	}
 }
 
-bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d)
+bool has_busy_rmid(struct rdt_domain *d)
 {
-	return find_first_bit(d->rmid_busy_llc, r->num_rmid) != r->num_rmid;
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+
+	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
 }
 
 /*
@@ -353,6 +372,9 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	struct rdt_domain *d;
 	int cpu, err;
 	u64 val = 0;
+	u32 idx;
+
+	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
 
 	entry->busy = 0;
 	cpu = get_cpu();
@@ -370,9 +392,9 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 		 * For the first limbo RMID in the domain,
 		 * setup up the limbo worker.
 		 */
-		if (!has_busy_rmid(r, d))
+		if (!has_busy_rmid(d))
 			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL);
-		set_bit(entry->rmid, d->rmid_busy_llc);
+		set_bit(idx, d->rmid_busy_llc);
 		entry->busy++;
 	}
 	put_cpu();
@@ -385,14 +407,21 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 
 void free_rmid(u32 closid, u32 rmid)
 {
+	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
 	struct rmid_entry *entry;
 
-	if (!rmid)
-		return;
-
 	lockdep_assert_held(&rdtgroup_mutex);
 
-	entry = __rmid_entry(closid, rmid);
+	/*
+	 * Do not allow the default rmid to be free'd. Comparing by index
+	 * allows architectures that ignore the closid parameter to avoid an
+	 * unnecessary check.
+	 */
+	if (idx == resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
+						RESCTRL_RESERVED_RMID))
+		return;
+
+	entry = __rmid_entry(idx);
 
 	if (is_llc_occupancy_enabled())
 		add_rmid_to_limbo(entry);
@@ -403,11 +432,13 @@ void free_rmid(u32 closid, u32 rmid)
 static struct mbm_state *get_mbm_state(struct rdt_domain *d, u32 closid,
 				       u32 rmid, enum resctrl_event_id evtid)
 {
+	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+
 	switch (evtid) {
 	case QOS_L3_MBM_TOTAL_EVENT_ID:
-		return &d->mbm_total[rmid];
+		return &d->mbm_total[idx];
 	case QOS_L3_MBM_LOCAL_EVENT_ID:
-		return &d->mbm_local[rmid];
+		return &d->mbm_local[idx];
 	default:
 		return NULL;
 	}
@@ -450,7 +481,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
  */
 static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
-	struct mbm_state *m = &rr->d->mbm_local[rmid];
+	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+	struct mbm_state *m = &rr->d->mbm_local[idx];
 	u64 cur_bw, bytes, cur_bytes;
 
 	cur_bytes = rr->val;
@@ -540,7 +572,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 {
 	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
-	u32 cur_bw, delta_bw, user_bw;
+	u32 cur_bw, delta_bw, user_bw, idx;
 	struct rdt_resource *r_mba;
 	struct rdt_domain *dom_mba;
 	struct list_head *head;
@@ -553,7 +585,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 
 	closid = rgrp->closid;
 	rmid = rgrp->mon.rmid;
-	pmbm_data = &dom_mbm->mbm_local[rmid];
+	idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+	pmbm_data = &dom_mbm->mbm_local[idx];
 
 	dom_mba = get_domain_from_cpu(smp_processor_id(), r_mba);
 	if (!dom_mba) {
@@ -671,7 +704,7 @@ void cqm_handle_limbo(struct work_struct *work)
 
 	__check_limbo(d, false);
 
-	if (has_busy_rmid(r, d))
+	if (has_busy_rmid(d))
 		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
 
 	mutex_unlock(&rdtgroup_mutex);
@@ -736,19 +769,20 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 
 static int dom_data_init(struct rdt_resource *r)
 {
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	struct rmid_entry *entry = NULL;
-	int i, nr_rmids;
+	u32 idx;
+	int i;
 
-	nr_rmids = r->num_rmid;
-	rmid_ptrs = kcalloc(nr_rmids, sizeof(struct rmid_entry), GFP_KERNEL);
+	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
 	if (!rmid_ptrs)
 		return -ENOMEM;
 
-	for (i = 0; i < nr_rmids; i++) {
+	for (i = 0; i < idx_limit; i++) {
 		entry = &rmid_ptrs[i];
 		INIT_LIST_HEAD(&entry->list);
 
-		entry->rmid = i;
+		resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
 		list_add_tail(&entry->list, &rmid_free_lru);
 	}
 
@@ -757,7 +791,9 @@ static int dom_data_init(struct rdt_resource *r)
 	 * are always allocated. These are used for rdtgroup_default control
 	 * group, which will be setup later. See rdtgroup_setup_root().
 	 */
-	entry = __rmid_entry(RESCTRL_RESERVED_CLOSID, RESCTRL_RESERVED_RMID);
+	idx = resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
+					   RESCTRL_RESERVED_RMID);
+	entry = __rmid_entry(idx);
 	list_del(&entry->list);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 970ba0531108..61f338d96906 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3756,7 +3756,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 
 	if (is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
-	if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+	if (is_llc_occupancy_enabled() && has_busy_rmid(d)) {
 		/*
 		 * When a package is going down, forcefully
 		 * decrement rmid->ebusy. There is no way to know
@@ -3774,16 +3774,17 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 {
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize;
 
 	if (is_llc_occupancy_enabled()) {
-		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
+		d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
 		if (!d->rmid_busy_llc)
 			return -ENOMEM;
 	}
 	if (is_mbm_total_enabled()) {
 		tsize = sizeof(*d->mbm_total);
-		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
 		if (!d->mbm_total) {
 			bitmap_free(d->rmid_busy_llc);
 			return -ENOMEM;
@@ -3791,7 +3792,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 	}
 	if (is_mbm_local_enabled()) {
 		tsize = sizeof(*d->mbm_local);
-		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
 		if (!d->mbm_local) {
 			bitmap_free(d->rmid_busy_llc);
 			kfree(d->mbm_total);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (5 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:12   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

MPAMs RMID values are not unique unless the CLOSID is considered as well.

alloc_rmid() expects the RMID to be an independent number.

Pass the CLOSID in to alloc_rmid(). Use this to compare indexes when
allocating. If the CLOSID is not relevant to the index, this ends up
comparing the free RMID with itself, and the first free entry will be
used. With MPAM the CLOSID is included in the index, so this becomes a
walk of the free RMID entries, until one that matches the supplied
CLOSID is found.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2;
 * Rephrased comment in resctrl_find_free_rmid() to describe this in terms of
   list_entry_first()
 * Rephrased comment above alloc_rmid()

Changes since v3:
 * Flipped conditions in alloc_rmid()

Changes since v4:
 * Typo in comment

Changes since v5:
 * Reworded two comments.
---
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 52 +++++++++++++++++------
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  2 +-
 4 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ab96af8d9953..ad6e874d9ed2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -535,7 +535,7 @@ void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int closids_supported(void);
 void closid_free(int closid);
-int alloc_rmid(void);
+int alloc_rmid(u32 closid);
 void free_rmid(u32 closid, u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
 void resctrl_exit_mon_l3_config(struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index be0b7cb6e1f5..d286aba1ee63 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -345,24 +345,50 @@ bool has_busy_rmid(struct rdt_domain *d)
 	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
 }
 
-/*
- * As of now the RMIDs allocation is global.
- * However we keep track of which packages the RMIDs
- * are used to optimize the limbo list management.
- */
-int alloc_rmid(void)
+static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
 {
-	struct rmid_entry *entry;
-
-	lockdep_assert_held(&rdtgroup_mutex);
+	struct rmid_entry *itr;
+	u32 itr_idx, cmp_idx;
 
 	if (list_empty(&rmid_free_lru))
-		return rmid_limbo_count ? -EBUSY : -ENOSPC;
+		return rmid_limbo_count ? ERR_PTR(-EBUSY) : ERR_PTR(-ENOSPC);
+
+	list_for_each_entry(itr, &rmid_free_lru, list) {
+		/*
+		 * Get the index of this free RMID, and the index it would need
+		 * to be if it were used with this CLOSID.
+		 * If the CLOSID is irrelevant on this architecture, the two
+		 * index values are always same on every entry and thus the
+		 * very first entry will be returned.
+
+		 */
+		itr_idx = resctrl_arch_rmid_idx_encode(itr->closid, itr->rmid);
+		cmp_idx = resctrl_arch_rmid_idx_encode(closid, itr->rmid);
+
+		if (itr_idx == cmp_idx)
+			return itr;
+	}
+
+	return ERR_PTR(-ENOSPC);
+}
+
+/*
+ * For MPAM the RMID value is not unique, and has to be considered with
+ * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
+ * allows all domains to be managed by a single free list.
+ * Each domain also has a rmid_busy_llc to reduce the work of the limbo handler.
+ */
+int alloc_rmid(u32 closid)
+{
+	struct rmid_entry *entry;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	entry = resctrl_find_free_rmid(closid);
+	if (IS_ERR(entry))
+		return PTR_ERR(entry);
 
-	entry = list_first_entry(&rmid_free_lru,
-				 struct rmid_entry, list);
 	list_del(&entry->list);
-
 	return entry->rmid;
 }
 
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 65bee6f11015..d8f44113ed1f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -777,7 +777,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
 	int ret;
 
 	if (rdt_mon_capable) {
-		ret = alloc_rmid();
+		ret = alloc_rmid(rdtgrp->closid);
 		if (ret < 0) {
 			rdt_last_cmd_puts("Out of RMIDs\n");
 			return ret;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 61f338d96906..ac1a6437469f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3172,7 +3172,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 	if (!rdt_mon_capable)
 		return 0;
 
-	ret = alloc_rmid();
+	ret = alloc_rmid(rdtgrp->closid);
 	if (ret < 0) {
 		rdt_last_cmd_puts("Out of RMIDs\n");
 		return ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (6 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:13   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
used for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Keep track of the number of RMID held in limbo each CLOSID has. This will
allow a future helper to find the 'cleanest' CLOSID when allocating.

The array is only needed when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is
defined. This will never be the case on x86.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v4:
 * Moved closid_num_dirty_rmid[] update under entry->busy check
 * Take the mutex in dom_data_init() as the caller doesn't.

Changes since v5:
 * Added braces after an else.
 * Made closid_num_dirty_rmid an unsigned int.
 * Moved mutex_lock() in dom_data_init() to cover the whole function.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 66 +++++++++++++++++++++++----
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index d286aba1ee63..0c783301d106 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -51,6 +51,13 @@ struct rmid_entry {
  */
 static LIST_HEAD(rmid_free_lru);
 
+/**
+ * @closid_num_dirty_rmid    The number of dirty RMID each CLOSID has.
+ *     Only allocated when CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID is defined.
+ *     Indexed by CLOSID. Protected by rdtgroup_mutex.
+ */
+static unsigned int *closid_num_dirty_rmid;
+
 /**
  * @rmid_limbo_count     count of currently unused but (potentially)
  *     dirty RMIDs.
@@ -293,6 +300,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 	return 0;
 }
 
+static void limbo_release_entry(struct rmid_entry *entry)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	rmid_limbo_count--;
+	list_add_tail(&entry->list, &rmid_free_lru);
+
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+		closid_num_dirty_rmid[entry->closid]--;
+}
+
 /*
  * Check the RMIDs that are marked as busy for this domain. If the
  * reported LLC occupancy is below the threshold clear the busy bit and
@@ -329,10 +347,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 
 		if (force_free || !rmid_dirty) {
 			clear_bit(idx, d->rmid_busy_llc);
-			if (!--entry->busy) {
-				rmid_limbo_count--;
-				list_add_tail(&entry->list, &rmid_free_lru);
-			}
+			if (!--entry->busy)
+				limbo_release_entry(entry);
 		}
 		cur_idx = idx + 1;
 	}
@@ -400,6 +416,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	u64 val = 0;
 	u32 idx;
 
+	lockdep_assert_held(&rdtgroup_mutex);
+
 	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
 
 	entry->busy = 0;
@@ -425,10 +443,13 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	}
 	put_cpu();
 
-	if (entry->busy)
+	if (entry->busy) {
 		rmid_limbo_count++;
-	else
+		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+			closid_num_dirty_rmid[entry->closid]++;
+	} else {
 		list_add_tail(&entry->list, &rmid_free_lru);
+	}
 }
 
 void free_rmid(u32 closid, u32 rmid)
@@ -796,13 +817,30 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 static int dom_data_init(struct rdt_resource *r)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	u32 num_closid = resctrl_arch_get_num_closid(r);
 	struct rmid_entry *entry = NULL;
+	int err = 0, i;
 	u32 idx;
-	int i;
+
+	mutex_lock(&rdtgroup_mutex);
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+		int *tmp;
+
+		tmp = kcalloc(num_closid, sizeof(int), GFP_KERNEL);
+		if (!tmp) {
+			err = -ENOMEM;
+			goto out_unlock;
+		}
+
+		closid_num_dirty_rmid = tmp;
+	}
 
 	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
-	if (!rmid_ptrs)
-		return -ENOMEM;
+	if (!rmid_ptrs) {
+		kfree(closid_num_dirty_rmid);
+		err = -ENOMEM;
+		goto out_unlock;
+	}
 
 	for (i = 0; i < idx_limit; i++) {
 		entry = &rmid_ptrs[i];
@@ -822,13 +860,21 @@ static int dom_data_init(struct rdt_resource *r)
 	entry = __rmid_entry(idx);
 	list_del(&entry->list);
 
-	return 0;
+out_unlock:
+	mutex_unlock(&rdtgroup_mutex);
+
+	return err;
 }
 
 void resctrl_exit_mon_l3_config(struct rdt_resource *r)
 {
 	mutex_lock(&rdtgroup_mutex);
 
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+		kfree(closid_num_dirty_rmid);
+		closid_num_dirty_rmid = NULL;
+	}
+
 	kfree(rmid_ptrs);
 	rmid_ptrs = NULL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (7 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-09-17 21:00   ` David Laight
                     ` (2 more replies)
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
                   ` (15 subsequent siblings)
  24 siblings, 3 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

The resctrl CLOSID allocator uses a single 32bit word to track which
CLOSID are free. The setting and clearing of bits is open coded.

A subsequent patch adds resctrl_closid_is_free(), which adds more open
coded bitmaps operations. These will eventually need changing to use
the bitops helpers so that a CLOSID bitmap of the correct size can be
allocated dynamically.

Convert the existing open coded bit manipulations of closid_free_map
to use set_bit() and friends.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ac1a6437469f..fa449ee0d1a7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -106,7 +106,7 @@ void rdt_staged_configs_clear(void)
  * - Our choices on how to configure each resource become progressively more
  *   limited as the number of resources grows.
  */
-static int closid_free_map;
+static unsigned long closid_free_map;
 static int closid_free_map_len;
 
 int closids_supported(void)
@@ -126,7 +126,7 @@ static void closid_init(void)
 	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
 
 	/* CLOSID 0 is always reserved for the default group */
-	closid_free_map &= ~1;
+	clear_bit(0, &closid_free_map);
 	closid_free_map_len = rdt_min_closid;
 }
 
@@ -137,14 +137,14 @@ static int closid_alloc(void)
 	if (closid == 0)
 		return -ENOSPC;
 	closid--;
-	closid_free_map &= ~(1 << closid);
+	clear_bit(closid, &closid_free_map);
 
 	return closid;
 }
 
 void closid_free(int closid)
 {
-	closid_free_map |= 1 << closid;
+	set_bit(closid, &closid_free_map);
 }
 
 /**
@@ -156,7 +156,7 @@ void closid_free(int closid)
  */
 static bool closid_allocated(unsigned int closid)
 {
-	return (closid_free_map & (1 << closid)) == 0;
+	return !test_bit(closid, &closid_free_map);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (8 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:14   ` Reinette Chatre
                     ` (3 more replies)
  2023-09-14 17:21 ` [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
                   ` (14 subsequent siblings)
  24 siblings, 4 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
used for different control groups.

This means once a CLOSID is allocated, all its monitoring ids may still be
dirty, and held in limbo.

Instead of allocating the first free CLOSID, on architectures where
CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
closid_num_dirty_rmid[] to find the cleanest CLOSID.

The CLOSID found is returned to closid_alloc() for the free list
to be updated.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v4:
 * Dropped stale section from comment
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 42 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
 3 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ad6e874d9ed2..f06d3d3e0808 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init thread_throttle_mode_init(void);
 void __init mbm_config_rftype_init(const char *config);
 void rdt_staged_configs_clear(void);
+bool closid_allocated(unsigned int closid);
+int resctrl_find_cleanest_closid(void);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0c783301d106..0bbed8c62d42 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
 	return ERR_PTR(-ENOSPC);
 }
 
+/**
+ * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
+ *                                  RMID are clean, or the CLOSID that has
+ *                                  the most clean RMID.
+ *
+ * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
+ * may not be able to allocate clean RMID. To avoid this the allocator will
+ * choose the CLOSID with the most clean RMID.
+ *
+ * When the CLOSID and RMID are independent numbers, the first free CLOSID will
+ * be returned.
+ */
+int resctrl_find_cleanest_closid(void)
+{
+	u32 cleanest_closid = ~0, iter_num_dirty;
+	int i = 0;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+		return -EIO;
+
+	for (i = 0; i < closids_supported(); i++) {
+		if (closid_allocated(i))
+			continue;
+
+		iter_num_dirty = closid_num_dirty_rmid[i];
+		if (iter_num_dirty == 0)
+			return i;
+
+		if (cleanest_closid == ~0)
+			cleanest_closid = i;
+
+		if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
+			cleanest_closid = i;
+	}
+
+	if (cleanest_closid == ~0)
+		return -ENOSPC;
+	return cleanest_closid;
+}
+
 /*
  * For MPAM the RMID value is not unique, and has to be considered with
  * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index fa449ee0d1a7..1f8f1c417a4b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -132,11 +132,20 @@ static void closid_init(void)
 
 static int closid_alloc(void)
 {
-	u32 closid = ffs(closid_free_map);
+	u32 closid;
+	int err;
 
-	if (closid == 0)
-		return -ENOSPC;
-	closid--;
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+		err = resctrl_find_cleanest_closid();
+		if (err < 0)
+			return err;
+		closid = err;
+	} else {
+		closid = ffs(closid_free_map);
+		if (closid == 0)
+			return -ENOSPC;
+		closid--;
+	}
 	clear_bit(closid, &closid_free_map);
 
 	return closid;
@@ -154,7 +163,7 @@ void closid_free(int closid)
  * Return: true if @closid is currently associated with a resource group,
  * false if @closid is free
  */
-static bool closid_allocated(unsigned int closid)
+bool closid_allocated(unsigned int closid)
 {
 	return !test_bit(closid, &closid_free_map);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (9 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:15   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

When switching tasks, the CLOSID and RMID that the new task should
use are stored in struct task_struct. For x86 the CLOSID known by resctrl,
the value in task_struct, and the value written to the CPU register are
all the same thing.

MPAM's CPU interface has two different PARTID's one for data accesses
the other for instruction fetch. Storing resctrl's CLOSID value in
struct task_struct implies the arch code knows whether resctrl is using
CDP.

Move the matching and setting of the struct task_struct properties
to use helpers. This allows arm64 to store the hardware format of
the register, instead of having to convert it each time.

__rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
values aren't seen as another CPU may schedule the task being moved
while the value is being changed. MPAM has an additional corner-case
here as the PMG bits extend the PARTID space. If the scheduler sees a
new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code
is not watching causing an inaccurate count. x86's RMID are independent
values, so the limbo code will still be watching the old-RMID in this
circumstance.
To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
Both values must be provided together.

Because MPAM's RMID values are not unique, the CLOSID must be provided
when matching the RMID.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * __rdtgroup_move_task() changed to set CLOSID from different CLOSID place
   depending on group type
---
 arch/x86/include/asm/resctrl.h         | 18 ++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++++++++++----------
 2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index db4c84dde2d5..1d274dbabc44 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -95,6 +95,24 @@ static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
 	return val * scale;
 }
 
+static inline void resctrl_arch_set_closid_rmid(struct task_struct *tsk,
+						u32 closid, u32 rmid)
+{
+	WRITE_ONCE(tsk->closid, closid);
+	WRITE_ONCE(tsk->rmid, rmid);
+}
+
+static inline bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+	return READ_ONCE(tsk->closid) == closid;
+}
+
+static inline bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 ignored,
+					   u32 rmid)
+{
+	return READ_ONCE(tsk->rmid) == rmid;
+}
+
 static inline void resctrl_sched_in(struct task_struct *tsk)
 {
 	if (static_branch_likely(&rdt_enable_key))
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1f8f1c417a4b..64a0f71f6a5d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -97,7 +97,7 @@ void rdt_staged_configs_clear(void)
  *
  * Using a global CLOSID across all resources has some advantages and
  * some drawbacks:
- * + We can simply set "current->closid" to assign a task to a resource
+ * + We can simply set current's closid to assign a task to a resource
  *   group.
  * + Context switch code can avoid extra memory references deciding which
  *   CLOSID to load into the PQR_ASSOC MSR
@@ -563,14 +563,26 @@ static void update_task_closid_rmid(struct task_struct *t)
 		_update_task_closid_rmid(t);
 }
 
+static bool task_in_rdtgroup(struct task_struct *tsk, struct rdtgroup *rdtgrp)
+{
+	u32 closid, rmid = rdtgrp->mon.rmid;
+
+	if (rdtgrp->type == RDTCTRL_GROUP)
+		closid = rdtgrp->closid;
+	else if (rdtgrp->type == RDTMON_GROUP)
+		closid = rdtgrp->mon.parent->closid;
+	else
+		return false;
+
+	return resctrl_arch_match_closid(tsk, closid) &&
+	       resctrl_arch_match_rmid(tsk, closid, rmid);
+}
+
 static int __rdtgroup_move_task(struct task_struct *tsk,
 				struct rdtgroup *rdtgrp)
 {
 	/* If the task is already in rdtgrp, no need to move the task. */
-	if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
-	     tsk->rmid == rdtgrp->mon.rmid) ||
-	    (rdtgrp->type == RDTMON_GROUP && tsk->rmid == rdtgrp->mon.rmid &&
-	     tsk->closid == rdtgrp->mon.parent->closid))
+	if (task_in_rdtgroup(tsk, rdtgrp))
 		return 0;
 
 	/*
@@ -581,19 +593,19 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 	 * For monitor groups, can move the tasks only from
 	 * their parent CTRL group.
 	 */
-
-	if (rdtgrp->type == RDTCTRL_GROUP) {
-		WRITE_ONCE(tsk->closid, rdtgrp->closid);
-		WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
-	} else if (rdtgrp->type == RDTMON_GROUP) {
-		if (rdtgrp->mon.parent->closid == tsk->closid) {
-			WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
-		} else {
-			rdt_last_cmd_puts("Can't move task to different control group\n");
-			return -EINVAL;
-		}
+	if (rdtgrp->type == RDTMON_GROUP &&
+	    !resctrl_arch_match_closid(tsk, rdtgrp->mon.parent->closid)) {
+		rdt_last_cmd_puts("Can't move task to different control group\n");
+		return -EINVAL;
 	}
 
+	if (rdtgrp->type == RDTMON_GROUP)
+		resctrl_arch_set_closid_rmid(tsk, rdtgrp->mon.parent->closid,
+					     rdtgrp->mon.rmid);
+	else
+		resctrl_arch_set_closid_rmid(tsk, rdtgrp->closid,
+					     rdtgrp->mon.rmid);
+
 	/*
 	 * Ensure the task's closid and rmid are written before determining if
 	 * the task is current that will decide if it will be interrupted.
@@ -615,14 +627,15 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 
 static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
 {
-	return (rdt_alloc_capable &&
-	       (r->type == RDTCTRL_GROUP) && (t->closid == r->closid));
+	return (rdt_alloc_capable && (r->type == RDTCTRL_GROUP) &&
+		resctrl_arch_match_closid(t, r->closid));
 }
 
 static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
 {
-	return (rdt_mon_capable &&
-	       (r->type == RDTMON_GROUP) && (t->rmid == r->mon.rmid));
+	return (rdt_mon_capable && (r->type == RDTMON_GROUP) &&
+		resctrl_arch_match_rmid(t, r->mon.parent->closid,
+					r->mon.rmid));
 }
 
 /**
@@ -822,7 +835,7 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
 		    rdtg->mode != RDT_MODE_EXCLUSIVE)
 			continue;
 
-		if (rdtg->closid != tsk->closid)
+		if (!resctrl_arch_match_closid(tsk, rdtg->closid))
 			continue;
 
 		seq_printf(s, "res:%s%s\n", (rdtg == &rdtgroup_default) ? "/" : "",
@@ -830,7 +843,8 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
 		seq_puts(s, "mon:");
 		list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
 				    mon.crdtgrp_list) {
-			if (tsk->rmid != crg->mon.rmid)
+			if (!resctrl_arch_match_rmid(tsk, crg->mon.parent->closid,
+						     crg->mon.rmid))
 				continue;
 			seq_printf(s, "%s", crg->kn->name);
 			break;
@@ -2691,8 +2705,8 @@ static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
 	for_each_process_thread(p, t) {
 		if (!from || is_closid_match(t, from) ||
 		    is_rmid_match(t, from)) {
-			WRITE_ONCE(t->closid, to->closid);
-			WRITE_ONCE(t->rmid, to->mon.rmid);
+			resctrl_arch_set_closid_rmid(t, to->closid,
+						     to->mon.rmid);
 
 			/*
 			 * Order the closid/rmid stores above before the loads
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (10 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:15   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

The limbo and overflow code picks a CPU to use from the domain's list
of online CPUs. Work is then scheduled on these CPUs to maintain
the limbo list and any counters that may overflow.

cpumask_any() may pick a CPU that is marked nohz_full, which will
either penalise the work that CPU was dedicated to, or delay the
processing of limbo list or counters that may overflow. Perhaps
indefinitely. Delaying the overflow handling will skew the bandwidth
values calculated by mba_sc, which expects to be called once a second.

Add cpumask_any_housekeeping() as a replacement for cpumask_any()
that prefers housekeeping CPUs. This helper will still return
a nohz_full CPU if that is the only option. The CPU to use is
re-evaluated each time the limbo/overflow work runs. This ensures
the work will move off a nohz_full CPU once a housekeeping CPU is
available.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v3:
 * typos fixed

Changes since v4:
 * Made temporary variables unsigned

Changes since v5:
 * Restructured cpumask_any_housekeeping() to avoid later churn.
---
 arch/x86/kernel/cpu/resctrl/internal.h | 24 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 17 ++++++++++++-----
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f06d3d3e0808..37bb3de37a4a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -7,6 +7,7 @@
 #include <linux/kernfs.h>
 #include <linux/fs_context.h>
 #include <linux/jump_label.h>
+#include <linux/tick.h>
 #include <asm/resctrl.h>
 
 #define L3_QOS_CDP_ENABLE		0x01ULL
@@ -55,6 +56,29 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
+/**
+ * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
+ *			        aren't marked nohz_full
+ * @mask:	The mask to pick a CPU from.
+ *
+ * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
+ * nohz_full, these are preferred.
+ */
+static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
+{
+	unsigned int cpu, hk_cpu;
+
+	cpu = cpumask_any(mask);
+	if (!tick_nohz_full_cpu(cpu))
+		return cpu;
+
+	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
+	if (hk_cpu < nr_cpu_ids)
+		cpu = hk_cpu;
+
+	return cpu;
+}
+
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
 	bool				enable_cdpl2;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 0bbed8c62d42..993837e46db1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -782,9 +782,9 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
 void cqm_handle_limbo(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
-	int cpu = smp_processor_id();
 	struct rdt_resource *r;
 	struct rdt_domain *d;
+	int cpu;
 
 	mutex_lock(&rdtgroup_mutex);
 
@@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)
 
 	__check_limbo(d, false);
 
-	if (has_busy_rmid(d))
+	if (has_busy_rmid(d)) {
+		cpu = cpumask_any_housekeeping(&d->cpu_mask);
 		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
+	}
 
 	mutex_unlock(&rdtgroup_mutex);
 }
@@ -804,7 +806,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
 
-	cpu = cpumask_any(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
 	dom->cqm_work_cpu = cpu;
 
 	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
@@ -814,10 +816,10 @@ void mbm_handle_overflow(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
 	struct rdtgroup *prgrp, *crgrp;
-	int cpu = smp_processor_id();
 	struct list_head *head;
 	struct rdt_resource *r;
 	struct rdt_domain *d;
+	int cpu;
 
 	mutex_lock(&rdtgroup_mutex);
 
@@ -838,6 +840,11 @@ void mbm_handle_overflow(struct work_struct *work)
 			update_mba_bw(prgrp, d);
 	}
 
+	/*
+	 * Re-check for housekeeping CPUs. This allows the overflow handler to
+	 * move off a nohz_full CPU quickly.
+	 */
+	cpu = cpumask_any_housekeeping(&d->cpu_mask);
 	schedule_delayed_work_on(cpu, &d->mbm_over, delay);
 
 out_unlock:
@@ -851,7 +858,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 
 	if (!static_branch_likely(&rdt_mon_enable_key))
 		return;
-	cpu = cpumask_any(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
 	dom->mbm_work_cpu = cpu;
 	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (11 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:17   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Intel is blessed with an abundance of monitors, one per RMID, that can be
read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
the number implemented is up to the manufacturer. This means when there are
fewer monitors than needed, they need to be allocated and freed.

MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The
CSU counter is allowed to return 'not ready' for a small number of
micro-seconds after programming. To allow one CSU hardware monitor to be
used for multiple control or monitor groups, the CPU accessing the
monitor needs to be able to block when configuring and reading the
counter.

Worse, the domain may be broken up into slices, and the MMIO accesses
for each slice may need performing from different CPUs.

These two details mean MPAMs monitor code needs to be able to sleep, and
IPI another CPU in the domain to read from a resource that has been sliced.

mon_event_read() already invokes mon_event_count() via IPI, which means
this isn't possible. On systems using nohz-full, some CPUs need to be
interrupted to run kernel work as they otherwise stay in user-space
running realtime workloads. Interrupting these CPUs should be avoided,
and scheduling work on them may never complete.

Change mon_event_read() to pick a housekeeping CPU, (one that is not using
nohz_full) and schedule mon_event_count() and wait. If all the CPUs
in a domain are using nohz-full, then an IPI is used as the fallback.

This function is only used in response to a user-space filesystem request
(not the timing sensitive overflow code).

This allows MPAM to hide the slice behaviour from resctrl, and to keep
the monitor-allocation in monitor.c. When the IPI fallback is used on
machines where MPAM needs to make an access on multiple CPUs, the counter
read will always fail.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Peter Newman <peternewman@google.com>
Tested-by: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Use cpumask_any_housekeeping() and fallback to an IPI if needed.

Changes since v3:
 * Actually include the IPI fallback code.

Changes since v4:
 * Tinkered with existing capitalisation.
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 28 +++++++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/monitor.c     |  2 +-
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index b44c487727d4..bd263b9a0abd 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -19,6 +19,7 @@
 #include <linux/kernfs.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
+#include <linux/tick.h>
 #include "internal.h"
 
 /*
@@ -520,12 +521,24 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+static int smp_mon_event_count(void *arg)
+{
+	mon_event_count(arg);
+
+	return 0;
+}
+
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first)
 {
+	int cpu;
+
+	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
+	lockdep_assert_held(&rdtgroup_mutex);
+
 	/*
-	 * setup the parameters to send to the IPI to read the data.
+	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
 	rr->rgrp = rdtgrp;
 	rr->evtid = evtid;
@@ -534,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->val = 0;
 	rr->first = first;
 
-	smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+	cpu = cpumask_any_housekeeping(&d->cpu_mask);
+
+	/*
+	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
+	 * are all the CPUs nohz_full? If yes, pick a CPU to IPI.
+	 * MPAM's resctrl_arch_rmid_read() is unable to read the
+	 * counters on some platforms if its called in irq context.
+	 */
+	if (tick_nohz_full_cpu(cpu))
+		smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
+	else
+		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 993837e46db1..7749e6569a4a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -587,7 +587,7 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 }
 
 /*
- * This is called via IPI to read the CQM/MBM counters
+ * This is scheduled by mon_event_read() to read the CQM/MBM counters
  * on a domain.
  */
 void mon_event_count(void *info)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (12 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:18   ` Reinette Chatre
  2023-10-05 21:33   ` Moger, Babu
  2023-09-14 17:21 ` [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
                   ` (10 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

MPAM's cache occupancy counters can take a little while to settle once
the monitor has been configured. The maximum settling time is described
to the driver via a firmware table. The value could be large enough
that it makes sense to sleep. To avoid exposing this to resctrl, it
should be hidden behind MPAM's resctrl_arch_rmid_read().

resctrl_arch_rmid_read() may be called via IPI meaning it is unable
to sleep. In this case resctrl_arch_rmid_read() should return an error
if it needs to sleep. This will only affect MPAM platforms where
the cache occupancy counter isn't available immediately, nohz_full is
in use, and there are there are no housekeeping CPUs in the necessary
domain.

There are three callers of resctrl_arch_rmid_read():
__mon_event_count() and __check_limbo() are both called from a
non-migrateable context. mon_event_read() invokes __mon_event_count()
using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
rdtgroup_mutex() is held, meaning this cannot race with the resctrl
cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
also adds work to a per-cpu workqueue.

The remaining call is add_rmid_to_limbo() which is called in response
to a user-space syscall that frees an RMID. This opportunistically
reads the LLC occupancy counter on the current domain to see if the
RMID is over the dirty threshold. This has to disable preemption to
avoid reading the wrong domain's value. Disabling pre-emption here
prevents resctrl_arch_rmid_read() from sleeping.

add_rmid_to_limbo() walks each domain, but only reads the counter
on one domain. If the system has more than one domain, the RMID will
always be added to the limbo list. If the RMIDs usage was not over the
threshold, it will be removed from the list when __check_limbo() runs.
Make this the default behaviour. Free RMIDs are always added to the
limbo list for each domain.

The user visible effect of this is that a clean RMID is not available
for re-allocation immediately after 'rmdir()' completes, this behaviour
was never portable as it never happened on a machine with multiple
domains.

Removing this path allows resctrl_arch_rmid_read() to sleep if its called
with interrupts unmasked. Document this is the expected behaviour, and
add a might_sleep() annotation to catch changes that won't work on arm64.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
The previous version allowed resctrl_arch_rmid_read() to be called on the
wrong CPUs, but now that this needs to take nohz_full and housekeeping into
account, its too complex.

Changes since v3:
 * Removed error handling for smp_call_function_any(), this can't race
   with the cpuhp callbacks as both hold rdtgroup_mutex.
 * Switched to the alternative of removing the counter read, this simplifies
   things dramatically.

Changes since v4:
 * Messed with capitalisation.
 * Removed some dead code now that entry->busy will never be zero in
   add_rmid_to_limbo().
 * Rephrased the comment above resctrl_arch_rmid_read_context_check().
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 25 +++++--------------------
 include/linux/resctrl.h               | 18 +++++++++++++++++-
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7749e6569a4a..05d949ec94f1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -278,6 +278,8 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 	u64 msr_val, chunks;
 	int ret;
 
+	resctrl_arch_rmid_read_context_check();
+
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
 		return -EINVAL;
 
@@ -454,8 +456,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 	struct rdt_domain *d;
-	int cpu, err;
-	u64 val = 0;
 	u32 idx;
 
 	lockdep_assert_held(&rdtgroup_mutex);
@@ -463,17 +463,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
 
 	entry->busy = 0;
-	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
-		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			err = resctrl_arch_rmid_read(r, d, entry->closid,
-						     entry->rmid,
-						     QOS_L3_OCCUP_EVENT_ID,
-						     &val);
-			if (err || val <= resctrl_rmid_realloc_threshold)
-				continue;
-		}
-
 		/*
 		 * For the first limbo RMID in the domain,
 		 * setup up the limbo worker.
@@ -483,15 +473,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 		set_bit(idx, d->rmid_busy_llc);
 		entry->busy++;
 	}
-	put_cpu();
 
-	if (entry->busy) {
-		rmid_limbo_count++;
-		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
-			closid_num_dirty_rmid[entry->closid]++;
-	} else {
-		list_add_tail(&entry->list, &rmid_free_lru);
-	}
+	rmid_limbo_count++;
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
+		closid_num_dirty_rmid[entry->closid]++;
 }
 
 void free_rmid(u32 closid, u32 rmid)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 660752406174..f7311102e94c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -236,7 +236,12 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * @eventid:		eventid to read, e.g. L3 occupancy.
  * @val:		result of the counter read in bytes.
  *
- * Call from process context on a CPU that belongs to domain @d.
+ * Some architectures need to sleep when first programming some of the counters.
+ * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
+ *  for a short period of time). Call from a non-migrateable process context on
+ * a CPU that belongs to domain @d. e.g. use smp_call_on_cpu() or
+ * schedule_work_on(). This function can be called with interrupts masked,
+ * e.g. using smp_call_function_any(), but may consistently return an error.
  *
  * Return:
  * 0 on success, or -EIO, -EINVAL etc on error.
@@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val);
 
+/**
+ * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
+ *
+ * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
+ * resctrl_arch_rmid_read() is called with preemption disabled.
+ */
+static inline void resctrl_arch_rmid_read_context_check(void)
+{
+	if (!irqs_disabled())
+		might_sleep();
+}
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (13 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:18   ` Reinette Chatre
  2023-10-05 21:46   ` Moger, Babu
  2023-09-14 17:21 ` [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
                   ` (9 subsequent siblings)
  24 siblings, 2 replies; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Depending on the number of monitors available, Arm's MPAM may need to
allocate a monitor prior to reading the counter value. Allocating a
contended resource may involve sleeping.

add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
the allocation should be valid for all domains.

__check_limbo() and mon_event_count() each make multiple calls to
resctrl_arch_rmid_read(), to avoid extra work on contended systems,
the allocation should be valid for multiple invocations of
resctrl_arch_rmid_read().

Add arch hooks for this allocation, which need calling before
resctrl_arch_rmid_read(). The allocated monitor is passed to
resctrl_arch_rmid_read(), then freed again afterwards. The helper
can be called on any CPU, and can sleep.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v3:
 * Expanded comment.
 * Removed stray header include.
 * Reworded commit message.
 * Made ctx a void * instead of an int.

Changes since v4:
 * Used IS_ERR() in more places.

Changes since v5:
 * Pass the error back from mon_event_read() as -EINVAL/Unavailable.
 * Add some ratelimited warnings when failing to allocate a mon context
---
 arch/x86/include/asm/resctrl.h            | 11 ++++++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  7 +++++
 arch/x86/kernel/cpu/resctrl/internal.h    |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c     | 34 +++++++++++++++++++++--
 include/linux/resctrl.h                   |  5 +++-
 5 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 1d274dbabc44..29c4cc343787 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -136,6 +136,17 @@ static inline u32 resctrl_arch_rmid_idx_encode(u32 ignored, u32 rmid)
 	return rmid;
 }
 
+/* x86 can always read an rmid, nothing needs allocating */
+struct rdt_resource;
+static inline void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, int evtid)
+{
+	might_sleep();
+	return NULL;
+};
+
+static inline void resctrl_arch_mon_ctx_free(struct rdt_resource *r, int evtid,
+					     void *ctx) { };
+
 void resctrl_cpu_detect(struct cpuinfo_x86 *c);
 
 #else
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index bd263b9a0abd..ce4821ea111b 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -546,6 +546,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->d = d;
 	rr->val = 0;
 	rr->first = first;
+	rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
+	if (IS_ERR(rr->arch_mon_ctx)) {
+		rr->err = -EINVAL;
+		return;
+	}
 
 	cpu = cpumask_any_housekeeping(&d->cpu_mask);
 
@@ -559,6 +564,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
 	else
 		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
+
+	resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 37bb3de37a4a..66d9ebb5e03a 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -136,6 +136,7 @@ struct rmid_read {
 	bool			first;
 	int			err;
 	u64			val;
+	void			*arch_mon_ctx;
 };
 
 extern bool rdt_alloc_capable;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 05d949ec94f1..28a2c8765faf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -270,7 +270,7 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val)
+			   u64 *val, void *ignored)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
@@ -325,9 +325,17 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	struct rmid_entry *entry;
 	u32 idx, cur_idx = 1;
+	void *arch_mon_ctx;
 	bool rmid_dirty;
 	u64 val = 0;
 
+	arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID);
+	if (IS_ERR(arch_mon_ctx)) {
+		pr_warn_ratelimited("Failed to allocate monitor context: %ld",
+				    PTR_ERR(arch_mon_ctx));
+		return;
+	}
+
 	/*
 	 * Skip RMID 0 and start from RMID 1 and check all the RMIDs that
 	 * are marked as busy for occupancy < threshold. If the occupancy
@@ -341,7 +349,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 
 		entry = __rmid_entry(idx);
 		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, &val)) {
+					   QOS_L3_OCCUP_EVENT_ID, &val,
+					   arch_mon_ctx)) {
 			rmid_dirty = true;
 		} else {
 			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
@@ -354,6 +363,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 		}
 		cur_idx = idx + 1;
 	}
+
+	resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx);
 }
 
 bool has_busy_rmid(struct rdt_domain *d)
@@ -532,7 +543,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	}
 
 	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
-					 &tval);
+					 &tval, rr->arch_mon_ctx);
 	if (rr->err)
 		return rr->err;
 
@@ -743,11 +754,27 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
 	if (is_mbm_total_enabled()) {
 		rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
 		rr.val = 0;
+		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+		if (IS_ERR(rr.arch_mon_ctx)) {
+			pr_warn_ratelimited("Failed to allocate monitor context: %ld",
+					    PTR_ERR(rr.arch_mon_ctx));
+			return;
+		}
+
 		__mon_event_count(closid, rmid, &rr);
+
+		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
 		rr.val = 0;
+		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+		if (IS_ERR(rr.arch_mon_ctx)) {
+			pr_warn_ratelimited("Failed to allocate monitor context: %ld",
+					    PTR_ERR(rr.arch_mon_ctx));
+			return;
+		}
+
 		__mon_event_count(closid, rmid, &rr);
 
 		/*
@@ -757,6 +784,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
 		 */
 		if (is_mba_sc(NULL))
 			mbm_bw_count(closid, rmid, &rr);
+		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
 	}
 }
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f7311102e94c..5e4b4df9610b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -235,6 +235,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * @rmid:		rmid of the counter to read.
  * @eventid:		eventid to read, e.g. L3 occupancy.
  * @val:		result of the counter read in bytes.
+ * @arch_mon_ctx:	An architecture specific value from
+ *			resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
+ *			the hardware monitor allocated for this read request.
  *
  * Some architectures need to sleep when first programming some of the counters.
  * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
@@ -248,7 +251,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  */
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val);
+			   u64 *val, void *arch_mon_ctx);
 
 /**
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (14 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:19   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

The rdt_enable_key is switched when resctrl is mounted, and used to
prevent a second mount of the filesystem. It also enables the
architecture's context switch code.

This requires another architecture to have the same set of static-keys,
as resctrl depends on them too. The existing users of these static-keys
are implicitly also checking if the filesystem is mounted.

Make the resctrl_mounted checks explicit: resctrl can keep track of
whether it has been mounted once. This doesn't need to be combined with
whether the arch code is context switching the CLOSID.

rdt_mon_enable_key is never used just to test that resctrl is mounted,
but does also have this implication. Add a resctrl_mounted to all uses
of rdt_mon_enable_key. This will allow rdt_mon_enable_key to be swapped
with a helper in a subsequent patch.

This will allow the static-key changing to be moved behind resctrl_arch_
calls.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v3:
 * Removed a newline.
 * Rephrased commit message

Changes since v4:
 * Rephrased comment.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  | 12 ++++++++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 23 +++++++++++++++++------
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 66d9ebb5e03a..0bcfb2da9109 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -143,6 +143,7 @@ extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
 extern struct list_head resctrl_schema_all;
+extern bool resctrl_mounted;
 
 enum rdt_group_type {
 	RDTCTRL_GROUP = 0,
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 28a2c8765faf..7bbe3d91b1f1 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -836,7 +836,11 @@ void mbm_handle_overflow(struct work_struct *work)
 
 	mutex_lock(&rdtgroup_mutex);
 
-	if (!static_branch_likely(&rdt_mon_enable_key))
+	/*
+	 * If the filesystem has been unmounted this work no longer needs to
+	 * run.
+	 */
+	if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
 		goto out_unlock;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -869,7 +873,11 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
 
-	if (!static_branch_likely(&rdt_mon_enable_key))
+	/*
+	 * When a domain comes online there is no guarantee the filesystem is
+	 * mounted. If not, there is no need to catch counter overflow.
+	 */
+	if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
 		return;
 	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
 	dom->mbm_work_cpu = cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 64a0f71f6a5d..5a7d6f6b5018 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -42,6 +42,9 @@ LIST_HEAD(rdt_all_groups);
 /* list of entries for the schemata file */
 LIST_HEAD(resctrl_schema_all);
 
+/* The filesystem can only be mounted once. */
+bool resctrl_mounted;
+
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
@@ -819,7 +822,7 @@ int proc_resctrl_show(struct seq_file *s, struct pid_namespace *ns,
 	mutex_lock(&rdtgroup_mutex);
 
 	/* Return empty if resctrl has not been mounted. */
-	if (!static_branch_unlikely(&rdt_enable_key)) {
+	if (!resctrl_mounted) {
 		seq_puts(s, "res:\nmon:\n");
 		goto unlock;
 	}
@@ -2495,7 +2498,7 @@ static int rdt_get_tree(struct fs_context *fc)
 	/*
 	 * resctrl file system can only be mounted once.
 	 */
-	if (static_branch_unlikely(&rdt_enable_key)) {
+	if (resctrl_mounted) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -2543,8 +2546,10 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (rdt_mon_capable)
 		static_branch_enable_cpuslocked(&rdt_mon_enable_key);
 
-	if (rdt_alloc_capable || rdt_mon_capable)
+	if (rdt_alloc_capable || rdt_mon_capable) {
 		static_branch_enable_cpuslocked(&rdt_enable_key);
+		resctrl_mounted = true;
+	}
 
 	if (is_mbm_enabled()) {
 		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -2815,6 +2820,7 @@ static void rdt_kill_sb(struct super_block *sb)
 	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
 	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
 	static_branch_disable_cpuslocked(&rdt_enable_key);
+	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
@@ -3774,7 +3780,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 	 * If resctrl is mounted, remove all the
 	 * per domain monitor data directories.
 	 */
-	if (static_branch_unlikely(&rdt_mon_enable_key))
+	if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
 		rmdir_mondata_subdir_allrdtgrp(r, d->id);
 
 	if (is_mbm_enabled())
@@ -3851,8 +3857,13 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 	if (is_llc_occupancy_enabled())
 		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
 
-	/* If resctrl is mounted, add per domain monitor data directories. */
-	if (static_branch_unlikely(&rdt_mon_enable_key))
+	/*
+	 * If the filesystem is not mounted then only the default resource group
+	 * exists. Creation of its directories is deferred until mount time
+	 * by rdt_get_tree() calling mkdir_mondata_all().
+	 * If resctrl is mounted, add per domain monitor data directories.
+	 */
+	if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
 		mkdir_mondata_subdir_allrdtgrp(r, d);
 
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (15 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:19   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any
static keys that control it should be buried behind helpers.

Move the alloc/mon logic into arch-specific helpers as a preparatory step
for making the rdt_enable_key's status something the arch code decides.

This means other architectures don't have to mirror the static keys.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/include/asm/resctrl.h         | 20 ++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |  5 -----
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  8 ++++----
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 29c4cc343787..3c9137b6ad4f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -42,6 +42,26 @@ DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
 
+static inline void resctrl_arch_enable_alloc(void)
+{
+	static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_disable_alloc(void)
+{
+	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
+}
+
+static inline void resctrl_arch_enable_mon(void)
+{
+	static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+}
+
+static inline void resctrl_arch_disable_mon(void)
+{
+	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+}
+
 /*
  * __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
  *
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0bcfb2da9109..ef50789e2b44 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -93,9 +93,6 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
 	return container_of(kfc, struct rdt_fs_context, kfc);
 }
 
-DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
-DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
-
 /**
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
@@ -453,8 +450,6 @@ extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_hw_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
-DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
-
 extern struct dentry *debugfs_resctrl;
 
 enum resctrl_res_level {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5a7d6f6b5018..4c0e012142e2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2542,9 +2542,9 @@ static int rdt_get_tree(struct fs_context *fc)
 		goto out_psl;
 
 	if (rdt_alloc_capable)
-		static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+		resctrl_arch_enable_alloc();
 	if (rdt_mon_capable)
-		static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+		resctrl_arch_enable_mon();
 
 	if (rdt_alloc_capable || rdt_mon_capable) {
 		static_branch_enable_cpuslocked(&rdt_enable_key);
@@ -2817,8 +2817,8 @@ static void rdt_kill_sb(struct super_block *sb)
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	schemata_list_destroy();
-	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
-	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+	resctrl_arch_disable_alloc();
+	resctrl_arch_disable_mon();
 	static_branch_disable_cpuslocked(&rdt_enable_key);
 	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (16 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:19   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

rdt_enable_key is switched when resctrl is mounted. It was also previously
used to prevent a second mount of the filesystem.

Any other architecture that wants to support resctrl has to provide
identical static keys.

Now that there are helpers for enabling and disabling the alloc/mon keys,
resctrl doesn't need to switch this extra key, it can be done by the arch
code. Use the static-key increment and decrement helpers, and change
resctrl to ensure the calls are balanced.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/include/asm/resctrl.h         |  4 ++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 3c9137b6ad4f..b74aa34dc9e8 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -45,21 +45,25 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
 static inline void resctrl_arch_enable_alloc(void)
 {
 	static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
+	static_branch_inc_cpuslocked(&rdt_enable_key);
 }
 
 static inline void resctrl_arch_disable_alloc(void)
 {
 	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
+	static_branch_dec_cpuslocked(&rdt_enable_key);
 }
 
 static inline void resctrl_arch_enable_mon(void)
 {
 	static_branch_enable_cpuslocked(&rdt_mon_enable_key);
+	static_branch_inc_cpuslocked(&rdt_enable_key);
 }
 
 static inline void resctrl_arch_disable_mon(void)
 {
 	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
+	static_branch_dec_cpuslocked(&rdt_enable_key);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4c0e012142e2..a2391cc05f20 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2546,10 +2546,8 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (rdt_mon_capable)
 		resctrl_arch_enable_mon();
 
-	if (rdt_alloc_capable || rdt_mon_capable) {
-		static_branch_enable_cpuslocked(&rdt_enable_key);
+	if (rdt_alloc_capable || rdt_mon_capable)
 		resctrl_mounted = true;
-	}
 
 	if (is_mbm_enabled()) {
 		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -2817,9 +2815,10 @@ static void rdt_kill_sb(struct super_block *sb)
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	schemata_list_destroy();
-	resctrl_arch_disable_alloc();
-	resctrl_arch_disable_mon();
-	static_branch_disable_cpuslocked(&rdt_enable_key);
+	if (rdt_alloc_capable)
+		resctrl_arch_disable_alloc();
+	if (rdt_mon_capable)
+		resctrl_arch_disable_mon();
 	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
 	mutex_unlock(&rdtgroup_mutex);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (17 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:19   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work James Morse
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

resctrl reads rdt_alloc_capable or rdt_mon_capable to determine
whether any of the resources support the corresponding features.
resctrl also uses the static-keys that affect the architecture's
context-switch code to determine the same thing.

This forces another architecture to have the same static-keys.

As the static-key is enabled based on the capable flag, and none of
the filesystem uses of these are in the scheduler path, move the
capable flags behind helpers, and use these in the filesystem
code instead of the static-key.

After this change, only the architecture code manages and uses
the static-keys to ensure __resctrl_sched_in() does not need
runtime checks.

This avoids multiple architectures having to define the same
static-keys.

Cases where the static-key implicitly tested if the resctrl
filesystem was mounted all have an explicit check added by a
previous patch.

Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Added missing conversion in mkdir_rdt_prepare_rmid_free()

Changes since v3:
 * Expanded the commit message.
---
 arch/x86/include/asm/resctrl.h            | 13 +++++++++
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 --
 arch/x86/kernel/cpu/resctrl/monitor.c     |  4 +--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  6 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 34 +++++++++++------------
 5 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b74aa34dc9e8..12dbd2588ca7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -38,10 +38,18 @@ struct resctrl_pqr_state {
 
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
+extern bool rdt_alloc_capable;
+extern bool rdt_mon_capable;
+
 DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
 
+static inline bool resctrl_arch_alloc_capable(void)
+{
+	return rdt_alloc_capable;
+}
+
 static inline void resctrl_arch_enable_alloc(void)
 {
 	static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
@@ -54,6 +62,11 @@ static inline void resctrl_arch_disable_alloc(void)
 	static_branch_dec_cpuslocked(&rdt_enable_key);
 }
 
+static inline bool resctrl_arch_mon_capable(void)
+{
+	return rdt_mon_capable;
+}
+
 static inline void resctrl_arch_enable_mon(void)
 {
 	static_branch_enable_cpuslocked(&rdt_mon_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ef50789e2b44..c54fa86e4ef9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -136,8 +136,6 @@ struct rmid_read {
 	void			*arch_mon_ctx;
 };
 
-extern bool rdt_alloc_capable;
-extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
 extern struct list_head resctrl_schema_all;
 extern bool resctrl_mounted;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7bbe3d91b1f1..9c6d4b0970e2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -840,7 +840,7 @@ void mbm_handle_overflow(struct work_struct *work)
 	 * If the filesystem has been unmounted this work no longer needs to
 	 * run.
 	 */
-	if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+	if (!resctrl_mounted || !resctrl_arch_mon_capable())
 		goto out_unlock;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
@@ -877,7 +877,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 	 * When a domain comes online there is no guarantee the filesystem is
 	 * mounted. If not, there is no need to catch counter overflow.
 	 */
-	if (!resctrl_mounted || !static_branch_likely(&rdt_mon_enable_key))
+	if (!resctrl_mounted || !resctrl_arch_mon_capable())
 		return;
 	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
 	dom->mbm_work_cpu = cpu;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d8f44113ed1f..8056bed033cc 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -581,7 +581,7 @@ static int rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
 	if (ret)
 		goto err_cpus;
 
-	if (rdt_mon_capable) {
+	if (resctrl_arch_mon_capable()) {
 		ret = rdtgroup_kn_mode_restrict(rdtgrp, "mon_groups");
 		if (ret)
 			goto err_cpus_list;
@@ -628,7 +628,7 @@ static int rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
 	if (ret)
 		goto err_cpus;
 
-	if (rdt_mon_capable) {
+	if (resctrl_arch_mon_capable()) {
 		ret = rdtgroup_kn_mode_restore(rdtgrp, "mon_groups", 0777);
 		if (ret)
 			goto err_cpus_list;
@@ -776,7 +776,7 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
 {
 	int ret;
 
-	if (rdt_mon_capable) {
+	if (resctrl_arch_mon_capable()) {
 		ret = alloc_rmid(rdtgrp->closid);
 		if (ret < 0) {
 			rdt_last_cmd_puts("Out of RMIDs\n");
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a2391cc05f20..dfaef047dead 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -630,13 +630,13 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 
 static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
 {
-	return (rdt_alloc_capable && (r->type == RDTCTRL_GROUP) &&
+	return (resctrl_arch_alloc_capable() && (r->type == RDTCTRL_GROUP) &&
 		resctrl_arch_match_closid(t, r->closid));
 }
 
 static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
 {
-	return (rdt_mon_capable && (r->type == RDTMON_GROUP) &&
+	return (resctrl_arch_mon_capable() && (r->type == RDTMON_GROUP) &&
 		resctrl_arch_match_rmid(t, r->mon.parent->closid,
 					r->mon.rmid));
 }
@@ -2519,7 +2519,7 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (ret < 0)
 		goto out_schemata_free;
 
-	if (rdt_mon_capable) {
+	if (resctrl_arch_mon_capable()) {
 		ret = mongroup_create_dir(rdtgroup_default.kn,
 					  &rdtgroup_default, "mon_groups",
 					  &kn_mongrp);
@@ -2541,12 +2541,12 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (ret < 0)
 		goto out_psl;
 
-	if (rdt_alloc_capable)
+	if (resctrl_arch_alloc_capable())
 		resctrl_arch_enable_alloc();
-	if (rdt_mon_capable)
+	if (resctrl_arch_mon_capable())
 		resctrl_arch_enable_mon();
 
-	if (rdt_alloc_capable || rdt_mon_capable)
+	if (resctrl_arch_alloc_capable() || resctrl_arch_mon_capable())
 		resctrl_mounted = true;
 
 	if (is_mbm_enabled()) {
@@ -2560,10 +2560,10 @@ static int rdt_get_tree(struct fs_context *fc)
 out_psl:
 	rdt_pseudo_lock_release();
 out_mondata:
-	if (rdt_mon_capable)
+	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
 out_mongrp:
-	if (rdt_mon_capable)
+	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mongrp);
 out_info:
 	kernfs_remove(kn_info);
@@ -2815,9 +2815,9 @@ static void rdt_kill_sb(struct super_block *sb)
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	schemata_list_destroy();
-	if (rdt_alloc_capable)
+	if (resctrl_arch_alloc_capable())
 		resctrl_arch_disable_alloc();
-	if (rdt_mon_capable)
+	if (resctrl_arch_mon_capable())
 		resctrl_arch_disable_mon();
 	resctrl_mounted = false;
 	kernfs_kill_sb(sb);
@@ -3197,7 +3197,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 {
 	int ret;
 
-	if (!rdt_mon_capable)
+	if (!resctrl_arch_mon_capable())
 		return 0;
 
 	ret = alloc_rmid(rdtgrp->closid);
@@ -3219,7 +3219,7 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
 
 static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
 {
-	if (rdt_mon_capable)
+	if (resctrl_arch_mon_capable())
 		free_rmid(rgrp->closid, rgrp->mon.rmid);
 }
 
@@ -3385,7 +3385,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 
 	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
 
-	if (rdt_mon_capable) {
+	if (resctrl_arch_mon_capable()) {
 		/*
 		 * Create an empty mon_groups directory to hold the subset
 		 * of tasks and cpus to monitor.
@@ -3440,14 +3440,14 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	 * allocation is supported, add a control and monitoring
 	 * subdirectory
 	 */
-	if (rdt_alloc_capable && parent_kn == rdtgroup_default.kn)
+	if (resctrl_arch_alloc_capable() && parent_kn == rdtgroup_default.kn)
 		return rdtgroup_mkdir_ctrl_mon(parent_kn, name, mode);
 
 	/*
 	 * If RDT monitoring is supported and the parent directory is a valid
 	 * "mon_groups" directory, add a monitoring subdirectory.
 	 */
-	if (rdt_mon_capable && is_mon_groups(parent_kn, name))
+	if (resctrl_arch_mon_capable() && is_mon_groups(parent_kn, name))
 		return rdtgroup_mkdir_mon(parent_kn, name, mode);
 
 	return -EPERM;
@@ -3779,7 +3779,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 	 * If resctrl is mounted, remove all the
 	 * per domain monitor data directories.
 	 */
-	if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+	if (resctrl_mounted && resctrl_arch_mon_capable())
 		rmdir_mondata_subdir_allrdtgrp(r, d->id);
 
 	if (is_mbm_enabled())
@@ -3862,7 +3862,7 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 	 * by rdt_get_tree() calling mkdir_mondata_all().
 	 * If resctrl is mounted, add per domain monitor data directories.
 	 */
-	if (resctrl_mounted && static_branch_unlikely(&rdt_mon_enable_key))
+	if (resctrl_mounted && resctrl_arch_mon_capable())
 		mkdir_mondata_subdir_allrdtgrp(r, d);
 
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (18 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:20   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

The resctrl architecture specific code may need to create a domain when
a CPU comes online, it also needs to reset the CPUs PQR_ASSOC register.
The resctrl filesystem code needs to update the rdtgroup_default CPU
mask when CPUs are brought online.

Currently this is all done in one function, resctrl_online_cpu().
This will need to be split into architecture and filesystem parts
before resctrl can be moved to /fs/.

Pull the rdtgroup_default update work out as a filesystem specific
cpu_online helper. resctrl_online_cpu() is the obvious name for this,
which means the version in core.c needs renaming.

resctrl_online_cpu() is called by the arch code once it has done the
work to add the new CPU to any domains.

In future patches, resctrl_online_cpu() will take the rdtgroup_mutex
itself.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v3:
 * Renamed err to ret

Changes since v4:
 * Changes in capitalisation.

Changes since v5:
 * More changes in capitalisation.
 * Made resctrl_online_cpu() return void.
---
 arch/x86/kernel/cpu/resctrl/core.c     | 8 ++++----
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++++
 include/linux/resctrl.h                | 1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index eaadf6f20900..5b4c719ac129 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -604,16 +604,16 @@ static void clear_closid_rmid(int cpu)
 	      RESCTRL_RESERVED_CLOSID);
 }
 
-static int resctrl_online_cpu(unsigned int cpu)
+static int resctrl_arch_online_cpu(unsigned int cpu)
 {
 	struct rdt_resource *r;
 
 	mutex_lock(&rdtgroup_mutex);
 	for_each_capable_rdt_resource(r)
 		domain_add_cpu(cpu, r);
-	/* The cpu is set in default rdtgroup after online. */
-	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
 	clear_closid_rmid(cpu);
+
+	resctrl_online_cpu(cpu);
 	mutex_unlock(&rdtgroup_mutex);
 
 	return 0;
@@ -966,7 +966,7 @@ static int __init resctrl_late_init(void)
 
 	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				  "x86/resctrl/cat:online:",
-				  resctrl_online_cpu, resctrl_offline_cpu);
+				  resctrl_arch_online_cpu, resctrl_offline_cpu);
 	if (state < 0)
 		return state;
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index dfaef047dead..0c609cdfe7e5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3868,6 +3868,14 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
+void resctrl_online_cpu(unsigned int cpu)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	/* The CPU is set in default rdtgroup after online. */
+	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+}
+
 /*
  * rdtgroup_init - rdtgroup initialization
  *
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5e4b4df9610b..9d5f75a4e192 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -223,6 +223,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_online_cpu(unsigned int cpu);
 
 /**
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (19 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:22   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

When a CPU is taken offline resctrl may need to move the overflow or
limbo handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler()
will be called while the CPU that is going offline is still present
in the cpu_mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
need excluding.

A subsequent patch moves these calls to be before CPUs have been removed,
so this exclude_cpus behaviour is temporary.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Rephrased a comment to avoid a two letter bad-word. (we)
 * Avoid assigning mbm_work_cpu if the domain is going to be free()d
 * Added cpumask_any_housekeeping_but(), I dislike the name

Changes since v3:
 * Marked an explanatory comment as temporary as the subsequent patch is
   no longer adjacent.

Changes since v4:
 * Check against RESCTRL_PICK_ANY_CPU instead of -1.
 * Leave cqm_work_cpu as nr_cpu_ids when no CPU is available.
 * Made cpumask_any_housekeeping_but() more readable.

Changes since v5:
 * Changes in captialisation, and a typo.
 * Merged cpumask helpers.
---
 arch/x86/kernel/cpu/resctrl/core.c        |  8 +++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    | 19 +++++++++---
 arch/x86/kernel/cpu/resctrl/monitor.c     | 38 +++++++++++++++++------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  6 ++--
 include/linux/resctrl.h                   |  2 ++
 6 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5b4c719ac129..37aa124f1e4c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -582,12 +582,16 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
 		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
 			cancel_delayed_work(&d->mbm_over);
-			mbm_setup_overflow_handler(d, 0);
+			/*
+			 * temporary: exclude_cpu=-1 as this CPU has already
+			 * been removed by cpumask_clear_cpu()d
+			 */
+			mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
 		}
 		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
 		    has_busy_rmid(d)) {
 			cancel_delayed_work(&d->cqm_limbo);
-			cqm_setup_limbo_handler(d, 0);
+			cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index ce4821ea111b..b4ed4e1b4938 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -552,7 +552,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		return;
 	}
 
-	cpu = cpumask_any_housekeeping(&d->cpu_mask);
+	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
 
 	/*
 	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c54fa86e4ef9..bd7f60bf49fe 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -60,11 +60,15 @@
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
  * @mask:	The mask to pick a CPU from.
+ * @exclude_cpu:The CPU to avoid picking.
  *
- * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
- * nohz_full, these are preferred.
+ * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
+ * CPUs that don't use nohz_full, these are preferred. Pass
+ * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
+ * Returns >= nr_cpu_ids if no CPUs are available.
  */
-static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
+static inline unsigned int
+cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 {
 	unsigned int cpu, hk_cpu;
 
@@ -73,6 +77,9 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
 		return cpu;
 
 	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
+	if (hk_cpu == exclude_cpu)
+		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
+
 	if (hk_cpu < nr_cpu_ids)
 		cpu = hk_cpu;
 
@@ -565,11 +572,13 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
 void mbm_setup_overflow_handler(struct rdt_domain *dom,
-				unsigned long delay_ms);
+				unsigned long delay_ms,
+				int exclude_cpu);
 void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
+			     int exclude_cpu);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9c6d4b0970e2..208e46ba7368 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -480,7 +480,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 		 * setup up the limbo worker.
 		 */
 		if (!has_busy_rmid(d))
-			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL);
+			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL,
+						RESCTRL_PICK_ANY_CPU);
 		set_bit(idx, d->rmid_busy_llc);
 		entry->busy++;
 	}
@@ -807,22 +808,31 @@ void cqm_handle_limbo(struct work_struct *work)
 	__check_limbo(d, false);
 
 	if (has_busy_rmid(d)) {
-		cpu = cpumask_any_housekeeping(&d->cpu_mask);
+		cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
 		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
 	}
 
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+/**
+ * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
+ *                             domain.
+ * @delay_ms:      How far in the future the handler should run.
+ * @exclude_cpu:   Which CPU the handler should not run on,
+ *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
+ */
+void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
+			     int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
 
-	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
 	dom->cqm_work_cpu = cpu;
 
-	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
+	if (cpu < nr_cpu_ids)
+		schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
 }
 
 void mbm_handle_overflow(struct work_struct *work)
@@ -861,14 +871,22 @@ void mbm_handle_overflow(struct work_struct *work)
 	 * Re-check for housekeeping CPUs. This allows the overflow handler to
 	 * move off a nohz_full CPU quickly.
 	 */
-	cpu = cpumask_any_housekeeping(&d->cpu_mask);
+	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
 	schedule_delayed_work_on(cpu, &d->mbm_over, delay);
 
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+/**
+ * mbm_setup_overflow_handler() - Schedule the overflow handler to run for this
+ *                                domain.
+ * @delay_ms:      How far in the future the handler should run.
+ * @exclude_cpu:   Which CPU the handler should not run on,
+ *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
+ */
+void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
+				int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
@@ -879,9 +897,11 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 	 */
 	if (!resctrl_mounted || !resctrl_arch_mon_capable())
 		return;
-	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
 	dom->mbm_work_cpu = cpu;
-	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
+
+	if (cpu < nr_cpu_ids)
+		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
 static int dom_data_init(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0c609cdfe7e5..49f100c73838 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2552,7 +2552,8 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (is_mbm_enabled()) {
 		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 		list_for_each_entry(dom, &r->domains, list)
-			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
+			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
+						   RESCTRL_PICK_ANY_CPU);
 	}
 
 	goto out;
@@ -3850,7 +3851,8 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 
 	if (is_mbm_enabled()) {
 		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
-		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL,
+					   RESCTRL_PICK_ANY_CPU);
 	}
 
 	if (is_llc_occupancy_enabled())
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9d5f75a4e192..0888d1975161 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -10,6 +10,8 @@
 #define RESCTRL_RESERVED_CLOSID		0
 #define RESCTRL_RESERVED_RMID		0
 
+#define RESCTRL_PICK_ANY_CPU		-1
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 int proc_resctrl_show(struct seq_file *m,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (20 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:23   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

The resctrl architecture specific code may need to free a domain when
a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
Amongst other things, the resctrl filesystem code needs to clear this
CPU from the cpu_mask of any control and monitor groups.

Currently this is all done in core.c and called from
resctrl_offline_cpu(), making the split between architecture and
filesystem code unclear.

Move the filesystem work to remove the CPU from the control and monitor
groups into a filesystem helper called resctrl_offline_cpu(), and rename
the one in core.c resctrl_arch_offline_cpu().

The rdtgroup_mutex is unlocked and locked again in the call in
preparation for changing the locking rules for the architecture
code.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 25 +++++--------------------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
 include/linux/resctrl.h                |  1 +
 3 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 37aa124f1e4c..00b1592fd059 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -623,31 +623,15 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
 	return 0;
 }
 
-static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+static int resctrl_arch_offline_cpu(unsigned int cpu)
 {
-	struct rdtgroup *cr;
-
-	list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
-		if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask)) {
-			break;
-		}
-	}
-}
-
-static int resctrl_offline_cpu(unsigned int cpu)
-{
-	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
 
 	mutex_lock(&rdtgroup_mutex);
+	resctrl_offline_cpu(cpu);
+
 	for_each_capable_rdt_resource(r)
 		domain_remove_cpu(cpu, r);
-	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
-		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
-			clear_childcpus(rdtgrp, cpu);
-			break;
-		}
-	}
 	clear_closid_rmid(cpu);
 	mutex_unlock(&rdtgroup_mutex);
 
@@ -970,7 +954,8 @@ static int __init resctrl_late_init(void)
 
 	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 				  "x86/resctrl/cat:online:",
-				  resctrl_arch_online_cpu, resctrl_offline_cpu);
+				  resctrl_arch_online_cpu,
+				  resctrl_arch_offline_cpu);
 	if (state < 0)
 		return state;
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 49f100c73838..f06a80d8fa3b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3878,6 +3878,30 @@ void resctrl_online_cpu(unsigned int cpu)
 	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
 }
 
+static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+{
+	struct rdtgroup *cr;
+
+	list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
+		if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask))
+			break;
+	}
+}
+
+void resctrl_offline_cpu(unsigned int cpu)
+{
+	struct rdtgroup *rdtgrp;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
+			clear_childcpus(rdtgrp, cpu);
+			break;
+		}
+	}
+}
+
 /*
  * rdtgroup_init - rdtgroup initialization
  *
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 0888d1975161..74886cda5f66 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -226,6 +226,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_online_cpu(unsigned int cpu);
+void resctrl_offline_cpu(unsigned int cpu);
 
 /**
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (21 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:23   ` Reinette Chatre
  2023-09-14 17:21 ` [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
  2023-09-27  7:38 ` [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

When a CPU is taken offline the resctrl filesystem code needs to check
if it was the CPU nominated to perform the periodic overflow and limbo
work. If so, another CPU needs to be chosen to do this work.

This is currently done in core.c, mixed in with the code that removes
the CPU from the domain's mask, and potentially free()s the domain.

Move the migration of the overflow and limbo helpers into the filesystem
code, into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before
the architecture code has removed the CPU from the domain mask, the
callers need to be told which CPU is being removed, to avoid picking
it as the new CPU. This uses the exclude_cpu feature previously
added.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v5:
 * Changed fir tree order of variables.
 * Added mon-capable check for cpu offline.
---
 arch/x86/kernel/cpu/resctrl/core.c     | 16 ----------------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 18 ++++++++++++++++++
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 00b1592fd059..1a10f567bbe5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -578,22 +578,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 
 		return;
 	}
-
-	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
-		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
-			cancel_delayed_work(&d->mbm_over);
-			/*
-			 * temporary: exclude_cpu=-1 as this CPU has already
-			 * been removed by cpumask_clear_cpu()d
-			 */
-			mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
-		}
-		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
-		    has_busy_rmid(d)) {
-			cancel_delayed_work(&d->cqm_limbo);
-			cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
-		}
-	}
 }
 
 static void clear_closid_rmid(int cpu)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f06a80d8fa3b..1eb1c9b4aec7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3890,7 +3890,9 @@ static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
 
 void resctrl_offline_cpu(unsigned int cpu)
 {
+	struct rdt_resource *l3 = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 	struct rdtgroup *rdtgrp;
+	struct rdt_domain *d;
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
@@ -3900,6 +3902,22 @@ void resctrl_offline_cpu(unsigned int cpu)
 			break;
 		}
 	}
+
+	if (!l3->mon_capable)
+		return;
+
+	d = get_domain_from_cpu(cpu, l3);
+	if (d) {
+		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
+			cancel_delayed_work(&d->mbm_over);
+			mbm_setup_overflow_handler(d, 0, cpu);
+		}
+		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
+		    has_busy_rmid(d)) {
+			cancel_delayed_work(&d->cqm_limbo);
+			cqm_setup_limbo_handler(d, 0, cpu);
+		}
+	}
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (22 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
@ 2023-09-14 17:21 ` James Morse
  2023-10-03 21:28   ` Reinette Chatre
  2023-09-27  7:38 ` [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
  24 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-09-14 17:21 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

resctrl has one mutex that is taken by the architecture specific code,
and the filesystem parts. The two interact via cpuhp, where the
architecture code updates the domain list. Filesystem handlers that
walk the domains list should not run concurrently with the cpuhp
callback modifying the list.

Exposing a lock from the filesystem code means the interface is not
cleanly defined, and creates the possibility of cross-architecture
lock ordering headaches. The interaction only exists so that certain
filesystem paths are serialised against CPU hotplug. The CPU hotplug
code already has a mechanism to do this using cpus_read_lock().

MPAM's monitors have an overflow interrupt, so it needs to be possible
to walk the domains list in irq context. RCU is ideal for this,
but some paths need to be able to sleep to allocate memory.

Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part
of a cpuhp callback, cpus_read_lock() must always be taken first.
rdtgroup_schemata_write() already does this.

Most of the filesystem code's domain list walkers are currently
protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live().
The exceptions are rdt_bit_usage_show() and the mon_config helpers
which take the lock directly.

Make the domain list protected by RCU. An architecture-specific
lock prevents concurrent writers. rdt_bit_usage_show() could
walk the domain list using RCU, but to keep all the filesystem
operations the same, this is changed to call cpus_read_lock().
The mon_config helpers send multiple IPIs, take the cpus_read_lock()
in these cases.

The other filesystem list walkers need to be able to sleep.
Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the
cpuhp callbacks can't be invoked when file system operations are
occurring.

Add lockdep_assert_cpus_held() in the cases where the
rdtgroup_kn_lock_live() call isn't obvious.

Resctrl's domain online/offline calls now need to take the
rdtgroup_mutex themselves.

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Reworded a comment,
 * Added a lockdep assertion
 * Moved clear_closid_rmid() outside the locked region of cpu
   online/offline

Changes since v3:
 * Added a header include

Changes since v5:
 * Made rdt_bit_usage_show() take the cpus_read_lock() instead of using
   RCU.
---
 arch/x86/kernel/cpu/resctrl/core.c        | 34 ++++++++----
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 14 +++--
 arch/x86/kernel/cpu/resctrl/monitor.c     |  4 ++
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  3 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 65 ++++++++++++++++++++---
 include/linux/resctrl.h                   |  2 +-
 6 files changed, 100 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a10f567bbe5..8fd0510d767b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -25,8 +25,15 @@
 #include <asm/resctrl.h>
 #include "internal.h"
 
-/* Mutex to protect rdtgroup access. */
-DEFINE_MUTEX(rdtgroup_mutex);
+/*
+ * rdt_domain structures are kfree()d when their last CPU goes offline,
+ * and allocated when the first CPU in a new domain comes online.
+ * The rdt_resource's domain list is updated when this happens. Readers of
+ * the domain list must either take cpus_read_lock(), or rely on an RCU
+ * read-side critical section, to avoid observing concurrent modification.
+ * All writers take this mutex:
+ */
+static DEFINE_MUTEX(domain_list_lock);
 
 /*
  * The cached resctrl_pqr_state is strictly per CPU and can never be
@@ -508,6 +515,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	struct rdt_domain *d;
 	int err;
 
+	lockdep_assert_held(&domain_list_lock);
+
 	d = rdt_find_domain(r, id, &add_pos);
 	if (IS_ERR(d)) {
 		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -541,11 +550,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	list_add_tail(&d->list, add_pos);
+	list_add_tail_rcu(&d->list, add_pos);
 
 	err = resctrl_online_domain(r, d);
 	if (err) {
-		list_del(&d->list);
+		list_del_rcu(&d->list);
+		synchronize_rcu();
 		domain_free(hw_dom);
 	}
 }
@@ -556,6 +566,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	struct rdt_hw_domain *hw_dom;
 	struct rdt_domain *d;
 
+	lockdep_assert_held(&domain_list_lock);
+
 	d = rdt_find_domain(r, id, NULL);
 	if (IS_ERR_OR_NULL(d)) {
 		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
@@ -566,7 +578,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	cpumask_clear_cpu(cpu, &d->cpu_mask);
 	if (cpumask_empty(&d->cpu_mask)) {
 		resctrl_offline_domain(r, d);
-		list_del(&d->list);
+		list_del_rcu(&d->list);
+		synchronize_rcu();
 
 		/*
 		 * rdt_domain "d" is going to be freed below, so clear
@@ -596,13 +609,13 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
 {
 	struct rdt_resource *r;
 
-	mutex_lock(&rdtgroup_mutex);
+	mutex_lock(&domain_list_lock);
 	for_each_capable_rdt_resource(r)
 		domain_add_cpu(cpu, r);
-	clear_closid_rmid(cpu);
+	mutex_unlock(&domain_list_lock);
 
+	clear_closid_rmid(cpu);
 	resctrl_online_cpu(cpu);
-	mutex_unlock(&rdtgroup_mutex);
 
 	return 0;
 }
@@ -611,13 +624,14 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 {
 	struct rdt_resource *r;
 
-	mutex_lock(&rdtgroup_mutex);
 	resctrl_offline_cpu(cpu);
 
+	mutex_lock(&domain_list_lock);
 	for_each_capable_rdt_resource(r)
 		domain_remove_cpu(cpu, r);
+	mutex_unlock(&domain_list_lock);
+
 	clear_closid_rmid(cpu);
-	mutex_unlock(&rdtgroup_mutex);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index b4ed4e1b4938..0620dfc72036 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -209,6 +209,9 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	struct rdt_domain *d;
 	unsigned long dom_id;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
 	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
@@ -313,6 +316,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	struct rdt_domain *d;
 	u32 idx;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
 
@@ -378,11 +384,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 		return -EINVAL;
 	buf[nbytes - 1] = '\0';
 
-	cpus_read_lock();
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
 	if (!rdtgrp) {
 		rdtgroup_kn_unlock(of->kn);
-		cpus_read_unlock();
 		return -ENOENT;
 	}
 	rdt_last_cmd_clear();
@@ -444,7 +448,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 out:
 	rdt_staged_configs_clear();
 	rdtgroup_kn_unlock(of->kn);
-	cpus_read_unlock();
 	return ret ?: nbytes;
 }
 
@@ -464,6 +467,9 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
 	bool sep = false;
 	u32 ctrl_val;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	seq_printf(s, "%*s:", max_name_width, schema->name);
 	list_for_each_entry(dom, &r->domains, list) {
 		if (sep)
@@ -535,7 +541,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	int cpu;
 
 	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
-	lockdep_assert_held(&rdtgroup_mutex);
+	lockdep_assert_cpus_held();
 
 	/*
 	 * Setup the parameters to pass to mon_event_count() to read the data.
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 208e46ba7368..e869372cc35a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -15,6 +15,7 @@
  * Software Developer Manual June 2016, volume 3, section 17.17.
  */
 
+#include <linux/cpu.h>
 #include <linux/module.h>
 #include <linux/sizes.h>
 #include <linux/slab.h>
@@ -471,6 +472,9 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
 
 	entry->busy = 0;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 8056bed033cc..884b88e25141 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -844,6 +844,9 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
 	struct rdt_domain *d_i;
 	bool ret = false;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	if (!zalloc_cpumask_var(&cpu_with_psl, GFP_KERNEL))
 		return true;
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1eb1c9b4aec7..a1257fec2a83 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -35,6 +35,10 @@
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 DEFINE_STATIC_KEY_FALSE(rdt_mon_enable_key);
 DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
+
+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
 static struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
@@ -952,6 +956,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 	bool sep = false;
 	u32 ctrl_val;
 
+	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 	hw_shareable = r->cache.shareable_bits;
 	list_for_each_entry(dom, &r->domains, list) {
@@ -1012,6 +1017,7 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 	}
 	seq_putc(seq, '\n');
 	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
 	return 0;
 }
 
@@ -1254,6 +1260,9 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 	struct rdt_domain *d;
 	u32 ctrl;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
 		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
@@ -1520,6 +1529,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 	struct rdt_domain *dom;
 	bool sep = false;
 
+	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
 	list_for_each_entry(dom, &r->domains, list) {
@@ -1536,6 +1546,7 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid
 	seq_puts(s, "\n");
 
 	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
 
 	return 0;
 }
@@ -1627,6 +1638,9 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
 	struct rdt_domain *d;
 	int ret = 0;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 next:
 	if (!tok || tok[0] == '\0')
 		return 0;
@@ -1668,6 +1682,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
 	if (nbytes == 0 || buf[nbytes - 1] != '\n')
 		return -EINVAL;
 
+	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
 	rdt_last_cmd_clear();
@@ -1677,6 +1692,7 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
 	ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
 
 	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
 
 	return ret ?: nbytes;
 }
@@ -1692,6 +1708,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	if (nbytes == 0 || buf[nbytes - 1] != '\n')
 		return -EINVAL;
 
+	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
 	rdt_last_cmd_clear();
@@ -1701,6 +1718,7 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
 	ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
 
 	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
 
 	return ret ?: nbytes;
 }
@@ -2153,6 +2171,9 @@ static int set_cache_qos_cfg(int level, bool enable)
 	struct rdt_domain *d;
 	int cpu;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	if (level == RDT_RESOURCE_L3)
 		update = l3_qos_cfg_update;
 	else if (level == RDT_RESOURCE_L2)
@@ -2360,6 +2381,7 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
 
 	rdtgroup_kn_get(rdtgrp, kn);
 
+	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
 	/* Was this group deleted while we waited? */
@@ -2377,6 +2399,8 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
 		return;
 
 	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
 	rdtgroup_kn_put(rdtgrp, kn);
 }
 
@@ -2664,6 +2688,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	struct rdt_domain *d;
 	int i;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
 
@@ -2948,6 +2975,9 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
 	struct rdt_domain *dom;
 	int ret;
 
+	/* Walking r->domains, ensure it can't race with cpuhp */
+	lockdep_assert_cpus_held();
+
 	list_for_each_entry(dom, &r->domains, list) {
 		ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
 		if (ret)
@@ -3766,7 +3796,8 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
 	kfree(d->mbm_local);
 }
 
-void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+static void _resctrl_offline_domain(struct rdt_resource *r,
+				    struct rdt_domain *d)
 {
 	lockdep_assert_held(&rdtgroup_mutex);
 
@@ -3801,6 +3832,13 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 	domain_destroy_mon_state(d);
 }
 
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	mutex_lock(&rdtgroup_mutex);
+	_resctrl_offline_domain(r, d);
+	mutex_unlock(&rdtgroup_mutex);
+}
+
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -3832,7 +3870,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
-int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+static int _resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 {
 	int err;
 
@@ -3870,12 +3908,23 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	int err;
+
+	mutex_lock(&rdtgroup_mutex);
+	err = _resctrl_online_domain(r, d);
+	mutex_unlock(&rdtgroup_mutex);
+
+	return err;
+}
+
 void resctrl_online_cpu(unsigned int cpu)
 {
-	lockdep_assert_held(&rdtgroup_mutex);
-
+	mutex_lock(&rdtgroup_mutex);
 	/* The CPU is set in default rdtgroup after online. */
 	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
+	mutex_unlock(&rdtgroup_mutex);
 }
 
 static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
@@ -3894,8 +3943,7 @@ void resctrl_offline_cpu(unsigned int cpu)
 	struct rdtgroup *rdtgrp;
 	struct rdt_domain *d;
 
-	lockdep_assert_held(&rdtgroup_mutex);
-
+	mutex_lock(&rdtgroup_mutex);
 	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
 		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
 			clear_childcpus(rdtgrp, cpu);
@@ -3904,7 +3952,7 @@ void resctrl_offline_cpu(unsigned int cpu)
 	}
 
 	if (!l3->mon_capable)
-		return;
+		goto out_unlock;
 
 	d = get_domain_from_cpu(cpu, l3);
 	if (d) {
@@ -3918,6 +3966,9 @@ void resctrl_offline_cpu(unsigned int cpu)
 			cqm_setup_limbo_handler(d, 0, cpu);
 		}
 	}
+
+out_unlock:
+	mutex_unlock(&rdtgroup_mutex);
 }
 
 /*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 74886cda5f66..0bccb86ba38b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -159,7 +159,7 @@ struct resctrl_schema;
  * @cache_level:	Which cache level defines scope of this resource
  * @cache:		Cache allocation related data
  * @membw:		If the component has bandwidth controls, their properties.
- * @domains:		All domains for this resource
+ * @domains:		RCU list of all domains for this resource
  * @name:		Name to use in "schemata" file.
  * @data_width:		Character width of data when displaying
  * @default_ctrl:	Specifies default cache cbm or memory B/W percent.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* RE: [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
@ 2023-09-17 21:00   ` David Laight
  2023-09-29 16:13     ` James Morse
  2023-10-03 21:14   ` Reinette Chatre
  2023-10-04 20:38   ` Moger, Babu
  2 siblings, 1 reply; 80+ messages in thread
From: David Laight @ 2023-09-17 21:00 UTC (permalink / raw)
  To: 'James Morse', x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

From: James Morse
> Sent: 14 September 2023 18:21
> 
> The resctrl CLOSID allocator uses a single 32bit word to track which
> CLOSID are free. The setting and clearing of bits is open coded.
> 
> A subsequent patch adds resctrl_closid_is_free(), which adds more open
> coded bitmaps operations. These will eventually need changing to use
> the bitops helpers so that a CLOSID bitmap of the correct size can be
> allocated dynamically.
> 
> Convert the existing open coded bit manipulations of closid_free_map
> to use set_bit() and friends.
> 
>  int closids_supported(void)
> @@ -126,7 +126,7 @@ static void closid_init(void)
>  	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
> 
>  	/* CLOSID 0 is always reserved for the default group */
> -	closid_free_map &= ~1;
> +	clear_bit(0, &closid_free_map);

Don't the clear_bit() etc functions use locked accesses?
These are always measurably more expensive than the C operators.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
  2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
@ 2023-09-26 14:31   ` Fenghua Yu
  2023-10-03 21:05   ` Reinette Chatre
  1 sibling, 0 replies; 80+ messages in thread
From: Fenghua Yu @ 2023-09-26 14:31 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Reinette Chatre, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi, Peter and James,

On 9/14/23 10:21, James Morse wrote:
> tick_nohz_full_mask lists the CPUs that are nohz_full. This is only
> needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu() allows
> a specific CPU to be tested against the mask, and evaluates to false
> when CONFIG_NO_HZ_FULL is not defined.
> 
> The resctrl code needs to pick a CPU to run some work on, a new helper
> prefers housekeeping CPUs by examining the tick_nohz_full_mask. Hiding
> the declaration behind #ifdef CONFIG_NO_HZ_FULL forces all the users to
> be behind an ifdef too.
> 
> Move the tick_nohz_full_mask declaration, this lets callers drop the
> ifdef, and guard access to tick_nohz_full_mask with IS_ENABLED() or
> something like tick_nohz_full_cpu().
> 
> The definition does not need to be moved as any callers should be
> removed at compile time unless CONFIG_NO_HZ_FULL is defined.
> 
> CC: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>

checkpatch.pl reports warning:
WARNING: 'Tested-by:' is the preferred signature form
#27:
Tested-By: Peter Newman <peternewman@google.com>

The same warning is reported on all following patches in this series.

According to Documentation/process/submitting-patches.rst, "Tested-by" 
(instead of "Tested-By") is used.

Could you please fix the warnings?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking
  2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
                   ` (23 preceding siblings ...)
  2023-09-14 17:21 ` [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
@ 2023-09-27  7:38 ` Shaopeng Tan (Fujitsu)
  2023-09-29 16:13   ` James Morse
  24 siblings, 1 reply; 80+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2023-09-27  7:38 UTC (permalink / raw)
  To: 'James Morse', x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hello James,

I reviewed this patch series(v6) and ran resctrl selftest on 
Intel(R) Xeon(R) Gold 6254 CPU with nohz_full enabled/disabled, 
there is no problem.

<reviewed-by:tan.shaopeng@jp.fujitsu.com>
<tested-by:tan.shaopeng@jp.fujitsu.com>


> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch
> code and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
> 
> (What's MPAM? See the cover letter of the first series. [1])
> 
> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but
> this isn't an independent number - it extends the PARTID (same as CLOSID)
> space with bits that aren't used to select the configuration. The monitors can
> then be told to match specific PMG values, allowing monitor-groups to be
> created.
> 
> But, MPAM expects the monitors to always monitor by PARTID. The
> Cache-storage-utilisation counters can only work this way.
> (In the MPAM spec not setting the MATCH_PARTID bit is made
> CONSTRAINED UNPREDICTABLE - which is Arm's term to mean portable
> software can't rely on
> this)
> 
> It gets worse, as some SoCs may have very few PMG bits. I've seen the
> datasheet for one that has a single bit of PMG space.
> 
> To be usable, MPAM's counters always need the PARTID and the PMG.
> For resctrl, this means always making the CLOSID available when the RMID is
> used.
> 
> To ensure RMID are always unique, this series combines the CLOSID and
> RMID into an index, and manages RMID based on that. For x86, the index and
> RMID would always be the same.
> 
> 
> Currently the architecture specific code in the cpuhp callbacks takes the
> rdtgroup_mutex. This means the filesystem code would have to export this lock,
> resulting in an ill-defined interface between the two, and the possibility of
> cross-architecture lock-ordering head aches.
> 
> The second part of this series adds a domain_list_lock to protect writes to the
> domain list, and protects the domain list with RCU - or cpus_read_lock().
> 
> Use of RCU is to allow lockless readers of the domain list. To get MPAMs
> monitors working, its very likely they'll need to be plumbed up to perf. An
> uncore PMU driver would need to be a lockless reader of the domain list.
> 
> This series is based on v6.6-rc1, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/monitors_and_locking/v6
> 
> Bugs welcome,
> 
> Thanks,
> 
> James
> 
> [1]
> https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com
> /
> [v1]
> https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/
> [v2]
> https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com
> /
> [v3]
> https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com
> [v4]
> https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com
> [v6]
> https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com
> /
> 
> 
> James Morse (24):
>   tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
>   x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
>   x86/resctrl: Create helper for RMID allocation and mondata dir
>     creation
>   x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
>   x86/resctrl: Track the closid with the rmid
>   x86/resctrl: Access per-rmid structures by index
>   x86/resctrl: Allow RMID allocation to be scoped by CLOSID
>   x86/resctrl: Track the number of dirty RMID a CLOSID has
>   x86/resctrl: Use set_bit()/clear_bit() instead of open coding
>   x86/resctrl: Allocate the cleanest CLOSID by searching
>     closid_num_dirty_rmid
>   x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
>   x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
>   x86/resctrl: Queue mon_event_read() instead of sending an IPI
>   x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
>   x86/resctrl: Allow arch to allocate memory needed in
>     resctrl_arch_rmid_read()
>   x86/resctrl: Make resctrl_mounted checks explicit
>   x86/resctrl: Move alloc/mon static keys into helpers
>   x86/resctrl: Make rdt_enable_key the arch's decision to switch
>   x86/resctrl: Add helpers for system wide mon/alloc capable
>   x86/resctrl: Add CPU online callback for resctrl work
>   x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
>     cpu
>   x86/resctrl: Add cpu offline callback for resctrl work
>   x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
>   x86/resctrl: Separate arch and fs resctrl locks
> 
>  arch/x86/include/asm/resctrl.h            |  90 +++++
>  arch/x86/kernel/cpu/resctrl/core.c        |  78 ++--
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  47 ++-
>  arch/x86/kernel/cpu/resctrl/internal.h    |  56 ++-
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 434
> +++++++++++++++++-----
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  15 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 345 ++++++++++++-----
>  include/linux/resctrl.h                   |  43 ++-
>  include/linux/tick.h                      |   9 +-
>  9 files changed, 857 insertions(+), 260 deletions(-)
> 
> --
> 2.39.2


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-09-17 21:00   ` David Laight
@ 2023-09-29 16:13     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-09-29 16:13 UTC (permalink / raw)
  To: David Laight, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Hi David,

On 17/09/2023 22:00, David Laight wrote:
> From: James Morse
>> Sent: 14 September 2023 18:21
>>
>> The resctrl CLOSID allocator uses a single 32bit word to track which
>> CLOSID are free. The setting and clearing of bits is open coded.
>>
>> A subsequent patch adds resctrl_closid_is_free(), which adds more open
>> coded bitmaps operations. These will eventually need changing to use
>> the bitops helpers so that a CLOSID bitmap of the correct size can be
>> allocated dynamically.
>>
>> Convert the existing open coded bit manipulations of closid_free_map
>> to use set_bit() and friends.
>>
>>  int closids_supported(void)
>> @@ -126,7 +126,7 @@ static void closid_init(void)
>>  	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
>>
>>  	/* CLOSID 0 is always reserved for the default group */
>> -	closid_free_map &= ~1;
>> +	clear_bit(0, &closid_free_map);

> Don't the clear_bit() etc functions use locked accesses?

Yes. In this case there is no need for it to be atomic, just to use the bitmap API so this
can be made bigger in the future. It's currently protected by the rdtgroup_mutex (I'll add
some lockdep annotations to document that).


> These are always measurably more expensive than the C operators.

I'll switch this to use the double-underscore version which are non-atomic,
double-underscore is usually a warning not to use this function!

I doubt the performance matters as this is only ever called from a mkdir() syscall when
the configuration is changed, which we anticipate only really happens once at boot.



Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking
  2023-09-27  7:38 ` [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
@ 2023-09-29 16:13   ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-09-29 16:13 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hello!

On 27/09/2023 08:38, Shaopeng Tan (Fujitsu) wrote:
> I reviewed this patch series(v6) and ran resctrl selftest on 
> Intel(R) Xeon(R) Gold 6254 CPU with nohz_full enabled/disabled, 
> there is no problem.
> 
> <reviewed-by:tan.shaopeng@jp.fujitsu.com>
> <tested-by:tan.shaopeng@jp.fujitsu.com>

Thanks for your testing and review!

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-09-14 17:21 ` [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() James Morse
@ 2023-10-02 17:00   ` Reinette Chatre
  2023-10-05 17:05     ` James Morse
  2023-10-04 18:00   ` Moger, Babu
  1 sibling, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-02 17:00 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 725344048f85..a2158c266e41 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>  
>  void __exit rdtgroup_exit(void)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> +	if (r->mon_capable)
> +		resctrl_exit_mon_l3_config(r);
> +
>  	debugfs_remove_recursive(debugfs_resctrl);
>  	unregister_filesystem(&rdt_fs_type);
>  	sysfs_remove_mount_point(fs_kobj, "resctrl");

You did not respond to me when I requested that this be done differently [1].
Without a response letting me know the faults of my proposal or following the
recommendation I conclude that my feedback was ignored. 

Reinette 

[1] https://lore.kernel.org/lkml/1ccd6be5-1dbd-c4a5-659f-ae20761dcce7@intel.com/

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
  2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
  2023-09-26 14:31   ` Fenghua Yu
@ 2023-10-03 21:05   ` Reinette Chatre
  1 sibling, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:05 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

On 9/14/2023 10:21 AM, James Morse wrote:
> tick_nohz_full_mask lists the CPUs that are nohz_full. This is only
> needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu() allows
> a specific CPU to be tested against the mask, and evaluates to false
> when CONFIG_NO_HZ_FULL is not defined.
> 
> The resctrl code needs to pick a CPU to run some work on, a new helper
> prefers housekeeping CPUs by examining the tick_nohz_full_mask. Hiding
> the declaration behind #ifdef CONFIG_NO_HZ_FULL forces all the users to
> be behind an ifdef too.
> 
> Move the tick_nohz_full_mask declaration, this lets callers drop the
> ifdef, and guard access to tick_nohz_full_mask with IS_ENABLED() or
> something like tick_nohz_full_cpu().
> 
> The definition does not need to be moved as any callers should be
> removed at compile time unless CONFIG_NO_HZ_FULL is defined.
> 
> CC: Frederic Weisbecker <frederic@kernel.org>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Acked-by: Reinette Chatre <reinette.chatre@intel.com> # for resctrl dependency

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation
  2023-09-14 17:21 ` [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
@ 2023-10-03 21:07   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:07 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> When monitoring is supported, each monitor and control group is allocated
> an RMID. For control groups, rdtgroup_mkdir_ctrl_mon() later goes on to
> allocate the CLOSID.
> 
> MPAM's equivalent of RMID are not an independent number, so can't be
> allocated until the CLOSID is known. An RMID allocation for one CLOSID
> may fail, whereas another may succeed depending on how many monitor
> groups a control group has.
> 
> The RMID allocation needs to move to be after the CLOSID has been
> allocated.
> 
> Move the RMID allocation and mondata dir creation to a helper, this
> makes a subsequent change easier to read.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Please follow the x86 custom for ordering of tags. You can
find this documented in section "Ordering of commit tags"
in Documentation/process/maintainer-tip.rst. Please do
so for all the x86 patches in this series. I believe this
also applies to the tick.h patch.

For this and following patches please consider when a review tag
is provided it is done with expectation that the commit tag ordering
is fixed. This is the only scenario I am doing it for.

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  2023-09-14 17:21 ` [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
@ 2023-10-03 21:07   ` Reinette Chatre
  2023-10-04 18:01   ` Moger, Babu
  1 sibling, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:07 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> RMID are allocated for each monitor or control group directory, because
> each of these needs its own RMID. For control groups,
> rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID.
> 
> MPAM's equivalent of RMID is not an independent number, so can't be
> allocated until the CLOSID is known. An RMID allocation for one CLOSID
> may fail, whereas another may succeed depending on how many monitor
> groups a control group has.
> 
> The RMID allocation needs to move to be after the CLOSID has been
> allocated.
> 
> Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller,
> after the mkdir_rdt_prepare() call. This allows the RMID allocator to
> know the CLOSID.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid
  2023-09-14 17:21 ` [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid James Morse
@ 2023-10-03 21:11   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:11 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index cfb3f632a4b2..42b9a694fe2f 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -24,7 +24,21 @@
>  
>  #include "internal.h"
>  
> +/**
> + * struct rmid_entry - dirty tracking for all RMID.
> + * @closid:	The CLOSID for this entry.
> + * @rmid:	The RMID for this entry.
> + * @busy:	The number of domains with cached data using this RMID.
> + * @list:	Member of the rmid_free_lru list when busy == 0.
> + *
> + * Some architectures's resctrl_arch_rmid_read() needs the CLOSID value
> + * in order to access the correct monitor. @closid provides the value to
> + * list walkers like __check_limbo(). On x86 this is ignored.

I do not think this is correct. At this point in the series
__check_limbo() uses @rmid as index, at end of series it uses the
(@closid, @rmid) index. Never does the list walker use @closid.

Perhaps something like below that matches your later similar comments:

	Depending on the architecture the correct monitor is accessed
	using both @closid and @rmid, or @rmid only.


...

  
> @@ -685,11 +706,11 @@ void mbm_handle_overflow(struct work_struct *work)
>  	d = container_of(work, struct rdt_domain, mbm_over.work);
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> -		mbm_update(r, d, prgrp->mon.rmid);
> +		mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
>  
>  		head = &prgrp->mon.crdtgrp_list;
>  		list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> -			mbm_update(r, d, crgrp->mon.rmid);
> +			mbm_update(r, d, crgrp->closid, crgrp->mon.rmid);
>  
>  		if (is_mba_sc(NULL))
>  			update_mba_bw(prgrp, d);
> @@ -732,10 +753,11 @@ static int dom_data_init(struct rdt_resource *r)
>  	}
>  
>  	/*
> -	 * RMID 0 is special and is always allocated. It's used for all
> -	 * tasks that are not monitored.
> +	 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and
> +	 * are always allocated. These are used for rdtgroup_default control
> +	 * group, which will be setup later. See rdtgroup_setup_root().
>  	 */

This comment will not be accurate after Babu's changes are merged (the function will
be rdtgroup_setup_default()). To avoid that conflict you could perhaps change
last two sentences to something like below that will be accurate no matter the
order of merging between your and Babu's work:

	These are used for rdtgroup_default control group, which will be
	setup later in rdtgroup_init().

> -	entry = __rmid_entry(0);
> +	entry = __rmid_entry(RESCTRL_RESERVED_CLOSID, RESCTRL_RESERVED_RMID);
>  	list_del(&entry->list);
>  
>  	return 0;


My feedback only relates to the comments. The rest of the patch looks good to
me. I could give a review tag with expectation that comments be addressed in next
version but since some review feedback fell through the cracks in this version I
feel that I need to confirm first before providing review tag.

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index
  2023-09-14 17:21 ` [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index James Morse
@ 2023-10-03 21:12   ` Reinette Chatre
  2023-10-24  9:28   ` Maciej Wieczór-Retman
  1 sibling, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:12 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 91a6ea783200..ab96af8d9953 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -7,6 +7,7 @@
>  #include <linux/kernfs.h>
>  #include <linux/fs_context.h>
>  #include <linux/jump_label.h>
> +#include <asm/resctrl.h>
>  

Please use empty line between the different groups of headers.

The rest of the patch looks good to me.

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID
  2023-09-14 17:21 ` [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
@ 2023-10-03 21:12   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:12 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index be0b7cb6e1f5..d286aba1ee63 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -345,24 +345,50 @@ bool has_busy_rmid(struct rdt_domain *d)
>  	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
>  }
>  
> -/*
> - * As of now the RMIDs allocation is global.
> - * However we keep track of which packages the RMIDs
> - * are used to optimize the limbo list management.
> - */
> -int alloc_rmid(void)
> +static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
>  {
> -	struct rmid_entry *entry;
> -
> -	lockdep_assert_held(&rdtgroup_mutex);
> +	struct rmid_entry *itr;
> +	u32 itr_idx, cmp_idx;
>  
>  	if (list_empty(&rmid_free_lru))
> -		return rmid_limbo_count ? -EBUSY : -ENOSPC;
> +		return rmid_limbo_count ? ERR_PTR(-EBUSY) : ERR_PTR(-ENOSPC);
> +
> +	list_for_each_entry(itr, &rmid_free_lru, list) {
> +		/*
> +		 * Get the index of this free RMID, and the index it would need
> +		 * to be if it were used with this CLOSID.
> +		 * If the CLOSID is irrelevant on this architecture, the two
> +		 * index values are always same on every entry and thus the

"are always same" -> "are always the same"?

> +		 * very first entry will be returned.
> +

Stray empty line.

> +		 */
> +		itr_idx = resctrl_arch_rmid_idx_encode(itr->closid, itr->rmid);
> +		cmp_idx = resctrl_arch_rmid_idx_encode(closid, itr->rmid);
> +
> +		if (itr_idx == cmp_idx)
> +			return itr;
> +	}
> +
> +	return ERR_PTR(-ENOSPC);
> +}
> +
> +/*

Rest of the patch looks good.

Reinette


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has
  2023-09-14 17:21 ` [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
@ 2023-10-03 21:13   ` Reinette Chatre
  2023-10-05 17:07     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:13 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> @@ -796,13 +817,30 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>  static int dom_data_init(struct rdt_resource *r)
>  {
>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	u32 num_closid = resctrl_arch_get_num_closid(r);
>  	struct rmid_entry *entry = NULL;
> +	int err = 0, i;
>  	u32 idx;
> -	int i;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> +		int *tmp;
> +
> +		tmp = kcalloc(num_closid, sizeof(int), GFP_KERNEL);

Shouldn't this rather be sizeof(unsigned int) to match the type it will store?

> +		if (!tmp) {
> +			err = -ENOMEM;
> +			goto out_unlock;
> +		}
> +
> +		closid_num_dirty_rmid = tmp;
> +	}
>  
>  	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
> -	if (!rmid_ptrs)
> -		return -ENOMEM;
> +	if (!rmid_ptrs) {
> +		kfree(closid_num_dirty_rmid);
> +		err = -ENOMEM;
> +		goto out_unlock;
> +	}
>  
>  	for (i = 0; i < idx_limit; i++) {
>  		entry = &rmid_ptrs[i];
> @@ -822,13 +860,21 @@ static int dom_data_init(struct rdt_resource *r)
>  	entry = __rmid_entry(idx);
>  	list_del(&entry->list);
>  
> -	return 0;
> +out_unlock:
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return err;
>  }
>  
>  void resctrl_exit_mon_l3_config(struct rdt_resource *r)
>  {
>  	mutex_lock(&rdtgroup_mutex);
>  
> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> +		kfree(closid_num_dirty_rmid);
> +		closid_num_dirty_rmid = NULL;
> +	}
> +
>  	kfree(rmid_ptrs);
>  	rmid_ptrs = NULL;
>  

Awaiting response on patch #2 related to above hunk.

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
  2023-09-17 21:00   ` David Laight
@ 2023-10-03 21:14   ` Reinette Chatre
  2023-10-04 20:38   ` Moger, Babu
  2 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:14 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The resctrl CLOSID allocator uses a single 32bit word to track which
> CLOSID are free. The setting and clearing of bits is open coded.
> 
> A subsequent patch adds resctrl_closid_is_free(), which adds more open

resctrl_closid_is_free() is not added in this series.

> coded bitmaps operations. These will eventually need changing to use
> the bitops helpers so that a CLOSID bitmap of the correct size can be
> allocated dynamically.
> 
> Convert the existing open coded bit manipulations of closid_free_map
> to use set_bit() and friends.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ac1a6437469f..fa449ee0d1a7 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -106,7 +106,7 @@ void rdt_staged_configs_clear(void)
>   * - Our choices on how to configure each resource become progressively more
>   *   limited as the number of resources grows.
>   */
> -static int closid_free_map;
> +static unsigned long closid_free_map;
>  static int closid_free_map_len;
>  
>  int closids_supported(void)
> @@ -126,7 +126,7 @@ static void closid_init(void)
>  	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
>  
>  	/* CLOSID 0 is always reserved for the default group */
> -	closid_free_map &= ~1;
> +	clear_bit(0, &closid_free_map);
>  	closid_free_map_len = rdt_min_closid;
>  }
>  
> @@ -137,14 +137,14 @@ static int closid_alloc(void)
>  	if (closid == 0)
>  		return -ENOSPC;
>  	closid--;
> -	closid_free_map &= ~(1 << closid);
> +	clear_bit(closid, &closid_free_map);
>  
>  	return closid;
>  }
>  
>  void closid_free(int closid)
>  {
> -	closid_free_map |= 1 << closid;
> +	set_bit(closid, &closid_free_map);
>  }
>  
>  /**
> @@ -156,7 +156,7 @@ void closid_free(int closid)
>   */
>  static bool closid_allocated(unsigned int closid)
>  {
> -	return (closid_free_map & (1 << closid)) == 0;
> +	return !test_bit(closid, &closid_free_map);
>  }
>  
>  /**

The patch looks good to me.

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
@ 2023-10-03 21:14   ` Reinette Chatre
  2023-10-05 20:13   ` Moger, Babu
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:14 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
> 
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
> 
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
> 
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
  2023-09-14 17:21 ` [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
@ 2023-10-03 21:15   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:15 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> When switching tasks, the CLOSID and RMID that the new task should
> use are stored in struct task_struct. For x86 the CLOSID known by resctrl,
> the value in task_struct, and the value written to the CPU register are
> all the same thing.
> 
> MPAM's CPU interface has two different PARTID's one for data accesses
> the other for instruction fetch. Storing resctrl's CLOSID value in
> struct task_struct implies the arch code knows whether resctrl is using
> CDP.
> 
> Move the matching and setting of the struct task_struct properties
> to use helpers. This allows arm64 to store the hardware format of
> the register, instead of having to convert it each time.
> 
> __rdtgroup_move_task()s use of READ_ONCE()/WRITE_ONCE() ensures torn
> values aren't seen as another CPU may schedule the task being moved
> while the value is being changed. MPAM has an additional corner-case
> here as the PMG bits extend the PARTID space. If the scheduler sees a
> new-CLOSID but old-RMID, the task will dirty an RMID that the limbo code
> is not watching causing an inaccurate count. x86's RMID are independent
> values, so the limbo code will still be watching the old-RMID in this
> circumstance.
> To avoid this, arm64 needs both the CLOSID/RMID WRITE_ONCE()d together.
> Both values must be provided together.
> 
> Because MPAM's RMID values are not unique, the CLOSID must be provided
> when matching the RMID.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
  2023-09-14 17:21 ` [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
@ 2023-10-03 21:15   ` Reinette Chatre
  2023-10-05 17:07     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:15 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The limbo and overflow code picks a CPU to use from the domain's list
> of online CPUs. Work is then scheduled on these CPUs to maintain
> the limbo list and any counters that may overflow.
> 
> cpumask_any() may pick a CPU that is marked nohz_full, which will
> either penalise the work that CPU was dedicated to, or delay the
> processing of limbo list or counters that may overflow. Perhaps
> indefinitely. Delaying the overflow handling will skew the bandwidth
> values calculated by mba_sc, which expects to be called once a second.
> 
> Add cpumask_any_housekeeping() as a replacement for cpumask_any()
> that prefers housekeeping CPUs. This helper will still return
> a nohz_full CPU if that is the only option. The CPU to use is
> re-evaluated each time the limbo/overflow work runs. This ensures
> the work will move off a nohz_full CPU once a housekeeping CPU is
> available.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v3:
>  * typos fixed
> 
> Changes since v4:
>  * Made temporary variables unsigned
> 
> Changes since v5:
>  * Restructured cpumask_any_housekeeping() to avoid later churn.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 24 ++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 17 ++++++++++++-----
>  2 files changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index f06d3d3e0808..37bb3de37a4a 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -7,6 +7,7 @@
>  #include <linux/kernfs.h>
>  #include <linux/fs_context.h>
>  #include <linux/jump_label.h>
> +#include <linux/tick.h>
>  #include <asm/resctrl.h>
>  

Please maintain the empty line between groups of headers.


...

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0bbed8c62d42..993837e46db1 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -782,9 +782,9 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
>  void cqm_handle_limbo(struct work_struct *work)
>  {
>  	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
> -	int cpu = smp_processor_id();
>  	struct rdt_resource *r;
>  	struct rdt_domain *d;
> +	int cpu;
>  
>  	mutex_lock(&rdtgroup_mutex);
>  
> @@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)
>  
>  	__check_limbo(d, false);
>  
> -	if (has_busy_rmid(d))
> +	if (has_busy_rmid(d)) {
> +		cpu = cpumask_any_housekeeping(&d->cpu_mask);
>  		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
> +	}
>  

ok - but if you do change the CPU the worker is running on then
I also expect d->cqm_work_cpu to be updated. Otherwise the offline
code will not be able to determine if the worker needs to move.

>  	mutex_unlock(&rdtgroup_mutex);
>  }
> @@ -804,7 +806,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
>  
> -	cpu = cpumask_any(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
>  	dom->cqm_work_cpu = cpu;
>  
>  	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
> @@ -814,10 +816,10 @@ void mbm_handle_overflow(struct work_struct *work)
>  {
>  	unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
>  	struct rdtgroup *prgrp, *crgrp;
> -	int cpu = smp_processor_id();
>  	struct list_head *head;
>  	struct rdt_resource *r;
>  	struct rdt_domain *d;
> +	int cpu;
>  
>  	mutex_lock(&rdtgroup_mutex);
>  
> @@ -838,6 +840,11 @@ void mbm_handle_overflow(struct work_struct *work)
>  			update_mba_bw(prgrp, d);
>  	}
>  
> +	/*
> +	 * Re-check for housekeeping CPUs. This allows the overflow handler to
> +	 * move off a nohz_full CPU quickly.
> +	 */
> +	cpu = cpumask_any_housekeeping(&d->cpu_mask);
>  	schedule_delayed_work_on(cpu, &d->mbm_over, delay);
>  

Similar to above I expect a change like this to
be accompanied by an update to d->mbm_work_cpu.

>  out_unlock:
> @@ -851,7 +858,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>  
>  	if (!static_branch_likely(&rdt_mon_enable_key))
>  		return;
> -	cpu = cpumask_any(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
>  	dom->mbm_work_cpu = cpu;
>  	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>  }


Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI
  2023-09-14 17:21 ` [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
@ 2023-10-03 21:17   ` Reinette Chatre
  2023-10-25 17:56     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:17 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...
 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index b44c487727d4..bd263b9a0abd 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -19,6 +19,7 @@
>  #include <linux/kernfs.h>
>  #include <linux/seq_file.h>
>  #include <linux/slab.h>
> +#include <linux/tick.h>
>  #include "internal.h"
>  

Please keep the empty line between groups of header files.

>  /*
> @@ -520,12 +521,24 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +static int smp_mon_event_count(void *arg)
> +{
> +	mon_event_count(arg);
> +
> +	return 0;
> +}
> +
>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>  		    int evtid, int first)
>  {
> +	int cpu;
> +
> +	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */

This comment is not accurate at this point. It should accompany the code it applies to.

> +	lockdep_assert_held(&rdtgroup_mutex);
> +
>  	/*
> -	 * setup the parameters to send to the IPI to read the data.
> +	 * Setup the parameters to pass to mon_event_count() to read the data.
>  	 */
>  	rr->rgrp = rdtgrp;
>  	rr->evtid = evtid;


Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  2023-09-14 17:21 ` [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
@ 2023-10-03 21:18   ` Reinette Chatre
  2023-10-25 17:57     ` James Morse
  2023-10-05 21:33   ` Moger, Babu
  1 sibling, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:18 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

..

> @@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>  			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>  			   u64 *val);
>  
> +/**
> + * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
> + *
> + * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
> + * resctrl_arch_rmid_read() is called with preemption disabled.
> + */
> +static inline void resctrl_arch_rmid_read_context_check(void)
> +{
> +	if (!irqs_disabled())
> +		might_sleep();
> +}
>  
>  /**
>   * resctrl_arch_reset_rmid() - Reset any private state associated with rmid

I was expecting the above to look like you said it would look [1].

Reinette

[1] https://lore.kernel.org/lkml/9d69d0ca-212d-9b1b-3001-9f56731e48fd@arm.com/

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
  2023-09-14 17:21 ` [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
@ 2023-10-03 21:18   ` Reinette Chatre
  2023-10-05 21:46   ` Moger, Babu
  1 sibling, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:18 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> Depending on the number of monitors available, Arm's MPAM may need to
> allocate a monitor prior to reading the counter value. Allocating a
> contended resource may involve sleeping.
> 
> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
> the allocation should be valid for all domains.
> 
> __check_limbo() and mon_event_count() each make multiple calls to
> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
> the allocation should be valid for multiple invocations of
> resctrl_arch_rmid_read().
> 
> Add arch hooks for this allocation, which need calling before
> resctrl_arch_rmid_read(). The allocated monitor is passed to
> resctrl_arch_rmid_read(), then freed again afterwards. The helper
> can be called on any CPU, and can sleep.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit
  2023-09-14 17:21 ` [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
@ 2023-10-03 21:19   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The rdt_enable_key is switched when resctrl is mounted, and used to
> prevent a second mount of the filesystem. It also enables the
> architecture's context switch code.
> 
> This requires another architecture to have the same set of static-keys,
> as resctrl depends on them too. The existing users of these static-keys
> are implicitly also checking if the filesystem is mounted.
> 
> Make the resctrl_mounted checks explicit: resctrl can keep track of
> whether it has been mounted once. This doesn't need to be combined with
> whether the arch code is context switching the CLOSID.
> 
> rdt_mon_enable_key is never used just to test that resctrl is mounted,
> but does also have this implication. Add a resctrl_mounted to all uses
> of rdt_mon_enable_key. This will allow rdt_mon_enable_key to be swapped
> with a helper in a subsequent patch.
> 
> This will allow the static-key changing to be moved behind resctrl_arch_
> calls.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers
  2023-09-14 17:21 ` [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
@ 2023-10-03 21:19   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> resctrl enables three static keys depending on the features it has enabled.
> Another architecture's context switch code may look different, any
> static keys that control it should be buried behind helpers.
> 
> Move the alloc/mon logic into arch-specific helpers as a preparatory step
> for making the rdt_enable_key's status something the arch code decides.
> 
> This means other architectures don't have to mirror the static keys.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch
  2023-09-14 17:21 ` [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
@ 2023-10-03 21:19   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> rdt_enable_key is switched when resctrl is mounted. It was also previously
> used to prevent a second mount of the filesystem.
> 
> Any other architecture that wants to support resctrl has to provide
> identical static keys.
> 
> Now that there are helpers for enabling and disabling the alloc/mon keys,
> resctrl doesn't need to switch this extra key, it can be done by the arch
> code. Use the static-key increment and decrement helpers, and change
> resctrl to ensure the calls are balanced.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable
  2023-09-14 17:21 ` [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
@ 2023-10-03 21:19   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> resctrl reads rdt_alloc_capable or rdt_mon_capable to determine
> whether any of the resources support the corresponding features.
> resctrl also uses the static-keys that affect the architecture's
> context-switch code to determine the same thing.
> 
> This forces another architecture to have the same static-keys.
> 
> As the static-key is enabled based on the capable flag, and none of
> the filesystem uses of these are in the scheduler path, move the
> capable flags behind helpers, and use these in the filesystem
> code instead of the static-key.
> 
> After this change, only the architecture code manages and uses
> the static-keys to ensure __resctrl_sched_in() does not need
> runtime checks.
> 
> This avoids multiple architectures having to define the same
> static-keys.
> 
> Cases where the static-key implicitly tested if the resctrl
> filesystem was mounted all have an explicit check added by a
> previous patch.
> 
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work
  2023-09-14 17:21 ` [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work James Morse
@ 2023-10-03 21:20   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:20 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The resctrl architecture specific code may need to create a domain when
> a CPU comes online, it also needs to reset the CPUs PQR_ASSOC register.
> The resctrl filesystem code needs to update the rdtgroup_default CPU
> mask when CPUs are brought online.
> 
> Currently this is all done in one function, resctrl_online_cpu().
> This will need to be split into architecture and filesystem parts
> before resctrl can be moved to /fs/.
> 
> Pull the rdtgroup_default update work out as a filesystem specific
> cpu_online helper. resctrl_online_cpu() is the obvious name for this,
> which means the version in core.c needs renaming.
> 
> resctrl_online_cpu() is called by the arch code once it has done the
> work to add the new CPU to any domains.
> 
> In future patches, resctrl_online_cpu() will take the rdtgroup_mutex
> itself.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu
  2023-09-14 17:21 ` [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
@ 2023-10-03 21:22   ` Reinette Chatre
  2023-10-25 17:57     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:22 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c54fa86e4ef9..bd7f60bf49fe 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -60,11 +60,15 @@
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
>   * @mask:	The mask to pick a CPU from.
> + * @exclude_cpu:The CPU to avoid picking.
>   *
> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
> - * nohz_full, these are preferred.
> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
> + * CPUs that don't use nohz_full, these are preferred. Pass
> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
> + * Returns >= nr_cpu_ids if no CPUs are available.

It may be helpful to add that the function can only fail if exclude_cpu is
*not* RESCTRL_PICK_ANY_CPU. That helps to understand the sparse error checking.

>   */
> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
> +static inline unsigned int
> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>  {
>  	unsigned int cpu, hk_cpu;
>  
> @@ -73,6 +77,9 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>  		return cpu;
>  

It is not obvious from this hunk but I cannot see how this would work
on a system without any nohz_full CPUs.

At this point the function looks like:

	cpu = cpumask_any(mask);
	if (!tick_nohz_full_cpu(cpu))
		return cpu;

I expected exclude_cpu to be taken into account. If I understand correctly
exclude_cpu can be picked by cpumask_any() and as long as it is not
a nohz_full CPU it would be returned.


>  	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
> +	if (hk_cpu == exclude_cpu)
> +		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
> +
>  	if (hk_cpu < nr_cpu_ids)
>  		cpu = hk_cpu;
>  
> @@ -565,11 +572,13 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>  		    int evtid, int first);
>  void mbm_setup_overflow_handler(struct rdt_domain *dom,
> -				unsigned long delay_ms);
> +				unsigned long delay_ms,
> +				int exclude_cpu);
>  void mbm_handle_overflow(struct work_struct *work);
>  void __init intel_rdt_mbm_apply_quirk(void);
>  bool is_mba_sc(struct rdt_resource *r);
> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
> +void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +			     int exclude_cpu);
>  void cqm_handle_limbo(struct work_struct *work);
>  bool has_busy_rmid(struct rdt_domain *d);
>  void __check_limbo(struct rdt_domain *d, bool force_free);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 9c6d4b0970e2..208e46ba7368 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -480,7 +480,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>  		 * setup up the limbo worker.
>  		 */
>  		if (!has_busy_rmid(d))
> -			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL);
> +			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL,
> +						RESCTRL_PICK_ANY_CPU);
>  		set_bit(idx, d->rmid_busy_llc);
>  		entry->busy++;
>  	}
> @@ -807,22 +808,31 @@ void cqm_handle_limbo(struct work_struct *work)
>  	__check_limbo(d, false);
>  
>  	if (has_busy_rmid(d)) {
> -		cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +		cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>  		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
>  	}
>  
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
> +/**
> + * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
> + *                             domain.
> + * @delay_ms:      How far in the future the handler should run.
> + * @exclude_cpu:   Which CPU the handler should not run on,
> + *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
> + */

arch/x86/kernel/cpu/resctrl/monitor.c:824: info: Scanning doc for function cqm_setup_limbo_handler
arch/x86/kernel/cpu/resctrl/monitor.c:832: warning: Function parameter or member 'dom' not described in 'cqm_setup_limbo_handler'


> +void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +			     int exclude_cpu)
>  {
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
>  
> -	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
>  	dom->cqm_work_cpu = cpu;
>  
> -	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
> +	if (cpu < nr_cpu_ids)
> +		schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
>  }
>  
>  void mbm_handle_overflow(struct work_struct *work)
> @@ -861,14 +871,22 @@ void mbm_handle_overflow(struct work_struct *work)
>  	 * Re-check for housekeeping CPUs. This allows the overflow handler to
>  	 * move off a nohz_full CPU quickly.
>  	 */
> -	cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>  	schedule_delayed_work_on(cpu, &d->mbm_over, delay);
>  
>  out_unlock:
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> -void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
> +/**
> + * mbm_setup_overflow_handler() - Schedule the overflow handler to run for this
> + *                                domain.
> + * @delay_ms:      How far in the future the handler should run.
> + * @exclude_cpu:   Which CPU the handler should not run on,
> + *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
> + */

arch/x86/kernel/cpu/resctrl/monitor.c:887: info: Scanning doc for function mbm_setup_overflow_handler
arch/x86/kernel/cpu/resctrl/monitor.c:895: warning: Function parameter or member 'dom' not described in 'mbm_setup_overflow_handler'


> +void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +				int exclude_cpu)
>  {
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
> @@ -879,9 +897,11 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>  	 */
>  	if (!resctrl_mounted || !resctrl_arch_mon_capable())
>  		return;
> -	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
>  	dom->mbm_work_cpu = cpu;
> -	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
> +
> +	if (cpu < nr_cpu_ids)
> +		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>  }
>  
>  static int dom_data_init(struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 0c609cdfe7e5..49f100c73838 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2552,7 +2552,8 @@ static int rdt_get_tree(struct fs_context *fc)
>  	if (is_mbm_enabled()) {
>  		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>  		list_for_each_entry(dom, &r->domains, list)
> -			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
> +			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
> +						   RESCTRL_PICK_ANY_CPU);
>  	}
>  
>  	goto out;
> @@ -3850,7 +3851,8 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  
>  	if (is_mbm_enabled()) {
>  		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
> -		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
> +		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL,
> +					   RESCTRL_PICK_ANY_CPU);
>  	}
>  
>  	if (is_llc_occupancy_enabled())
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 9d5f75a4e192..0888d1975161 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -10,6 +10,8 @@
>  #define RESCTRL_RESERVED_CLOSID		0
>  #define RESCTRL_RESERVED_RMID		0
>  
> +#define RESCTRL_PICK_ANY_CPU		-1
> +
>  #ifdef CONFIG_PROC_CPU_RESCTRL
>  
>  int proc_resctrl_show(struct seq_file *m,

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work
  2023-09-14 17:21 ` [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
@ 2023-10-03 21:23   ` Reinette Chatre
  2023-10-25 17:57     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:23 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> The resctrl architecture specific code may need to free a domain when
> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
> Amongst other things, the resctrl filesystem code needs to clear this
> CPU from the cpu_mask of any control and monitor groups.
> 
> Currently this is all done in core.c and called from
> resctrl_offline_cpu(), making the split between architecture and
> filesystem code unclear.
> 
> Move the filesystem work to remove the CPU from the control and monitor
> groups into a filesystem helper called resctrl_offline_cpu(), and rename
> the one in core.c resctrl_arch_offline_cpu().
> 
> The rdtgroup_mutex is unlocked and locked again in the call in
> preparation for changing the locking rules for the architecture
> code.

This last paragraph may cause some confusion since this refactoring
is not changing any current locking. I'll defer to you if you prefer
to keep it.

> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
  2023-09-14 17:21 ` [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
@ 2023-10-03 21:23   ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:23 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> When a CPU is taken offline the resctrl filesystem code needs to check
> if it was the CPU nominated to perform the periodic overflow and limbo
> work. If so, another CPU needs to be chosen to do this work.
> 
> This is currently done in core.c, mixed in with the code that removes
> the CPU from the domain's mask, and potentially free()s the domain.
> 
> Move the migration of the overflow and limbo helpers into the filesystem
> code, into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before
> the architecture code has removed the CPU from the domain mask, the
> callers need to be told which CPU is being removed, to avoid picking
> it as the new CPU. This uses the exclude_cpu feature previously
> added.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks
  2023-09-14 17:21 ` [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
@ 2023-10-03 21:28   ` Reinette Chatre
  2023-10-25 17:55     ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-03 21:28 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 10:21 AM, James Morse wrote:
> resctrl has one mutex that is taken by the architecture specific code,
> and the filesystem parts. The two interact via cpuhp, where the
> architecture code updates the domain list. Filesystem handlers that
> walk the domains list should not run concurrently with the cpuhp
> callback modifying the list.
> 
> Exposing a lock from the filesystem code means the interface is not
> cleanly defined, and creates the possibility of cross-architecture
> lock ordering headaches. The interaction only exists so that certain
> filesystem paths are serialised against CPU hotplug. The CPU hotplug
> code already has a mechanism to do this using cpus_read_lock().
> 
> MPAM's monitors have an overflow interrupt, so it needs to be possible
> to walk the domains list in irq context. RCU is ideal for this,
> but some paths need to be able to sleep to allocate memory.
> 
> Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part
> of a cpuhp callback, cpus_read_lock() must always be taken first.
> rdtgroup_schemata_write() already does this.
> 
> Most of the filesystem code's domain list walkers are currently
> protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live().
> The exceptions are rdt_bit_usage_show() and the mon_config helpers
> which take the lock directly.
> 
> Make the domain list protected by RCU. An architecture-specific
> lock prevents concurrent writers. rdt_bit_usage_show() could
> walk the domain list using RCU, but to keep all the filesystem
> operations the same, this is changed to call cpus_read_lock().
> The mon_config helpers send multiple IPIs, take the cpus_read_lock()
> in these cases.
> 
> The other filesystem list walkers need to be able to sleep.
> Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the
> cpuhp callbacks can't be invoked when file system operations are
> occurring.
> 
> Add lockdep_assert_cpus_held() in the cases where the
> rdtgroup_kn_lock_live() call isn't obvious.

One place that does not seem to have this annotation that
I think is needed is within get_domain_from_cpu(). Starting
with this series it is called from resctrl_offline_cpu()
called via CPU hotplug code. From now on extra care needs to be
taken when trying to call it from anywhere else.

> 
> Resctrl's domain online/offline calls now need to take the
> rdtgroup_mutex themselves.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

...

> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1a10f567bbe5..8fd0510d767b 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -25,8 +25,15 @@
>  #include <asm/resctrl.h>
>  #include "internal.h"
>  
> -/* Mutex to protect rdtgroup access. */
> -DEFINE_MUTEX(rdtgroup_mutex);
> +/*
> + * rdt_domain structures are kfree()d when their last CPU goes offline,
> + * and allocated when the first CPU in a new domain comes online.
> + * The rdt_resource's domain list is updated when this happens. Readers of
> + * the domain list must either take cpus_read_lock(), or rely on an RCU
> + * read-side critical section, to avoid observing concurrent modification.
> + * All writers take this mutex:
> + */
> +static DEFINE_MUTEX(domain_list_lock);
>  

I assume that you have not followed the SNC work. Please note that in 
that work the domain list is split between a monitoring domain list and
control domain list. I expect this lock would cover both and both would
be rcu lists?


...

> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index b4ed4e1b4938..0620dfc72036 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -209,6 +209,9 @@ static int parse_line(char *line, struct resctrl_schema *s,
>  	struct rdt_domain *d;
>  	unsigned long dom_id;
>  
> +	/* Walking r->domains, ensure it can't race with cpuhp */
> +	lockdep_assert_cpus_held();
> +
>  	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
>  	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
>  		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
> @@ -313,6 +316,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
>  	struct rdt_domain *d;
>  	u32 idx;
>  
> +	/* Walking r->domains, ensure it can't race with cpuhp */
> +	lockdep_assert_cpus_held();
> +
>  	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
>  		return -ENOMEM;
>  
> @@ -378,11 +384,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>  		return -EINVAL;
>  	buf[nbytes - 1] = '\0';
>  
> -	cpus_read_lock();
>  	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>  	if (!rdtgrp) {
>  		rdtgroup_kn_unlock(of->kn);
> -		cpus_read_unlock();
>  		return -ENOENT;
>  	}
>  	rdt_last_cmd_clear();
> @@ -444,7 +448,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>  out:
>  	rdt_staged_configs_clear();
>  	rdtgroup_kn_unlock(of->kn);
> -	cpus_read_unlock();
>  	return ret ?: nbytes;
>  }
>  
> @@ -464,6 +467,9 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
>  	bool sep = false;
>  	u32 ctrl_val;
>  
> +	/* Walking r->domains, ensure it can't race with cpuhp */
> +	lockdep_assert_cpus_held();
> +
>  	seq_printf(s, "%*s:", max_name_width, schema->name);
>  	list_for_each_entry(dom, &r->domains, list) {
>  		if (sep)
> @@ -535,7 +541,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  	int cpu;
>  
>  	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> -	lockdep_assert_held(&rdtgroup_mutex);
> +	lockdep_assert_cpus_held();
>  

Only now is that comment accurate. Could it be moved to this patch?

...

> @@ -2948,6 +2975,9 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
>  	struct rdt_domain *dom;
>  	int ret;
>  
> +	/* Walking r->domains, ensure it can't race with cpuhp */
> +	lockdep_assert_cpus_held();
> +
>  	list_for_each_entry(dom, &r->domains, list) {
>  		ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
>  		if (ret)
> @@ -3766,7 +3796,8 @@ static void domain_destroy_mon_state(struct rdt_domain *d)
>  	kfree(d->mbm_local);
>  }
>  
> -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +static void _resctrl_offline_domain(struct rdt_resource *r,
> +				    struct rdt_domain *d)
>  {
>  	lockdep_assert_held(&rdtgroup_mutex);
>  
> @@ -3801,6 +3832,13 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>  	domain_destroy_mon_state(d);
>  }
>  
> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	mutex_lock(&rdtgroup_mutex);
> +	_resctrl_offline_domain(r, d);
> +	mutex_unlock(&rdtgroup_mutex);
> +}
> +

This seems unnecessary. Why not keep resctrl_offline_domain() as-is and just
take the lock within it?

>  static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>  {
>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> @@ -3832,7 +3870,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>  	return 0;
>  }
>  
> -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> +static int _resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  {
>  	int err;
>  
> @@ -3870,12 +3908,23 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  	return 0;
>  }
>  
> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	int err;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	err = _resctrl_online_domain(r, d);
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return err;
> +}
> +

Same here.

>  void resctrl_online_cpu(unsigned int cpu)
>  {
> -	lockdep_assert_held(&rdtgroup_mutex);
> -
> +	mutex_lock(&rdtgroup_mutex);
>  	/* The CPU is set in default rdtgroup after online. */
>  	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
> +	mutex_unlock(&rdtgroup_mutex);
>  }
>  
>  static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)

...

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-09-14 17:21 ` [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() James Morse
  2023-10-02 17:00   ` Reinette Chatre
@ 2023-10-04 18:00   ` Moger, Babu
  2023-10-05 17:06     ` James Morse
  1 sibling, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-04 18:00 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/23 12:21, James Morse wrote:
> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
> 
> While the exit text ends up in the linker script's DISCARD section,
> the direction of travel is for resctrl to be/have loadable modules.
> 
> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
> by rdt_get_mon_l3_config().
> 
> There is no reason to backport this to a stable kernel.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v5:
>  * This patch is new
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  1 +
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 10 ++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  5 +++++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 85ceaf9a31ac..57cf1e6a57bd 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -537,6 +537,7 @@ void closid_free(int closid);
>  int alloc_rmid(void);
>  void free_rmid(u32 rmid);
>  int rdt_get_mon_l3_config(struct rdt_resource *r);
> +void resctrl_exit_mon_l3_config(struct rdt_resource *r);
>  bool __init rdt_cpu_has(int flag);
>  void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index ded1fc7cb7cb..cfb3f632a4b2 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
>  	return 0;
>  }
>  
> +void resctrl_exit_mon_l3_config(struct rdt_resource *r)
> +{
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	kfree(rmid_ptrs);
> +	rmid_ptrs = NULL;
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +}

What is the need for passing "rdt_resource *r" here?
Is mutex_lock required?

Thanks
Babu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  2023-09-14 17:21 ` [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
  2023-10-03 21:07   ` Reinette Chatre
@ 2023-10-04 18:01   ` Moger, Babu
  2023-10-05 17:06     ` James Morse
  1 sibling, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-04 18:01 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/23 12:21, James Morse wrote:
> RMID are allocated for each monitor or control group directory, because
> each of these needs its own RMID. For control groups,
> rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID.
> 
> MPAM's equivalent of RMID is not an independent number, so can't be
> allocated until the CLOSID is known. An RMID allocation for one CLOSID
> may fail, whereas another may succeed depending on how many monitor
> groups a control group has.
> 
> The RMID allocation needs to move to be after the CLOSID has been
> allocated.
> 
> Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller,
> after the mkdir_rdt_prepare() call. This allows the RMID allocator to
> know the CLOSID.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v2:
>  * Moved kernfs_activate() later to preserve atomicity of files being visible
> 
> Changes since v5:
>  * Renamed out_id_free as out_closid_free.
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 35 +++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 7a7369a323b5..d25cb8c9a20e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3189,6 +3189,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>  	return 0;
>  }
>  
> +static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
> +{
> +	if (rdt_mon_capable)
> +		free_rmid(rgrp->mon.rmid);
> +}
> +
>  static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>  			     const char *name, umode_t mode,
>  			     enum rdt_group_type rtype, struct rdtgroup **r)
> @@ -3254,12 +3260,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>  		goto out_destroy;
>  	}
>  
> -	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
> -	if (ret)
> -		goto out_destroy;
> -
> -	kernfs_activate(kn);

You should not remove "kernfs_activate(kn); from here (only the last line).

kernfs_create_dir is called in this function.

/* kernfs creates the directory for rdtgrp */
 kn = kernfs_create_dir(parent_kn, name, mode, rdtgrp);


There should be matching kernfs_activate.

> -
>  	/*
>  	 * The caller unlocks the parent_kn upon success.
>  	 */
> @@ -3278,7 +3278,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>  static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp)
>  {
>  	kernfs_remove(rgrp->kn);
> -	free_rmid(rgrp->mon.rmid);
>  	rdtgroup_remove(rgrp);
>  }
>  
> @@ -3300,12 +3299,21 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>  	prgrp = rdtgrp->mon.parent;
>  	rdtgrp->closid = prgrp->closid;
>  
> +	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
> +	if (ret) {
> +		mkdir_rdt_prepare_clean(rdtgrp);
> +		goto out_unlock;
> +	}
> +
> +	kernfs_activate(rdtgrp->kn);

I dont see the need for this. There is kernfs_activate  inside
mkdir_rdt_prepare_rmid_alloc (mkdir_rdt_prepare_rmid_alloc
->mkdir_mondata_all)  for all the files created.
Also mkdir_rdt_prepare already has kernfs_activate for the files it created.


> +
>  	/*
>  	 * Add the rdtgrp to the list of rdtgrps the parent
>  	 * ctrl_mon group has to track.
>  	 */
>  	list_add_tail(&rdtgrp->mon.crdtgrp_list, &prgrp->mon.crdtgrp_list);
>  
> +out_unlock:
>  	rdtgroup_kn_unlock(parent_kn);
>  	return ret;
>  }
> @@ -3336,9 +3344,16 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  	ret = 0;
>  
>  	rdtgrp->closid = closid;
> +
> +	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
> +	if (ret)
> +		goto out_closid_free;
> +
> +	kernfs_activate(rdtgrp->kn);
> +

Same as above.

>  	ret = rdtgroup_init_alloc(rdtgrp);
>  	if (ret < 0)
> -		goto out_id_free;
> +		goto out_rmid_free;
>  
>  	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
>  
> @@ -3358,7 +3373,9 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
>  
>  out_del_list:
>  	list_del(&rdtgrp->rdtgroup_list);
> -out_id_free:
> +out_rmid_free:
> +	mkdir_rdt_prepare_rmid_free(rdtgrp);
> +out_closid_free:
>  	closid_free(closid);
>  out_common_fail:
>  	mkdir_rdt_prepare_clean(rdtgrp);

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
  2023-09-17 21:00   ` David Laight
  2023-10-03 21:14   ` Reinette Chatre
@ 2023-10-04 20:38   ` Moger, Babu
  2023-10-05 17:07     ` James Morse
  2 siblings, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-04 20:38 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 9/14/23 12:21, James Morse wrote:
> The resctrl CLOSID allocator uses a single 32bit word to track which
> CLOSID are free. The setting and clearing of bits is open coded.
> 
> A subsequent patch adds resctrl_closid_is_free(), which adds more open
> coded bitmaps operations. These will eventually need changing to use
> the bitops helpers so that a CLOSID bitmap of the correct size can be
> allocated dynamically.
> 
> Convert the existing open coded bit manipulations of closid_free_map
> to use set_bit() and friends.
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ac1a6437469f..fa449ee0d1a7 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -106,7 +106,7 @@ void rdt_staged_configs_clear(void)
>   * - Our choices on how to configure each resource become progressively more
>   *   limited as the number of resources grows.
>   */
> -static int closid_free_map;
> +static unsigned long closid_free_map;
>  static int closid_free_map_len;
>  
>  int closids_supported(void)
> @@ -126,7 +126,7 @@ static void closid_init(void)
>  	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
>  
>  	/* CLOSID 0 is always reserved for the default group */
> -	closid_free_map &= ~1;
> +	clear_bit(0, &closid_free_map);

How about using RESCTRL_RESERVED_CLOSID instead of 0 here?

Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-10-02 17:00   ` Reinette Chatre
@ 2023-10-05 17:05     ` James Morse
  2023-10-05 18:04       ` Reinette Chatre
  0 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-10-05 17:05 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 02/10/2023 18:00, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 725344048f85..a2158c266e41 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>  
>>  void __exit rdtgroup_exit(void)
>>  {
>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> +	if (r->mon_capable)
>> +		resctrl_exit_mon_l3_config(r);
>> +
>>  	debugfs_remove_recursive(debugfs_resctrl);
>>  	unregister_filesystem(&rdt_fs_type);
>>  	sysfs_remove_mount_point(fs_kobj, "resctrl");
> 
> You did not respond to me when I requested that this be done differently [1].
> Without a response letting me know the faults of my proposal or following the
> recommendation I conclude that my feedback was ignored. 

Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
one if you prefer, but I find that adds more noise than signal.

This is my attempt at 'doing the cleanup properly', which is what you said your preference
was. (no machine on the planet can ever run this code, the __exit section is always
discarded by the linker).

Reading through again, I missed that you wanted this called from resctrl_exit(). (The
naming suggests I did this originally, but it didn't work out).
I don't think this works as the code in resctrl_exit() remains part of the arch code after
the move, but allocating rmid_ptrs[] stays part of the fs code.

resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
the name resctrl_exit() as its part of the exposed interface.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-10-04 18:00   ` Moger, Babu
@ 2023-10-05 17:06     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-05 17:06 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 04/10/2023 19:00, Moger, Babu wrote:
> On 9/14/23 12:21, James Morse wrote:
>> rmid_ptrs[] is allocated from dom_data_init() but never free()d.
>>
>> While the exit text ends up in the linker script's DISCARD section,
>> the direction of travel is for resctrl to be/have loadable modules.
>>
>> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
>> by rdt_get_mon_l3_config().
>>
>> There is no reason to backport this to a stable kernel.

>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 85ceaf9a31ac..57cf1e6a57bd 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -537,6 +537,7 @@ void closid_free(int closid);
>>  int alloc_rmid(void);
>>  void free_rmid(u32 rmid);
>>  int rdt_get_mon_l3_config(struct rdt_resource *r);
>> +void resctrl_exit_mon_l3_config(struct rdt_resource *r);
>>  bool __init rdt_cpu_has(int flag);
>>  void mon_event_count(void *info);
>>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index ded1fc7cb7cb..cfb3f632a4b2 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r)
>>  	return 0;
>>  }
>>  
>> +void resctrl_exit_mon_l3_config(struct rdt_resource *r)
>> +{
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	kfree(rmid_ptrs);
>> +	rmid_ptrs = NULL;
>> +
>> +	mutex_unlock(&rdtgroup_mutex);
>> +}

> What is the need for passing "rdt_resource *r" here?

My vain belief that monitors should be supported on something other than L3, but I agree
that isn't what resctrl does today. I'll remove it.


> Is mutex_lock required?

Reads and writes to rmid_ptrs[] are protected by that lock. This ensures no-one reads the
value while its being free()d, and after this function releases the lock, anyone trying
sees NULL.

(This is all moot as its only caller is marked __exit, so gets discarded by the linker)



Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  2023-10-04 18:01   ` Moger, Babu
@ 2023-10-05 17:06     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-05 17:06 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 04/10/2023 19:01, Moger, Babu wrote:
> On 9/14/23 12:21, James Morse wrote:
>> RMID are allocated for each monitor or control group directory, because
>> each of these needs its own RMID. For control groups,
>> rdtgroup_mkdir_ctrl_mon() later goes on to allocate the CLOSID.
>>
>> MPAM's equivalent of RMID is not an independent number, so can't be
>> allocated until the CLOSID is known. An RMID allocation for one CLOSID
>> may fail, whereas another may succeed depending on how many monitor
>> groups a control group has.
>>
>> The RMID allocation needs to move to be after the CLOSID has been
>> allocated.
>>
>> Move the RMID allocation out of mkdir_rdt_prepare() to occur in its caller,
>> after the mkdir_rdt_prepare() call. This allows the RMID allocator to
>> know the CLOSID.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 7a7369a323b5..d25cb8c9a20e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3189,6 +3189,12 @@ static int mkdir_rdt_prepare_rmid_alloc(struct rdtgroup *rdtgrp)
>>  	return 0;
>>  }
>>  
>> +static void mkdir_rdt_prepare_rmid_free(struct rdtgroup *rgrp)
>> +{
>> +	if (rdt_mon_capable)
>> +		free_rmid(rgrp->mon.rmid);
>> +}
>> +
>>  static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>>  			     const char *name, umode_t mode,
>>  			     enum rdt_group_type rtype, struct rdtgroup **r)
>> @@ -3254,12 +3260,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>>  		goto out_destroy;
>>  	}
>>  
>> -	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
>> -	if (ret)
>> -		goto out_destroy;
>> -
>> -	kernfs_activate(kn);
> 
> You should not remove "kernfs_activate(kn); from here (only the last line).
> 
> kernfs_create_dir is called in this function.
> 
> /* kernfs creates the directory for rdtgrp */
>  kn = kernfs_create_dir(parent_kn, name, mode, rdtgrp);
> 
> 
> There should be matching kernfs_activate.

I think your point is kernfs_activate() should have been called by the time
mkdir_rdt_prepare() returns because it creates other directories. I don't think this
matters because kernfs_activate() is a tree operation. Sure, the control/monitor group
directory isn't visible once mkdir_rdt_prepare() returns, but by the time either of its
two callers return, changes to the directory tree have been activated.

Moving these lines is the to ensure user-space doesn't see the control/monitor group as
existing without the mon_data directory that is created by mkdir_rdt_prepare_rmid_alloc().


>> -
>>  	/*
>>  	 * The caller unlocks the parent_kn upon success.
>>  	 */
>> @@ -3278,7 +3278,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
>>  static void mkdir_rdt_prepare_clean(struct rdtgroup *rgrp)
>>  {
>>  	kernfs_remove(rgrp->kn);
>> -	free_rmid(rgrp->mon.rmid);
>>  	rdtgroup_remove(rgrp);
>>  }
>>  
>> @@ -3300,12 +3299,21 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
>>  	prgrp = rdtgrp->mon.parent;
>>  	rdtgrp->closid = prgrp->closid;
>>  
>> +	ret = mkdir_rdt_prepare_rmid_alloc(rdtgrp);
>> +	if (ret) {
>> +		mkdir_rdt_prepare_clean(rdtgrp);
>> +		goto out_unlock;
>> +	}
>> +
>> +	kernfs_activate(rdtgrp->kn);
> 
> I dont see the need for this. There is kernfs_activate  inside
> mkdir_rdt_prepare_rmid_alloc (mkdir_rdt_prepare_rmid_alloc
> ->mkdir_mondata_all)  for all the files created.

> Also mkdir_rdt_prepare already has kernfs_activate for the files it created.

It does, and this makes the mon_data directory visible in the parent control/monitor group
- but that control/monitor group isn't visible until this kernfs_activate(rdtgrp->kn)
makes it visible. The scope of these tree operations is different.

Looking at this again, there is an existing problem with the mon_groups directory not
being visible until after the control/monitor group is visible, worse is that if the
mon_group directory creation fails, the control/monitor group is removed. Chances are
no-one is depending on this.

I do think ultimately these kernfs_activate() calls should be moved to the end of the
syscall helpers that change the directory structure. This would stop things being briefly
visible.



Thanks!

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has
  2023-10-03 21:13   ` Reinette Chatre
@ 2023-10-05 17:07     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-05 17:07 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:13, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> @@ -796,13 +817,30 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>>  static int dom_data_init(struct rdt_resource *r)
>>  {
>>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>> +	u32 num_closid = resctrl_arch_get_num_closid(r);
>>  	struct rmid_entry *entry = NULL;
>> +	int err = 0, i;
>>  	u32 idx;
>> -	int i;
>> +
>> +	mutex_lock(&rdtgroup_mutex);
>> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
>> +		int *tmp;
>> +
>> +		tmp = kcalloc(num_closid, sizeof(int), GFP_KERNEL);
> 
> Shouldn't this rather be sizeof(unsigned int) to match the type it will store?

It matches the type of tmp... I'll change both closid_num_dirty_rmid and tmp to a u32 *,
and this sizeof() to be sizeof(*tmp).


>> +		if (!tmp) {
>> +			err = -ENOMEM;
>> +			goto out_unlock;
>> +		}
>> +
>> +		closid_num_dirty_rmid = tmp;
>> +	}
>>  
>>  	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
>> -	if (!rmid_ptrs)
>> -		return -ENOMEM;
>> +	if (!rmid_ptrs) {
>> +		kfree(closid_num_dirty_rmid);
>> +		err = -ENOMEM;
>> +		goto out_unlock;
>> +	}
>>  
>>  	for (i = 0; i < idx_limit; i++) {
>>  		entry = &rmid_ptrs[i];
>> @@ -822,13 +860,21 @@ static int dom_data_init(struct rdt_resource *r)
>>  	entry = __rmid_entry(idx);
>>  	list_del(&entry->list);
>>  
>> -	return 0;
>> +out_unlock:
>> +	mutex_unlock(&rdtgroup_mutex);
>> +
>> +	return err;
>>  }
>>  
>>  void resctrl_exit_mon_l3_config(struct rdt_resource *r)
>>  {
>>  	mutex_lock(&rdtgroup_mutex);
>>  
>> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
>> +		kfree(closid_num_dirty_rmid);
>> +		closid_num_dirty_rmid = NULL;
>> +	}
>> +
>>  	kfree(rmid_ptrs);
>>  	rmid_ptrs = NULL;
>>  
> 
> Awaiting response on patch #2 related to above hunk.

It's the same story here. CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID makes this behaviour
visible to the filesystem code, which means the filesystem code can do the alloc/free of
this array. All this eventually moves out to /fs/.

This is all because the RMID allocation is dependent on the limbo list that resctrl
manages, and for MPAM the CLOSID is too. I'm sure its simpler to expose this MPAM
behaviour to resctrl - and in a way that the compiler can remove if its not needed. The
alternative would be to duplicate the allocators on each architecture. I don't think MPAM
is different enough to justify this.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
  2023-10-03 21:15   ` Reinette Chatre
@ 2023-10-05 17:07     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-05 17:07 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:15, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> The limbo and overflow code picks a CPU to use from the domain's list
>> of online CPUs. Work is then scheduled on these CPUs to maintain
>> the limbo list and any counters that may overflow.
>>
>> cpumask_any() may pick a CPU that is marked nohz_full, which will
>> either penalise the work that CPU was dedicated to, or delay the
>> processing of limbo list or counters that may overflow. Perhaps
>> indefinitely. Delaying the overflow handling will skew the bandwidth
>> values calculated by mba_sc, which expects to be called once a second.
>>
>> Add cpumask_any_housekeeping() as a replacement for cpumask_any()
>> that prefers housekeeping CPUs. This helper will still return
>> a nohz_full CPU if that is the only option. The CPU to use is
>> re-evaluated each time the limbo/overflow work runs. This ensures
>> the work will move off a nohz_full CPU once a housekeeping CPU is
>> available.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 0bbed8c62d42..993837e46db1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c

>> @@ -793,8 +793,10 @@ void cqm_handle_limbo(struct work_struct *work)
>>  
>>  	__check_limbo(d, false);
>>  
>> -	if (has_busy_rmid(d))
>> +	if (has_busy_rmid(d)) {
>> +		cpu = cpumask_any_housekeeping(&d->cpu_mask);
>>  		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
>> +	}
>>  
> 
> ok - but if you do change the CPU the worker is running on then
> I also expect d->cqm_work_cpu to be updated. Otherwise the offline
> code will not be able to determine if the worker needs to move.

Good point - I missed this.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding
  2023-10-04 20:38   ` Moger, Babu
@ 2023-10-05 17:07     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-05 17:07 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 04/10/2023 21:38, Moger, Babu wrote:
> On 9/14/23 12:21, James Morse wrote:
>> The resctrl CLOSID allocator uses a single 32bit word to track which
>> CLOSID are free. The setting and clearing of bits is open coded.
>>
>> A subsequent patch adds resctrl_closid_is_free(), which adds more open
>> coded bitmaps operations. These will eventually need changing to use
>> the bitops helpers so that a CLOSID bitmap of the correct size can be
>> allocated dynamically.
>>
>> Convert the existing open coded bit manipulations of closid_free_map
>> to use set_bit() and friends.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ac1a6437469f..fa449ee0d1a7 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -106,7 +106,7 @@ void rdt_staged_configs_clear(void)
>>   * - Our choices on how to configure each resource become progressively more
>>   *   limited as the number of resources grows.
>>   */
>> -static int closid_free_map;
>> +static unsigned long closid_free_map;
>>  static int closid_free_map_len;
>>  
>>  int closids_supported(void)
>> @@ -126,7 +126,7 @@ static void closid_init(void)
>>  	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
>>  
>>  	/* CLOSID 0 is always reserved for the default group */
>> -	closid_free_map &= ~1;
>> +	clear_bit(0, &closid_free_map);
> 
> How about using RESCTRL_RESERVED_CLOSID instead of 0 here?

Great idea - even more readable.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-10-05 17:05     ` James Morse
@ 2023-10-05 18:04       ` Reinette Chatre
  2023-10-25 17:56         ` James Morse
  0 siblings, 1 reply; 80+ messages in thread
From: Reinette Chatre @ 2023-10-05 18:04 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi James,

On 10/5/2023 10:05 AM, James Morse wrote:
> On 02/10/2023 18:00, Reinette Chatre wrote:
>> On 9/14/2023 10:21 AM, James Morse wrote:
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 725344048f85..a2158c266e41 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>>  
>>>  void __exit rdtgroup_exit(void)
>>>  {
>>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>> +
>>> +	if (r->mon_capable)
>>> +		resctrl_exit_mon_l3_config(r);
>>> +
>>>  	debugfs_remove_recursive(debugfs_resctrl);
>>>  	unregister_filesystem(&rdt_fs_type);
>>>  	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>
>> You did not respond to me when I requested that this be done differently [1].
>> Without a response letting me know the faults of my proposal or following the
>> recommendation I conclude that my feedback was ignored. 
> 
> Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
> one if you prefer, but I find that adds more noise than signal.

I do not expect a response to every review feedback but no response
is assumed to mean that you agree with the feedback.

> 
> This is my attempt at 'doing the cleanup properly', which is what you said your preference
> was. (no machine on the planet can ever run this code, the __exit section is always
> discarded by the linker).
> 
> Reading through again, I missed that you wanted this called from resctrl_exit(). (The

Right. And not responding to that created expectation that you agreed with the
request.

> naming suggests I did this originally, but it didn't work out).
> I don't think this works as the code in resctrl_exit() remains part of the arch code after
> the move, but allocating rmid_ptrs[] stays part of the fs code.
> 
> resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
> the name resctrl_exit() as its part of the exposed interface.

I expect memory allocation/free to be symmetrical. Doing otherwise
complicates the code. Having this memory freed in rdtgroup_exit() only
seems appropriate if it is allocated from rdtgroup_init().
Neither rmid_ptrs[] nor closid_num_dirty_rmid are allocated in
rdtgroup_init() so freeing it in rdtgroup_exit() is not appropriate.

If you are planning to move resctrl_exit() to be arch code then I expect
resctrl_late_init() to be split with the rmid_ptrs[]/closid_num_dirty_rmid
allocation moving to fs code. Freeing that memory can follow at that time.

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
  2023-10-03 21:14   ` Reinette Chatre
@ 2023-10-05 20:13   ` Moger, Babu
  2023-10-25 17:56     ` James Morse
  2023-10-05 20:26   ` Moger, Babu
  2023-10-24 12:06   ` Maciej Wieczór-Retman
  3 siblings, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-05 20:13 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 12:21 PM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
>
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
>
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v4:
>   * Dropped stale section from comment
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 42 ++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
>   3 files changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ad6e874d9ed2..f06d3d3e0808 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>   void __init thread_throttle_mode_init(void);
>   void __init mbm_config_rftype_init(const char *config);
>   void rdt_staged_configs_clear(void);
> +bool closid_allocated(unsigned int closid);
> +int resctrl_find_cleanest_closid(void);
>   
>   #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0c783301d106..0bbed8c62d42 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
>   	return ERR_PTR(-ENOSPC);
>   }
>   
> +/**
> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
> + *                                  RMID are clean, or the CLOSID that has
> + *                                  the most clean RMID.
> + *
> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
> + * may not be able to allocate clean RMID. To avoid this the allocator will
> + * choose the CLOSID with the most clean RMID.
> + *
> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
> + * be returned.
> + */
> +int resctrl_find_cleanest_closid(void)
> +{
> +	u32 cleanest_closid = ~0, iter_num_dirty;

Just naming num_dirty should have been fine.  I will leave it you.

> +	int i = 0;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> +		return -EIO;
> +
> +	for (i = 0; i < closids_supported(); i++) {
> +		if (closid_allocated(i))
> +			continue;
> +
> +		iter_num_dirty = closid_num_dirty_rmid[i];
> +		if (iter_num_dirty == 0)
> +			return i;
> +
> +		if (cleanest_closid == ~0)
> +			cleanest_closid = i;
> +
> +		if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
> +			cleanest_closid = i;
> +	}
> +
> +	if (cleanest_closid == ~0)
> +		return -ENOSPC;
> +	return cleanest_closid;

Line before the return looks clean.

*       if (cleanest_closid == ~0)
+		return -ENOSPC;
+
+	return cleanest_closid;

Thanks
Babu

> +}
> +
>   /*
>    * For MPAM the RMID value is not unique, and has to be considered with
>    * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fa449ee0d1a7..1f8f1c417a4b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -132,11 +132,20 @@ static void closid_init(void)
>   
>   static int closid_alloc(void)
>   {
> -	u32 closid = ffs(closid_free_map);
> +	u32 closid;
> +	int err;
>   
> -	if (closid == 0)
> -		return -ENOSPC;
> -	closid--;
> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> +		err = resctrl_find_cleanest_closid();
> +		if (err < 0)
> +			return err;
> +		closid = err;
> +	} else {
> +		closid = ffs(closid_free_map);
> +		if (closid == 0)
> +			return -ENOSPC;
> +		closid--;
> +	}
>   	clear_bit(closid, &closid_free_map);
>   
>   	return closid;
> @@ -154,7 +163,7 @@ void closid_free(int closid)
>    * Return: true if @closid is currently associated with a resource group,
>    * false if @closid is free
>    */
> -static bool closid_allocated(unsigned int closid)
> +bool closid_allocated(unsigned int closid)
>   {
>   	return !test_bit(closid, &closid_free_map);
>   }

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
  2023-10-03 21:14   ` Reinette Chatre
  2023-10-05 20:13   ` Moger, Babu
@ 2023-10-05 20:26   ` Moger, Babu
  2023-10-25 17:56     ` James Morse
  2023-10-24 12:06   ` Maciej Wieczór-Retman
  3 siblings, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-05 20:26 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Hi James,

One more comment.

On 9/14/2023 12:21 PM, James Morse wrote:
> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
> used for different control groups.
>
> This means once a CLOSID is allocated, all its monitoring ids may still be
> dirty, and held in limbo.
>
> Instead of allocating the first free CLOSID, on architectures where
> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
> The CLOSID found is returned to closid_alloc() for the free list
> to be updated.
>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v4:
>   * Dropped stale section from comment
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 42 ++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 +++++++++---
>   3 files changed, 58 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ad6e874d9ed2..f06d3d3e0808 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>   void __init thread_throttle_mode_init(void);
>   void __init mbm_config_rftype_init(const char *config);
>   void rdt_staged_configs_clear(void);
> +bool closid_allocated(unsigned int closid);
> +int resctrl_find_cleanest_closid(void);
>   
>   #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 0c783301d106..0bbed8c62d42 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
>   	return ERR_PTR(-ENOSPC);
>   }
>   
> +/**
> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
> + *                                  RMID are clean, or the CLOSID that has
> + *                                  the most clean RMID.
> + *
> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
> + * may not be able to allocate clean RMID. To avoid this the allocator will
> + * choose the CLOSID with the most clean RMID.
> + *
> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
> + * be returned.
> + */
> +int resctrl_find_cleanest_closid(void)
> +{
> +	u32 cleanest_closid = ~0, iter_num_dirty;
> +	int i = 0;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> +		return -EIO;
> +
> +	for (i = 0; i < closids_supported(); i++) {
> +		if (closid_allocated(i))
> +			continue;
> +
> +		iter_num_dirty = closid_num_dirty_rmid[i];
> +		if (iter_num_dirty == 0)
> +			return i;
> +
> +		if (cleanest_closid == ~0)
> +			cleanest_closid = i;
> +
> +		if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
> +			cleanest_closid = i;
> +	}
> +
> +	if (cleanest_closid == ~0)
> +		return -ENOSPC;
> +	return cleanest_closid;
> +}
> +
>   /*
>    * For MPAM the RMID value is not unique, and has to be considered with
>    * the CLOSID. The (CLOSID, RMID) pair is allocated on all domains, which
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fa449ee0d1a7..1f8f1c417a4b 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -132,11 +132,20 @@ static void closid_init(void)
>   
>   static int closid_alloc(void)
>   {
> -	u32 closid = ffs(closid_free_map);
> +	u32 closid;
> +	int err;

Naming "err" seems odd here.  How about cleanest_closid ?

Thanks

Babu

>   
> -	if (closid == 0)
> -		return -ENOSPC;
> -	closid--;
> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> +		err = resctrl_find_cleanest_closid();
> +		if (err < 0)
> +			return err;
> +		closid = err;
> +	} else {
> +		closid = ffs(closid_free_map);
> +		if (closid == 0)
> +			return -ENOSPC;
> +		closid--;
> +	}
>   	clear_bit(closid, &closid_free_map);
>   
>   	return closid;
> @@ -154,7 +163,7 @@ void closid_free(int closid)
>    * Return: true if @closid is currently associated with a resource group,
>    * false if @closid is free
>    */
> -static bool closid_allocated(unsigned int closid)
> +bool closid_allocated(unsigned int closid)
>   {
>   	return !test_bit(closid, &closid_free_map);
>   }

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  2023-09-14 17:21 ` [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
  2023-10-03 21:18   ` Reinette Chatre
@ 2023-10-05 21:33   ` Moger, Babu
  1 sibling, 0 replies; 80+ messages in thread
From: Moger, Babu @ 2023-10-05 21:33 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 12:21 PM, James Morse wrote:
> MPAM's cache occupancy counters can take a little while to settle once
> the monitor has been configured. The maximum settling time is described
> to the driver via a firmware table. The value could be large enough
> that it makes sense to sleep. To avoid exposing this to resctrl, it
> should be hidden behind MPAM's resctrl_arch_rmid_read().
>
> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
> to sleep. In this case resctrl_arch_rmid_read() should return an error
> if it needs to sleep. This will only affect MPAM platforms where
> the cache occupancy counter isn't available immediately, nohz_full is
> in use, and there are there are no housekeeping CPUs in the necessary

:%s/there are there are/there are/

Thanks

Babu

> domain.
>
> There are three callers of resctrl_arch_rmid_read():
> __mon_event_count() and __check_limbo() are both called from a
> non-migrateable context. mon_event_read() invokes __mon_event_count()
> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
> also adds work to a per-cpu workqueue.
>
> The remaining call is add_rmid_to_limbo() which is called in response
> to a user-space syscall that frees an RMID. This opportunistically
> reads the LLC occupancy counter on the current domain to see if the
> RMID is over the dirty threshold. This has to disable preemption to
> avoid reading the wrong domain's value. Disabling pre-emption here
> prevents resctrl_arch_rmid_read() from sleeping.
>
> add_rmid_to_limbo() walks each domain, but only reads the counter
> on one domain. If the system has more than one domain, the RMID will
> always be added to the limbo list. If the RMIDs usage was not over the
> threshold, it will be removed from the list when __check_limbo() runs.
> Make this the default behaviour. Free RMIDs are always added to the
> limbo list for each domain.
>
> The user visible effect of this is that a clean RMID is not available
> for re-allocation immediately after 'rmdir()' completes, this behaviour
> was never portable as it never happened on a machine with multiple
> domains.
>
> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
> with interrupts unmasked. Document this is the expected behaviour, and
> add a might_sleep() annotation to catch changes that won't work on arm64.
>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> The previous version allowed resctrl_arch_rmid_read() to be called on the
> wrong CPUs, but now that this needs to take nohz_full and housekeeping into
> account, its too complex.
>
> Changes since v3:
>   * Removed error handling for smp_call_function_any(), this can't race
>     with the cpuhp callbacks as both hold rdtgroup_mutex.
>   * Switched to the alternative of removing the counter read, this simplifies
>     things dramatically.
>
> Changes since v4:
>   * Messed with capitalisation.
>   * Removed some dead code now that entry->busy will never be zero in
>     add_rmid_to_limbo().
>   * Rephrased the comment above resctrl_arch_rmid_read_context_check().
> ---
>   arch/x86/kernel/cpu/resctrl/monitor.c | 25 +++++--------------------
>   include/linux/resctrl.h               | 18 +++++++++++++++++-
>   2 files changed, 22 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7749e6569a4a..05d949ec94f1 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -278,6 +278,8 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>   	u64 msr_val, chunks;
>   	int ret;
>   
> +	resctrl_arch_rmid_read_context_check();
> +
>   	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
>   		return -EINVAL;
>   
> @@ -454,8 +456,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   {
>   	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>   	struct rdt_domain *d;
> -	int cpu, err;
> -	u64 val = 0;
>   	u32 idx;
>   
>   	lockdep_assert_held(&rdtgroup_mutex);
> @@ -463,17 +463,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
>   
>   	entry->busy = 0;
> -	cpu = get_cpu();
>   	list_for_each_entry(d, &r->domains, list) {
> -		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
> -			err = resctrl_arch_rmid_read(r, d, entry->closid,
> -						     entry->rmid,
> -						     QOS_L3_OCCUP_EVENT_ID,
> -						     &val);
> -			if (err || val <= resctrl_rmid_realloc_threshold)
> -				continue;
> -		}
> -
>   		/*
>   		 * For the first limbo RMID in the domain,
>   		 * setup up the limbo worker.
> @@ -483,15 +473,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   		set_bit(idx, d->rmid_busy_llc);
>   		entry->busy++;
>   	}
> -	put_cpu();
>   
> -	if (entry->busy) {
> -		rmid_limbo_count++;
> -		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> -			closid_num_dirty_rmid[entry->closid]++;
> -	} else {
> -		list_add_tail(&entry->list, &rmid_free_lru);
> -	}
> +	rmid_limbo_count++;
> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> +		closid_num_dirty_rmid[entry->closid]++;
>   }
>   
>   void free_rmid(u32 closid, u32 rmid)
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 660752406174..f7311102e94c 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -236,7 +236,12 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
>    * @eventid:		eventid to read, e.g. L3 occupancy.
>    * @val:		result of the counter read in bytes.
>    *
> - * Call from process context on a CPU that belongs to domain @d.
> + * Some architectures need to sleep when first programming some of the counters.
> + * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
> + *  for a short period of time). Call from a non-migrateable process context on
> + * a CPU that belongs to domain @d. e.g. use smp_call_on_cpu() or
> + * schedule_work_on(). This function can be called with interrupts masked,
> + * e.g. using smp_call_function_any(), but may consistently return an error.
>    *
>    * Return:
>    * 0 on success, or -EIO, -EINVAL etc on error.
> @@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>   			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>   			   u64 *val);
>   
> +/**
> + * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
> + *
> + * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
> + * resctrl_arch_rmid_read() is called with preemption disabled.
> + */
> +static inline void resctrl_arch_rmid_read_context_check(void)
> +{
> +	if (!irqs_disabled())
> +		might_sleep();
> +}
>   
>   /**
>    * resctrl_arch_reset_rmid() - Reset any private state associated with rmid

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
  2023-09-14 17:21 ` [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
  2023-10-03 21:18   ` Reinette Chatre
@ 2023-10-05 21:46   ` Moger, Babu
  2023-10-25 17:58     ` James Morse
  1 sibling, 1 reply; 80+ messages in thread
From: Moger, Babu @ 2023-10-05 21:46 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

Hi James,

On 9/14/2023 12:21 PM, James Morse wrote:
> Depending on the number of monitors available, Arm's MPAM may need to
> allocate a monitor prior to reading the counter value. Allocating a
> contended resource may involve sleeping.
>
> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
> the allocation should be valid for all domains.
>
> __check_limbo() and mon_event_count() each make multiple calls to
> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
> the allocation should be valid for multiple invocations of
> resctrl_arch_rmid_read().
>
> Add arch hooks for this allocation, which need calling before
> resctrl_arch_rmid_read(). The allocated monitor is passed to
> resctrl_arch_rmid_read(), then freed again afterwards. The helper
> can be called on any CPU, and can sleep.
>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v3:
>   * Expanded comment.
>   * Removed stray header include.
>   * Reworded commit message.
>   * Made ctx a void * instead of an int.
>
> Changes since v4:
>   * Used IS_ERR() in more places.
>
> Changes since v5:
>   * Pass the error back from mon_event_read() as -EINVAL/Unavailable.
>   * Add some ratelimited warnings when failing to allocate a mon context
> ---
>   arch/x86/include/asm/resctrl.h            | 11 ++++++++
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  7 +++++
>   arch/x86/kernel/cpu/resctrl/internal.h    |  1 +
>   arch/x86/kernel/cpu/resctrl/monitor.c     | 34 +++++++++++++++++++++--
>   include/linux/resctrl.h                   |  5 +++-
>   5 files changed, 54 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> index 1d274dbabc44..29c4cc343787 100644
> --- a/arch/x86/include/asm/resctrl.h
> +++ b/arch/x86/include/asm/resctrl.h
> @@ -136,6 +136,17 @@ static inline u32 resctrl_arch_rmid_idx_encode(u32 ignored, u32 rmid)
>   	return rmid;
>   }
>   
> +/* x86 can always read an rmid, nothing needs allocating */
> +struct rdt_resource;
> +static inline void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, int evtid)
> +{
> +	might_sleep();
> +	return NULL;
> +};
> +
> +static inline void resctrl_arch_mon_ctx_free(struct rdt_resource *r, int evtid,
> +					     void *ctx) { };
> +
>   void resctrl_cpu_detect(struct cpuinfo_x86 *c);
>   
>   #else
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index bd263b9a0abd..ce4821ea111b 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -546,6 +546,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   	rr->d = d;
>   	rr->val = 0;
>   	rr->first = first;
> +	rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
> +	if (IS_ERR(rr->arch_mon_ctx)) {
> +		rr->err = -EINVAL;
> +		return;
> +	}
>   
>   	cpu = cpumask_any_housekeeping(&d->cpu_mask);
>   
> @@ -559,6 +564,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   		smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>   	else
>   		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
> +
> +	resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
>   }
>   
>   int rdtgroup_mondata_show(struct seq_file *m, void *arg)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 37bb3de37a4a..66d9ebb5e03a 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -136,6 +136,7 @@ struct rmid_read {
>   	bool			first;
>   	int			err;
>   	u64			val;
> +	void			*arch_mon_ctx;
>   };
>   
>   extern bool rdt_alloc_capable;
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 05d949ec94f1..28a2c8765faf 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -270,7 +270,7 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>   
>   int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>   			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val)
> +			   u64 *val, void *ignored)
>   {
>   	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>   	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> @@ -325,9 +325,17 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>   	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>   	struct rmid_entry *entry;
>   	u32 idx, cur_idx = 1;
> +	void *arch_mon_ctx;
>   	bool rmid_dirty;
>   	u64 val = 0;
>   
> +	arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID);
> +	if (IS_ERR(arch_mon_ctx)) {
> +		pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> +				    PTR_ERR(arch_mon_ctx));
> +		return;
> +	}
> +
>   	/*
>   	 * Skip RMID 0 and start from RMID 1 and check all the RMIDs that
>   	 * are marked as busy for occupancy < threshold. If the occupancy
> @@ -341,7 +349,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>   
>   		entry = __rmid_entry(idx);
>   		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
> -					   QOS_L3_OCCUP_EVENT_ID, &val)) {
> +					   QOS_L3_OCCUP_EVENT_ID, &val,
> +					   arch_mon_ctx)) {
>   			rmid_dirty = true;
>   		} else {
>   			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
> @@ -354,6 +363,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>   		}
>   		cur_idx = idx + 1;
>   	}
> +
> +	resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx);
>   }
>   
>   bool has_busy_rmid(struct rdt_domain *d)
> @@ -532,7 +543,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>   	}
>   
>   	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
> -					 &tval);
> +					 &tval, rr->arch_mon_ctx);
>   	if (rr->err)
>   		return rr->err;
>   
> @@ -743,11 +754,27 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
>   	if (is_mbm_total_enabled()) {
>   		rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
>   		rr.val = 0;
> +		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> +		if (IS_ERR(rr.arch_mon_ctx)) {
> +			pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> +					    PTR_ERR(rr.arch_mon_ctx));
> +			return;
> +		}
> +
>   		__mon_event_count(closid, rmid, &rr);
> +
> +		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
>   	}
>   	if (is_mbm_local_enabled()) {
>   		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
>   		rr.val = 0;
> +		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> +		if (IS_ERR(rr.arch_mon_ctx)) {
> +			pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> +					    PTR_ERR(rr.arch_mon_ctx));
> +			return;
> +		}
> +
>   		__mon_event_count(closid, rmid, &rr);
>   
>   		/*
> @@ -757,6 +784,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d,
>   		 */
>   		if (is_mba_sc(NULL))
>   			mbm_bw_count(closid, rmid, &rr);
> +		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);

Space before the last line would be cleaner.

                 if (is_mba_sc(NULL))
  			mbm_bw_count(closid, rmid, &rr);
+
+		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);


>   	}
>   }
>   
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index f7311102e94c..5e4b4df9610b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -235,6 +235,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
>    * @rmid:		rmid of the counter to read.
>    * @eventid:		eventid to read, e.g. L3 occupancy.
>    * @val:		result of the counter read in bytes.
> + * @arch_mon_ctx:	An architecture specific value from
> + *			resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
> + *			the hardware monitor allocated for this read request.
>    *
>    * Some architectures need to sleep when first programming some of the counters.
>    * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
> @@ -248,7 +251,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
>    */
>   int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>   			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val);
> +			   u64 *val, void *arch_mon_ctx);

Just wondering.. Have you thought about passing rmid_read structure to 
this function?

Because most of the information is already inside the rmid_read 
structure. We can avoid passing 7 parameters.

Thanks

Babu

>   
>   /**
>    * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index
  2023-09-14 17:21 ` [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index James Morse
  2023-10-03 21:12   ` Reinette Chatre
@ 2023-10-24  9:28   ` Maciej Wieczór-Retman
  1 sibling, 0 replies; 80+ messages in thread
From: Maciej Wieczór-Retman @ 2023-10-24  9:28 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

On 2023-09-14 at 17:21:20 +0000, James Morse wrote:
>diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>index 42b9a694fe2f..be0b7cb6e1f5 100644
>--- a/arch/x86/kernel/cpu/resctrl/monitor.c
>+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>@@ -150,12 +150,29 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
> 	return val;
> }
> 
>-static inline struct rmid_entry *__rmid_entry(u32 closid, u32 rmid)
>+/*
>+ * x86 and arm64 differ in their handling of monitoring.
>+ * x86's RMID are an independent number, there is only one source of traffic

"are an independent number" -> "is an independent number"
or
"are an independent number" -> "are independent numbers"?

>+ * with an RMID value of '1'.
>+ * arm64's PMG extend the PARTID/CLOSID space, there are multiple sources of

"extend" -> "extends"?

>+ * traffic with a PMG value of '1', one for each CLOSID, meaning the RMID
>+ * value is no longer unique.
>+ * To account for this, resctrl uses an index. On x86 this is just the RMID,
>+ * on arm64 it encodes the CLOSID and RMID. This gives a unique number.
>+ *
>+ * The domain's rmid_busy_llc and rmid_ptrs[] are sized by index. The arch code
>+ * must accept an attempt to read every index.
>+ */
>+static inline struct rmid_entry *__rmid_entry(u32 idx)
> {
> 	struct rmid_entry *entry;

-- 
Kind regards
Maciej Wieczór-Retman

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
                     ` (2 preceding siblings ...)
  2023-10-05 20:26   ` Moger, Babu
@ 2023-10-24 12:06   ` Maciej Wieczór-Retman
  3 siblings, 0 replies; 80+ messages in thread
From: Maciej Wieczór-Retman @ 2023-10-24 12:06 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, xingxin.hx, baolin.wang,
	Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght

On 2023-09-14 at 17:21:24 +0000, James Morse wrote:
>MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>used for different control groups.
>
>This means once a CLOSID is allocated, all its monitoring ids may still be
>dirty, and held in limbo.
>
>Instead of allocating the first free CLOSID, on architectures where
>CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search

"CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID" >
"CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID"?

>closid_num_dirty_rmid[] to find the cleanest CLOSID.
>
>The CLOSID found is returned to closid_alloc() for the free list
>to be updated.

-- 
Kind regards
Maciej Wieczór-Retman

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks
  2023-10-03 21:28   ` Reinette Chatre
@ 2023-10-25 17:55     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:55 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:28, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> resctrl has one mutex that is taken by the architecture specific code,
>> and the filesystem parts. The two interact via cpuhp, where the
>> architecture code updates the domain list. Filesystem handlers that
>> walk the domains list should not run concurrently with the cpuhp
>> callback modifying the list.
>>
>> Exposing a lock from the filesystem code means the interface is not
>> cleanly defined, and creates the possibility of cross-architecture
>> lock ordering headaches. The interaction only exists so that certain
>> filesystem paths are serialised against CPU hotplug. The CPU hotplug
>> code already has a mechanism to do this using cpus_read_lock().
>>
>> MPAM's monitors have an overflow interrupt, so it needs to be possible
>> to walk the domains list in irq context. RCU is ideal for this,
>> but some paths need to be able to sleep to allocate memory.
>>
>> Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part
>> of a cpuhp callback, cpus_read_lock() must always be taken first.
>> rdtgroup_schemata_write() already does this.
>>
>> Most of the filesystem code's domain list walkers are currently
>> protected by the rdtgroup_mutex taken in rdtgroup_kn_lock_live().
>> The exceptions are rdt_bit_usage_show() and the mon_config helpers
>> which take the lock directly.
>>
>> Make the domain list protected by RCU. An architecture-specific
>> lock prevents concurrent writers. rdt_bit_usage_show() could
>> walk the domain list using RCU, but to keep all the filesystem
>> operations the same, this is changed to call cpus_read_lock().
>> The mon_config helpers send multiple IPIs, take the cpus_read_lock()
>> in these cases.
>>
>> The other filesystem list walkers need to be able to sleep.
>> Add cpus_read_lock() to rdtgroup_kn_lock_live() so that the
>> cpuhp callbacks can't be invoked when file system operations are
>> occurring.
>>
>> Add lockdep_assert_cpus_held() in the cases where the
>> rdtgroup_kn_lock_live() call isn't obvious.

> One place that does not seem to have this annotation that
> I think is needed is within get_domain_from_cpu(). Starting
> with this series it is called from resctrl_offline_cpu()
> called via CPU hotplug code. From now on extra care needs to be
> taken when trying to call it from anywhere else.

Excellent! This shows that the overflow/limbo threads are now exposed to CPUs going
offline while they run - I'll fix that.

But, this gets called via IPI from rdt_ctrl_update(), and lockdep can't know who the IPI
came from to check the lock was held, so it triggers false positives. This one will look a
bit funny:
|       /*
|        * Walking r->domains, ensure it can't race with cpuhp.
|        * Because this is called via IPI by rdt_ctrl_update(), assertions
|        * about locks this thread holds will lead to false positives. Check
|        * someone is holding the CPUs lock.
|        */
|       if (IS_ENABLED(CONFIG_LOCKDEP))
|               lockdep_is_cpus_held();


>> Resctrl's domain online/offline calls now need to take the
>> rdtgroup_mutex themselves.

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 1a10f567bbe5..8fd0510d767b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -25,8 +25,15 @@
>>  #include <asm/resctrl.h>
>>  #include "internal.h"
>>  
>> -/* Mutex to protect rdtgroup access. */
>> -DEFINE_MUTEX(rdtgroup_mutex);
>> +/*
>> + * rdt_domain structures are kfree()d when their last CPU goes offline,
>> + * and allocated when the first CPU in a new domain comes online.
>> + * The rdt_resource's domain list is updated when this happens. Readers of
>> + * the domain list must either take cpus_read_lock(), or rely on an RCU
>> + * read-side critical section, to avoid observing concurrent modification.
>> + * All writers take this mutex:
>> + */
>> +static DEFINE_MUTEX(domain_list_lock);
>>  
> 
> I assume that you have not followed the SNC work. Please note that in 
> that work the domain list is split between a monitoring domain list and
> control domain list. I expect this lock would cover both and both would
> be rcu lists?

It's on my list to read through, but too much arm stuff comes up for me to get to it.
I agree that one write-lock to protect both RCU lists makes sense, those would only ever
be modified together. The case I have for needing to walk the list without taking a lock
only applies to the monitors - but keeping the rules the same makes it easier to think about.


>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index b4ed4e1b4938..0620dfc72036 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c

>> @@ -535,7 +541,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>  	int cpu;
>>  
>>  	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
>> -	lockdep_assert_held(&rdtgroup_mutex);
>> +	lockdep_assert_cpus_held();
>>  
> 
> Only now is that comment accurate. Could it be moved to this patch?

Before this patch resctrl_arch_offline_cpu() took the mutex, if this thread held the
mutex, then cpuhp would get blocked in resctrl_arch_offline_cpu() until it was released.
What has changed is how that mutual-exclusion is provided, but the comment describes why
mutual-exclusion is needed.



>> @@ -3801,6 +3832,13 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>>  	domain_destroy_mon_state(d);
>>  }
>>  
>> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	mutex_lock(&rdtgroup_mutex);
>> +	_resctrl_offline_domain(r, d);
>> +	mutex_unlock(&rdtgroup_mutex);
>> +}
>> +

> This seems unnecessary. Why not keep resctrl_offline_domain() as-is and just
> take the lock within it?

For offline there is nothing in it, but ....


>> @@ -3870,12 +3908,23 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>>  	return 0;
>>  }
>>  
>> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	int err;
>> +
>> +	mutex_lock(&rdtgroup_mutex);
>> +	err = _resctrl_online_domain(r, d);
>> +	mutex_unlock(&rdtgroup_mutex);
>> +
>> +	return err;
>> +}
>> +
> 
> Same here.

resctrl_online_domain() has four exit paths, like this they can just return an error, and
the locking is taken care of here to keep the churn down.
But it's just preference - I've changed it to do this with a handful of gotos.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-10-05 20:13   ` Moger, Babu
@ 2023-10-25 17:56     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:56 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 05/10/2023 21:13, Moger, Babu wrote:
> On 9/14/2023 12:21 PM, James Morse wrote:
>> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>> used for different control groups.
>>
>> This means once a CLOSID is allocated, all its monitoring ids may still be
>> dirty, and held in limbo.
>>
>> Instead of allocating the first free CLOSID, on architectures where
>> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
>> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>>
>> The CLOSID found is returned to closid_alloc() for the free list
>> to be updated.

>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index ad6e874d9ed2..f06d3d3e0808 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -558,5 +558,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>   void __init thread_throttle_mode_init(void);
>>   void __init mbm_config_rftype_init(const char *config);
>>   void rdt_staged_configs_clear(void);
>> +bool closid_allocated(unsigned int closid);
>> +int resctrl_find_cleanest_closid(void);
>>     #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 0c783301d106..0bbed8c62d42 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -388,6 +388,48 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
>>       return ERR_PTR(-ENOSPC);
>>   }
>>   +/**
>> + * resctrl_find_cleanest_closid() - Find a CLOSID where all the associated
>> + *                                  RMID are clean, or the CLOSID that has
>> + *                                  the most clean RMID.
>> + *
>> + * MPAM's equivalent of RMID are per-CLOSID, meaning a freshly allocated CLOSID
>> + * may not be able to allocate clean RMID. To avoid this the allocator will
>> + * choose the CLOSID with the most clean RMID.
>> + *
>> + * When the CLOSID and RMID are independent numbers, the first free CLOSID will
>> + * be returned.
>> + */
>> +int resctrl_find_cleanest_closid(void)
>> +{
>> +    u32 cleanest_closid = ~0, iter_num_dirty;
> 
> Just naming num_dirty should have been fine.  I will leave it you.

That was to make it obvious its something to do with the loop, so the value can't be
relied on outside that. I'll rename it and move the declaration into the loop, that way
its out of scope if someone tries to use it later.


>> +    int i = 0;
>> +
>> +    lockdep_assert_held(&rdtgroup_mutex);
>> +
>> +    if (!IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> +        return -EIO;
>> +
>> +    for (i = 0; i < closids_supported(); i++) {
>> +        if (closid_allocated(i))
>> +            continue;
>> +
>> +        iter_num_dirty = closid_num_dirty_rmid[i];
>> +        if (iter_num_dirty == 0)
>> +            return i;
>> +
>> +        if (cleanest_closid == ~0)
>> +            cleanest_closid = i;
>> +
>> +        if (iter_num_dirty < closid_num_dirty_rmid[cleanest_closid])
>> +            cleanest_closid = i;
>> +    }
>> +
>> +    if (cleanest_closid == ~0)
>> +        return -ENOSPC;
>> +    return cleanest_closid;
> 
> Line before the return looks clean.
> 
> *       if (cleanest_closid == ~0)
> +        return -ENOSPC;
> +
> +    return cleanest_closid;

Sure,


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid
  2023-10-05 20:26   ` Moger, Babu
@ 2023-10-25 17:56     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:56 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 05/10/2023 21:26, Moger, Babu wrote:
> On 9/14/2023 12:21 PM, James Morse wrote:
>> MPAM's PMG bits extend its PARTID space, meaning the same PMG value can be
>> used for different control groups.
>>
>> This means once a CLOSID is allocated, all its monitoring ids may still be
>> dirty, and held in limbo.
>>
>> Instead of allocating the first free CLOSID, on architectures where
>> CONFIG_RESCTRL_RMID_DEPENDS_ON_COSID is enabled, search
>> closid_num_dirty_rmid[] to find the cleanest CLOSID.
>>
>> The CLOSID found is returned to closid_alloc() for the free list
>> to be updated.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index fa449ee0d1a7..1f8f1c417a4b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -132,11 +132,20 @@ static void closid_init(void)
>>     static int closid_alloc(void)
>>   {
>> -    u32 closid = ffs(closid_free_map);
>> +    u32 closid;
>> +    int err;

> Naming "err" seems odd here.  How about cleanest_closid ?

That's just habit because the value might be an error code until its been checked, and
once it has, its called 'closid'. But sure, if you think that is clearer.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI
  2023-10-03 21:17   ` Reinette Chatre
@ 2023-10-25 17:56     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:56 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:17, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index b44c487727d4..bd263b9a0abd 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/kernfs.h>
>>  #include <linux/seq_file.h>
>>  #include <linux/slab.h>
>> +#include <linux/tick.h>
>>  #include "internal.h"
>>  
> 
> Please keep the empty line between groups of header files.

(in this case, adding one, but sure)


>> @@ -520,12 +521,24 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +static int smp_mon_event_count(void *arg)
>> +{
>> +	mon_event_count(arg);
>> +
>> +	return 0;
>> +}
>> +
>>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>>  		    int evtid, int first)
>>  {
>> +	int cpu;
>> +
>> +	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> 
> This comment is not accurate at this point. It should accompany the code it applies to.
> 
>> +	lockdep_assert_held(&rdtgroup_mutex);

This refers to the d->cpu_mask calls further down this function. These are written to by
the cpuhp callbacks, rdtgroup_mutex is what prevents the cpuhp callback from running at
the same time as mon_event_read(). If that mutex weren't held, you could pick an offline CPU.

Patch 24 changes this to be lockdep_asser_cpus_held(), as the mutex is no longer used for
this purpose.

This got added here instead of patch-24 because I've added additional use of d->cpu_mask,
these things serve to document how that is safe. If you prefer I'll leave it unsaid here,
and add it with all the others in patch24.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit()
  2023-10-05 18:04       ` Reinette Chatre
@ 2023-10-25 17:56         ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:56 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 05/10/2023 19:04, Reinette Chatre wrote:
> On 10/5/2023 10:05 AM, James Morse wrote:
>> On 02/10/2023 18:00, Reinette Chatre wrote:
>>> On 9/14/2023 10:21 AM, James Morse wrote:
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 725344048f85..a2158c266e41 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -3867,6 +3867,11 @@ int __init rdtgroup_init(void)
>>>>  
>>>>  void __exit rdtgroup_exit(void)
>>>>  {
>>>> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>>>> +
>>>> +	if (r->mon_capable)
>>>> +		resctrl_exit_mon_l3_config(r);
>>>> +
>>>>  	debugfs_remove_recursive(debugfs_resctrl);
>>>>  	unregister_filesystem(&rdt_fs_type);
>>>>  	sysfs_remove_mount_point(fs_kobj, "resctrl");
>>>
>>> You did not respond to me when I requested that this be done differently [1].
>>> Without a response letting me know the faults of my proposal or following the
>>> recommendation I conclude that my feedback was ignored. 
>>
>> Not so - I just trimmed the bits that didn't need a response. I can respond 'Yes' to each
>> one if you prefer, but I find that adds more noise than signal.
> 
> I do not expect a response to every review feedback but no response
> is assumed to mean that you agree with the feedback.
> 
>>
>> This is my attempt at 'doing the cleanup properly', which is what you said your preference
>> was. (no machine on the planet can ever run this code, the __exit section is always
>> discarded by the linker).
>>
>> Reading through again, I missed that you wanted this called from resctrl_exit(). (The
> 
> Right. And not responding to that created expectation that you agreed with the
> request.
> 
>> naming suggests I did this originally, but it didn't work out).
>> I don't think this works as the code in resctrl_exit() remains part of the arch code after
>> the move, but allocating rmid_ptrs[] stays part of the fs code.
>>
>> resctrl_exit() in core.c gets renamed as resctrl_arch_exit(), and rdtgroup_exit() takes on
>> the name resctrl_exit() as its part of the exposed interface.
> 
> I expect memory allocation/free to be symmetrical. Doing otherwise
> complicates the code. Having this memory freed in rdtgroup_exit() only
> seems appropriate if it is allocated from rdtgroup_init().
> Neither rmid_ptrs[] nor closid_num_dirty_rmid are allocated in
> rdtgroup_init() so freeing it in rdtgroup_exit() is not appropriate.

It probably makes more sense when you see how things get split up. I was trying to reduce
the churn of adding something in one place, then moving it later.

For now I've added all the functions to make this thing symmetric.



James


> If you are planning to move resctrl_exit() to be arch code then I expect
> resctrl_late_init() to be split with the rmid_ptrs[]/closid_num_dirty_rmid
> allocation moving to fs code. Freeing that memory can follow at that time.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu
  2023-10-03 21:22   ` Reinette Chatre
@ 2023-10-25 17:57     ` James Morse
  2023-10-27 21:20       ` Reinette Chatre
  0 siblings, 1 reply; 80+ messages in thread
From: James Morse @ 2023-10-25 17:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:22, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index c54fa86e4ef9..bd7f60bf49fe 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -60,11 +60,15 @@
>>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>   *			        aren't marked nohz_full
>>   * @mask:	The mask to pick a CPU from.
>> + * @exclude_cpu:The CPU to avoid picking.
>>   *
>> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
>> - * nohz_full, these are preferred.
>> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
>> + * CPUs that don't use nohz_full, these are preferred. Pass
>> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
>> + * Returns >= nr_cpu_ids if no CPUs are available.

> It may be helpful to add that the function can only fail if exclude_cpu is
> *not* RESCTRL_PICK_ANY_CPU. That helps to understand the sparse error checking.

Assuming you don't give it an empty mask, that should be true ... but I've missed a
difference between the two helpers use of cpumask_any() when combining them....

It now looks like this:
|/**
| * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
| *                              aren't marked nohz_full
| * @mask:       The mask to pick a CPU from.
| * @exclude_cpu:The CPU to avoid picking.
| *
| * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
| * CPUs that don't use nohz_full, these are preferred. Pass
| * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
| *
| * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
| */
|static inline unsigned int
|cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
|{
|        unsigned int cpu, hk_cpu;
|
|        if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
|                cpu = cpumask_any(mask);
|        else
|                cpu = cpumask_any_but(mask, exclude_cpu);
|
|        /* If the CPU picked isn't marked nohz_full, we're done */
|        if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
|                return cpu;
|
|        /* Try to find a CPU that isn't nohz_full to use in preference */
|        hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
|        if (hk_cpu == exclude_cpu)
|                hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
|
|        if (hk_cpu < nr_cpu_ids)
|                cpu = hk_cpu;
|
|        return cpu;
|}

This also has to check cpu is in range before passing it to tick_nohz_full_cpu().


>> @@ -73,6 +77,9 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>>  		return cpu;
>>  
> 
> It is not obvious from this hunk but I cannot see how this would work
> on a system without any nohz_full CPUs.
> 
> At this point the function looks like:
> 
> 	cpu = cpumask_any(mask);
> 	if (!tick_nohz_full_cpu(cpu))
> 		return cpu;
> 
> I expected exclude_cpu to be taken into account. If I understand correctly
> exclude_cpu can be picked by cpumask_any() and as long as it is not
> a nohz_full CPU it would be returned.

Yup, I missed this when combining the functions, fixed as above...



>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 9c6d4b0970e2..208e46ba7368 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -807,22 +808,31 @@ void cqm_handle_limbo(struct work_struct *work)
>>  	__check_limbo(d, false);
>>  
>>  	if (has_busy_rmid(d)) {
>> -		cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> +		cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>>  		schedule_delayed_work_on(cpu, &d->cqm_limbo, delay);
>>  	}
>>  
>>  	mutex_unlock(&rdtgroup_mutex);
>>  }
>>  
>> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
>> +/**
>> + * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
>> + *                             domain.
>> + * @delay_ms:      How far in the future the handler should run.
>> + * @exclude_cpu:   Which CPU the handler should not run on,
>> + *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
>> + */

> arch/x86/kernel/cpu/resctrl/monitor.c:824: info: Scanning doc for function cqm_setup_limbo_handler
> arch/x86/kernel/cpu/resctrl/monitor.c:832: warning: Function parameter or member 'dom' not described in 'cqm_setup_limbo_handler'

What tool outputs this? I've run 'make ARCH=x86 htmldocs', which outputs a tonne of stuff,
but I've never found lines about resctrl in there:
| morse@eglon:~/kernel/mpam/build_x86_64$ make ARCH=x86 htmldocs &> tee  output.log
| morse@eglon:~/kernel/mpam/build_x86_64$ cat output.log | grep resctrl
| morse@eglon:~/kernel/mpam/build_x86_64$


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  2023-10-03 21:18   ` Reinette Chatre
@ 2023-10-25 17:57     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:18, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> @@ -245,6 +250,17 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>>  			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
>>  			   u64 *val);
>>  
>> +/**
>> + * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
>> + *
>> + * When built with CONFIG_DEBUG_ATOMIC_SLEEP generate a warning when
>> + * resctrl_arch_rmid_read() is called with preemption disabled.
>> + */
>> +static inline void resctrl_arch_rmid_read_context_check(void)
>> +{
>> +	if (!irqs_disabled())
>> +		might_sleep();
>> +}
>>  
>>  /**
>>   * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> 
> I was expecting the above to look like you said it would look [1].

Hmm, not sure what happened there - it even made it into the changelog for the patch.
Presumably an earlier change conflicted and I messed up the resolution.

Fixed now.


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work
  2023-10-03 21:23   ` Reinette Chatre
@ 2023-10-25 17:57     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Reinette,

On 03/10/2023 22:23, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> The resctrl architecture specific code may need to free a domain when
>> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
>> Amongst other things, the resctrl filesystem code needs to clear this
>> CPU from the cpu_mask of any control and monitor groups.
>>
>> Currently this is all done in core.c and called from
>> resctrl_offline_cpu(), making the split between architecture and
>> filesystem code unclear.
>>
>> Move the filesystem work to remove the CPU from the control and monitor
>> groups into a filesystem helper called resctrl_offline_cpu(), and rename
>> the one in core.c resctrl_arch_offline_cpu().
>>
>> The rdtgroup_mutex is unlocked and locked again in the call in
>> preparation for changing the locking rules for the architecture
>> code.
> 
> This last paragraph may cause some confusion since this refactoring
> is not changing any current locking. I'll defer to you if you prefer
> to keep it.

Hmm, that is referring to an earlier version that looked funny and I felt needed
explanation. I've remove that paragraph.


>> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
>> Tested-By: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>


Thanks!

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read()
  2023-10-05 21:46   ` Moger, Babu
@ 2023-10-25 17:58     ` James Morse
  0 siblings, 0 replies; 80+ messages in thread
From: James Morse @ 2023-10-25 17:58 UTC (permalink / raw)
  To: babu.moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght

Hi Babu,

On 05/10/2023 22:46, Moger, Babu wrote:
> On 9/14/2023 12:21 PM, James Morse wrote:
>> Depending on the number of monitors available, Arm's MPAM may need to
>> allocate a monitor prior to reading the counter value. Allocating a
>> contended resource may involve sleeping.
>>
>> add_rmid_to_limbo() calls resctrl_arch_rmid_read() for multiple domains,
>> the allocation should be valid for all domains.
>>
>> __check_limbo() and mon_event_count() each make multiple calls to
>> resctrl_arch_rmid_read(), to avoid extra work on contended systems,
>> the allocation should be valid for multiple invocations of
>> resctrl_arch_rmid_read().
>>
>> Add arch hooks for this allocation, which need calling before
>> resctrl_arch_rmid_read(). The allocated monitor is passed to
>> resctrl_arch_rmid_read(), then freed again afterwards. The helper
>> can be called on any CPU, and can sleep.

>>   diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f7311102e94c..5e4b4df9610b 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -235,6 +235,9 @@ void resctrl_offline_domain(struct rdt_resource *r, struct
>> rdt_domain *d);
>>    * @rmid:        rmid of the counter to read.
>>    * @eventid:        eventid to read, e.g. L3 occupancy.
>>    * @val:        result of the counter read in bytes.
>> + * @arch_mon_ctx:    An architecture specific value from
>> + *            resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
>> + *            the hardware monitor allocated for this read request.
>>    *
>>    * Some architectures need to sleep when first programming some of the counters.
>>    * (specifically: arm64's MPAM cache occupancy counters can return 'not ready'
>> @@ -248,7 +251,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct
>> rdt_domain *d);
>>    */
>>   int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>>                  u32 closid, u32 rmid, enum resctrl_event_id eventid,
>> -               u64 *val);
>> +               u64 *val, void *arch_mon_ctx);

> Just wondering.. Have you thought about passing rmid_read structure to this function?

I did, but I'd prefer to leave that as private to resctrl as the proposed PMU driver ends
up using this API too.


> Because most of the information is already inside the rmid_read structure. We can avoid
> passing 7 parameters.

We'd end up passing all these parameters via memory ... but the compiler knows when it has
to do this, and when it doesn't. For example on aarch64 the compiler knows it can pass all
seven of these arguments in registers. On x86_64 it looks like 6 arguments can be passed,
and the last one is never used on x86 so it never needs reading off the stack. (all this
feels like micro-optimisation!)


Thanks,

James

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu
  2023-10-25 17:57     ` James Morse
@ 2023-10-27 21:20       ` Reinette Chatre
  0 siblings, 0 replies; 80+ messages in thread
From: Reinette Chatre @ 2023-10-27 21:20 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, xingxin.hx, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght



On 10/25/2023 10:57 AM, James Morse wrote:
> On 03/10/2023 22:22, Reinette Chatre wrote:

...

> 
>> arch/x86/kernel/cpu/resctrl/monitor.c:824: info: Scanning doc for function cqm_setup_limbo_handler
>> arch/x86/kernel/cpu/resctrl/monitor.c:832: warning: Function parameter or member 'dom' not described in 'cqm_setup_limbo_handler'
> 
> What tool outputs this? I've run 'make ARCH=x86 htmldocs', which outputs a tonne of stuff,
> but I've never found lines about resctrl in there:
> | morse@eglon:~/kernel/mpam/build_x86_64$ make ARCH=x86 htmldocs &> tee  output.log
> | morse@eglon:~/kernel/mpam/build_x86_64$ cat output.log | grep resctrl
> | morse@eglon:~/kernel/mpam/build_x86_64$
> 

(copied from Documentation/doc-guide/kernel-doc.rst:)
	scripts/kernel-doc -v -none drivers/foo/bar.c

Reinette

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2023-10-27 21:21 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-14 17:21 [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking James Morse
2023-09-14 17:21 ` [PATCH v6 01/24] tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef James Morse
2023-09-26 14:31   ` Fenghua Yu
2023-10-03 21:05   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() James Morse
2023-10-02 17:00   ` Reinette Chatre
2023-10-05 17:05     ` James Morse
2023-10-05 18:04       ` Reinette Chatre
2023-10-25 17:56         ` James Morse
2023-10-04 18:00   ` Moger, Babu
2023-10-05 17:06     ` James Morse
2023-09-14 17:21 ` [PATCH v6 03/24] x86/resctrl: Create helper for RMID allocation and mondata dir creation James Morse
2023-10-03 21:07   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 04/24] x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() James Morse
2023-10-03 21:07   ` Reinette Chatre
2023-10-04 18:01   ` Moger, Babu
2023-10-05 17:06     ` James Morse
2023-09-14 17:21 ` [PATCH v6 05/24] x86/resctrl: Track the closid with the rmid James Morse
2023-10-03 21:11   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 06/24] x86/resctrl: Access per-rmid structures by index James Morse
2023-10-03 21:12   ` Reinette Chatre
2023-10-24  9:28   ` Maciej Wieczór-Retman
2023-09-14 17:21 ` [PATCH v6 07/24] x86/resctrl: Allow RMID allocation to be scoped by CLOSID James Morse
2023-10-03 21:12   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 08/24] x86/resctrl: Track the number of dirty RMID a CLOSID has James Morse
2023-10-03 21:13   ` Reinette Chatre
2023-10-05 17:07     ` James Morse
2023-09-14 17:21 ` [PATCH v6 09/24] x86/resctrl: Use set_bit()/clear_bit() instead of open coding James Morse
2023-09-17 21:00   ` David Laight
2023-09-29 16:13     ` James Morse
2023-10-03 21:14   ` Reinette Chatre
2023-10-04 20:38   ` Moger, Babu
2023-10-05 17:07     ` James Morse
2023-09-14 17:21 ` [PATCH v6 10/24] x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid James Morse
2023-10-03 21:14   ` Reinette Chatre
2023-10-05 20:13   ` Moger, Babu
2023-10-25 17:56     ` James Morse
2023-10-05 20:26   ` Moger, Babu
2023-10-25 17:56     ` James Morse
2023-10-24 12:06   ` Maciej Wieczór-Retman
2023-09-14 17:21 ` [PATCH v6 11/24] x86/resctrl: Move CLOSID/RMID matching and setting to use helpers James Morse
2023-10-03 21:15   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 12/24] x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow James Morse
2023-10-03 21:15   ` Reinette Chatre
2023-10-05 17:07     ` James Morse
2023-09-14 17:21 ` [PATCH v6 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI James Morse
2023-10-03 21:17   ` Reinette Chatre
2023-10-25 17:56     ` James Morse
2023-09-14 17:21 ` [PATCH v6 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to sleep James Morse
2023-10-03 21:18   ` Reinette Chatre
2023-10-25 17:57     ` James Morse
2023-10-05 21:33   ` Moger, Babu
2023-09-14 17:21 ` [PATCH v6 15/24] x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() James Morse
2023-10-03 21:18   ` Reinette Chatre
2023-10-05 21:46   ` Moger, Babu
2023-10-25 17:58     ` James Morse
2023-09-14 17:21 ` [PATCH v6 16/24] x86/resctrl: Make resctrl_mounted checks explicit James Morse
2023-10-03 21:19   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 17/24] x86/resctrl: Move alloc/mon static keys into helpers James Morse
2023-10-03 21:19   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 18/24] x86/resctrl: Make rdt_enable_key the arch's decision to switch James Morse
2023-10-03 21:19   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 19/24] x86/resctrl: Add helpers for system wide mon/alloc capable James Morse
2023-10-03 21:19   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 20/24] x86/resctrl: Add CPU online callback for resctrl work James Morse
2023-10-03 21:20   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu James Morse
2023-10-03 21:22   ` Reinette Chatre
2023-10-25 17:57     ` James Morse
2023-10-27 21:20       ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 22/24] x86/resctrl: Add cpu offline callback for resctrl work James Morse
2023-10-03 21:23   ` Reinette Chatre
2023-10-25 17:57     ` James Morse
2023-09-14 17:21 ` [PATCH v6 23/24] x86/resctrl: Move domain helper migration into resctrl_offline_cpu() James Morse
2023-10-03 21:23   ` Reinette Chatre
2023-09-14 17:21 ` [PATCH v6 24/24] x86/resctrl: Separate arch and fs resctrl locks James Morse
2023-10-03 21:28   ` Reinette Chatre
2023-10-25 17:55     ` James Morse
2023-09-27  7:38 ` [PATCH v6 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Shaopeng Tan (Fujitsu)
2023-09-29 16:13   ` James Morse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.