All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
@ 2021-10-01 16:02 James Morse
  2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
                   ` (24 more replies)
  0 siblings, 25 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

Hello!

Patches 1&2 have been posted independently in case they are wanted as fixes.

The major change in this version is when the mba_mbps[] array is allocated.

---
The aim of this series is to insert a split between the parts of the monitor
code that the architecture must implement, and those that are part of the
resctrl filesystem. The eventual aim is to move all filesystem parts out
to live in /fs/resctrl, so that resctrl can be wired up for MPAM.

What's MPAM? See the cover letter of a previous series. [1]

The series adds domain online/offline callbacks to allow the filesystem to
manage some of its structures itself, then moves all the 'mba_sc' behaviour
to be part of the filesystem.
This means another architecture doesn't need to provide an mbps_val array.
As its all software, the resctrl filesystem should be able to do this without
any help from the architecture code.

Finally __rmid_read() is refactored to be the API call that the architecture
provides to read a counter value. All the hardware specific overflow detection,
scaling and value correction should occur behind this helper.


This series is based on v5.15-rc3, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v2

[0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
[1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/

[v1] https://lore.kernel.org/lkml/20210729223610.29373-1-james.morse@arm.com/


Thanks,

James Morse (23):
  x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state()
    fails
  x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
  x86/resctrl: Kill off alloc_enabled
  x86/resctrl: Merge mon_capable and mon_enabled
  x86/resctrl: Add domain online callback for resctrl work
  x86/resctrl: Group struct rdt_hw_domain cleanup
  x86/resctrl: Add domain offline callback for resctrl work
  x86/resctrl: Create mba_sc configuration in the rdt_domain
  x86/resctrl: Switch over to the resctrl mbps_val list
  x86/resctrl: Remove architecture copy of mbps_val
  x86/resctrl: Remove set_mba_sc()s control array re-initialisation
  x86/resctrl: Abstract and use supports_mba_mbps()
  x86/resctrl: Allow update_mba_bw() to update controls directly
  x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
    chunks
  x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  x86/recstrl: Allow per-rmid arch private storage to be reset
  x86/resctrl: Abstract __rmid_read()
  x86/resctrl: Pass the required parameters into
    resctrl_arch_rmid_read()
  x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
  x86/resctrl: Move get_corrected_mbm_count() into
    resctrl_arch_rmid_read()
  x86/resctrl: Rename and change the units of resctrl_cqm_threshold
  x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
    boot_cpu_data
  x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

 arch/x86/kernel/cpu/resctrl/core.c        | 116 ++++--------
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  75 +++++---
 arch/x86/kernel/cpu/resctrl/internal.h    |  62 +++----
 arch/x86/kernel/cpu/resctrl/monitor.c     | 200 ++++++++++++--------
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 217 ++++++++++++++++++----
 include/linux/resctrl.h                   |  60 +++++-
 7 files changed, 480 insertions(+), 252 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:02 ` [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() James Morse
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

domain_add_cpu() is called whenever a CPU is brought online. The
earlier call to domain_setup_ctrlval() allocates the control value
arrays.

If domain_setup_mon_state() fails, the control value arrays are not
freed.

Add the missing kfree() calls.

Fixes: 1bd2a63b4f0de ("x86/intel_rdt/mba_sc: Add initialization support")
Fixes: edf6fa1c4a951 ("x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management")
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
This will not apply prior to v5.15-rc1, I'll provide a backport.
~s/hw_dom/d/;
---
 arch/x86/kernel/cpu/resctrl/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4b8813bafffd..b5de5a6c115c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -532,6 +532,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	}
 
 	if (r->mon_capable && domain_setup_mon_state(r, d)) {
+		kfree(hw_dom->ctrl_val);
+		kfree(hw_dom->mbps_val);
 		kfree(d);
 		return;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

Commit 792e0f6f789b ("x86/resctrl: Split struct rdt_domain") separated
the architecture specific and filesystem parts of the resctrl domain
structures.

This left the error paths in domain_add_cpu() kfree()ing the memory
with the wrong type.

This will cause a problem if someone adds a new member to struct
rdt_hw_domain meaning d_resctrl is no longer the first member.

Fixes: 792e0f6f789b ("x86/resctrl: Split struct rdt_domain")
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b5de5a6c115c..bb1c3f5f60c8 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -527,14 +527,14 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	rdt_domain_reconfigure_cdp(r);
 
 	if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
-		kfree(d);
+		kfree(hw_dom);
 		return;
 	}
 
 	if (r->mon_capable && domain_setup_mon_state(r, d)) {
 		kfree(hw_dom->ctrl_val);
 		kfree(hw_dom->mbps_val);
-		kfree(d);
+		kfree(hw_dom);
 		return;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
  2021-10-01 16:02 ` [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:19   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

rdt_resources_all[] used to have extra entries for L2CODE/L2DATA.
These were hidden from resctrl by the alloc_enabled value.

Now that the L2/L2CODE/L2DATA resources have been merged together,
alloc_enabled doesn't mean anything, it always has the same value as
alloc_capable which indicates CAT is supported by this cache.

Remove alloc_enabled and its helpers.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Fixed comment in rdtgroup_create_info_dir()
---
 arch/x86/kernel/cpu/resctrl/core.c        | 4 ----
 arch/x86/kernel/cpu/resctrl/internal.h    | 4 ----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 6 +++---
 include/linux/resctrl.h                   | 2 --
 5 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bb1c3f5f60c8..2f87177f1f69 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -147,7 +147,6 @@ static inline void cache_alloc_hsw_probe(void)
 	r->cache.shareable_bits = 0xc0000;
 	r->cache.min_cbm_bits = 2;
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	rdt_alloc_capable = true;
 }
@@ -211,7 +210,6 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 	thread_throttle_mode_init();
 
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	return true;
 }
@@ -242,7 +240,6 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	r->data_width = 4;
 
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	return true;
 }
@@ -261,7 +258,6 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
 	r->cache.shareable_bits = ebx & r->default_ctrl;
 	r->data_width = (r->cache.cbm_len + 3) / 4;
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 }
 
 static void rdt_get_cdp_config(int level)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d647188a43b..53f3d275a98f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 	for_each_rdt_resource(r)					      \
 		if (r->mon_capable)
 
-#define for_each_alloc_enabled_rdt_resource(r)				      \
-	for_each_rdt_resource(r)					      \
-		if (r->alloc_enabled)
-
 #define for_each_mon_enabled_rdt_resource(r)				      \
 	for_each_rdt_resource(r)					      \
 		if (r->mon_enabled)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index db813f819ad6..f810969ced4b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -835,7 +835,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
 	 * First determine which cpus have pseudo-locked regions
 	 * associated with them.
 	 */
-	for_each_alloc_enabled_rdt_resource(r) {
+	for_each_alloc_capable_rdt_resource(r) {
 		list_for_each_entry(d_i, &r->domains, list) {
 			if (d_i->plr)
 				cpumask_or(cpu_with_psl, cpu_with_psl,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b57b3db9a6a7..e327f8d1c8a3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1756,7 +1756,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 	if (ret)
 		goto out_destroy;
 
-	/* loop over enabled controls, these are all alloc_enabled */
+	/* loop over enabled controls, these are all alloc_capable */
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
 		fflags =  r->fflags | RF_CTRL_INFO;
@@ -2106,7 +2106,7 @@ static int schemata_list_create(void)
 	struct rdt_resource *r;
 	int ret = 0;
 
-	for_each_alloc_enabled_rdt_resource(r) {
+	for_each_alloc_capable_rdt_resource(r) {
 		if (resctrl_arch_get_cdp_enabled(r->rid)) {
 			ret = schemata_list_add(r, CDP_CODE);
 			if (ret)
@@ -2452,7 +2452,7 @@ static void rdt_kill_sb(struct super_block *sb)
 	set_mba_sc(false);
 
 	/*Put everything back to default values. */
-	for_each_alloc_enabled_rdt_resource(r)
+	for_each_alloc_capable_rdt_resource(r)
 		reset_all_ctrls(r);
 	cdp_disable_all();
 	rmdir_all_sub();
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 21deb5212bbd..386ab3a41500 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
- * @alloc_enabled:	Is allocation enabled on this machine
  * @mon_enabled:	Is monitoring enabled for this feature
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
@@ -150,7 +149,6 @@ struct resctrl_schema;
  */
 struct rdt_resource {
 	int			rid;
-	bool			alloc_enabled;
 	bool			mon_enabled;
 	bool			alloc_capable;
 	bool			mon_capable;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (2 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-19 23:18   ` Babu Moger
  2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Removed stray cdp_capable changes.
---
 arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
 arch/x86/kernel/cpu/resctrl/monitor.c  | 1 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
 include/linux/resctrl.h                | 2 --
 4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 53f3d275a98f..8828b5c1b6d2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 	for_each_rdt_resource(r)					      \
 		if (r->mon_capable)
 
-#define for_each_mon_enabled_rdt_resource(r)				      \
-	for_each_rdt_resource(r)					      \
-		if (r->mon_enabled)
-
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
 	struct {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c9f0f3d63f75..37af1790337f 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -717,7 +717,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
-	r->mon_enabled = true;
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e327f8d1c8a3..e243c7d15b81 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1765,7 +1765,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 			goto out_destroy;
 	}
 
-	for_each_mon_enabled_rdt_resource(r) {
+	for_each_mon_capable_rdt_resource(r) {
 		fflags =  r->fflags | RF_MON_INFO;
 		sprintf(name, "%s_MON", r->name);
 		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
@@ -2504,7 +2504,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
 	struct rdtgroup *prgrp, *crgrp;
 	char name[32];
 
-	if (!r->mon_enabled)
+	if (!r->mon_capable)
 		return;
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2572,7 +2572,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	struct rdtgroup *prgrp, *crgrp;
 	struct list_head *head;
 
-	if (!r->mon_enabled)
+	if (!r->mon_capable)
 		return;
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2642,7 +2642,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 	 * Create the subdirectories for each domain. Note that all events
 	 * in a domain like L3 are grouped into a resource whose domain is L3
 	 */
-	for_each_mon_enabled_rdt_resource(r) {
+	for_each_mon_capable_rdt_resource(r) {
 		ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
 		if (ret)
 			goto out_destroy;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 386ab3a41500..8180c539800d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
- * @mon_enabled:	Is monitoring enabled for this feature
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
  * @num_rmid:		Number of RMIDs available
@@ -149,7 +148,6 @@ struct resctrl_schema;
  */
 struct rdt_resource {
 	int			rid;
-	bool			mon_enabled;
 	bool			alloc_capable;
 	bool			mon_capable;
 	int			num_rmid;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (3 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:19   ` Reinette Chatre
  2021-10-19 23:19   ` Babu Moger
  2021-10-01 16:02 ` [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup James Morse
                   ` (19 subsequent siblings)
  24 siblings, 2 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
allocate the memory.

Move domain_setup_mon_state(), the monitor subdir creation call and the
mbm/limbo workers into a new resctrl_online_domain() call. These bits
are not specific to the architecture. Grouping them in one function
allows that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Capitalisation
 * Removed inline comment
 * Added to the commit message
---
 arch/x86/kernel/cpu/resctrl/core.c     | 57 ++++------------------
 arch/x86/kernel/cpu/resctrl/internal.h |  2 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 ++++++++++++++++++++++++--
 include/linux/resctrl.h                |  1 +
 4 files changed, 69 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2f87177f1f69..f1fa54de8136 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -443,42 +443,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
-{
-	size_t tsize;
-
-	if (is_llc_occupancy_enabled()) {
-		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
-		if (!d->rmid_busy_llc)
-			return -ENOMEM;
-		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
-	}
-	if (is_mbm_total_enabled()) {
-		tsize = sizeof(*d->mbm_total);
-		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-		if (!d->mbm_total) {
-			bitmap_free(d->rmid_busy_llc);
-			return -ENOMEM;
-		}
-	}
-	if (is_mbm_local_enabled()) {
-		tsize = sizeof(*d->mbm_local);
-		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-		if (!d->mbm_local) {
-			bitmap_free(d->rmid_busy_llc);
-			kfree(d->mbm_total);
-			return -ENOMEM;
-		}
-	}
-
-	if (is_mbm_enabled()) {
-		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
-		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
-	}
-
-	return 0;
-}
-
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -498,6 +462,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	struct list_head *add_pos = NULL;
 	struct rdt_hw_domain *hw_dom;
 	struct rdt_domain *d;
+	int err;
 
 	d = rdt_find_domain(r, id, &add_pos);
 	if (IS_ERR(d)) {
@@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (r->mon_capable && domain_setup_mon_state(r, d)) {
-		kfree(hw_dom->ctrl_val);
-		kfree(hw_dom->mbps_val);
-		kfree(hw_dom);
-		return;
-	}
-
 	list_add_tail(&d->list, add_pos);
 
-	/*
-	 * If resctrl is mounted, add
-	 * per domain monitor data directories.
-	 */
-	if (static_branch_unlikely(&rdt_mon_enable_key))
-		mkdir_mondata_subdir_allrdtgrp(r, d);
+	err = resctrl_online_domain(r, d);
+	if (err) {
+		list_del(&d->list);
+		kfree(hw_dom->ctrl_val);
+		kfree(hw_dom->mbps_val);
+		kfree(d);
+	}
 }
 
 static void domain_remove_cpu(int cpu, struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8828b5c1b6d2..be48a682dbdb 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -524,8 +524,6 @@ void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 				    unsigned int dom_id);
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    struct rdt_domain *d);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e243c7d15b81..19691f9ab061 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2565,16 +2565,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
  * Add all subdirectories of mon_data for "ctrl_mon" groups
  * and "monitor" groups with given domain id.
  */
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    struct rdt_domain *d)
+static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+					   struct rdt_domain *d)
 {
 	struct kernfs_node *parent_kn;
 	struct rdtgroup *prgrp, *crgrp;
 	struct list_head *head;
 
-	if (!r->mon_capable)
-		return;
-
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		parent_kn = prgrp->mon.mon_data_kn;
 		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
@@ -3236,6 +3233,64 @@ static int __init rdtgroup_setup_root(void)
 	return ret;
 }
 
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+{
+	size_t tsize;
+
+	if (is_llc_occupancy_enabled()) {
+		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
+		if (!d->rmid_busy_llc)
+			return -ENOMEM;
+	}
+	if (is_mbm_total_enabled()) {
+		tsize = sizeof(*d->mbm_total);
+		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		if (!d->mbm_total) {
+			bitmap_free(d->rmid_busy_llc);
+			return -ENOMEM;
+		}
+	}
+	if (is_mbm_local_enabled()) {
+		tsize = sizeof(*d->mbm_local);
+		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		if (!d->mbm_local) {
+			bitmap_free(d->rmid_busy_llc);
+			kfree(d->mbm_total);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	int err;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (!r->mon_capable)
+		return 0;
+
+	err = domain_setup_mon_state(r, d);
+	if (err)
+		return err;
+
+	if (is_mbm_enabled()) {
+		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
+		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+	}
+
+	if (is_llc_occupancy_enabled())
+		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+
+	/* If resctrl is mounted, add per domain monitor data directories. */
+	if (static_branch_unlikely(&rdt_mon_enable_key))
+		mkdir_mondata_subdir_allrdtgrp(r, d);
+
+	return 0;
+}
+
 /*
  * rdtgroup_init - rdtgroup initialization
  *
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8180c539800d..d512455b4c3a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -192,5 +192,6 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (4 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

domain_add_cpu() and domain_remove_cpu() need to kfree() the child
arrays that were allocated by domain_setup_ctrlval().

As this memory is moved around, and new arrays are created, adjusting
the error handling cleanup code becomes noisier.

To simplify this, move all the kfree() calls into a domain_free() helper.
This depends on struct rdt_hw_domain being kzalloc()d, allowing it to
unconditionally kfree() all the child arrays.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * This patch is new
---
 arch/x86/kernel/cpu/resctrl/core.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f1fa54de8136..7a2c24c5652c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -414,6 +414,13 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
 	}
 }
 
+void domain_free(struct rdt_hw_domain *hw_dom)
+{
+	kfree(hw_dom->ctrl_val);
+	kfree(hw_dom->mbps_val);
+	kfree(hw_dom);
+}
+
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -488,7 +495,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	rdt_domain_reconfigure_cdp(r);
 
 	if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
-		kfree(hw_dom);
+		domain_free(hw_dom);
 		return;
 	}
 
@@ -497,9 +504,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	err = resctrl_online_domain(r, d);
 	if (err) {
 		list_del(&d->list);
-		kfree(hw_dom->ctrl_val);
-		kfree(hw_dom->mbps_val);
-		kfree(d);
+		domain_free(hw_dom);
 	}
 }
 
@@ -547,12 +552,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 		if (d->plr)
 			d->plr->d = NULL;
 
-		kfree(hw_dom->ctrl_val);
-		kfree(hw_dom->mbps_val);
 		bitmap_free(d->rmid_busy_llc);
 		kfree(d->mbm_total);
 		kfree(d->mbm_local);
-		kfree(hw_dom);
+		domain_free(hw_dom);
 		return;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (5 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-19 23:19   ` Babu Moger
  2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when CPU hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call. These bits are not
specific to the architecture. Grouping them in one function allows
that code to be moved to /fs/ and re-used by another architecture.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Removed a redundant mon_capable check
 * Capitalisation
 * Removed inline comment
 * Added to the commit message
---
 arch/x86/kernel/cpu/resctrl/core.c     | 26 ++---------------
 arch/x86/kernel/cpu/resctrl/internal.h |  2 --
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++---
 include/linux/resctrl.h                |  1 +
 4 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7a2c24c5652c..1dd8428df008 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 
 	cpumask_clear_cpu(cpu, &d->cpu_mask);
 	if (cpumask_empty(&d->cpu_mask)) {
-		/*
-		 * If resctrl is mounted, remove all the
-		 * per domain monitor data directories.
-		 */
-		if (static_branch_unlikely(&rdt_mon_enable_key))
-			rmdir_mondata_subdir_allrdtgrp(r, d->id);
+		resctrl_offline_domain(r, d);
 		list_del(&d->list);
-		if (r->mon_capable && is_mbm_enabled())
-			cancel_delayed_work(&d->mbm_over);
-		if (is_llc_occupancy_enabled() &&  has_busy_rmid(r, d)) {
-			/*
-			 * When a package is going down, forcefully
-			 * decrement rmid->ebusy. There is no way to know
-			 * that the L3 was flushed and hence may lead to
-			 * incorrect counts in rare scenarios, but leaving
-			 * the RMID as busy creates RMID leaks if the
-			 * package never comes back.
-			 */
-			__check_limbo(d, true);
-			cancel_delayed_work(&d->cqm_limbo);
-		}
 
 		/*
 		 * rdt_domain "d" is going to be freed below, so clear
@@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 		 */
 		if (d->plr)
 			d->plr->d = NULL;
-
-		bitmap_free(d->rmid_busy_llc);
-		kfree(d->mbm_total);
-		kfree(d->mbm_local);
 		domain_free(hw_dom);
+
 		return;
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index be48a682dbdb..e12b55f815bf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    unsigned int dom_id);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 19691f9ab061..38670bb810cb 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
  * Remove all subdirectories of mon_data of ctrl_mon groups
  * and monitor groups with given domain id.
  */
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+					   unsigned int dom_id)
 {
 	struct rdtgroup *prgrp, *crgrp;
 	char name[32];
 
-	if (!r->mon_capable)
-		return;
-
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		sprintf(name, "mon_%s_%02d", r->name, dom_id);
 		kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
@@ -3233,6 +3231,39 @@ static int __init rdtgroup_setup_root(void)
 	return ret;
 }
 
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (!r->mon_capable)
+		return;
+
+	/*
+	 * If resctrl is mounted, remove all the
+	 * per domain monitor data directories.
+	 */
+	if (static_branch_unlikely(&rdt_mon_enable_key))
+		rmdir_mondata_subdir_allrdtgrp(r, d->id);
+
+	if (is_mbm_enabled())
+		cancel_delayed_work(&d->mbm_over);
+	if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+		/*
+		 * When a package is going down, forcefully
+		 * decrement rmid->ebusy. There is no way to know
+		 * that the L3 was flushed and hence may lead to
+		 * incorrect counts in rare scenarios, but leaving
+		 * the RMID as busy creates RMID leaks if the
+		 * package never comes back.
+		 */
+		__check_limbo(d, true);
+		cancel_delayed_work(&d->cqm_limbo);
+	}
+	bitmap_free(d->rmid_busy_llc);
+	kfree(d->mbm_total);
+	kfree(d->mbm_local);
+}
+
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 {
 	size_t tsize;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d512455b4c3a..5d283bdd6162 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (6 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:26   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val from user-space.

This complicates the interface between the architecture code.

Make the filesystem parts of resctrl create an array for the mba_sc
values when is_mba_sc() is set to true. The software controller
can be changed to use this, allowing the architecture code to only
consider the values configured in hardware.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Added missing error handling to mba_sc_domain_allocate() in
   domain_setup_mon_state()
 * Added comment about mba_sc_domain_allocate() races
 * Squashed out struct resctrl_mba_sc
 * Moved mount time alloc/free calls to set_mba_sc().
 * Removed mount check in resctrl_offline_domain()
 * Reword commit message
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 67 ++++++++++++++++++++++++++
 include/linux/resctrl.h                |  6 +++
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e12b55f815bf..a7e2cbce29d5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -36,7 +36,6 @@
 #define MBM_OVERFLOW_INTERVAL		1000
 #define MAX_MBA_BW			100u
 #define MBA_IS_LINEAR			0x4
-#define MBA_MAX_MBPS			U32_MAX
 #define MAX_MBA_BW_AMD			0x800
 #define MBM_CNTR_WIDTH_OFFSET_AMD	20
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 38670bb810cb..9d402bc8bdff 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1889,6 +1889,64 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
 		l3_qos_cfg_update(&hw_res->cdp_enabled);
 }
 
+static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
+{
+	u32 num_closid = resctrl_arch_get_num_closid(r);
+	int cpu = cpumask_any(&d->cpu_mask);
+	int i;
+
+	/*
+	 * d->mbps_val is allocated by a call to this function in set_mba_sc(),
+	 * and domain_setup_mon_state(). Both calls are guarded by is_mba_sc(),
+	 * which can only return true while the filesystem is mounted. The
+	 * two calls are prevented from racing as rdt_get_tree() takes the
+	 * cpuhp read lock before calling rdt_enable_ctx(ctx), which prevents
+	 * it running concurrently with resctrl_online_domain().
+	 */
+	lockdep_assert_cpus_held();
+
+	d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
+				   GFP_KERNEL, cpu_to_node(cpu));
+	if (!d->mbps_val)
+		return -ENOMEM;
+
+	for (i = 0; i < num_closid; i++)
+		d->mbps_val[i] = MBA_MAX_MBPS;
+
+	return 0;
+}
+
+static int mba_sc_allocate(struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+	int ret;
+
+	list_for_each_entry(d, &r->domains, list) {
+		ret = mba_sc_domain_allocate(r, d);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static void mba_sc_domain_destroy(struct rdt_resource *r,
+				  struct rdt_domain *d)
+{
+	kfree(d->mbps_val);
+	d->mbps_val = NULL;
+}
+
+static void mba_sc_destroy(struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+
+	lockdep_assert_cpus_held();
+
+	list_for_each_entry(d, &r->domains, list)
+		mba_sc_domain_destroy(r, d);
+}
+
 /*
  * Enable or disable the MBA software controller
  * which helps user specify bandwidth in MBps.
@@ -1911,6 +1969,10 @@ static int set_mba_sc(bool mba_sc)
 		setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
 	}
 
+	if (is_mba_sc(r))
+		return mba_sc_allocate(r);
+
+	mba_sc_destroy(r);
 	return 0;
 }
 
@@ -3259,6 +3321,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 		__check_limbo(d, true);
 		cancel_delayed_work(&d->cqm_limbo);
 	}
+	if (is_mba_sc(r))
+		mba_sc_domain_destroy(r, d);
 	bitmap_free(d->rmid_busy_llc);
 	kfree(d->mbm_total);
 	kfree(d->mbm_local);
@@ -3291,6 +3355,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 		}
 	}
 
+	if (is_mba_sc(r))
+		return mba_sc_domain_allocate(r, d);
+
 	return 0;
 }
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5d283bdd6162..355660d46612 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,
 
 #endif
 
+/* max value for struct resctrl_mba_sc's mbps_val */
+#define MBA_MAX_MBPS   U32_MAX
+
 /**
  * enum resctrl_conf_type - The type of configuration.
  * @CDP_NONE:	No prioritisation, both code and data are controlled or monitored.
@@ -53,6 +56,8 @@ struct resctrl_staged_config {
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
  * @plr:		pseudo-locked region (if any) associated with domain
  * @staged_config:	parsed configuration to be applied
+ * @mbps_val:		Array of user specified control values for mba_sc,
+ *			indexed by closid
  */
 struct rdt_domain {
 	struct list_head		list;
@@ -67,6 +72,7 @@ struct rdt_domain {
 	int				cqm_work_cpu;
 	struct pseudo_lock_region	*plr;
 	struct resctrl_staged_config	staged_config[CDP_NUM_TYPES];
+	u32				*mbps_val;
 };
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (7 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:26   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
ctrlval_parse function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.

Change parse_bw() to write the configuration value directly to the
mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mba_sc[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Squashed out struct resctrl_mba_sc
 * Removed stray paragraphs from commit message
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 44 ++++++++++++++---------
 arch/x86/kernel/cpu/resctrl/monitor.c     | 10 +++---
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..9f45207a6c74 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -61,6 +61,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 	     struct rdt_domain *d)
 {
 	struct resctrl_staged_config *cfg;
+	u32 closid = data->rdtgrp->closid;
 	struct rdt_resource *r = s->res;
 	unsigned long bw_val;
 
@@ -72,6 +73,12 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 
 	if (!bw_validate(data->buf, &bw_val, r))
 		return -EINVAL;
+
+	if (is_mba_sc(r)) {
+		d->mbps_val[closid] = bw_val;
+		return 0;
+	}
+
 	cfg->new_ctrl = bw_val;
 	cfg->have_new_ctrl = true;
 
@@ -261,14 +268,13 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
 
 static bool apply_config(struct rdt_hw_domain *hw_dom,
 			 struct resctrl_staged_config *cfg, u32 idx,
-			 cpumask_var_t cpu_mask, bool mba_sc)
+			 cpumask_var_t cpu_mask)
 {
 	struct rdt_domain *dom = &hw_dom->d_resctrl;
-	u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;
 
-	if (cfg->new_ctrl != dc[idx]) {
+	if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
 		cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
-		dc[idx] = cfg->new_ctrl;
+		hw_dom->ctrl_val[idx] = cfg->new_ctrl;
 
 		return true;
 	}
@@ -284,14 +290,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	enum resctrl_conf_type t;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	bool mba_sc;
 	int cpu;
 	u32 idx;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
 
-	mba_sc = is_mba_sc(r);
 	msr_param.res = NULL;
 	list_for_each_entry(d, &r->domains, list) {
 		hw_dom = resctrl_to_arch_dom(d);
@@ -301,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 				continue;
 
 			idx = get_config_index(closid, t);
-			if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
+			if (!apply_config(hw_dom, cfg, idx, cpu_mask))
 				continue;
 
 			if (!msr_param.res) {
@@ -315,11 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 		}
 	}
 
-	/*
-	 * Avoid writing the control msr with control values when
-	 * MBA software controller is enabled
-	 */
-	if (cpumask_empty(cpu_mask) || mba_sc)
+	if (cpumask_empty(cpu_mask))
 		goto done;
 	cpu = get_cpu();
 	/* Update resource control msr on this CPU if it's in cpu_mask. */
@@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
+
+		/*
+		 * Writes to mba_sc resources update the software controller,
+		 * not the control msr.
+		 */
+		if (is_mba_sc(r))
+			continue;
+
 		ret = resctrl_arch_update_domains(r, rdtgrp->closid);
 		if (ret)
 			goto out;
@@ -433,9 +441,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	u32 idx = get_config_index(closid, type);
 
-	if (!is_mba_sc(r))
-		return hw_dom->ctrl_val[idx];
-	return hw_dom->mbps_val[idx];
+	return hw_dom->ctrl_val[idx];
 }
 
 static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
@@ -450,8 +456,12 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
 		if (sep)
 			seq_puts(s, ";");
 
-		ctrl_val = resctrl_arch_get_config(r, dom, closid,
-						   schema->conf_type);
+		if (is_mba_sc(r))
+			ctrl_val = dom->mbps_val[closid];
+		else
+			ctrl_val = resctrl_arch_get_config(r, dom, closid,
+							   schema->conf_type);
+
 		seq_printf(s, r->format_str, dom->id, max_data_width,
 			   ctrl_val);
 		sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 37af1790337f..66c2667584dc 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -447,13 +447,11 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
 	cur_bw = pmbm_data->prev_bw;
-	user_bw = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
+	user_bw = dom_mba->mbps_val[closid];
 	delta_bw = pmbm_data->delta_bw;
-	/*
-	 * resctrl_arch_get_config() chooses the mbps/ctrl value to return
-	 * based on is_mba_sc(). For now, reach into the hw_dom.
-	 */
-	cur_msr_val = hw_dom_mba->ctrl_val[closid];
+
+	/* MBA monitor resource doesn't support CDP */
+	cur_msr_val = resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE);
 
 	/*
 	 * For Ctrl groups read data from child monitor groups.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (8 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:27   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

The resctrl arch code provides a second configuration array mbps_val[]
for the MBA software controller.

Since resctrl switched over to allocating and freeing its own array
when needed, nothing uses the arch code version.

Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Fixed spelling mistake
 * Capitalisation
---
 arch/x86/kernel/cpu/resctrl/core.c     | 20 ++++----------------
 arch/x86/kernel/cpu/resctrl/internal.h |  4 +---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1dd8428df008..feaf2fafa3c6 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -397,7 +397,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 	return NULL;
 }
 
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
+void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	int i;
@@ -406,18 +406,14 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
 	 * Initialize the Control MSRs to having no control.
 	 * For Cache Allocation: Set all bits in cbm
 	 * For Memory Allocation: Set b/w requested to 100%
-	 * and the bandwidth in MBps to U32_MAX
 	 */
-	for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
+	for (i = 0; i < hw_res->num_closid; i++, dc++)
 		*dc = r->default_ctrl;
-		*dm = MBA_MAX_MBPS;
-	}
 }
 
 void domain_free(struct rdt_hw_domain *hw_dom)
 {
 	kfree(hw_dom->ctrl_val);
-	kfree(hw_dom->mbps_val);
 	kfree(hw_dom);
 }
 
@@ -426,23 +422,15 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	struct msr_param m;
-	u32 *dc, *dm;
+	u32 *dc;
 
 	dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
 			   GFP_KERNEL);
 	if (!dc)
 		return -ENOMEM;
 
-	dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
-			   GFP_KERNEL);
-	if (!dm) {
-		kfree(dc);
-		return -ENOMEM;
-	}
-
 	hw_dom->ctrl_val = dc;
-	hw_dom->mbps_val = dm;
-	setup_default_ctrlval(r, dc, dm);
+	setup_default_ctrlval(r, dc);
 
 	m.low = 0;
 	m.high = hw_res->num_closid;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a7e2cbce29d5..796e13a0e8dc 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -308,14 +308,12 @@ struct mbm_state {
  *			  a resource
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
- * @mbps_val:	When mba_sc is enabled, this holds the bandwidth in MBps
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
 struct rdt_hw_domain {
 	struct rdt_domain		d_resctrl;
 	u32				*ctrl_val;
-	u32				*mbps_val;
 };
 
 static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
@@ -529,7 +527,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
 void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
+void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
 u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 9d402bc8bdff..52a7accbff8b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1966,7 +1966,7 @@ static int set_mba_sc(bool mba_sc)
 	r->membw.mba_sc = mba_sc;
 	list_for_each_entry(d, &r->domains, list) {
 		hw_dom = resctrl_to_arch_dom(d);
-		setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
+		setup_default_ctrlval(r, hw_dom->ctrl_val);
 	}
 
 	if (is_mba_sc(r))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (9 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

set_mba_sc() is called by rdt_enable_ctx() during mount()
and rdt_kill_sb(). It currently re-initialises the arch code's control
value array.

These values are already set to their default when the domain is created,
and when rdt_kill_sb() is called, (via reset_all_ctrls()). set_mba_sc()s
extra call to setup_default_ctrlval() isn't needed as the values are
already at their defaults due to the creation of the domain, or reset
during umount().

Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 52a7accbff8b..069c209be1d5 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1956,18 +1956,12 @@ static void mba_sc_destroy(struct rdt_resource *r)
 static int set_mba_sc(bool mba_sc)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
-	struct rdt_hw_domain *hw_dom;
-	struct rdt_domain *d;
 
 	if (!is_mbm_enabled() || !is_mba_linear() ||
 	    mba_sc == is_mba_sc(r))
 		return -EINVAL;
 
 	r->membw.mba_sc = mba_sc;
-	list_for_each_entry(d, &r->domains, list) {
-		hw_dom = resctrl_to_arch_dom(d);
-		setup_default_ctrlval(r, hw_dom->ctrl_val);
-	}
 
 	if (is_mba_sc(r))
 		return mba_sc_allocate(r);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (10 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-07  6:13   ` tan.shaopeng
  2021-10-15 22:28   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
                   ` (12 subsequent siblings)
  24 siblings, 2 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

To determine whether the mba_MBps option to resctrl should be supported,
resctrl tests the boot cpus' x86_vendor.

This isn't portable, and needs abstracting behind a helper so this check
can be part of the filesystem code that moves to /fs/.

Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
on this system. An 'alloc_capable' test is added so that support for the
controls isn't implied by the 'delay_linear' property, which is always
true for MPAM.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Capitalisation
 * Added MPAM example in commit message
 * Fixed supports_mba_mbps() logic error in rdt_parse_param()
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 069c209be1d5..1207271cce23 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1948,17 +1948,26 @@ static void mba_sc_destroy(struct rdt_resource *r)
 }
 
 /*
- * Enable or disable the MBA software controller
- * which helps user specify bandwidth in MBps.
  * MBA software controller is supported only if
  * MBM is supported and MBA is in linear scale.
  */
+static bool supports_mba_mbps(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
+	return (is_mbm_enabled() &&
+		r->alloc_capable && is_mba_linear());
+}
+
+/*
+ * Enable or disable the MBA software controller
+ * which helps user specify bandwidth in MBps.
+ */
 static int set_mba_sc(bool mba_sc)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
 
-	if (!is_mbm_enabled() || !is_mba_linear() ||
-	    mba_sc == is_mba_sc(r))
+	if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
 		return -EINVAL;
 
 	r->membw.mba_sc = mba_sc;
@@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
 		ctx->enable_cdpl2 = true;
 		return 0;
 	case Opt_mba_mbps:
-		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		if (!supports_mba_mbps())
 			return -EINVAL;
 		ctx->enable_mba_mbps = true;
 		return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (11 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:28   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

update_mba_bw() calculates a new control value for the MBA resource
based on the user provided mbps_val and the current measured
bandwidth. Some control values need remapping by delay_bw_map().

It does this by calling wrmsrl() directly. This needs splitting
up to be done by an architecture specific helper, so that the
remainder can eventually be moved to /fs/.

Add resctrl_arch_update_one() to apply one configuration value
to the provided resource and domain. This avoids the staging
and cross-calling that is only needed with changes made by
user-space. delay_bw_map() moves to be part of the arch code,
to maintain the 'percentage control' view of MBA resources
in resctrl.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Capitalisation
---
 arch/x86/kernel/cpu/resctrl/core.c        |  2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 21 +++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h    |  1 -
 arch/x86/kernel/cpu/resctrl/monitor.c     | 13 ++++---------
 include/linux/resctrl.h                   |  8 ++++++++
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index feaf2fafa3c6..583fb41db06d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -296,7 +296,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
  * that can be written to QOS_MSRs.
  * There are currently no SKUs which support non linear delay values.
  */
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
 {
 	if (r->membw.delay_linear)
 		return MAX_MBA_BW - bw;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 9f45207a6c74..25baacd331e0 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
 	return false;
 }
 
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	u32 idx = get_config_index(closid, t);
+	struct msr_param msr_param;
+
+	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+		return -EINVAL;
+
+	hw_dom->ctrl_val[idx] = cfg_val;
+
+	msr_param.res = r;
+	msr_param.low = idx;
+	msr_param.high = idx + 1;
+
+	rdt_ctrl_update(&msr_param);
+
+	return 0;
+}
+
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 {
 	struct resctrl_staged_config *cfg;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 796e13a0e8dc..1b07e49564cf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -528,7 +528,6 @@ void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
 void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 66c2667584dc..6c8226987dd6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -420,10 +420,8 @@ void mon_event_count(void *info)
  */
 static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 {
-	u32 closid, rmid, cur_msr, cur_msr_val, new_msr_val;
+	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
-	struct rdt_hw_resource *hw_r_mba;
-	struct rdt_hw_domain *hw_dom_mba;
 	u32 cur_bw, delta_bw, user_bw;
 	struct rdt_resource *r_mba;
 	struct rdt_domain *dom_mba;
@@ -433,8 +431,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	if (!is_mbm_local_enabled())
 		return;
 
-	hw_r_mba = &rdt_resources_all[RDT_RESOURCE_MBA];
-	r_mba = &hw_r_mba->r_resctrl;
+	r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
 	closid = rgrp->closid;
 	rmid = rgrp->mon.rmid;
 	pmbm_data = &dom_mbm->mbm_local[rmid];
@@ -444,7 +442,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 		pr_warn_once("Failure to get domain for MBA update\n");
 		return;
 	}
-	hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
 	cur_bw = pmbm_data->prev_bw;
 	user_bw = dom_mba->mbps_val[closid];
@@ -486,9 +483,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 		return;
 	}
 
-	cur_msr = hw_r_mba->msr_base + closid;
-	wrmsrl(cur_msr, delay_bw_map(new_msr_val, r_mba));
-	hw_dom_mba->ctrl_val[closid] = new_msr_val;
+	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
 
 	/*
 	 * Delta values are updated dynamically package wise for each
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 355660d46612..af202c891ba7 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -196,6 +196,14 @@ struct resctrl_schema {
 /* The number of closid supported by this resource regardless of CDP */
 u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+
+/*
+ * Update the ctrl_val and apply this config right now.
+ * Must be called on one of the domain's CPUs.
+ */
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val);
+
 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (12 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:28   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
second. It reads the hardware register, calculates the bandwidth and
updates m->prev_bw_msr which is used to hold the previous hardware register
value.

Operating directly on hardware register values makes it difficult to make
this code architecture independent, so that it can be moved to /fs/,
making the mba_sc feature something resctrl supports with no additional
support from the architecture.
Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
register using __mon_event_count().

Change mbm_bw_count() to use the current chunks value from
__mon_event_count() to calculate bandwidth. This means it no longer
operates on hardware register values.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * This patch was rewritten
---
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++---------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1b07e49564cf..0a5721e1cc07 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -289,7 +289,7 @@ struct rftype {
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr:	Value of IA32_QM_CTR for this RMID last time we read it
- * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
+ * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
@@ -297,7 +297,7 @@ struct rftype {
 struct mbm_state {
 	u64	chunks;
 	u64	prev_msr;
-	u64	prev_bw_msr;
+	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
 	bool	delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6c8226987dd6..a1232462db14 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -315,7 +315,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 
 	if (rr->first) {
 		memset(m, 0, sizeof(struct mbm_state));
-		m->prev_bw_msr = m->prev_msr = tval;
+		m->prev_msr = tval;
 		return 0;
 	}
 
@@ -329,27 +329,32 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 }
 
 /*
+ * mbm_bw_count() - Update bw count from values previously read by
+ *		    __mon_event_count().
+ * @rmid:	The rmid used to identify the cached mbm_state.
+ * @rr:		The struct rmid_read populated by __mon_event_count().
+ *
  * Supporting function to calculate the memory bandwidth
- * and delta bandwidth in MBps.
+ * and delta bandwidth in MBps. The chunks value previously read by
+ * __mon_event_count() is compared with the chunks value from the previous
+ * invocation. This must be called oncer per second to maintain values in MBps.
  */
 static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 tval, cur_bw, chunks;
+	u64 cur_bw, chunks, cur_chunks;
 
-	tval = __rmid_read(rmid, rr->evtid);
-	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
-		return;
+	cur_chunks = rr->val;
+	chunks = cur_chunks - m->prev_bw_chunks;
+	m->prev_bw_chunks = cur_chunks;
 
-	chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
-	cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
+	cur_bw = (chunks * hw_res->mon_scale) >> 20;
 
 	if (m->delta_comp)
 		m->delta_bw = abs(cur_bw - m->prev_bw);
 	m->delta_comp = false;
 	m->prev_bw = cur_bw;
-	m->prev_bw_msr = tval;
 }
 
 /*
@@ -509,6 +514,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 	rr.first = false;
 	rr.r = r;
 	rr.d = d;
+	rr.val = 0;
 
 	/*
 	 * This is protected from concurrent reads from user
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (13 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:29   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. For bandwidth
counters the resctrl filesystem uses this to calculate the number of
bytes ever seen.

MPAM's scaling of counters can be changed at runtime, reducing the
resolution but increasing the range. When this is changed the prev_msr
values need to be converted by the architecture code.

Add an array for per-rmid private storage. The prev_msr and chunks
values will move here to allow resctrl_arch_rmid_read() to always
return the number of bytes read by this counter without assistance
from the filesystem. The values are moved in later patches when
the overflow and correction calls are moved into
resctrl_arch_rmid_read().

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 34 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 583fb41db06d..f527489a607a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -413,6 +413,8 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
 
 void domain_free(struct rdt_hw_domain *hw_dom)
 {
+	kfree(hw_dom->arch_mbm_total);
+	kfree(hw_dom->arch_mbm_local);
 	kfree(hw_dom->ctrl_val);
 	kfree(hw_dom);
 }
@@ -438,6 +440,33 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
+/**
+ * arch_domain_mbm_alloc() - Allocate arch private storage for the mbm counters
+ * @num_rmid:	The size of the mbm counter array
+ * @hw_dom:	The domain that owns the allocated the arrays
+ *
+ * On error, call domain_free()
+ */
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+{
+	size_t tsize;
+
+	if (is_mbm_total_enabled()) {
+		tsize = sizeof(*hw_dom->arch_mbm_total);
+		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
+		if (!hw_dom->arch_mbm_total)
+			return -ENOMEM;
+	}
+	if (is_mbm_local_enabled()) {
+		tsize = sizeof(*hw_dom->arch_mbm_local);
+		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
+		if (!hw_dom->arch_mbm_local)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -487,6 +516,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 		return;
 	}
 
+	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+		domain_free(hw_dom);
+		return;
+	}
+
 	list_add_tail(&d->list, add_pos);
 
 	err = resctrl_online_domain(r, d);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0a5721e1cc07..aaae900a8ef3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -303,17 +303,30 @@ struct mbm_state {
 	bool	delta_comp;
 };
 
+/**
+ * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
+ *			   return value.
+ * @prev_msr	Value of IA32_QM_CTR for this RMID last time we read it
+ */
+struct arch_mbm_state {
+	u64	prev_msr;
+};
+
 /**
  * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
  *			  a resource
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
+ * @arch_mbm_total:	arch private state for MBM total bandwidth
+ * @arch_mbm_local:	arch private state for MBM local bandwidth
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
 struct rdt_hw_domain {
 	struct rdt_domain		d_resctrl;
 	u32				*ctrl_val;
+	struct arch_mbm_state		*arch_mbm_total;
+	struct arch_mbm_state		*arch_mbm_local;
 };
 
 static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (14 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-07  6:16   ` tan.shaopeng
  2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Signed-off-by: James Morse <james.morse@arm.com>

---
Changes since v1:
 * Aded WARN_ON_ONCE() for a case that should never happen.
---
 arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++---------
 arch/x86/kernel/cpu/resctrl/monitor.c  | 35 +++++++++++++++++++++++++-
 include/linux/resctrl.h                | 23 +++++++++++++++++
 3 files changed, 62 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index aaae900a8ef3..f3f31315a907 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -22,14 +22,6 @@
 
 #define L2_QOS_CDP_ENABLE		0x01ULL
 
-/*
- * Event IDs are used to program IA32_QM_EVTSEL before reading event
- * counter from IA32_QM_CTR
- */
-#define QOS_L3_OCCUP_EVENT_ID		0x01
-#define QOS_L3_MBM_TOTAL_EVENT_ID	0x02
-#define QOS_L3_MBM_LOCAL_EVENT_ID	0x03
-
 #define CQM_LIMBOCHECK_INTERVAL	1000
 
 #define MBM_CNTR_WIDTH_BASE		24
@@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
-	u32			evtid;
+	enum resctrl_event_id	evtid;
 	char			*name;
 	struct list_head	list;
 };
@@ -90,9 +82,9 @@ struct mon_evt {
 union mon_data_bits {
 	void *priv;
 	struct {
-		unsigned int rid	: 10;
-		unsigned int evtid	: 8;
-		unsigned int domid	: 14;
+		unsigned int rid		: 10;
+		enum resctrl_event_id evtid	: 8;
+		unsigned int domid		: 14;
 	} u;
 };
 
@@ -100,7 +92,7 @@ struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
 	struct rdt_domain	*d;
-	int			evtid;
+	enum resctrl_event_id	evtid;
 	bool			first;
 	u64			val;
 };
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index a1232462db14..35eef49954b0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -137,7 +137,37 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
 	return entry;
 }
 
-static u64 __rmid_read(u32 rmid, u32 eventid)
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+						 u32 rmid,
+						 enum resctrl_event_id eventid)
+{
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		return NULL;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return &hw_dom->arch_mbm_total[rmid];
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return &hw_dom->arch_mbm_local[rmid];
+	}
+
+	/* Never expect to get here */
+	WARN_ON_ONCE(1);
+
+	return NULL;
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+			     u32 rmid, enum resctrl_event_id eventid)
+{
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	struct arch_mbm_state *am;
+
+	am = get_arch_mbm_state(hw_dom, rmid, eventid);
+	if (am)
+		memset(am, 0, sizeof(*am));
+}
+
+static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
 {
 	u64 val;
 
@@ -291,6 +321,9 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 	struct mbm_state *m;
 	u64 chunks, tval;
 
+	if (rr->first)
+		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
+
 	tval = __rmid_read(rmid, rr->evtid);
 	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
 		return tval;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index af202c891ba7..04f30d80fc67 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -32,6 +32,16 @@ enum resctrl_conf_type {
 
 #define CDP_NUM_TYPES	(CDP_DATA + 1)
 
+/*
+ * Event IDs, the values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
+enum resctrl_event_id {
+	QOS_L3_OCCUP_EVENT_ID		= 0x01,
+	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
+	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
+};
+
 /**
  * struct resctrl_staged_config - parsed configuration to be applied
  * @new_ctrl:		new ctrl value to be loaded
@@ -209,4 +219,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
 
+/**
+ * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
+ *			       and eventid.
+ * @r:		The domain's resource.
+ * @d:		The rmid's domain.
+ * @rmid:	The rmid whose counter values should be reset.
+ * @eventid:	The eventid whose counter values should be reset.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+			     u32 rmid, enum resctrl_event_id eventid);
+
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (15 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:29   ` Reinette Chatre
  2021-10-19 23:20   ` Babu Moger
  2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
                   ` (7 subsequent siblings)
  24 siblings, 2 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

__rmid_read() selects the specified eventid and returns the counter
value from the msr. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Return EINVAL from the impossible case in __mon_event_count() instead
   of an x86 hardware specific value.
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
 include/linux/resctrl.h                   |  1 +
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 25baacd331e0..c8ca7184c6d9 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 
 	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
 
-	if (rr.val & RMID_VAL_ERROR)
+	if (rr.err == -EIO)
 		seq_puts(m, "Error\n");
-	else if (rr.val & RMID_VAL_UNAVAIL)
+	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
 	else
 		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f3f31315a907..eca7793d3342 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -40,7 +40,6 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
-
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
 	bool				enable_cdpl2;
@@ -94,6 +93,7 @@ struct rmid_read {
 	struct rdt_domain	*d;
 	enum resctrl_event_id	evtid;
 	bool			first;
+	int			err;
 	u64			val;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 35eef49954b0..cf35eaf01042 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(am, 0, sizeof(*am));
 }
 
-static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
-	u64 val;
+	u64 msr_val;
 
 	/*
 	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -180,14 +180,24 @@ static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
 	 * are error bits.
 	 */
 	wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
-	rdmsrl(MSR_IA32_QM_CTR, val);
+	rdmsrl(MSR_IA32_QM_CTR, msr_val);
 
-	return val;
+	if (msr_val & RMID_VAL_ERROR)
+		return -EIO;
+	if (msr_val & RMID_VAL_UNAVAIL)
+		return -EINVAL;
+
+	*val = msr_val;
+
+	return 0;
 }
 
 static bool rmid_dirty(struct rmid_entry *entry)
 {
-	u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
+	u64 val = 0;
+
+	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
+		return true;
 
 	return val >= resctrl_cqm_threshold;
 }
@@ -259,8 +269,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r;
 	struct rdt_domain *d;
-	int cpu;
-	u64 val;
+	int cpu, err;
+	u64 val = 0;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
@@ -268,8 +278,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
 		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
-			if (val <= resctrl_cqm_threshold)
+			err = resctrl_arch_rmid_read(entry->rmid,
+						     QOS_L3_OCCUP_EVENT_ID,
+						     &val);
+			if (err || val <= resctrl_cqm_threshold)
 				continue;
 		}
 
@@ -319,15 +331,15 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m;
-	u64 chunks, tval;
+	u64 chunks, tval = 0;
 
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
 
-	tval = __rmid_read(rmid, rr->evtid);
-	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
-		return tval;
-	}
+	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+	if (rr->err)
+		return rr->err;
+
 	switch (rr->evtid) {
 	case QOS_L3_OCCUP_EVENT_ID:
 		rr->val += tval;
@@ -343,7 +355,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 		 * Code would never reach here because an invalid
 		 * event id would fail the __rmid_read.
 		 */
-		return RMID_VAL_ERROR;
+		return -EINVAL;
 	}
 
 	if (rr->first) {
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 04f30d80fc67..01bdd8be590b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -218,6 +218,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (16 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-15 22:30   ` Reinette Chatre
  2021-10-01 16:02 ` [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() " James Morse
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a hardware register. Currently the function
returns the mbm values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the registers.

Pass the resource and domain to resctrl_arch_rmid_read(). This make
rmid_dirty() to big, instead merge it with its only caller, the name is
kept as a local variable.

Signed-off-by: James Morse <james.morse@arm.com>
---
This is all a little noisy for __mon_event_count(), as the switch
statement work is now before the resctrl_arch_rmid_read() call.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 31 +++++++++++++++------------
 include/linux/resctrl.h               | 15 ++++++++++++-
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index cf35eaf01042..f833bc01aeac 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,10 +167,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(am, 0, sizeof(*am));
 }
 
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
+			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
 	u64 msr_val;
 
+	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+		return -EINVAL;
+
 	/*
 	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
 	 * with a valid event code for supported resource type and the bits
@@ -192,16 +196,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
 	return 0;
 }
 
-static bool rmid_dirty(struct rmid_entry *entry)
-{
-	u64 val = 0;
-
-	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
-		return true;
-
-	return val >= resctrl_cqm_threshold;
-}
-
 /*
  * Check the RMIDs that are marked as busy for this domain. If the
  * reported LLC occupancy is below the threshold clear the busy bit and
@@ -213,6 +207,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 	struct rmid_entry *entry;
 	struct rdt_resource *r;
 	u32 crmid = 1, nrmid;
+	bool rmid_dirty;
+	u64 val = 0;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
@@ -228,7 +224,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(nrmid);
-		if (force_free || !rmid_dirty(entry)) {
+
+		if (resctrl_arch_rmid_read(r, d, entry->rmid,
+					   QOS_L3_OCCUP_EVENT_ID, &val))
+			rmid_dirty = true;
+		else
+			rmid_dirty = (val >= resctrl_cqm_threshold);
+
+		if (force_free || !rmid_dirty) {
 			clear_bit(entry->rmid, d->rmid_busy_llc);
 			if (!--entry->busy) {
 				rmid_limbo_count--;
@@ -278,7 +281,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
 		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			err = resctrl_arch_rmid_read(entry->rmid,
+			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
 			if (err || val <= resctrl_cqm_threshold)
@@ -336,7 +339,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
 
-	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
 	if (rr->err)
 		return rr->err;
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 01bdd8be590b..4215a0564206 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -218,7 +218,20 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
+
+/**
+ * resctrl_arch_rmid_read() - Read the eventid counter correpsonding to rmid
+ *			      for this resource and domain.
+ * @r:			The resource that the counter should be read from.
+ * @d:			The domain that the counter should be read from.
+ * @rmid:		The rmid of the counter to read.
+ * @eventid:		The eventid to read, e.g. L3 occupancy.
+ * @val:		The result of the counter read in chunks.
+ *
+ * Returns 0 on success, or -EIO, -EINVAL etc on error.
+ */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+			   u32 rmid, enum resctrl_event_id eventid, u64 *val);
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (17 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:02 ` [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() " James Morse
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. When reading a bandwidth
counter, mbm_overflow_count() must be used to correct for any possible
overflow.

mbm_overflow_count() is architecture specific, its behaviour should
be part of resctrl_arch_rmid_read().

Move the mbm_overflow_count() calls into resctrl_arch_rmid_read().
This allows the resctrl filesystems's prev_msr to be removed in
favour of the architecture private version.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 --
 arch/x86/kernel/cpu/resctrl/monitor.c  | 35 +++++++++++++++-----------
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index eca7793d3342..2d0a6bba4a01 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -280,7 +280,6 @@ struct rftype {
 /**
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr:	Value of IA32_QM_CTR for this RMID last time we read it
  * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
@@ -288,7 +287,6 @@ struct rftype {
  */
 struct mbm_state {
 	u64	chunks;
-	u64	prev_msr;
 	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f833bc01aeac..4e8bb86ce4a5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -167,9 +167,20 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(am, 0, sizeof(*am));
 }
 
+static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+{
+	u64 shift = 64 - width, chunks;
+
+	chunks = (cur_msr << shift) - (prev_msr << shift);
+	return chunks >>= shift;
+}
+
 int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	struct arch_mbm_state *am;
 	u64 msr_val;
 
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
@@ -191,7 +202,13 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	if (msr_val & RMID_VAL_UNAVAIL)
 		return -EINVAL;
 
-	*val = msr_val;
+	am = get_arch_mbm_state(hw_dom, rmid, eventid);
+	if (am) {
+		*val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+		am->prev_msr = msr_val;
+	} else {
+		*val = msr_val;
+	}
 
 	return 0;
 }
@@ -322,19 +339,10 @@ void free_rmid(u32 rmid)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
-{
-	u64 shift = 64 - width, chunks;
-
-	chunks = (cur_msr << shift) - (prev_msr << shift);
-	return chunks >>= shift;
-}
-
 static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m;
-	u64 chunks, tval = 0;
+	u64 tval = 0;
 
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
@@ -363,13 +371,10 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 
 	if (rr->first) {
 		memset(m, 0, sizeof(struct mbm_state));
-		m->prev_msr = tval;
 		return 0;
 	}
 
-	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
-	m->chunks += chunks;
-	m->prev_msr = tval;
+	m->chunks += tval;
 
 	rr->val += get_corrected_mbm_count(rmid, m->chunks);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (18 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() " James Morse
@ 2021-10-01 16:02 ` James Morse
  2021-10-01 16:03 ` [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. When reading a bandwidth
counter, get_corrected_mbm_count() must be used to correct the
value read.

get_corrected_mbm_count() is architecture specific, this work should be
done in resctrl_arch_rmid_read().

Move the function calls. This allows the resctrl filesystems's chunks
value to be removed in favour of the architecture private version.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h | 4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2d0a6bba4a01..65b472d6b146 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -279,14 +279,12 @@ struct rftype {
 
 /**
  * struct mbm_state - status for each MBM counter in each domain
- * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
  */
 struct mbm_state {
-	u64	chunks;
 	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
@@ -296,9 +294,11 @@ struct mbm_state {
 /**
  * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
  *			   return value.
+ * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr	Value of IA32_QM_CTR for this RMID last time we read it
  */
 struct arch_mbm_state {
+	u64	chunks;
 	u64	prev_msr;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 4e8bb86ce4a5..eb2502645433 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -204,7 +204,9 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 
 	am = get_arch_mbm_state(hw_dom, rmid, eventid);
 	if (am) {
-		*val = mbm_overflow_count(am->prev_msr, msr_val, hw_res->mbm_width);
+		am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+						 hw_res->mbm_width);
+		*val = get_corrected_mbm_count(rmid, am->chunks);
 		am->prev_msr = msr_val;
 	} else {
 		*val = msr_val;
@@ -374,9 +376,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
 		return 0;
 	}
 
-	m->chunks += tval;
-
-	rr->val += get_corrected_mbm_count(rmid, m->chunks);
+	rr->val += tval;
 
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (19 preceding siblings ...)
  2021-10-01 16:02 ` [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() " James Morse
@ 2021-10-01 16:03 ` James Morse
  2021-10-01 16:03 ` [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_cqm_threshold is stored in a hardware specific chunk size,
but exposed to user-space as bytes.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the callers have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 -
 arch/x86/kernel/cpu/resctrl/monitor.c  | 34 ++++++++++++--------------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 ++-----
 include/linux/resctrl.h                |  2 ++
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 65b472d6b146..4569b4588185 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -97,7 +97,6 @@ struct rmid_read {
 	u64			val;
 };
 
-extern unsigned int resctrl_cqm_threshold;
 extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eb2502645433..d0c3a3db5f2a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -37,8 +37,8 @@ static LIST_HEAD(rmid_free_lru);
  * @rmid_limbo_count     count of currently unused but (potentially)
  *     dirty RMIDs.
  *     This counts RMIDs that no one is currently using but that
- *     may have a occupancy value > intel_cqm_threshold. User can change
- *     the threshold occupancy value.
+ *     may have a occupancy value > resctrl_rmid_realloc_threshold. User can
+ *     change the threshold occupancy value.
  */
 static unsigned int rmid_limbo_count;
 
@@ -59,10 +59,10 @@ bool rdt_mon_capable;
 unsigned int rdt_mon_features;
 
 /*
- * This is the threshold cache occupancy at which we will consider an
+ * This is the threshold cache occupancy in bytes at which we will consider an
  * RMID available for re-allocation.
  */
-unsigned int resctrl_cqm_threshold;
+unsigned int resctrl_rmid_realloc_threshold;
 
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
@@ -223,14 +223,13 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
  */
 void __check_limbo(struct rdt_domain *d, bool force_free)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rmid_entry *entry;
-	struct rdt_resource *r;
 	u32 crmid = 1, nrmid;
 	bool rmid_dirty;
 	u64 val = 0;
 
-	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
 	/*
 	 * Skip RMID 0 and start from RMID 1 and check all the RMIDs that
 	 * are marked as busy for occupancy < threshold. If the occupancy
@@ -245,10 +244,12 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 		entry = __rmid_entry(nrmid);
 
 		if (resctrl_arch_rmid_read(r, d, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, &val))
+					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
-		else
-			rmid_dirty = (val >= resctrl_cqm_threshold);
+		} else {
+			val *= hw_res->mon_scale;
+			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+		}
 
 		if (force_free || !rmid_dirty) {
 			clear_bit(entry->rmid, d->rmid_busy_llc);
@@ -289,13 +290,12 @@ int alloc_rmid(void)
 
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
-	struct rdt_resource *r;
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_domain *d;
 	int cpu, err;
 	u64 val = 0;
 
-	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
 	entry->busy = 0;
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
@@ -303,7 +303,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
-			if (err || val <= resctrl_cqm_threshold)
+			val *= hw_res->mon_scale;
+			if (err || val <= resctrl_rmid_realloc_threshold)
 				continue;
 		}
 
@@ -757,10 +758,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
-
-	/* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
-	resctrl_cqm_threshold /= hw_res->mon_scale;
+	resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid;
 
 	ret = dom_data_init(r);
 	if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1207271cce23..bf0d13a5ce1f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of,
 static int max_threshold_occ_show(struct kernfs_open_file *of,
 				  struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
-	seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale);
+	seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);
 
 	return 0;
 }
@@ -1055,7 +1052,6 @@ static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
 static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 				       char *buf, size_t nbytes, loff_t off)
 {
-	struct rdt_hw_resource *hw_res;
 	unsigned int bytes;
 	int ret;
 
@@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 	if (bytes > (boot_cpu_data.x86_cache_size * 1024))
 		return -EINVAL;
 
-	hw_res = resctrl_to_arch_res(of->kn->parent->priv);
-	resctrl_cqm_threshold = bytes / hw_res->mon_scale;
+	resctrl_rmid_realloc_threshold = bytes;
 
 	return nbytes;
 }
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4215a0564206..8d297d014a16 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -246,4 +246,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid);
 
+extern unsigned int resctrl_rmid_realloc_threshold;
+
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (20 preceding siblings ...)
  2021-10-01 16:03 ` [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
@ 2021-10-01 16:03 ` James Morse
  2021-10-01 16:03 ` [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c  | 9 +++++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
 include/linux/resctrl.h                | 1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index d0c3a3db5f2a..becc7eb0cb7e 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -64,6 +64,11 @@ unsigned int rdt_mon_features;
  */
 unsigned int resctrl_rmid_realloc_threshold;
 
+/*
+ * This is the maximum value for the reallocation threshold, in bytes.
+ */
+unsigned int resctrl_rmid_realloc_limit;
+
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
 /*
@@ -739,9 +744,9 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-	unsigned int cl_size = boot_cpu_data.x86_cache_size;
 	int ret;
 
+	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
 	r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
@@ -758,7 +763,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid;
+	resctrl_rmid_realloc_threshold = resctrl_rmid_realloc_limit / r->num_rmid;
 
 	ret = dom_data_init(r);
 	if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index bf0d13a5ce1f..f93b52ab6580 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 	if (ret)
 		return ret;
 
-	if (bytes > (boot_cpu_data.x86_cache_size * 1024))
+	if (bytes > resctrl_rmid_realloc_limit)
 		return -EINVAL;
 
 	resctrl_rmid_realloc_threshold = bytes;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8d297d014a16..5b1452bdbd7e 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -247,5 +247,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid);
 
 extern unsigned int resctrl_rmid_realloc_threshold;
+extern unsigned int resctrl_rmid_realloc_limit;
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (21 preceding siblings ...)
  2021-10-01 16:03 ` [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
@ 2021-10-01 16:03 ` James Morse
  2021-10-13  2:09 ` [PATCH v2 00/23] " tan.shaopeng
  2021-10-19 23:17 ` Babu Moger
  24 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-01 16:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang, tan.shaopeng

resctrl_arch_rmid_read() returns a value in chunks, as read from the
hardware. This needs scaling to bytes by mon_scale, as provided by
the architecture code.

Now that resctrl_arch_rmid_read() performs the overflow and corrections
itself, it may as well return a value in bytes directly. This allows
the accesses to the architecture specific 'hw' structure to be removed.

Move the mon_scale conversion into resctrl_arch_rmid_read().
mbm_bw_count() is updated to calculate bandwidth from bytes.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  6 ++----
 arch/x86/kernel/cpu/resctrl/internal.h    |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c     | 22 +++++++++-------------
 include/linux/resctrl.h                   |  2 +-
 4 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index c8ca7184c6d9..44cc49098549 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -549,7 +549,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
-	struct rdt_hw_resource *hw_res;
 	u32 resid, evtid, domid;
 	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
@@ -569,8 +568,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	domid = md.u.domid;
 	evtid = md.u.evtid;
 
-	hw_res = &rdt_resources_all[resid];
-	r = &hw_res->r_resctrl;
+	r = &rdt_resources_all[resid].r_resctrl;
 	d = rdt_find_domain(r, domid, NULL);
 	if (IS_ERR_OR_NULL(d)) {
 		ret = -ENOENT;
@@ -584,7 +582,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
 	else
-		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
+		seq_printf(m, "%llu\n", rr.val);
 
 out:
 	rdtgroup_kn_unlock(of->kn);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4569b4588185..3bf4b32fc531 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -278,13 +278,13 @@ struct rftype {
 
 /**
  * struct mbm_state - status for each MBM counter in each domain
- * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
+ * @prev_bw_bytes: Previous bytes value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
  */
 struct mbm_state {
-	u64	prev_bw_chunks;
+	u64	prev_bw_bytes;
 	u32	prev_bw;
 	u32	delta_bw;
 	bool	delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index becc7eb0cb7e..14bc843043da 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -186,7 +186,7 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	struct arch_mbm_state *am;
-	u64 msr_val;
+	u64 msr_val, chunks;
 
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
 		return -EINVAL;
@@ -211,10 +211,11 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	if (am) {
 		am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
 						 hw_res->mbm_width);
-		*val = get_corrected_mbm_count(rmid, am->chunks);
+		chunks = get_corrected_mbm_count(rmid, am->chunks);
+		*val = chunks * hw_res->mon_scale;
 		am->prev_msr = msr_val;
 	} else {
-		*val = msr_val;
+		*val = msr_val * hw_res->mon_scale;
 	}
 
 	return 0;
@@ -229,7 +230,6 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 void __check_limbo(struct rdt_domain *d, bool force_free)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rmid_entry *entry;
 	u32 crmid = 1, nrmid;
 	bool rmid_dirty;
@@ -252,7 +252,6 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
 		} else {
-			val *= hw_res->mon_scale;
 			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
 		}
 
@@ -296,7 +295,6 @@ int alloc_rmid(void)
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_domain *d;
 	int cpu, err;
 	u64 val = 0;
@@ -308,7 +306,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
-			val *= hw_res->mon_scale;
 			if (err || val <= resctrl_rmid_realloc_threshold)
 				continue;
 		}
@@ -400,15 +397,14 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
  */
 static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 cur_bw, chunks, cur_chunks;
+	u64 cur_bw, bytes, cur_bytes;
 
-	cur_chunks = rr->val;
-	chunks = cur_chunks - m->prev_bw_chunks;
-	m->prev_bw_chunks = cur_chunks;
+	cur_bytes = rr->val;
+	bytes = cur_bytes - m->prev_bw_bytes;
+	m->prev_bw_bytes = cur_bytes;
 
-	cur_bw = (chunks * hw_res->mon_scale) >> 20;
+	cur_bw = bytes >> 20;
 
 	if (m->delta_comp)
 		m->delta_bw = abs(cur_bw - m->prev_bw);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5b1452bdbd7e..e096474cd433 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -226,7 +226,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * @d:			The domain that the counter should be read from.
  * @rmid:		The rmid of the counter to read.
  * @eventid:		The eventid to read, e.g. L3 occupancy.
- * @val:		The result of the counter read in chunks.
+ * @val:		The result of the counter read in bytes.
  *
  * Returns 0 on success, or -EIO, -EINVAL etc on error.
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* RE: [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
@ 2021-10-07  6:13   ` tan.shaopeng
  2021-10-27 16:50     ` James Morse
  2021-10-15 22:28   ` Reinette Chatre
  1 sibling, 1 reply; 61+ messages in thread
From: tan.shaopeng @ 2021-10-07  6:13 UTC (permalink / raw)
  To: 'James Morse', x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

> To determine whether the mba_MBps option to resctrl should be supported,
> resctrl tests the boot cpus' x86_vendor.
> 
> This isn't portable, and needs abstracting behind a helper so this check can be
> part of the filesystem code that moves to /fs/.
> 
> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
> on this system. An 'alloc_capable' test is added so that support for the controls
> isn't implied by the 'delay_linear' property, which is always true for MPAM.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>

> ---
> Changes since v1:
>  * Capitalisation
>  * Added MPAM example in commit message
>  * Fixed supports_mba_mbps() logic error in rdt_parse_param()
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 069c209be1d5..1207271cce23 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1948,17 +1948,26 @@ static void mba_sc_destroy(struct rdt_resource
> *r)  }
> 
>  /*
> - * Enable or disable the MBA software controller
> - * which helps user specify bandwidth in MBps.
>   * MBA software controller is supported only if
>   * MBM is supported and MBA is in linear scale.
>   */
> +static bool supports_mba_mbps(void)
> +{
> +	struct rdt_resource *r =
> +&rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> +
> +	return (is_mbm_enabled() &&
> +		r->alloc_capable && is_mba_linear()); }
> +
> +/*
> + * Enable or disable the MBA software controller
> + * which helps user specify bandwidth in MBps.
> + */
>  static int set_mba_sc(bool mba_sc)
>  {
>  	struct rdt_resource *r =
> &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> 
> -	if (!is_mbm_enabled() || !is_mba_linear() ||
> -	    mba_sc == is_mba_sc(r))
> +	if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
>  		return -EINVAL;
> 
>  	r->membw.mba_sc = mba_sc;
> @@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc,
> struct fs_parameter *param)
>  		ctx->enable_cdpl2 = true;
>  		return 0;
>  	case Opt_mba_mbps:
> -		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +		if (!supports_mba_mbps())
>  			return -EINVAL;
>  		ctx->enable_mba_mbps = true;
>  		return 0;
> --
> 2.30.2

Regards,
Shaopeng Tan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset
  2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
@ 2021-10-07  6:16   ` tan.shaopeng
  0 siblings, 0 replies; 61+ messages in thread
From: tan.shaopeng @ 2021-10-07  6:16 UTC (permalink / raw)
  To: 'James Morse', x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

> To abstract the rmid counters into a helper that returns the number of bytes
> counted, architecture specific per-rmid state is needed.
> 
> It needs to be possible to reset this hidden state, as the values may outlive the
> life of an rmid, or the mount time of the filesystem.
> 
> mon_event_read() is called with first = true when an rmid is first allocated in
> mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from
> __mon_event_count()'s rr->first check.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>

> ---
> Changes since v1:
>  * Aded WARN_ON_ONCE() for a case that should never happen.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++---------
> arch/x86/kernel/cpu/resctrl/monitor.c  | 35
> +++++++++++++++++++++++++-
>  include/linux/resctrl.h                | 23 +++++++++++++++++
>  3 files changed, 62 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> b/arch/x86/kernel/cpu/resctrl/internal.h
> index aaae900a8ef3..f3f31315a907 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -22,14 +22,6 @@
> 
>  #define L2_QOS_CDP_ENABLE		0x01ULL
> 
> -/*
> - * Event IDs are used to program IA32_QM_EVTSEL before reading event
> - * counter from IA32_QM_CTR
> - */
> -#define QOS_L3_OCCUP_EVENT_ID		0x01
> -#define QOS_L3_MBM_TOTAL_EVENT_ID	0x02
> -#define QOS_L3_MBM_LOCAL_EVENT_ID	0x03
> -
>  #define CQM_LIMBOCHECK_INTERVAL	1000
> 
>  #define MBM_CNTR_WIDTH_BASE		24
> @@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
> -	u32			evtid;
> +	enum resctrl_event_id	evtid;
>  	char			*name;
>  	struct list_head	list;
>  };
> @@ -90,9 +82,9 @@ struct mon_evt {
>  union mon_data_bits {
>  	void *priv;
>  	struct {
> -		unsigned int rid	: 10;
> -		unsigned int evtid	: 8;
> -		unsigned int domid	: 14;
> +		unsigned int rid		: 10;
> +		enum resctrl_event_id evtid	: 8;
> +		unsigned int domid		: 14;
>  	} u;
>  };
> 
> @@ -100,7 +92,7 @@ struct rmid_read {
>  	struct rdtgroup		*rgrp;
>  	struct rdt_resource	*r;
>  	struct rdt_domain	*d;
> -	int			evtid;
> +	enum resctrl_event_id	evtid;
>  	bool			first;
>  	u64			val;
>  };
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index a1232462db14..35eef49954b0 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -137,7 +137,37 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
>  	return entry;
>  }
> 
> -static u64 __rmid_read(u32 rmid, u32 eventid)
> +static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain
> *hw_dom,
> +						 u32 rmid,
> +						 enum resctrl_event_id
> eventid)
> +{
> +	switch (eventid) {
> +	case QOS_L3_OCCUP_EVENT_ID:
> +		return NULL;
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		return &hw_dom->arch_mbm_total[rmid];
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		return &hw_dom->arch_mbm_local[rmid];
> +	}
> +
> +	/* Never expect to get here */
> +	WARN_ON_ONCE(1);
> +
> +	return NULL;
> +}
> +
> +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> +			     u32 rmid, enum resctrl_event_id eventid) {
> +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +	struct arch_mbm_state *am;
> +
> +	am = get_arch_mbm_state(hw_dom, rmid, eventid);
> +	if (am)
> +		memset(am, 0, sizeof(*am));
> +}
> +
> +static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
>  {
>  	u64 val;
> 
> @@ -291,6 +321,9 @@ static u64 __mon_event_count(u32 rmid, struct
> rmid_read *rr)
>  	struct mbm_state *m;
>  	u64 chunks, tval;
> 
> +	if (rr->first)
> +		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
> +
>  	tval = __rmid_read(rmid, rr->evtid);
>  	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
>  		return tval;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index
> af202c891ba7..04f30d80fc67 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -32,6 +32,16 @@ enum resctrl_conf_type {
> 
>  #define CDP_NUM_TYPES	(CDP_DATA + 1)
> 
> +/*
> + * Event IDs, the values match those used to program IA32_QM_EVTSEL
> +before
> + * reading IA32_QM_CTR on RDT systems.
> + */
> +enum resctrl_event_id {
> +	QOS_L3_OCCUP_EVENT_ID		= 0x01,
> +	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
> +	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
> +};
> +
>  /**
>   * struct resctrl_staged_config - parsed configuration to be applied
>   * @new_ctrl:		new ctrl value to be loaded
> @@ -209,4 +219,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r,
> struct rdt_domain *d,  int resctrl_online_domain(struct rdt_resource *r, struct
> rdt_domain *d);  void resctrl_offline_domain(struct rdt_resource *r, struct
> rdt_domain *d);
> 
> +/**
> + * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> + *			       and eventid.
> + * @r:		The domain's resource.
> + * @d:		The rmid's domain.
> + * @rmid:	The rmid whose counter values should be reset.
> + * @eventid:	The eventid whose counter values should be reset.
> + *
> + * This can be called from any CPU.
> + */
> +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> +			     u32 rmid, enum resctrl_event_id eventid);
> +
>  #endif /* _RESCTRL_H */
> --
> 2.30.2

Regards,
Shaopeng Tan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (22 preceding siblings ...)
  2021-10-01 16:03 ` [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
@ 2021-10-13  2:09 ` tan.shaopeng
  2021-10-19 23:17 ` Babu Moger
  24 siblings, 0 replies; 61+ messages in thread
From: tan.shaopeng @ 2021-10-13  2:09 UTC (permalink / raw)
  To: 'James Morse', x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

> Hello!
> 
> Patches 1&2 have been posted independently in case they are wanted as fixes.
> 
> The major change in this version is when the mba_mbps[] array is allocated.
> 
> ---
> The aim of this series is to insert a split between the parts of the monitor code
> that the architecture must implement, and those that are part of the resctrl
> filesystem. The eventual aim is to move all filesystem parts out to live in
> /fs/resctrl, so that resctrl can be wired up for MPAM.
> 
> What's MPAM? See the cover letter of a previous series. [1]
> 
> The series adds domain online/offline callbacks to allow the filesystem to
> manage some of its structures itself, then moves all the 'mba_sc' behaviour to
> be part of the filesystem.
> This means another architecture doesn't need to provide an mbps_val array.
> As its all software, the resctrl filesystem should be able to do this without any
> help from the architecture code.
> 
> Finally __rmid_read() is refactored to be the API call that the architecture
> provides to read a counter value. All the hardware specific overflow detection,
> scaling and value correction should occur behind this helper.
> 
> 
> This series is based on v5.15-rc3, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/resctrl_monitors_in_bytes/v2
> 
> [0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
> mpam/resctrl_merge_cdp/v7 [1]
> https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com
> /
> 
> [v1]
> https://lore.kernel.org/lkml/20210729223610.29373-1-james.morse@arm.com
> /

I have tested these patches on Intel(R) Xeon(R) Gold 6254 CPU with resctrl selftest.
It is no problem.

Shaopeng Tan

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled
  2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
@ 2021-10-15 22:19   ` Reinette Chatre
  0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> rdt_resources_all[] used to have extra entries for L2CODE/L2DATA.
> These were hidden from resctrl by the alloc_enabled value.
> 
> Now that the L2/L2CODE/L2DATA resources have been merged together,
> alloc_enabled doesn't mean anything, it always has the same value as
> alloc_capable which indicates CAT is supported by this cache.

This impacts all resources, not just cache, so perhaps it should read 
"... which indicates allocation is supported by this resource." or 
equivalent.

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work
  2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
@ 2021-10-15 22:19   ` Reinette Chatre
  2021-10-22 18:30     ` James Morse
  2021-10-19 23:19   ` Babu Moger
  1 sibling, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:

...

> @@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>   		return;
>   	}
>   
> -	if (r->mon_capable && domain_setup_mon_state(r, d)) {
> -		kfree(hw_dom->ctrl_val);
> -		kfree(hw_dom->mbps_val);
> -		kfree(hw_dom);
> -		return;
> -	}
> -
>   	list_add_tail(&d->list, add_pos);
>   
> -	/*
> -	 * If resctrl is mounted, add
> -	 * per domain monitor data directories.
> -	 */
> -	if (static_branch_unlikely(&rdt_mon_enable_key))
> -		mkdir_mondata_subdir_allrdtgrp(r, d);
> +	err = resctrl_online_domain(r, d);
> +	if (err) {
> +		list_del(&d->list);
> +		kfree(hw_dom->ctrl_val);
> +		kfree(hw_dom->mbps_val);
> +		kfree(d);

Even though this goes away in next patch I think this should rather be 
kfree(hw_dom).

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
@ 2021-10-15 22:26   ` Reinette Chatre
  2021-10-22 18:30     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:26 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> To support resctrl's MBA software controller, the architecture must provide
> a second configuration array to hold the mbps_val from user-space.
> 
> This complicates the interface between the architecture code.

This complicates the interface between the architecture code and ... ?

> 
> Make the filesystem parts of resctrl create an array for the mba_sc
> values when is_mba_sc() is set to true. The software controller
> can be changed to use this, allowing the architecture code to only
> consider the values configured in hardware.

This changes significantly more than just where the mbps_val array is 
hosted. It also changes how the life cycle of this array is managed. 
Previously it followed the domain, whether mba_sc was enabled or not. 
Now that it depends on mba_sc it is managed quite differently.

Could the changelog be upfront about this change and its motivation? 
Stating this would make this much easier to review and also the later 
patches where the original mbps_val initialization code is removed 
without replacement.

> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Added missing error handling to mba_sc_domain_allocate() in
>     domain_setup_mon_state()
>   * Added comment about mba_sc_domain_allocate() races
>   * Squashed out struct resctrl_mba_sc
>   * Moved mount time alloc/free calls to set_mba_sc().
>   * Removed mount check in resctrl_offline_domain()
>   * Reword commit message
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  1 -
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 67 ++++++++++++++++++++++++++
>   include/linux/resctrl.h                |  6 +++
>   3 files changed, 73 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e12b55f815bf..a7e2cbce29d5 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -36,7 +36,6 @@
>   #define MBM_OVERFLOW_INTERVAL		1000
>   #define MAX_MBA_BW			100u
>   #define MBA_IS_LINEAR			0x4
> -#define MBA_MAX_MBPS			U32_MAX
>   #define MAX_MBA_BW_AMD			0x800
>   #define MBM_CNTR_WIDTH_OFFSET_AMD	20
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 38670bb810cb..9d402bc8bdff 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1889,6 +1889,64 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
>   		l3_qos_cfg_update(&hw_res->cdp_enabled);
>   }
>   
> +static int mba_sc_domain_allocate(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	u32 num_closid = resctrl_arch_get_num_closid(r);
> +	int cpu = cpumask_any(&d->cpu_mask);
> +	int i;
> +
> +	/*
> +	 * d->mbps_val is allocated by a call to this function in set_mba_sc(),
> +	 * and domain_setup_mon_state(). Both calls are guarded by is_mba_sc(),
> +	 * which can only return true while the filesystem is mounted. The
> +	 * two calls are prevented from racing as rdt_get_tree() takes the
> +	 * cpuhp read lock before calling rdt_enable_ctx(ctx), which prevents
> +	 * it running concurrently with resctrl_online_domain().
> +	 */
> +	lockdep_assert_cpus_held();
> +
> +	d->mbps_val = kcalloc_node(num_closid, sizeof(*d->mbps_val),
> +				   GFP_KERNEL, cpu_to_node(cpu));
> +	if (!d->mbps_val)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_closid; i++)
> +		d->mbps_val[i] = MBA_MAX_MBPS;
> +
> +	return 0;
> +}
> +
> +static int mba_sc_allocate(struct rdt_resource *r)
> +{
> +	struct rdt_domain *d;
> +	int ret;
> +

Please initialize ret.

> +	list_for_each_entry(d, &r->domains, list) {
> +		ret = mba_sc_domain_allocate(r, d);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static void mba_sc_domain_destroy(struct rdt_resource *r,
> +				  struct rdt_domain *d)
> +{
> +	kfree(d->mbps_val);
> +	d->mbps_val = NULL;
> +}
> +
> +static void mba_sc_destroy(struct rdt_resource *r)
> +{
> +	struct rdt_domain *d;
> +
> +	lockdep_assert_cpus_held();
> +
> +	list_for_each_entry(d, &r->domains, list)
> +		mba_sc_domain_destroy(r, d);
> +}
> +
>   /*
>    * Enable or disable the MBA software controller
>    * which helps user specify bandwidth in MBps.
> @@ -1911,6 +1969,10 @@ static int set_mba_sc(bool mba_sc)
>   		setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
>   	}
>   
> +	if (is_mba_sc(r))
> +		return mba_sc_allocate(r);
> +
> +	mba_sc_destroy(r);
>   	return 0;
>   }
>   
> @@ -3259,6 +3321,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>   		__check_limbo(d, true);
>   		cancel_delayed_work(&d->cqm_limbo);
>   	}
> +	if (is_mba_sc(r))
> +		mba_sc_domain_destroy(r, d);
>   	bitmap_free(d->rmid_busy_llc);
>   	kfree(d->mbm_total);
>   	kfree(d->mbm_local);
> @@ -3291,6 +3355,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>   		}
>   	}
>   
> +	if (is_mba_sc(r))
> +		return mba_sc_domain_allocate(r, d);
> +
>   	return 0;
>   }
>   

Could this be done symmetrically? That is, allocate in 
resctrl_online_domain() and free in resctrl_offline_domain().

> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 5d283bdd6162..355660d46612 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,
>   
>   #endif
>   
> +/* max value for struct resctrl_mba_sc's mbps_val */
> +#define MBA_MAX_MBPS   U32_MAX

struct resctrl_mba_sc?

> +
>   /**
>    * enum resctrl_conf_type - The type of configuration.
>    * @CDP_NONE:	No prioritisation, both code and data are controlled or monitored.
> @@ -53,6 +56,8 @@ struct resctrl_staged_config {
>    * @cqm_work_cpu:	worker CPU for CQM h/w counters
>    * @plr:		pseudo-locked region (if any) associated with domain
>    * @staged_config:	parsed configuration to be applied
> + * @mbps_val:		Array of user specified control values for mba_sc,
> + *			indexed by closid

Could this inherit some of the useful kerneldoc associated with the 
mbps_val being replaced? That is, it exists when mba_sc is enabled and 
contains bandwidth values in MBps.

>    */
>   struct rdt_domain {
>   	struct list_head		list;
> @@ -67,6 +72,7 @@ struct rdt_domain {
>   	int				cqm_work_cpu;
>   	struct pseudo_lock_region	*plr;
>   	struct resctrl_staged_config	staged_config[CDP_NUM_TYPES];
> +	u32				*mbps_val;
>   };
>   
>   /**
> 

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
@ 2021-10-15 22:26   ` Reinette Chatre
  2021-10-27 16:49     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:26 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> Updates to resctrl's software controller follow the same path as
> other configuration updates, but they don't modify the hardware state.
> rdtgroup_schemata_write() uses parse_line() and the resource's
> ctrlval_parse function to stage the configuration.

parse_ctrlval ?

> resctrl_arch_update_domains() then updates the mbps_val[] array
> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
> call that would update hardware.
> 
> This complicates the interface between resctrl's filesystem parts
> and architecture specific code. It should be possible for mba_sc
> to be completely implemented by the filesystem parts of resctrl. This
> would allow it to work on a second architecture with no additional code.
> 
> Change parse_bw() to write the configuration value directly to the
> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()

mpbs_val[] array?

> to skip the call to resctrl_arch_update_domains(), meaning all the
> mba_sc specific code in resctrl_arch_update_domains() can be removed.
> On the read-side, show_doms() and update_mba_bw() are changed to read
> the mba_sc[] array from the domain structure. With this,

mbps_val[] ?

Should rdtgroup_size_show() also get a similar snippet?

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val
  2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
@ 2021-10-15 22:27   ` Reinette Chatre
  2021-10-27 16:49     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:27 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> The resctrl arch code provides a second configuration array mbps_val[]
> for the MBA software controller.
> 
> Since resctrl switched over to allocating and freeing its own array
> when needed, nothing uses the arch code version.

With the previous changes this is true, that this array is no longer 
used. Even so, the code removed in this patch is not just the usage of 
the array but also its management ... especially how and when it is 
reset. While the array is no longer used I think it is still important 
to ensure that all the array management is handled in the new mpbs_val 
array. Perhaps just help the reader by stating that the values of the 
new array never needs to be reset since it is always recreated while the 
previous array stuck around during umount/mount.

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
  2021-10-07  6:13   ` tan.shaopeng
@ 2021-10-15 22:28   ` Reinette Chatre
  1 sibling, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:28 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> To determine whether the mba_MBps option to resctrl should be supported,
> resctrl tests the boot cpus' x86_vendor.

cpus -> CPUs

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly
  2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
@ 2021-10-15 22:28   ` Reinette Chatre
  2021-10-27 16:49     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:28 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 9f45207a6c74..25baacd331e0 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
>   	return false;
>   }
>   
> +int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> +{
> +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +	u32 idx = get_config_index(closid, t);
> +	struct msr_param msr_param;
> +
> +	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
> +		return -EINVAL;
> +
> +	hw_dom->ctrl_val[idx] = cfg_val;
> +
> +	msr_param.res = r;
> +	msr_param.low = idx;
> +	msr_param.high = idx + 1;
> +
> +	rdt_ctrl_update(&msr_param);
> +

rdt_ctrl_update() will take its parameters and recompute the domain that 
is already available here ... seems to take a few steps back and then do 
the needed. Could msr_update be called directly here instead?

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
@ 2021-10-15 22:28   ` Reinette Chatre
  2021-10-27 16:50     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:28 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
> second. It reads the hardware register, calculates the bandwidth and
> updates m->prev_bw_msr which is used to hold the previous hardware register
> value.
> 
> Operating directly on hardware register values makes it difficult to make
> this code architecture independent, so that it can be moved to /fs/,
> making the mba_sc feature something resctrl supports with no additional
> support from the architecture.
> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
> register using __mon_event_count().

Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local 
bandwidth when mba_sc is enabled") may explain how the code ended up the 
way it is.

> Change mbm_bw_count() to use the current chunks value from
> __mon_event_count() to calculate bandwidth. This means it no longer
> operates on hardware register values.

ok ... so could the patch just do this as it is stated here? The way it 
is implemented is very complicated and hard (for me) to verify the 
correctness (more below).

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * This patch was rewritten
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
>   arch/x86/kernel/cpu/resctrl/monitor.c  | 24 +++++++++++++++---------
>   2 files changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 1b07e49564cf..0a5721e1cc07 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -289,7 +289,7 @@ struct rftype {
>    * struct mbm_state - status for each MBM counter in each domain
>    * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
>    * @prev_msr:	Value of IA32_QM_CTR for this RMID last time we read it
> - * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
> + * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
>    * @prev_bw:	The most recent bandwidth in MBps
>    * @delta_bw:	Difference between the current and previous bandwidth
>    * @delta_comp:	Indicates whether to compute the delta_bw
> @@ -297,7 +297,7 @@ struct rftype {
>   struct mbm_state {
>   	u64	chunks;
>   	u64	prev_msr;
> -	u64	prev_bw_msr;
> +	u64	prev_bw_chunks;
>   	u32	prev_bw;
>   	u32	delta_bw;
>   	bool	delta_comp;
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 6c8226987dd6..a1232462db14 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -315,7 +315,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   
>   	if (rr->first) {
>   		memset(m, 0, sizeof(struct mbm_state));
> -		m->prev_bw_msr = m->prev_msr = tval;
> +		m->prev_msr = tval;
>   		return 0;
>   	}
>   
> @@ -329,27 +329,32 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   }
>   
>   /*
> + * mbm_bw_count() - Update bw count from values previously read by
> + *		    __mon_event_count().
> + * @rmid:	The rmid used to identify the cached mbm_state.
> + * @rr:		The struct rmid_read populated by __mon_event_count().
> + *
>    * Supporting function to calculate the memory bandwidth
> - * and delta bandwidth in MBps.
> + * and delta bandwidth in MBps. The chunks value previously read by
> + * __mon_event_count() is compared with the chunks value from the previous
> + * invocation. This must be called oncer per second to maintain values in MBps.
>    */
>   static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>   {
>   	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>   	struct mbm_state *m = &rr->d->mbm_local[rmid];
> -	u64 tval, cur_bw, chunks;
> +	u64 cur_bw, chunks, cur_chunks;
>   
> -	tval = __rmid_read(rmid, rr->evtid);
> -	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
> -		return;
> +	cur_chunks = rr->val;
> +	chunks = cur_chunks - m->prev_bw_chunks;
> +	m->prev_bw_chunks = cur_chunks;
>   
> -	chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
> -	cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
> +	cur_bw = (chunks * hw_res->mon_scale) >> 20;

I find this quite confusing. What if a new m->prev_chunks is introduced 
instead and initialized in __mon_event_count() to the value of chunks, 
and then here in mbm_bw_count it could just retrieve it (chunks = 
m->prev_chunks).

>   
>   	if (m->delta_comp)
>   		m->delta_bw = abs(cur_bw - m->prev_bw);
>   	m->delta_comp = false;
>   	m->prev_bw = cur_bw;
> -	m->prev_bw_msr = tval;
>   }
>   
>   /*
> @@ -509,6 +514,7 @@ static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
>   	rr.first = false;
>   	rr.r = r;
>   	rr.d = d;
> +	rr.val = 0;
>   
>   	/*
>   	 * This is protected from concurrent reads from user
> 

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
@ 2021-10-15 22:29   ` Reinette Chatre
  0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:29 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> resctrl_arch_rmid_read() is intended as the function that an
> architecture agnostic resctrl filesystem driver can use to
> read a value in bytes from a counter. Currently the function returns

resctrl_arch_rmid_read() does not exist at this point and is also not 
introduced in this patch.

> the mbm values in chunks directly from hardware. For bandwidth

Could you please replace mbm with MBM throughout the series?

> counters the resctrl filesystem uses this to calculate the number of
> bytes ever seen.
> 
> MPAM's scaling of counters can be changed at runtime, reducing the
> resolution but increasing the range. When this is changed the prev_msr
> values need to be converted by the architecture code.
> 
> Add an array for per-rmid private storage. The prev_msr and chunks
> values will move here to allow resctrl_arch_rmid_read() to always
> return the number of bytes read by this counter without assistance
> from the filesystem. The values are moved in later patches when
> the overflow and correction calls are moved into
> resctrl_arch_rmid_read().
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/core.c     | 34 ++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++++++
>   2 files changed, 47 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 583fb41db06d..f527489a607a 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -413,6 +413,8 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
>   
>   void domain_free(struct rdt_hw_domain *hw_dom)
>   {
> +	kfree(hw_dom->arch_mbm_total);
> +	kfree(hw_dom->arch_mbm_local);
>   	kfree(hw_dom->ctrl_val);
>   	kfree(hw_dom);
>   }
> @@ -438,6 +440,33 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>   	return 0;
>   }
>   
> +/**
> + * arch_domain_mbm_alloc() - Allocate arch private storage for the mbm counters

mbm -> MBM in comments also please .

> + * @num_rmid:	The size of the mbm counter array
> + * @hw_dom:	The domain that owns the allocated the arrays

the allocated the -> the allocated

> + *
> + * On error, call domain_free()

When following kerneldoc please use "Return:" to indicate the return 
section. It will help to run scripts/kernel-doc on these changes to make 
sure no new kernel-doc issues are introduced.

> + */
> +static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> +{
> +	size_t tsize;
> +
> +	if (is_mbm_total_enabled()) {
> +		tsize = sizeof(*hw_dom->arch_mbm_total);
> +		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
> +		if (!hw_dom->arch_mbm_total)
> +			return -ENOMEM;
> +	}
> +	if (is_mbm_local_enabled()) {
> +		tsize = sizeof(*hw_dom->arch_mbm_local);
> +		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
> +		if (!hw_dom->arch_mbm_local)
> +			return -ENOMEM;
> +	}
> +

Proper cleanup on error should be done in this function. Please do not 
use domain_free() as cleanup for what can be done in this function. I 
see domain_free() as the higher level error control ... like when a 
wrapper function calls arch_domain_mbm_alloc() and then something else 
fails after that ... then domain_free() would be that higher level error 
handling.

> +	return 0;
> +}
> +
>   /*
>    * domain_add_cpu - Add a cpu to a resource's domain list.
>    *
> @@ -487,6 +516,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>   		return;
>   	}
>   
> +	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> +		domain_free(hw_dom);
> +		return;
> +	}
> +
>   	list_add_tail(&d->list, add_pos);
>   
>   	err = resctrl_online_domain(r, d);
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 0a5721e1cc07..aaae900a8ef3 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -303,17 +303,30 @@ struct mbm_state {
>   	bool	delta_comp;
>   };
>   
> +/**
> + * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
> + *			   return value.
> + * @prev_msr	Value of IA32_QM_CTR for this RMID last time we read it

Missing a ":"?
Please do not use "we".

This is a description of a struct ... can the doc elaborate on what 
"this RMID" means?

> + */
> +struct arch_mbm_state {
> +	u64	prev_msr;
> +};
> +
>   /**
>    * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
>    *			  a resource
>    * @d_resctrl:	Properties exposed to the resctrl file system
>    * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
> + * @arch_mbm_total:	arch private state for MBM total bandwidth
> + * @arch_mbm_local:	arch private state for MBM local bandwidth
>    *
>    * Members of this structure are accessed via helpers that provide abstraction.
>    */
>   struct rdt_hw_domain {
>   	struct rdt_domain		d_resctrl;
>   	u32				*ctrl_val;
> +	struct arch_mbm_state		*arch_mbm_total;
> +	struct arch_mbm_state		*arch_mbm_local;
>   };
>   
>   static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
> 

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
@ 2021-10-15 22:29   ` Reinette Chatre
  2021-10-19 23:20   ` Babu Moger
  1 sibling, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:29 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> __rmid_read() selects the specified eventid and returns the counter
> value from the msr. The error handling is architecture specific, and

msr -> MSR

> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
> 
> Error handling should be handled by architecture specific code, as
> a different architecture may have different requirements. MPAM's
> counters can report that they are 'not ready', requiring a second
> read after a short delay. This should be hidden from resctrl.
> 
> Make __rmid_read() the architecture specific function for reading
> a counter. Rename it resctrl_arch_rmid_read() and move the error
> handling into it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>   * Return EINVAL from the impossible case in __mon_event_count() instead
>     of an x86 hardware specific value.
> ---
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>   arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>   include/linux/resctrl.h                   |  1 +
>   4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 25baacd331e0..c8ca7184c6d9 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   
>   	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>   
> -	if (rr.val & RMID_VAL_ERROR)
> +	if (rr.err == -EIO)
>   		seq_puts(m, "Error\n");
> -	else if (rr.val & RMID_VAL_UNAVAIL)
> +	else if (rr.err == -EINVAL)
>   		seq_puts(m, "Unavailable\n");
>   	else
>   		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index f3f31315a907..eca7793d3342 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -40,7 +40,6 @@
>    */
>   #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>   
> -
>   struct rdt_fs_context {
>   	struct kernfs_fs_context	kfc;
>   	bool				enable_cdpl2;

Stray snippet here.

> @@ -94,6 +93,7 @@ struct rmid_read {
>   	struct rdt_domain	*d;
>   	enum resctrl_event_id	evtid;
>   	bool			first;
> +	int			err;
>   	u64			val;
>   };
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 35eef49954b0..cf35eaf01042 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -167,9 +167,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
>   		memset(am, 0, sizeof(*am));
>   }
>   
> -static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
> +int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>   {
> -	u64 val;
> +	u64 msr_val;
>   
>   	/*
>   	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
> @@ -180,14 +180,24 @@ static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
>   	 * are error bits.
>   	 */
>   	wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
> -	rdmsrl(MSR_IA32_QM_CTR, val);
> +	rdmsrl(MSR_IA32_QM_CTR, msr_val);
>   
> -	return val;
> +	if (msr_val & RMID_VAL_ERROR)
> +		return -EIO;
> +	if (msr_val & RMID_VAL_UNAVAIL)
> +		return -EINVAL;
> +
> +	*val = msr_val;
> +
> +	return 0;
>   }
>   
>   static bool rmid_dirty(struct rmid_entry *entry)
>   {
> -	u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
> +	u64 val = 0;
> +
> +	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
> +		return true;
>   
>   	return val >= resctrl_cqm_threshold;
>   }
> @@ -259,8 +269,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   {
>   	struct rdt_resource *r;
>   	struct rdt_domain *d;
> -	int cpu;
> -	u64 val;
> +	int cpu, err;
> +	u64 val = 0;
>   
>   	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>   
> @@ -268,8 +278,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   	cpu = get_cpu();
>   	list_for_each_entry(d, &r->domains, list) {
>   		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
> -			val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
> -			if (val <= resctrl_cqm_threshold)
> +			err = resctrl_arch_rmid_read(entry->rmid,
> +						     QOS_L3_OCCUP_EVENT_ID,
> +						     &val);
> +			if (err || val <= resctrl_cqm_threshold)
>   				continue;
>   		}
>   
> @@ -319,15 +331,15 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   {
>   	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>   	struct mbm_state *m;
> -	u64 chunks, tval;
> +	u64 chunks, tval = 0;
>   
>   	if (rr->first)
>   		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
>   
> -	tval = __rmid_read(rmid, rr->evtid);
> -	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
> -		return tval;
> -	}
> +	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
> +	if (rr->err)
> +		return rr->err;
> +
>   	switch (rr->evtid) {
>   	case QOS_L3_OCCUP_EVENT_ID:
>   		rr->val += tval;
> @@ -343,7 +355,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   		 * Code would never reach here because an invalid
>   		 * event id would fail the __rmid_read.
>   		 */
> -		return RMID_VAL_ERROR;
> +		return -EINVAL;
>   	}
>   

mon_event_count() takes action based on return value of 
__mon_event_count() that I do not think has been taken into account in 
this patch.


>   	if (rr->first) {
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 04f30d80fc67..01bdd8be590b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -218,6 +218,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
>   			    u32 closid, enum resctrl_conf_type type);
>   int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
>   void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
> +int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
>   
>   /**
>    * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> 

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
  2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
@ 2021-10-15 22:30   ` Reinette Chatre
  0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-15 22:30 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/1/2021 9:02 AM, James Morse wrote:
> resctrl_arch_rmid_read() is intended as the function that an
> architecture agnostic resctrl filesystem driver can use to
> read a value in bytes from a hardware register. Currently the function
> returns the mbm values in chunks directly from hardware.
> 
> To convert this to bytes, some correction and overflow calculations
> are needed. These depend on the resource and domain structures.
> Overflow detection requires the old chunks value. None of this
> is available to resctrl_arch_rmid_read(). MPAM requires the
> resource and domain structures to find the MMIO device that holds
> the registers.
> 
> Pass the resource and domain to resctrl_arch_rmid_read(). This make
> rmid_dirty() to big, instead merge it with its only caller, the name is

to big -> too big

> kept as a local variable.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> This is all a little noisy for __mon_event_count(), as the switch
> statement work is now before the resctrl_arch_rmid_read() call.
> ---
>   arch/x86/kernel/cpu/resctrl/monitor.c | 31 +++++++++++++++------------
>   include/linux/resctrl.h               | 15 ++++++++++++-
>   2 files changed, 31 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index cf35eaf01042..f833bc01aeac 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -167,10 +167,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
>   		memset(am, 0, sizeof(*am));
>   }
>   
> -int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
> +int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,

Please do not use tabs in the function parameters.

> +			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
>   {
>   	u64 msr_val;
>   
> +	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
> +		return -EINVAL;
> +
>   	/*
>   	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
>   	 * with a valid event code for supported resource type and the bits
> @@ -192,16 +196,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>   	return 0;
>   }
>   
> -static bool rmid_dirty(struct rmid_entry *entry)
> -{
> -	u64 val = 0;
> -
> -	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
> -		return true;
> -
> -	return val >= resctrl_cqm_threshold;
> -}
> -
>   /*
>    * Check the RMIDs that are marked as busy for this domain. If the
>    * reported LLC occupancy is below the threshold clear the busy bit and
> @@ -213,6 +207,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>   	struct rmid_entry *entry;
>   	struct rdt_resource *r;
>   	u32 crmid = 1, nrmid;
> +	bool rmid_dirty;
> +	u64 val = 0;
>   
>   	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>   
> @@ -228,7 +224,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>   			break;
>   
>   		entry = __rmid_entry(nrmid);
> -		if (force_free || !rmid_dirty(entry)) {
> +
> +		if (resctrl_arch_rmid_read(r, d, entry->rmid,
> +					   QOS_L3_OCCUP_EVENT_ID, &val))
> +			rmid_dirty = true;
> +		else
> +			rmid_dirty = (val >= resctrl_cqm_threshold);
> +
> +		if (force_free || !rmid_dirty) {
>   			clear_bit(entry->rmid, d->rmid_busy_llc);
>   			if (!--entry->busy) {
>   				rmid_limbo_count--;
> @@ -278,7 +281,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>   	cpu = get_cpu();
>   	list_for_each_entry(d, &r->domains, list) {
>   		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
> -			err = resctrl_arch_rmid_read(entry->rmid,
> +			err = resctrl_arch_rmid_read(r, d, entry->rmid,
>   						     QOS_L3_OCCUP_EVENT_ID,
>   						     &val);
>   			if (err || val <= resctrl_cqm_threshold)
> @@ -336,7 +339,7 @@ static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>   	if (rr->first)
>   		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
>   
> -	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
> +	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
>   	if (rr->err)
>   		return rr->err;
>   
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 01bdd8be590b..4215a0564206 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -218,7 +218,20 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
>   			    u32 closid, enum resctrl_conf_type type);
>   int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
>   void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
> -int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
> +
> +/**
> + * resctrl_arch_rmid_read() - Read the eventid counter correpsonding to rmid

correpsonding -> corresponding

> + *			      for this resource and domain.
> + * @r:			The resource that the counter should be read from.
> + * @d:			The domain that the counter should be read from.
> + * @rmid:		The rmid of the counter to read.
> + * @eventid:		The eventid to read, e.g. L3 occupancy.
> + * @val:		The result of the counter read in chunks.
> + *

"The" prefix can be removed from all descriptions to match style of 
other descriptions, also in description of resctrl_arch_rmid_read() below.

> + * Returns 0 on success, or -EIO, -EINVAL etc on error.

I do not think this is valid kernel doc for the return section.

> + */
> +int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
> +			   u32 rmid, enum resctrl_event_id eventid, u64 *val);
>   
>   /**
>    * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> 

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
  2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (23 preceding siblings ...)
  2021-10-13  2:09 ` [PATCH v2 00/23] " tan.shaopeng
@ 2021-10-19 23:17 ` Babu Moger
  24 siblings, 0 replies; 61+ messages in thread
From: Babu Moger @ 2021-10-19 23:17 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi James,
Thanks for the patches. Sanity tested on AMD box. Found one problem and
added few comments.
thanks
Babu

On 10/1/21 11:02 AM, James Morse wrote:
> Hello!
> 
> Patches 1&2 have been posted independently in case they are wanted as fixes.
> 
> The major change in this version is when the mba_mbps[] array is allocated.
> 
> ---
> The aim of this series is to insert a split between the parts of the monitor
> code that the architecture must implement, and those that are part of the
> resctrl filesystem. The eventual aim is to move all filesystem parts out
> to live in /fs/resctrl, so that resctrl can be wired up for MPAM.
> 
> What's MPAM? See the cover letter of a previous series. [1]
> 
> The series adds domain online/offline callbacks to allow the filesystem to
> manage some of its structures itself, then moves all the 'mba_sc' behaviour
> to be part of the filesystem.
> This means another architecture doesn't need to provide an mbps_val array.
> As its all software, the resctrl filesystem should be able to do this without
> any help from the architecture code.
> 
> Finally __rmid_read() is refactored to be the API call that the architecture
> provides to read a counter value. All the hardware specific overflow detection,
> scaling and value correction should occur behind this helper.
> 
> 
> This series is based on v5.15-rc3, and can be retrieved from:
> git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v2
> 
> [0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
> [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/
> 
> [v1] https://lore.kernel.org/lkml/20210729223610.29373-1-james.morse@arm.com/
> 
> 
> Thanks,
> 
> James Morse (23):
>   x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state()
>     fails
>   x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
>   x86/resctrl: Kill off alloc_enabled
>   x86/resctrl: Merge mon_capable and mon_enabled
>   x86/resctrl: Add domain online callback for resctrl work
>   x86/resctrl: Group struct rdt_hw_domain cleanup
>   x86/resctrl: Add domain offline callback for resctrl work
>   x86/resctrl: Create mba_sc configuration in the rdt_domain
>   x86/resctrl: Switch over to the resctrl mbps_val list
>   x86/resctrl: Remove architecture copy of mbps_val
>   x86/resctrl: Remove set_mba_sc()s control array re-initialisation
>   x86/resctrl: Abstract and use supports_mba_mbps()
>   x86/resctrl: Allow update_mba_bw() to update controls directly
>   x86/resctrl: Calculate bandwidth from the previous __mon_event_count()
>     chunks
>   x86/recstrl: Add per-rmid arch private storage for overflow and chunks
>   x86/recstrl: Allow per-rmid arch private storage to be reset
>   x86/resctrl: Abstract __rmid_read()
>   x86/resctrl: Pass the required parameters into
>     resctrl_arch_rmid_read()
>   x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
>   x86/resctrl: Move get_corrected_mbm_count() into
>     resctrl_arch_rmid_read()
>   x86/resctrl: Rename and change the units of resctrl_cqm_threshold
>   x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
>     boot_cpu_data
>   x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
> 
>  arch/x86/kernel/cpu/resctrl/core.c        | 116 ++++--------
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  75 +++++---
>  arch/x86/kernel/cpu/resctrl/internal.h    |  62 +++----
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 200 ++++++++++++--------
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   2 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 217 ++++++++++++++++++----
>  include/linux/resctrl.h                   |  60 +++++-
>  7 files changed, 480 insertions(+), 252 deletions(-)
> 

-- 
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled
  2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
@ 2021-10-19 23:18   ` Babu Moger
  2021-10-22 18:30     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Babu Moger @ 2021-10-19 23:18 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi James,

On 10/1/21 11:02 AM, James Morse wrote:
> mon_enabled and mon_capable are always set as a pair by
> rdt_get_mon_l3_config().
> 
> There is no point having two values.
> 
> Merge them together.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Removed stray cdp_capable changes.
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 1 -
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
>  include/linux/resctrl.h                | 2 --
>  4 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 53f3d275a98f..8828b5c1b6d2 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
>  	for_each_rdt_resource(r)					      \
>  		if (r->mon_capable)
>  
> -#define for_each_mon_enabled_rdt_resource(r)				      \
> -	for_each_rdt_resource(r)					      \
> -		if (r->mon_enabled)
> -
>  /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
>  union cpuid_0x10_1_eax {
>  	struct {
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c9f0f3d63f75..37af1790337f 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -717,7 +717,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> -	r->mon_enabled = true;
>  
>  	return 0;
>  }
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e327f8d1c8a3..e243c7d15b81 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1765,7 +1765,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>  			goto out_destroy;
>  	}
>  
> -	for_each_mon_enabled_rdt_resource(r) {
> +	for_each_mon_capable_rdt_resource(r) {
>  		fflags =  r->fflags | RF_MON_INFO;
>  		sprintf(name, "%s_MON", r->name);
>  		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
> @@ -2504,7 +2504,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
>  	struct rdtgroup *prgrp, *crgrp;
>  	char name[32];
>  
> -	if (!r->mon_enabled)
> +	if (!r->mon_capable)
>  		return;
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> @@ -2572,7 +2572,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>  	struct rdtgroup *prgrp, *crgrp;
>  	struct list_head *head;
>  
> -	if (!r->mon_enabled)
> +	if (!r->mon_capable)
>  		return;
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> @@ -2642,7 +2642,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
>  	 * Create the subdirectories for each domain. Note that all events
>  	 * in a domain like L3 are grouped into a resource whose domain is L3
>  	 */
> -	for_each_mon_enabled_rdt_resource(r) {
> +	for_each_mon_capable_rdt_resource(r) {
>  		ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
>  		if (ret)
>  			goto out_destroy;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 386ab3a41500..8180c539800d 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -130,7 +130,6 @@ struct resctrl_schema;
>  /**
>   * struct rdt_resource - attributes of a resctrl resource
>   * @rid:		The index of the resource
> - * @mon_enabled:	Is monitoring enabled for this feature
>   * @alloc_capable:	Is allocation available on this machine
>   * @mon_capable:	Is monitor feature available on this machine
>   * @num_rmid:		Number of RMIDs available
> @@ -149,7 +148,6 @@ struct resctrl_schema;
>   */
>  struct rdt_resource {
>  	int			rid;
> -	bool			mon_enabled;
>  	bool			alloc_capable;
>  	bool			mon_capable;

Also we should probably rename alloc_capable and mon_capable to
alloc_supported and mon_supported respectively. We dont have an option to
enable and disable these feature. If it is supported, it is always supported.

Thanks
Babu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work
  2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
  2021-10-15 22:19   ` Reinette Chatre
@ 2021-10-19 23:19   ` Babu Moger
  2021-10-22 18:30     ` James Morse
  1 sibling, 1 reply; 61+ messages in thread
From: Babu Moger @ 2021-10-19 23:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng



On 10/1/21 11:02 AM, James Morse wrote:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when CPU hotplug callbacks are triggered.
> 
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> allocate the memory.
> 
> Move domain_setup_mon_state(), the monitor subdir creation call and the
> mbm/limbo workers into a new resctrl_online_domain() call. These bits
> are not specific to the architecture. Grouping them in one function
> allows that code to be moved to /fs/ and re-used by another architecture.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Capitalisation
>  * Removed inline comment
>  * Added to the commit message
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     | 57 ++++------------------
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 -
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 ++++++++++++++++++++++++--
>  include/linux/resctrl.h                |  1 +
>  4 files changed, 69 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 2f87177f1f69..f1fa54de8136 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -443,42 +443,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>  	return 0;
>  }
>  
> -static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
> -{
> -	size_t tsize;
> -
> -	if (is_llc_occupancy_enabled()) {
> -		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
> -		if (!d->rmid_busy_llc)
> -			return -ENOMEM;
> -		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
> -	}
> -	if (is_mbm_total_enabled()) {
> -		tsize = sizeof(*d->mbm_total);
> -		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
> -		if (!d->mbm_total) {
> -			bitmap_free(d->rmid_busy_llc);
> -			return -ENOMEM;
> -		}
> -	}
> -	if (is_mbm_local_enabled()) {
> -		tsize = sizeof(*d->mbm_local);
> -		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
> -		if (!d->mbm_local) {
> -			bitmap_free(d->rmid_busy_llc);
> -			kfree(d->mbm_total);
> -			return -ENOMEM;
> -		}
> -	}
> -
> -	if (is_mbm_enabled()) {
> -		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
> -		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
> -	}
> -
> -	return 0;
> -}
> -
>  /*
>   * domain_add_cpu - Add a cpu to a resource's domain list.
>   *
> @@ -498,6 +462,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  	struct list_head *add_pos = NULL;
>  	struct rdt_hw_domain *hw_dom;
>  	struct rdt_domain *d;
> +	int err;
>  
>  	d = rdt_find_domain(r, id, &add_pos);
>  	if (IS_ERR(d)) {
> @@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  		return;
>  	}
>  
> -	if (r->mon_capable && domain_setup_mon_state(r, d)) {
> -		kfree(hw_dom->ctrl_val);
> -		kfree(hw_dom->mbps_val);
> -		kfree(hw_dom);
> -		return;
> -	}
> -
>  	list_add_tail(&d->list, add_pos);
>  
> -	/*
> -	 * If resctrl is mounted, add
> -	 * per domain monitor data directories.
> -	 */
> -	if (static_branch_unlikely(&rdt_mon_enable_key))
> -		mkdir_mondata_subdir_allrdtgrp(r, d);
> +	err = resctrl_online_domain(r, d);
> +	if (err) {
> +		list_del(&d->list);
> +		kfree(hw_dom->ctrl_val);
> +		kfree(hw_dom->mbps_val);
> +		kfree(d);
> +	}
>  }
>  
>  static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 8828b5c1b6d2..be48a682dbdb 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -524,8 +524,6 @@ void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>  void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>  				    unsigned int dom_id);
> -void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> -				    struct rdt_domain *d);
>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>  		    int evtid, int first);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e243c7d15b81..19691f9ab061 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2565,16 +2565,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>   * Add all subdirectories of mon_data for "ctrl_mon" groups
>   * and "monitor" groups with given domain id.
>   */
> -void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> -				    struct rdt_domain *d)
> +static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> +					   struct rdt_domain *d)
>  {
>  	struct kernfs_node *parent_kn;
>  	struct rdtgroup *prgrp, *crgrp;
>  	struct list_head *head;
>  
> -	if (!r->mon_capable)
> -		return;
> -
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>  		parent_kn = prgrp->mon.mon_data_kn;
>  		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
> @@ -3236,6 +3233,64 @@ static int __init rdtgroup_setup_root(void)
>  	return ret;
>  }
>  
> +static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	size_t tsize;
> +
> +	if (is_llc_occupancy_enabled()) {
> +		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
> +		if (!d->rmid_busy_llc)
> +			return -ENOMEM;
> +	}
> +	if (is_mbm_total_enabled()) {
> +		tsize = sizeof(*d->mbm_total);
> +		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
> +		if (!d->mbm_total) {
> +			bitmap_free(d->rmid_busy_llc);
> +			return -ENOMEM;
> +		}
> +	}
> +	if (is_mbm_local_enabled()) {
> +		tsize = sizeof(*d->mbm_local);
> +		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
> +		if (!d->mbm_local) {
> +			bitmap_free(d->rmid_busy_llc);
> +			kfree(d->mbm_total);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	int err;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);

Looks like lockdep_assert_held was not there in this sequence.
Are you concerned about this lock not being held?
Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work
  2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
@ 2021-10-19 23:19   ` Babu Moger
  2021-10-22 18:30     ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Babu Moger @ 2021-10-19 23:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng



On 10/1/21 11:02 AM, James Morse wrote:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when CPU hotplug callbacks are triggered.
> 
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> free the memory.
> 
> Move the monitor subdir removal and the cancelling of the mbm/limbo
> works into a new resctrl_offline_domain() call. These bits are not
> specific to the architecture. Grouping them in one function allows
> that code to be moved to /fs/ and re-used by another architecture.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Removed a redundant mon_capable check
>  * Capitalisation
>  * Removed inline comment
>  * Added to the commit message
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     | 26 ++---------------
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 --
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++---
>  include/linux/resctrl.h                |  1 +
>  4 files changed, 38 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 7a2c24c5652c..1dd8428df008 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -523,27 +523,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  
>  	cpumask_clear_cpu(cpu, &d->cpu_mask);
>  	if (cpumask_empty(&d->cpu_mask)) {
> -		/*
> -		 * If resctrl is mounted, remove all the
> -		 * per domain monitor data directories.
> -		 */
> -		if (static_branch_unlikely(&rdt_mon_enable_key))
> -			rmdir_mondata_subdir_allrdtgrp(r, d->id);
> +		resctrl_offline_domain(r, d);
>  		list_del(&d->list);
> -		if (r->mon_capable && is_mbm_enabled())
> -			cancel_delayed_work(&d->mbm_over);
> -		if (is_llc_occupancy_enabled() &&  has_busy_rmid(r, d)) {
> -			/*
> -			 * When a package is going down, forcefully
> -			 * decrement rmid->ebusy. There is no way to know
> -			 * that the L3 was flushed and hence may lead to
> -			 * incorrect counts in rare scenarios, but leaving
> -			 * the RMID as busy creates RMID leaks if the
> -			 * package never comes back.
> -			 */
> -			__check_limbo(d, true);
> -			cancel_delayed_work(&d->cqm_limbo);
> -		}
>  
>  		/*
>  		 * rdt_domain "d" is going to be freed below, so clear
> @@ -551,11 +532,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  		 */
>  		if (d->plr)
>  			d->plr->d = NULL;
> -
> -		bitmap_free(d->rmid_busy_llc);
> -		kfree(d->mbm_total);
> -		kfree(d->mbm_local);
>  		domain_free(hw_dom);
> +
>  		return;
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index be48a682dbdb..e12b55f815bf 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
>  int rdt_get_mon_l3_config(struct rdt_resource *r);
>  void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
> -void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> -				    unsigned int dom_id);
>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>  		    int evtid, int first);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 19691f9ab061..38670bb810cb 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
>   * Remove all subdirectories of mon_data of ctrl_mon groups
>   * and monitor groups with given domain id.
>   */
> -void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
> +static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> +					   unsigned int dom_id)
>  {
>  	struct rdtgroup *prgrp, *crgrp;
>  	char name[32];
>  
> -	if (!r->mon_capable)
> -		return;
> -
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>  		sprintf(name, "mon_%s_%02d", r->name, dom_id);
>  		kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
> @@ -3233,6 +3231,39 @@ static int __init rdtgroup_setup_root(void)
>  	return ret;
>  }
>  
> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	lockdep_assert_held(&rdtgroup_mutex);

Is this really required?

> +
> +	if (!r->mon_capable)
> +		return;

I don't see the need for this check either.

Thanks
Babu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
  2021-10-15 22:29   ` Reinette Chatre
@ 2021-10-19 23:20   ` Babu Moger
  2021-10-20 18:15     ` Reinette Chatre
  1 sibling, 1 reply; 61+ messages in thread
From: Babu Moger @ 2021-10-19 23:20 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi James,

On 10/1/21 11:02 AM, James Morse wrote:
> __rmid_read() selects the specified eventid and returns the counter
> value from the msr. The error handling is architecture specific, and
> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
> 
> Error handling should be handled by architecture specific code, as
> a different architecture may have different requirements. MPAM's
> counters can report that they are 'not ready', requiring a second
> read after a short delay. This should be hidden from resctrl.
> 
> Make __rmid_read() the architecture specific function for reading
> a counter. Rename it resctrl_arch_rmid_read() and move the error
> handling into it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Return EINVAL from the impossible case in __mon_event_count() instead
>    of an x86 hardware specific value.
> ---
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>  arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>  include/linux/resctrl.h                   |  1 +
>  4 files changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 25baacd331e0..c8ca7184c6d9 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  
>  	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>  
> -	if (rr.val & RMID_VAL_ERROR)
> +	if (rr.err == -EIO)
>  		seq_puts(m, "Error\n");
> -	else if (rr.val & RMID_VAL_UNAVAIL)
> +	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
>  	else
>  		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);

This patch breaks the earlier fix
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.15-rc6&id=064855a69003c24bd6b473b367d364e418c57625

When the user reads the events on the default monitoring group with
multiple subgroups, the events on all subgroups are consolidated
together. In case if the last rmid read was resulted in error then whole
group will be reported as error. The err field needs to be cleared.

Please add this patch to clear the error.

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..0e4addf237ec 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -444,6 +444,8 @@ void mon_event_count(void *info)
        /* Report error if none of rmid_reads are successful */
        if (ret_val)
                rr->val = ret_val;
+       else
+               rr->err  = 0;
 }

 /*


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-19 23:20   ` Babu Moger
@ 2021-10-20 18:15     ` Reinette Chatre
  2021-10-20 19:22       ` Babu Moger
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-20 18:15 UTC (permalink / raw)
  To: Babu Moger, James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Babu,

On 10/19/2021 4:20 PM, Babu Moger wrote:
> Hi James,
> 
> On 10/1/21 11:02 AM, James Morse wrote:
>> __rmid_read() selects the specified eventid and returns the counter
>> value from the msr. The error handling is architecture specific, and
>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>
>> Error handling should be handled by architecture specific code, as
>> a different architecture may have different requirements. MPAM's
>> counters can report that they are 'not ready', requiring a second
>> read after a short delay. This should be hidden from resctrl.
>>
>> Make __rmid_read() the architecture specific function for reading
>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>> handling into it.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> Changes since v1:
>>   * Return EINVAL from the impossible case in __mon_event_count() instead
>>     of an x86 hardware specific value.
>> ---
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>>   arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>>   include/linux/resctrl.h                   |  1 +
>>   4 files changed, 31 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 25baacd331e0..c8ca7184c6d9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   
>>   	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>   
>> -	if (rr.val & RMID_VAL_ERROR)
>> +	if (rr.err == -EIO)
>>   		seq_puts(m, "Error\n");
>> -	else if (rr.val & RMID_VAL_UNAVAIL)
>> +	else if (rr.err == -EINVAL)
>>   		seq_puts(m, "Unavailable\n");
>>   	else
>>   		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
> 
> This patch breaks the earlier fix
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.15-rc6&id=064855a69003c24bd6b473b367d364e418c57625
> 
> When the user reads the events on the default monitoring group with
> multiple subgroups, the events on all subgroups are consolidated
> together. In case if the last rmid read was resulted in error then whole
> group will be reported as error. The err field needs to be cleared.
> 
> Please add this patch to clear the error.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 14bc843043da..0e4addf237ec 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>          /* Report error if none of rmid_reads are successful */
>          if (ret_val)
>                  rr->val = ret_val;
> +       else
> +               rr->err  = 0;
>   }
> 
>   /*
> 

Good catch, thank you.

Even so, I do not think mon_event_count()'s usage of __mon_event_count() 
was taken into account by this patch and needs a bigger rework than the 
above fixup. For example, if I understand correctly ret_val is the error 
and rr->val no longer expected to contain the error after this patch. So 
keeping that assignment to rr->val is not correct.

Reinette


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-20 18:15     ` Reinette Chatre
@ 2021-10-20 19:22       ` Babu Moger
  2021-10-20 20:28         ` Reinette Chatre
  0 siblings, 1 reply; 61+ messages in thread
From: Babu Moger @ 2021-10-20 19:22 UTC (permalink / raw)
  To: Reinette Chatre, James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 10/20/21 1:15 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/19/2021 4:20 PM, Babu Moger wrote:
>> Hi James,
>>
>> On 10/1/21 11:02 AM, James Morse wrote:
>>> __rmid_read() selects the specified eventid and returns the counter
>>> value from the msr. The error handling is architecture specific, and
>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>
>>> Error handling should be handled by architecture specific code, as
>>> a different architecture may have different requirements. MPAM's
>>> counters can report that they are 'not ready', requiring a second
>>> read after a short delay. This should be hidden from resctrl.
>>>
>>> Make __rmid_read() the architecture specific function for reading
>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>> handling into it.
>>>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> ---
>>> Changes since v1:
>>>   * Return EINVAL from the impossible case in __mon_event_count() instead
>>>     of an x86 hardware specific value.
>>> ---
>>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>>>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>>>   arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>>>   include/linux/resctrl.h                   |  1 +
>>>   4 files changed, 31 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> index 25baacd331e0..c8ca7184c6d9 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>> *arg)
>>>         mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>   -    if (rr.val & RMID_VAL_ERROR)
>>> +    if (rr.err == -EIO)
>>>           seq_puts(m, "Error\n");
>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>> +    else if (rr.err == -EINVAL)
>>>           seq_puts(m, "Unavailable\n");
>>>       else
>>>           seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>
>> This patch breaks the earlier fix
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&amp;reserved=0
>>
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all subgroups are consolidated
>> together. In case if the last rmid read was resulted in error then whole
>> group will be reported as error. The err field needs to be cleared.
>>
>> Please add this patch to clear the error.
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 14bc843043da..0e4addf237ec 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>>          /* Report error if none of rmid_reads are successful */
>>          if (ret_val)
>>                  rr->val = ret_val;
>> +       else
>> +               rr->err  = 0;
>>   }
>>
>>   /*
>>
> 
> Good catch, thank you.
> 
> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
> was taken into account by this patch and needs a bigger rework than the
> above fixup. For example, if I understand correctly ret_val is the error
> and rr->val no longer expected to contain the error after this patch. So
> keeping that assignment to rr->val is not correct.

Yes. You are right. rr->val is not expected to contain the error.
Hopefully, this should help.

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..105d972cc511 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -441,9 +441,9 @@ void mon_event_count(void *info)
                }
        }

-       /* Report error if none of rmid_reads are successful */
-       if (ret_val)
-               rr->val = ret_val;
+       /* Clear the error if at least one of the rmid reads succeed */
+       if (ret_val == 0)
+               rr->err = 0;
 }

 /*

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-20 19:22       ` Babu Moger
@ 2021-10-20 20:28         ` Reinette Chatre
  2021-10-27 16:50           ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-20 20:28 UTC (permalink / raw)
  To: Babu Moger, James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Babu,

On 10/20/2021 12:22 PM, Babu Moger wrote:
> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>> __rmid_read() selects the specified eventid and returns the counter
>>>> value from the msr. The error handling is architecture specific, and
>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>
>>>> Error handling should be handled by architecture specific code, as
>>>> a different architecture may have different requirements. MPAM's
>>>> counters can report that they are 'not ready', requiring a second
>>>> read after a short delay. This should be hidden from resctrl.
>>>>
>>>> Make __rmid_read() the architecture specific function for reading
>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>> handling into it.
>>>>
>>>> Signed-off-by: James Morse <james.morse@arm.com>
>>>> ---
>>>> Changes since v1:
>>>>    * Return EINVAL from the impossible case in __mon_event_count() instead
>>>>      of an x86 hardware specific value.
>>>> ---
>>>>    arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>>>>    arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>>>>    arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>>>>    include/linux/resctrl.h                   |  1 +
>>>>    4 files changed, 31 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>> *arg)
>>>>          mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>    -    if (rr.val & RMID_VAL_ERROR)
>>>> +    if (rr.err == -EIO)
>>>>            seq_puts(m, "Error\n");
>>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>>> +    else if (rr.err == -EINVAL)
>>>>            seq_puts(m, "Unavailable\n");
>>>>        else
>>>>            seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>
>>> This patch breaks the earlier fix
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&amp;reserved=0
>>>
>>>
>>> When the user reads the events on the default monitoring group with
>>> multiple subgroups, the events on all subgroups are consolidated
>>> together. In case if the last rmid read was resulted in error then whole
>>> group will be reported as error. The err field needs to be cleared.
>>>
>>> Please add this patch to clear the error.
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 14bc843043da..0e4addf237ec 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>>>           /* Report error if none of rmid_reads are successful */
>>>           if (ret_val)
>>>                   rr->val = ret_val;
>>> +       else
>>> +               rr->err  = 0;
>>>    }
>>>
>>>    /*
>>>
>>
>> Good catch, thank you.
>>
>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>> was taken into account by this patch and needs a bigger rework than the
>> above fixup. For example, if I understand correctly ret_val is the error
>> and rr->val no longer expected to contain the error after this patch. So
>> keeping that assignment to rr->val is not correct.
> 
> Yes. You are right. rr->val is not expected to contain the error.
> Hopefully, this should help.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 14bc843043da..105d972cc511 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -441,9 +441,9 @@ void mon_event_count(void *info)
>                  }
>          }
> 
> -       /* Report error if none of rmid_reads are successful */
> -       if (ret_val)
> -               rr->val = ret_val;
> +       /* Clear the error if at least one of the rmid reads succeed */
> +       if (ret_val == 0)
> +               rr->err = 0;
>   }
> 
>   /*
> 

Yes, this looks good. If the first __mon_event_count() succeeds but a 
following one fails then the data still needs to be reported so the 
error code needs to be fixed up afterwards and cannot be done inside 
__mon_event_count(). Thank you very much.

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work
  2021-10-15 22:19   ` Reinette Chatre
@ 2021-10-22 18:30     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-22 18:30 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:19, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
> 
>> @@ -527,21 +492,15 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>>           return;
>>       }
>>   -    if (r->mon_capable && domain_setup_mon_state(r, d)) {
>> -        kfree(hw_dom->ctrl_val);
>> -        kfree(hw_dom->mbps_val);
>> -        kfree(hw_dom);
>> -        return;
>> -    }
>> -
>>       list_add_tail(&d->list, add_pos);
>>   -    /*
>> -     * If resctrl is mounted, add
>> -     * per domain monitor data directories.
>> -     */
>> -    if (static_branch_unlikely(&rdt_mon_enable_key))
>> -        mkdir_mondata_subdir_allrdtgrp(r, d);
>> +    err = resctrl_online_domain(r, d);
>> +    if (err) {
>> +        list_del(&d->list);
>> +        kfree(hw_dom->ctrl_val);
>> +        kfree(hw_dom->mbps_val);
>> +        kfree(d);
> 
> Even though this goes away in next patch I think this should rather be kfree(hw_dom).


Whoops, that's a rebase artefact from patch 2. Fixed.


Thanks!

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work
  2021-10-19 23:19   ` Babu Moger
@ 2021-10-22 18:30     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-22 18:30 UTC (permalink / raw)
  To: Babu Moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi Babu,

On 20/10/2021 00:19, Babu Moger wrote:
> On 10/1/21 11:02 AM, James Morse wrote:
>> Because domains are exposed to user-space via resctrl, the filesystem
>> must update its state when CPU hotplug callbacks are triggered.
>>
>> Some of this work is common to any architecture that would support
>> resctrl, but the work is tied up with the architecture code to
>> allocate the memory.
>>
>> Move domain_setup_mon_state(), the monitor subdir creation call and the
>> mbm/limbo workers into a new resctrl_online_domain() call. These bits
>> are not specific to the architecture. Grouping them in one function
>> allows that code to be moved to /fs/ and re-used by another architecture.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index e243c7d15b81..19691f9ab061 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c

>> @@ -3236,6 +3233,64 @@ static int __init rdtgroup_setup_root(void)
>>  	return ret;
>>  }
>>  
>> +static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	size_t tsize;
>> +
>> +	if (is_llc_occupancy_enabled()) {
>> +		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
>> +		if (!d->rmid_busy_llc)
>> +			return -ENOMEM;
>> +	}
>> +	if (is_mbm_total_enabled()) {
>> +		tsize = sizeof(*d->mbm_total);
>> +		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
>> +		if (!d->mbm_total) {
>> +			bitmap_free(d->rmid_busy_llc);
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +	if (is_mbm_local_enabled()) {
>> +		tsize = sizeof(*d->mbm_local);
>> +		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
>> +		if (!d->mbm_local) {
>> +			bitmap_free(d->rmid_busy_llc);
>> +			kfree(d->mbm_total);
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	int err;
>> +
>> +	lockdep_assert_held(&rdtgroup_mutex);

> Looks like lockdep_assert_held was not there in this sequence.
> Are you concerned about this lock not being held?

Its partly paranoia, partly documentation.

This is to document that the caller has to take this mutex, it protects the domain
pointers that are written in domain_setup_mon_state(), and read by __mon_event_count() and
the like.

I have a patch later in the tree that splits the locking done by the arch-code from the
locking done by the filesystem. Today those are both rdtgroup_mutex.



Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled
  2021-10-19 23:18   ` Babu Moger
@ 2021-10-22 18:30     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-22 18:30 UTC (permalink / raw)
  To: Babu Moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi Babu,

On 20/10/2021 00:18, Babu Moger wrote:
> On 10/1/21 11:02 AM, James Morse wrote:
>> mon_enabled and mon_capable are always set as a pair by
>> rdt_get_mon_l3_config().
>>
>> There is no point having two values.
>>
>> Merge them together.

>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 386ab3a41500..8180c539800d 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -130,7 +130,6 @@ struct resctrl_schema;
>>  /**
>>   * struct rdt_resource - attributes of a resctrl resource
>>   * @rid:		The index of the resource
>> - * @mon_enabled:	Is monitoring enabled for this feature
>>   * @alloc_capable:	Is allocation available on this machine
>>   * @mon_capable:	Is monitor feature available on this machine
>>   * @num_rmid:		Number of RMIDs available
>> @@ -149,7 +148,6 @@ struct resctrl_schema;
>>   */
>>  struct rdt_resource {
>>  	int			rid;
>> -	bool			mon_enabled;
>>  	bool			alloc_capable;
>>  	bool			mon_capable;
> 
> Also we should probably rename alloc_capable and mon_capable to
> alloc_supported and mon_supported respectively. We dont have an option to
> enable and disable these feature. If it is supported, it is always supported.

Does 'capable' imply the feature was enabled? I agree 'supported' is clearer now that the
schema/resource enable step has been folded away.

I'll put this on the TODO list...


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work
  2021-10-19 23:19   ` Babu Moger
@ 2021-10-22 18:30     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-22 18:30 UTC (permalink / raw)
  To: Babu Moger, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, shameerali.kolothum.thodi,
	Jamie Iles, D Scott Phillips OS, lcherian, bobo.shaobowang,
	tan.shaopeng

Hi Babu,

On 20/10/2021 00:19, Babu Moger wrote:
> On 10/1/21 11:02 AM, James Morse wrote:
>> Because domains are exposed to user-space via resctrl, the filesystem
>> must update its state when CPU hotplug callbacks are triggered.
>>
>> Some of this work is common to any architecture that would support
>> resctrl, but the work is tied up with the architecture code to
>> free the memory.
>>
>> Move the monitor subdir removal and the cancelling of the mbm/limbo
>> works into a new resctrl_offline_domain() call. These bits are not
>> specific to the architecture. Grouping them in one function allows
>> that code to be moved to /fs/ and re-used by another architecture.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 19691f9ab061..38670bb810cb 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2499,14 +2499,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
>>   * Remove all subdirectories of mon_data of ctrl_mon groups
>>   * and monitor groups with given domain id.
>>   */
>> -void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
>> +static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>> +					   unsigned int dom_id)
>>  {
>>  	struct rdtgroup *prgrp, *crgrp;
>>  	char name[32];

>> -	if (!r->mon_capable)
>> -		return;

>>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>>  		sprintf(name, "mon_%s_%02d", r->name, dom_id);
>>  		kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
>> @@ -3233,6 +3231,39 @@ static int __init rdtgroup_setup_root(void)
>>  	return ret;
>>  }
>>  
>> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	lockdep_assert_held(&rdtgroup_mutex);
> 
> Is this really required?

It documents that the caller must take the lock. Its not so clear how walking
rdt_all_groups in rmdir_mondata_subdir_allrdtgrp() is safe now that the logic has moved to
another file. (and these helpers will eventually move out to /fs/).

Who takes what lock changes later in the tree, (after this series), these annotations make
it a lot clearer that these functions are changing from caller-takes-the-lock to
callee-takes-the-new-lock. Otherwise that patch would be much harder to review.


>> +
>> +	if (!r->mon_capable)
>> +		return;
> 
> I don't see the need for this check either.

It moved up from rmdir_mondata_subdir_allrdtgrp(), quoted above.
All of the work that moved from domain_remove_cpu() to resctrl_offline_domain() is about
monitors.

Sure, calling is_mbm_enabled(), is_llc_occupancy_enabled(), bitmap_free(NULL), and
kfree(NULL) twice isn't harmful in this case, but its quicker to check the flag on the
resource and return early if nothing else needs doing.


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-10-15 22:26   ` Reinette Chatre
@ 2021-10-22 18:30     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-22 18:30 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:26, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
>> To support resctrl's MBA software controller, the architecture must provide
>> a second configuration array to hold the mbps_val from user-space.
>>
>> This complicates the interface between the architecture code.
> 
> This complicates the interface between the architecture code and ... ?

and the filesystem parts ... fixed.


>> Make the filesystem parts of resctrl create an array for the mba_sc
>> values when is_mba_sc() is set to true. The software controller
>> can be changed to use this, allowing the architecture code to only
>> consider the values configured in hardware.

> This changes significantly more than just where the mbps_val array is hosted. It also
> changes how the life cycle of this array is managed. Previously it followed the domain,
> whether mba_sc was enabled or not. Now that it depends on mba_sc it is managed quite
> differently.

> Could the changelog be upfront about this change and its motivation? Stating this would
> make this much easier to review and also the later patches where the original mbps_val
> initialization code is removed without replacement.

Yes, I'd not considered those as different things. I'll split this into two patches, and
move the change that only allocates the memory if its going to be used to the end of the
series.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 38670bb810cb..9d402bc8bdff 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -3291,6 +3355,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct
>> rdt_domain *d)
>>           }
>>       }
>>   +    if (is_mba_sc(r))
>> +        return mba_sc_domain_allocate(r, d);
>> +
>>       return 0;
>>   }
>>   

> Could this be done symmetrically? That is, allocate in resctrl_online_domain() and free in
> resctrl_offline_domain().

Yes, that would be better!


>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 5d283bdd6162..355660d46612 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,
>>     #endif
>>   +/* max value for struct resctrl_mba_sc's mbps_val */
>> +#define MBA_MAX_MBPS   U32_MAX
> 
> struct resctrl_mba_sc?

Was squashed out of the previous version as it only had one member.
This should be struct rdt_domain.


>>   /**
>>    * enum resctrl_conf_type - The type of configuration.
>>    * @CDP_NONE:    No prioritisation, both code and data are controlled or monitored.
>> @@ -53,6 +56,8 @@ struct resctrl_staged_config {
>>    * @cqm_work_cpu:    worker CPU for CQM h/w counters
>>    * @plr:        pseudo-locked region (if any) associated with domain
>>    * @staged_config:    parsed configuration to be applied
>> + * @mbps_val:        Array of user specified control values for mba_sc,
>> + *            indexed by closid

> Could this inherit some of the useful kerneldoc associated with the mbps_val being
> replaced? That is, it exists when mba_sc is enabled and contains bandwidth values in MBps.

Yup, done.


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-10-15 22:26   ` Reinette Chatre
@ 2021-10-27 16:49     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-27 16:49 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:26, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
>> Updates to resctrl's software controller follow the same path as
>> other configuration updates, but they don't modify the hardware state.
>> rdtgroup_schemata_write() uses parse_line() and the resource's
>> ctrlval_parse function to stage the configuration.
> 
> parse_ctrlval ?
> 
>> resctrl_arch_update_domains() then updates the mbps_val[] array
>> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
>> call that would update hardware.
>>
>> This complicates the interface between resctrl's filesystem parts
>> and architecture specific code. It should be possible for mba_sc
>> to be completely implemented by the filesystem parts of resctrl. This
>> would allow it to work on a second architecture with no additional code.
>>
>> Change parse_bw() to write the configuration value directly to the
>> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
> 
> mpbs_val[] array?
> 
>> to skip the call to resctrl_arch_update_domains(), meaning all the
>> mba_sc specific code in resctrl_arch_update_domains() can be removed.
>> On the read-side, show_doms() and update_mba_bw() are changed to read
>> the mba_sc[] array from the domain structure. With this,
> 
> mbps_val[] ?
> 
> Should rdtgroup_size_show() also get a similar snippet?

Yes! Good catch!


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val
  2021-10-15 22:27   ` Reinette Chatre
@ 2021-10-27 16:49     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-27 16:49 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:27, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
>> The resctrl arch code provides a second configuration array mbps_val[]
>> for the MBA software controller.
>>
>> Since resctrl switched over to allocating and freeing its own array
>> when needed, nothing uses the arch code version.
> 
> With the previous changes this is true, that this array is no longer used. Even so, the
> code removed in this patch is not just the usage of the array but also its management ...
> especially how and when it is reset. While the array is no longer used I think it is still
> important to ensure that all the array management is handled in the new mpbs_val array.
> Perhaps just help the reader by stating that the values of the new array never needs to be
> reset since it is always recreated while the previous array stuck around during umount/mount.

I've split those changes out as a separate patch which appears at the end of the series,
meaning the lifecycle stuff is unchanged by this point.


Thanks,

James


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly
  2021-10-15 22:28   ` Reinette Chatre
@ 2021-10-27 16:49     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-27 16:49 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:28, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 9f45207a6c74..25baacd331e0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
>>       return false;
>>   }
>>   +int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
>> +                u32 closid, enum resctrl_conf_type t, u32 cfg_val)
>> +{
>> +    struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
>> +    u32 idx = get_config_index(closid, t);
>> +    struct msr_param msr_param;
>> +
>> +    if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
>> +        return -EINVAL;
>> +
>> +    hw_dom->ctrl_val[idx] = cfg_val;
>> +
>> +    msr_param.res = r;
>> +    msr_param.low = idx;
>> +    msr_param.high = idx + 1;
>> +
>> +    rdt_ctrl_update(&msr_param);
>> +
> 
> rdt_ctrl_update() will take its parameters and recompute the domain that is already
> available here ... seems to take a few steps back and then do the needed. Could msr_update
> be called directly here instead?

Even better!


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-15 22:28   ` Reinette Chatre
@ 2021-10-27 16:50     ` James Morse
  2021-10-27 20:41       ` Reinette Chatre
  0 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-27 16:50 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 15/10/2021 23:28, Reinette Chatre wrote:
> On 10/1/2021 9:02 AM, James Morse wrote:
>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>> second. It reads the hardware register, calculates the bandwidth and
>> updates m->prev_bw_msr which is used to hold the previous hardware register
>> value.
>>
>> Operating directly on hardware register values makes it difficult to make
>> this code architecture independent, so that it can be moved to /fs/,
>> making the mba_sc feature something resctrl supports with no additional
>> support from the architecture.
>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>> register using __mon_event_count().
> 
> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
> is enabled") may explain how the code ended up the way it is.
> 
>> Change mbm_bw_count() to use the current chunks value from
>> __mon_event_count() to calculate bandwidth. This means it no longer
>> operates on hardware register values.
> 
> ok ... so could the patch just do this as it is stated here? The way it is implemented is
> very complicated and hard (for me) to verify the correctness (more below).

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 6c8226987dd6..a1232462db14 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c

>>   static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>   {
>>       struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>       struct mbm_state *m = &rr->d->mbm_local[rmid];
>> -    u64 tval, cur_bw, chunks;
>> +    u64 cur_bw, chunks, cur_chunks;
>>   -    tval = __rmid_read(rmid, rr->evtid);
>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>> -        return;
>> +    cur_chunks = rr->val;
>> +    chunks = cur_chunks - m->prev_bw_chunks;
>> +    m->prev_bw_chunks = cur_chunks;
>>   -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;

> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
> it could just retrieve it (chunks = m->prev_chunks).

(I agree the diff is noisy, it may be easier as a new function as this is a replacement
not a transform of the existing function)

__mon_event_count() can also be triggered by user-space reading the file, so any of its
'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
get this in MB/s.
__mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
would need to be rr->val, which is what this code starts with for the "number of chunks
ever read by this counter".


The variable 'chunks' has been used too much here, so its lost its meaning. How about:
|	current_chunk_count = rr->val;
|	delta_counter = current_chunk_count - m->prev_chunk_count;
|	cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
|
|	m->prev_chunk_count = current_chunk_count;


The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
do with overflow of the hardware counter. Because a previously sanitised value is being
used, the hardware counters resolution doesn't need to be considered.
(which helps make mba_sc architecture independent)


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-10-07  6:13   ` tan.shaopeng
@ 2021-10-27 16:50     ` James Morse
  0 siblings, 0 replies; 61+ messages in thread
From: James Morse @ 2021-10-27 16:50 UTC (permalink / raw)
  To: tan.shaopeng, x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

On 07/10/2021 07:13, tan.shaopeng@fujitsu.com wrote:
>> To determine whether the mba_MBps option to resctrl should be supported,
>> resctrl tests the boot cpus' x86_vendor.
>>
>> This isn't portable, and needs abstracting behind a helper so this check can be
>> part of the filesystem code that moves to /fs/.
>>
>> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
>> on this system. An 'alloc_capable' test is added so that support for the controls
>> isn't implied by the 'delay_linear' property, which is always true for MPAM.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> 
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>

Thanks!

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-20 20:28         ` Reinette Chatre
@ 2021-10-27 16:50           ` James Morse
  2021-10-27 18:59             ` Babu Moger
  0 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-27 16:50 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette, Babu,

On 20/10/2021 21:28, Reinette Chatre wrote:
> On 10/20/2021 12:22 PM, Babu Moger wrote:
>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>> value from the msr. The error handling is architecture specific, and
>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>
>>>>> Error handling should be handled by architecture specific code, as
>>>>> a different architecture may have different requirements. MPAM's
>>>>> counters can report that they are 'not ready', requiring a second
>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>
>>>>> Make __rmid_read() the architecture specific function for reading
>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>> handling into it.


>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>> *arg)
>>>>>          mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>>    -    if (rr.val & RMID_VAL_ERROR)
>>>>> +    if (rr.err == -EIO)
>>>>>            seq_puts(m, "Error\n");
>>>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>>>> +    else if (rr.err == -EINVAL)
>>>>>            seq_puts(m, "Unavailable\n");
>>>>>        else
>>>>>            seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>
>>>> This patch breaks the earlier fix
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&amp;reserved=0

Aha!


>>>> When the user reads the events on the default monitoring group with
>>>> multiple subgroups, the events on all subgroups are consolidated
>>>> together. In case if the last rmid read was resulted in error then whole
>>>> group will be reported as error. The err field needs to be cleared.
>>>>
>>>> Please add this patch to clear the error.

>>> Good catch, thank you.
>>>
>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>> was taken into account by this patch and needs a bigger rework than the
>>> above fixup. For example, if I understand correctly ret_val is the error
>>> and rr->val no longer expected to contain the error after this patch. So
>>> keeping that assignment to rr->val is not correct.
>>
>> Yes. You are right. rr->val is not expected to contain the error.
>> Hopefully, this should help.

> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
> then the data still needs to be reported so the error code needs to be fixed up afterwards
> and cannot be done inside __mon_event_count(). Thank you very much.

Thanks both! I should have worked this out when splitting msr_val into two values, which
end up getting set the same.

I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
I've replaced the rr->val chunk with:
|	/*
|	 * __mon_event_count() calls for newly created monitor groups may
|	 * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
|	 * If the first call for the control group succeed, discard any error
|	 * set by reads of monitor groups.
|	 */
|	if (ret_val == 0)
|		rr->err = 0;


Thanks.

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
  2021-10-27 16:50           ` James Morse
@ 2021-10-27 18:59             ` Babu Moger
  0 siblings, 0 replies; 61+ messages in thread
From: Babu Moger @ 2021-10-27 18:59 UTC (permalink / raw)
  To: James Morse, Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/27/21 11:50 AM, James Morse wrote:
> Hi Reinette, Babu,
> 
> On 20/10/2021 21:28, Reinette Chatre wrote:
>> On 10/20/2021 12:22 PM, Babu Moger wrote:
>>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>>> value from the msr. The error handling is architecture specific, and
>>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>>
>>>>>> Error handling should be handled by architecture specific code, as
>>>>>> a different architecture may have different requirements. MPAM's
>>>>>> counters can report that they are 'not ready', requiring a second
>>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>>
>>>>>> Make __rmid_read() the architecture specific function for reading
>>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>>> handling into it.
> 
> 
>>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>>> *arg)
>>>>>>          mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>>>    -    if (rr.val & RMID_VAL_ERROR)
>>>>>> +    if (rr.err == -EIO)
>>>>>>            seq_puts(m, "Error\n");
>>>>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>>>>> +    else if (rr.err == -EINVAL)
>>>>>>            seq_puts(m, "Unavailable\n");
>>>>>>        else
>>>>>>            seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>>
>>>>> This patch breaks the earlier fix
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C00eaab44815947ce7eb908d99969e584%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709502411367349%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4udUc%2BEurWdC%2BAPQFs2eG0aVbsv3SnIXcEyRj081hxk%3D&amp;reserved=0
> 
> Aha!
> 
> 
>>>>> When the user reads the events on the default monitoring group with
>>>>> multiple subgroups, the events on all subgroups are consolidated
>>>>> together. In case if the last rmid read was resulted in error then whole
>>>>> group will be reported as error. The err field needs to be cleared.
>>>>>
>>>>> Please add this patch to clear the error.
> 
>>>> Good catch, thank you.
>>>>
>>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>>> was taken into account by this patch and needs a bigger rework than the
>>>> above fixup. For example, if I understand correctly ret_val is the error
>>>> and rr->val no longer expected to contain the error after this patch. So
>>>> keeping that assignment to rr->val is not correct.
>>>
>>> Yes. You are right. rr->val is not expected to contain the error.
>>> Hopefully, this should help.
> 
>> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
>> then the data still needs to be reported so the error code needs to be fixed up afterwards
>> and cannot be done inside __mon_event_count(). Thank you very much.
> 
> Thanks both! I should have worked this out when splitting msr_val into two values, which
> end up getting set the same.
> 
> I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
> I've replaced the rr->val chunk with:
> |	/*
> |	 * __mon_event_count() calls for newly created monitor groups may
> |	 * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
> |	 * If the first call for the control group succeed, discard any error
> |	 * set by reads of monitor groups.
> |	 */
> |	if (ret_val == 0)
> |		rr->err = 0;

Looks good.
Thanks
Babu

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-27 16:50     ` James Morse
@ 2021-10-27 20:41       ` Reinette Chatre
  2021-10-29 15:50         ` James Morse
  0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2021-10-27 20:41 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/27/2021 9:50 AM, James Morse wrote:
> On 15/10/2021 23:28, Reinette Chatre wrote:
>> On 10/1/2021 9:02 AM, James Morse wrote:
>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>> second. It reads the hardware register, calculates the bandwidth and
>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>> value.
>>>
>>> Operating directly on hardware register values makes it difficult to make
>>> this code architecture independent, so that it can be moved to /fs/,
>>> making the mba_sc feature something resctrl supports with no additional
>>> support from the architecture.
>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>> register using __mon_event_count().
>>
>> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
>> is enabled") may explain how the code ended up the way it is.
>>
>>> Change mbm_bw_count() to use the current chunks value from
>>> __mon_event_count() to calculate bandwidth. This means it no longer
>>> operates on hardware register values.
>>
>> ok ... so could the patch just do this as it is stated here? The way it is implemented is
>> very complicated and hard (for me) to verify the correctness (more below).
> 
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 6c8226987dd6..a1232462db14 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> 
>>>    static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>>    {
>>>        struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>>        struct mbm_state *m = &rr->d->mbm_local[rmid];
>>> -    u64 tval, cur_bw, chunks;
>>> +    u64 cur_bw, chunks, cur_chunks;
>>>    -    tval = __rmid_read(rmid, rr->evtid);
>>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>>> -        return;
>>> +    cur_chunks = rr->val;
>>> +    chunks = cur_chunks - m->prev_bw_chunks;
>>> +    m->prev_bw_chunks = cur_chunks;
>>>    -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;
> 
>> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
>> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
>> it could just retrieve it (chunks = m->prev_chunks).
> 
> (I agree the diff is noisy, it may be easier as a new function as this is a replacement
> not a transform of the existing function)
> 
> __mon_event_count() can also be triggered by user-space reading the file, so any of its
> 'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
> get this in MB/s.

The resource group's mutex is taken before __mon_event_count() is called 
via user-space or via the overflow handler so I think that 
mbm_bw_count() can assume that the prev values are from the 
__mon_event_count() called just before it.

> __mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
> would need to be rr->val, which is what this code starts with for the "number of chunks
> ever read by this counter".

The value could be corrected in mbm_bw_count(), no?

> 
> 
> The variable 'chunks' has been used too much here, so its lost its meaning. How about:
> |	current_chunk_count = rr->val;
> |	delta_counter = current_chunk_count - m->prev_chunk_count;
> |	cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
> |
> |	m->prev_chunk_count = current_chunk_count;
> 
> 
> The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
> do with overflow of the hardware counter. Because a previously sanitised value is being
> used, the hardware counters resolution doesn't need to be considered.
> (which helps make mba_sc architecture independent)

This is the part that is not obvious to me: is the difference between 
the two individually sanitized measurements the same as sanitizing the 
difference between the two measurements?

Reinette

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-27 20:41       ` Reinette Chatre
@ 2021-10-29 15:50         ` James Morse
  2021-10-29 22:22           ` Reinette Chatre
  0 siblings, 1 reply; 61+ messages in thread
From: James Morse @ 2021-10-29 15:50 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi Reinette,

On 27/10/2021 21:41, Reinette Chatre wrote:
> On 10/27/2021 9:50 AM, James Morse wrote:
>> On 15/10/2021 23:28, Reinette Chatre wrote:
>>> On 10/1/2021 9:02 AM, James Morse wrote:
>>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>>> second. It reads the hardware register, calculates the bandwidth and
>>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>>> value.
>>>>
>>>> Operating directly on hardware register values makes it difficult to make
>>>> this code architecture independent, so that it can be moved to /fs/,
>>>> making the mba_sc feature something resctrl supports with no additional
>>>> support from the architecture.
>>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>>> register using __mon_event_count().
>>>
>>> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
>>> is enabled") may explain how the code ended up the way it is.
>>>
>>>> Change mbm_bw_count() to use the current chunks value from
>>>> __mon_event_count() to calculate bandwidth. This means it no longer
>>>> operates on hardware register values.
>>>
>>> ok ... so could the patch just do this as it is stated here? The way it is implemented is
>>> very complicated and hard (for me) to verify the correctness (more below).
>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> index 6c8226987dd6..a1232462db14 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>
>>>>    static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>>>    {
>>>>        struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>>>        struct mbm_state *m = &rr->d->mbm_local[rmid];
>>>> -    u64 tval, cur_bw, chunks;
>>>> +    u64 cur_bw, chunks, cur_chunks;
>>>>    -    tval = __rmid_read(rmid, rr->evtid);
>>>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>>>> -        return;
>>>> +    cur_chunks = rr->val;
>>>> +    chunks = cur_chunks - m->prev_bw_chunks;
>>>> +    m->prev_bw_chunks = cur_chunks;
>>>>    -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>>>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>>>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;
>>
>>> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
>>> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
>>> it could just retrieve it (chunks = m->prev_chunks).
>>
>> (I agree the diff is noisy, it may be easier as a new function as this is a replacement
>> not a transform of the existing function)
>>
>> __mon_event_count() can also be triggered by user-space reading the file, so any of its
>> 'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
>> get this in MB/s.

> The resource group's mutex is taken before __mon_event_count() is called via user-space or
> via the overflow handler so I think that mbm_bw_count() can assume that the prev values
> are from the __mon_event_count() called just before it.

That is true. But changing this to work with the overflow+corrected value directly means
it doesn't need changing again as each of those steps are moved into the architecture
specific function. Changing this would make the later patches noisier, and we would have
the same discussion on a later patch.


>> __mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
>> would need to be rr->val, which is what this code starts with for the "number of chunks
>> ever read by this counter".

> The value could be corrected in mbm_bw_count(), no?

It could, but the aim of the series is to move all the architecture specific behaviour
behind an arch helper.

MPAMs counters read in bytes, and when they don't, its up to the MPAM architecture
specific code to fix the hardware values before resctrl gets them.

There is no reason for the mba_sc code to be architecture specific, it operates on the
counters and controls.


>> The variable 'chunks' has been used too much here, so its lost its meaning. How about:
>> |    current_chunk_count = rr->val;
>> |    delta_counter = current_chunk_count - m->prev_chunk_count;
>> |    cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
>> |
>> |    m->prev_chunk_count = current_chunk_count;
>>
>>
>> The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
>> do with overflow of the hardware counter. Because a previously sanitised value is being
>> used, the hardware counters resolution doesn't need to be considered.
>> (which helps make mba_sc architecture independent)

> This is the part that is not obvious to me: is the difference between the two individually
> sanitized measurements the same as sanitizing the difference between the two measurements?

I agree get_corrected_mbm_count()'s rmid check and shift hide what it is doing, but it
boils down to a multiply. The existing code is (a - b)*cf, which is the same as this a*cf
- b*cf.

I'm not worried about this going wrong after 18-and-a-bit Exabytes of data is transferred,
at current memory speeds that would take decades. But: none of the 'cf' values are greater
than two, and the hardware register has two bits taken for error codes, so there is no a
or b that hardware can represent, with a cf less than two, that overflows a 64bit unsigned
long.


Thanks,

James

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
  2021-10-29 15:50         ` James Morse
@ 2021-10-29 22:22           ` Reinette Chatre
  0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2021-10-29 22:22 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang, tan.shaopeng

Hi James,

On 10/29/2021 8:50 AM, James Morse wrote:
> On 27/10/2021 21:41, Reinette Chatre wrote:
>> On 10/27/2021 9:50 AM, James Morse wrote:
>>> On 15/10/2021 23:28, Reinette Chatre wrote:
>>>> On 10/1/2021 9:02 AM, James Morse wrote:
>>>>> mbm_bw_count() is only called by the mbm_handle_overflow() worker once a
>>>>> second. It reads the hardware register, calculates the bandwidth and
>>>>> updates m->prev_bw_msr which is used to hold the previous hardware register
>>>>> value.
>>>>>
>>>>> Operating directly on hardware register values makes it difficult to make
>>>>> this code architecture independent, so that it can be moved to /fs/,
>>>>> making the mba_sc feature something resctrl supports with no additional
>>>>> support from the architecture.
>>>>> Prior to calling mbm_bw_count(), mbm_update() reads from the same hardware
>>>>> register using __mon_event_count().
>>>>
>>>> Looking back I think 06c5fe9b12dd ("x86/resctrl: Fix incorrect local bandwidth when mba_sc
>>>> is enabled") may explain how the code ended up the way it is.
>>>>
>>>>> Change mbm_bw_count() to use the current chunks value from
>>>>> __mon_event_count() to calculate bandwidth. This means it no longer
>>>>> operates on hardware register values.
>>>>
>>>> ok ... so could the patch just do this as it is stated here? The way it is implemented is
>>>> very complicated and hard (for me) to verify the correctness (more below).
>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> index 6c8226987dd6..a1232462db14 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>
>>>>>     static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
>>>>>     {
>>>>>         struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
>>>>>         struct mbm_state *m = &rr->d->mbm_local[rmid];
>>>>> -    u64 tval, cur_bw, chunks;
>>>>> +    u64 cur_bw, chunks, cur_chunks;
>>>>>     -    tval = __rmid_read(rmid, rr->evtid);
>>>>> -    if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
>>>>> -        return;
>>>>> +    cur_chunks = rr->val;
>>>>> +    chunks = cur_chunks - m->prev_bw_chunks;
>>>>> +    m->prev_bw_chunks = cur_chunks;
>>>>>     -    chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
>>>>> -    cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
>>>>> +    cur_bw = (chunks * hw_res->mon_scale) >> 20;
>>>
>>>> I find this quite confusing. What if a new m->prev_chunks is introduced instead and
>>>> initialized in __mon_event_count() to the value of chunks, and then here in mbm_bw_count
>>>> it could just retrieve it (chunks = m->prev_chunks).
>>>
>>> (I agree the diff is noisy, it may be easier as a new function as this is a replacement
>>> not a transform of the existing function)
>>>
>>> __mon_event_count() can also be triggered by user-space reading the file, so any of its
>>> 'prev' values should be ignored, as they aren't updated on the 1-second timer needed to
>>> get this in MB/s.
> 
>> The resource group's mutex is taken before __mon_event_count() is called via user-space or
>> via the overflow handler so I think that mbm_bw_count() can assume that the prev values
>> are from the __mon_event_count() called just before it.
> 
> That is true. But changing this to work with the overflow+corrected value directly means
> it doesn't need changing again as each of those steps are moved into the architecture
> specific function. Changing this would make the later patches noisier, and we would have
> the same discussion on a later patch.

ok

> 
> 
>>> __mon_event_count()'s chunks values hasn't been through get_corrected_mbm_count(), so it
>>> would need to be rr->val, which is what this code starts with for the "number of chunks
>>> ever read by this counter".
> 
>> The value could be corrected in mbm_bw_count(), no?
> 
> It could, but the aim of the series is to move all the architecture specific behaviour
> behind an arch helper.

ok - I am still working on understanding how these helpers are organized

> 
> MPAMs counters read in bytes, and when they don't, its up to the MPAM architecture
> specific code to fix the hardware values before resctrl gets them.
> 
> There is no reason for the mba_sc code to be architecture specific, it operates on the
> counters and controls.
> 
> 
>>> The variable 'chunks' has been used too much here, so its lost its meaning. How about:
>>> |    current_chunk_count = rr->val;
>>> |    delta_counter = current_chunk_count - m->prev_chunk_count;
>>> |    cur_bw = (delta_counter * hw_res->mon_scale) >> 20;
>>> |
>>> |    m->prev_chunk_count = current_chunk_count;
>>>
>>>
>>> The 'delta_counter' step was previously hidden in mbm_overflow_count(), which also had to
>>> do with overflow of the hardware counter. Because a previously sanitised value is being
>>> used, the hardware counters resolution doesn't need to be considered.
>>> (which helps make mba_sc architecture independent)
> 
>> This is the part that is not obvious to me: is the difference between the two individually
>> sanitized measurements the same as sanitizing the difference between the two measurements?
> 
> I agree get_corrected_mbm_count()'s rmid check and shift hide what it is doing, but it
> boils down to a multiply. The existing code is (a - b)*cf, which is the same as this a*cf
> - b*cf.
> 
> I'm not worried about this going wrong after 18-and-a-bit Exabytes of data is transferred,
> at current memory speeds that would take decades. But: none of the 'cf' values are greater
> than two, and the hardware register has two bits taken for error codes, so there is no a
> or b that hardware can represent, with a cf less than two, that overflows a 64bit unsigned
> long.

Thank you for answering it in this way. This seems fair. Could the 
commit message please elaborate more on the changes involved? The 
current summary of "Change mbm_bw_count() to use the current chunks 
value from __mon_event_count() to calculate bandwidth." is too cryptic 
(for me).

Reinette



^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2021-10-29 22:22 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 16:02 [PATCH v2 00/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-01 16:02 ` [PATCH v2 01/23] x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails James Morse
2021-10-01 16:02 ` [PATCH v2 02/23] x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() James Morse
2021-10-01 16:02 ` [PATCH v2 03/23] x86/resctrl: Kill off alloc_enabled James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 04/23] x86/resctrl: Merge mon_capable and mon_enabled James Morse
2021-10-19 23:18   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 05/23] x86/resctrl: Add domain online callback for resctrl work James Morse
2021-10-15 22:19   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 06/23] x86/resctrl: Group struct rdt_hw_domain cleanup James Morse
2021-10-01 16:02 ` [PATCH v2 07/23] x86/resctrl: Add domain offline callback for resctrl work James Morse
2021-10-19 23:19   ` Babu Moger
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 08/23] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-22 18:30     ` James Morse
2021-10-01 16:02 ` [PATCH v2 09/23] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
2021-10-15 22:26   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 10/23] x86/resctrl: Remove architecture copy of mbps_val James Morse
2021-10-15 22:27   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 11/23] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
2021-10-01 16:02 ` [PATCH v2 12/23] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
2021-10-07  6:13   ` tan.shaopeng
2021-10-27 16:50     ` James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 13/23] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:49     ` James Morse
2021-10-01 16:02 ` [PATCH v2 14/23] x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks James Morse
2021-10-15 22:28   ` Reinette Chatre
2021-10-27 16:50     ` James Morse
2021-10-27 20:41       ` Reinette Chatre
2021-10-29 15:50         ` James Morse
2021-10-29 22:22           ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 15/23] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 16/23] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
2021-10-07  6:16   ` tan.shaopeng
2021-10-01 16:02 ` [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() James Morse
2021-10-15 22:29   ` Reinette Chatre
2021-10-19 23:20   ` Babu Moger
2021-10-20 18:15     ` Reinette Chatre
2021-10-20 19:22       ` Babu Moger
2021-10-20 20:28         ` Reinette Chatre
2021-10-27 16:50           ` James Morse
2021-10-27 18:59             ` Babu Moger
2021-10-01 16:02 ` [PATCH v2 18/23] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
2021-10-15 22:30   ` Reinette Chatre
2021-10-01 16:02 ` [PATCH v2 19/23] x86/resctrl: Move mbm_overflow_count() " James Morse
2021-10-01 16:02 ` [PATCH v2 20/23] x86/resctrl: Move get_corrected_mbm_count() " James Morse
2021-10-01 16:03 ` [PATCH v2 21/23] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
2021-10-01 16:03 ` [PATCH v2 22/23] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
2021-10-01 16:03 ` [PATCH v2 23/23] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-10-13  2:09 ` [PATCH v2 00/23] " tan.shaopeng
2021-10-19 23:17 ` Babu Moger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.