linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
@ 2021-07-29 22:35 James Morse
  2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
                   ` (19 more replies)
  0 siblings, 20 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi folks,

This series builds on v7 of "x86/resctrl: Merge the CDP resources", available
from [0] or [1]

The aim of this series is to insert a split between the parts of the monitor
code that the architecture must implement, and those that are part of the
resctrl filesystem. The eventual aim is to move all filesystem parts out
to live in /fs/resctrl, so that resctrl can be wired up for MPAM.

What's MPAM? See the cover letter of a previous series. [1]

The series adds domain online/offline callbacks to allow the filesystem to
manage some of its structures itself, then moves all the 'mba_sc' behaviour
to be part of the filesystem.
This means another architecture doesn't need to provide an mbps_val array.
As its all software, the resctrl filesystem should be able to do this without
any help from the architecture code.

Finally __rmid_read() is refactored to be the API call that the architecture
provides to read a counter value. All the hardware specific overflow detection,
scaling and value correction should occur behind this helper.


This series is based on v5.14-rc1, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_monitors_in_bytes/v1


[0] git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/resctrl_merge_cdp/v7
[1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/

Thanks,

James Morse (20):
  x86/resctrl: Kill off alloc_enabled
  x86/resctrl: Merge mon_capable and mon_enabled
  x86/resctrl: Add domain online callback for resctrl work
  x86/resctrl: Add domain offline callback for resctrl work
  x86/resctrl: Create mba_sc configuration in the rdt_domain
  x86/resctrl: Switch over to the resctrl mbps_val list
  x86/resctrl: Remove architecture copy of mbps_val
  x86/resctrl: Remove set_mba_sc()s control array re-initialisation
  x86/resctrl: Abstract and use supports_mba_mbps()
  x86/resctrl: Allow update_mba_bw() to update controls directly
  x86/resctrl: Calculate bandwidth from the total bytes counter
  x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  x86/recstrl: Allow per-rmid arch private storage to be reset
  x86/resctrl: Abstract __rmid_read()
  x86/resctrl: Pass the required parameters into
    resctrl_arch_rmid_read()
  x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
  x86/resctrl: Move get_corrected_mbm_count() into
    resctrl_arch_rmid_read()
  x86/resctrl: Rename and change the units of resctrl_cqm_threshold
  x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's
    boot_cpu_data
  x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes

 arch/x86/kernel/cpu/resctrl/core.c        |  96 +++-------
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  76 +++++---
 arch/x86/kernel/cpu/resctrl/internal.h    |  62 +++----
 arch/x86/kernel/cpu/resctrl/monitor.c     | 190 +++++++++++--------
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 217 ++++++++++++++++++----
 include/linux/resctrl.h                   |  69 ++++++-
 7 files changed, 462 insertions(+), 250 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-08-11 12:12   ` Jamie Iles
  2021-09-01 21:18   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled James Morse
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

rdt_resources_all[] used to have extra entries for L2CODE/L2DATA entries.
These were hidden from resctrl by the alloc_enabled value.

Now that the L2/L2CODE/L2DATA resources have been merged together,
alloc_enabled doesn't mean anything, it always has the same value as
alloc_capable which indicates CAT is supported by this cache.

Remove alloc_enabled and its helpers.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        | 4 ----
 arch/x86/kernel/cpu/resctrl/internal.h    | 4 ----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 4 ++--
 include/linux/resctrl.h                   | 2 --
 5 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4b8813bafffd..2a20fcce9cc2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -147,7 +147,6 @@ static inline void cache_alloc_hsw_probe(void)
 	r->cache.shareable_bits = 0xc0000;
 	r->cache.min_cbm_bits = 2;
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	rdt_alloc_capable = true;
 }
@@ -211,7 +210,6 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 	thread_throttle_mode_init();
 
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	return true;
 }
@@ -242,7 +240,6 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	r->data_width = 4;
 
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 
 	return true;
 }
@@ -261,7 +258,6 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
 	r->cache.shareable_bits = ebx & r->default_ctrl;
 	r->data_width = (r->cache.cbm_len + 3) / 4;
 	r->alloc_capable = true;
-	r->alloc_enabled = true;
 }
 
 static void rdt_get_cdp_config(int level)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d647188a43b..53f3d275a98f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 	for_each_rdt_resource(r)					      \
 		if (r->mon_capable)
 
-#define for_each_alloc_enabled_rdt_resource(r)				      \
-	for_each_rdt_resource(r)					      \
-		if (r->alloc_enabled)
-
 #define for_each_mon_enabled_rdt_resource(r)				      \
 	for_each_rdt_resource(r)					      \
 		if (r->mon_enabled)
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index db813f819ad6..f810969ced4b 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -835,7 +835,7 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
 	 * First determine which cpus have pseudo-locked regions
 	 * associated with them.
 	 */
-	for_each_alloc_enabled_rdt_resource(r) {
+	for_each_alloc_capable_rdt_resource(r) {
 		list_for_each_entry(d_i, &r->domains, list) {
 			if (d_i->plr)
 				cpumask_or(cpu_with_psl, cpu_with_psl,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 89123a4977cf..09d87dbd72ca 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2104,7 +2104,7 @@ static int schemata_list_create(void)
 	struct rdt_resource *r;
 	int ret = 0;
 
-	for_each_alloc_enabled_rdt_resource(r) {
+	for_each_alloc_capable_rdt_resource(r) {
 		if (resctrl_arch_get_cdp_enabled(r->rid)) {
 			ret = schemata_list_add(r, CDP_CODE);
 			if (ret)
@@ -2450,7 +2450,7 @@ static void rdt_kill_sb(struct super_block *sb)
 	set_mba_sc(false);
 
 	/*Put everything back to default values. */
-	for_each_alloc_enabled_rdt_resource(r)
+	for_each_alloc_capable_rdt_resource(r)
 		reset_all_ctrls(r);
 	cdp_disable_all();
 	rmdir_all_sub();
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 18dd764af0dd..ada0a02093a6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,6 @@ struct resctrl_schema;
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
- * @alloc_enabled:	Is allocation enabled on this machine
  * @mon_enabled:	Is monitoring enabled for this feature
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
@@ -150,7 +149,6 @@ struct resctrl_schema;
  */
 struct rdt_resource {
 	int			rid;
-	bool			alloc_enabled;
 	bool			mon_enabled;
 	bool			alloc_capable;
 	bool			mon_capable;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-08-11 12:15   ` Jamie Iles
  2021-07-29 22:35 ` [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work James Morse
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

mon_enabled and mon_capable are always set as a pair by
rdt_get_mon_l3_config().

There is no point having two values.

Merge them together.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
 arch/x86/kernel/cpu/resctrl/monitor.c  | 1 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
 include/linux/resctrl.h                | 4 ++--
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 53f3d275a98f..8828b5c1b6d2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -459,10 +459,6 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
 	for_each_rdt_resource(r)					      \
 		if (r->mon_capable)
 
-#define for_each_mon_enabled_rdt_resource(r)				      \
-	for_each_rdt_resource(r)					      \
-		if (r->mon_enabled)
-
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
 	struct {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eb227298487f..035e45e711ae 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -712,7 +712,6 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
-	r->mon_enabled = true;
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 09d87dbd72ca..3eb5f94a9e7d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1763,7 +1763,7 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 			goto out_destroy;
 	}
 
-	for_each_mon_enabled_rdt_resource(r) {
+	for_each_mon_capable_rdt_resource(r) {
 		fflags =  r->fflags | RF_MON_INFO;
 		sprintf(name, "%s_MON", r->name);
 		ret = rdtgroup_mkdir_info_resdir(r, name, fflags);
@@ -2502,7 +2502,7 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
 	struct rdtgroup *prgrp, *crgrp;
 	char name[32];
 
-	if (!r->mon_enabled)
+	if (!r->mon_capable)
 		return;
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2570,7 +2570,7 @@ void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	struct rdtgroup *prgrp, *crgrp;
 	struct list_head *head;
 
-	if (!r->mon_enabled)
+	if (!r->mon_capable)
 		return;
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -2640,7 +2640,7 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 	 * Create the subdirectories for each domain. Note that all events
 	 * in a domain like L3 are grouped into a resource whose domain is L3
 	 */
-	for_each_mon_enabled_rdt_resource(r) {
+	for_each_mon_capable_rdt_resource(r) {
 		ret = mkdir_mondata_subdir_alldom(kn, r, prgrp);
 		if (ret)
 			goto out_destroy;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index ada0a02093a6..d715df9de37f 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -130,7 +130,7 @@ struct resctrl_schema;
 /**
  * struct rdt_resource - attributes of a resctrl resource
  * @rid:		The index of the resource
- * @mon_enabled:	Is monitoring enabled for this feature
+ * @cdp_enabled		Is CDP enabled for this resource
  * @alloc_capable:	Is allocation available on this machine
  * @mon_capable:	Is monitor feature available on this machine
  * @num_rmid:		Number of RMIDs available
@@ -149,7 +149,7 @@ struct resctrl_schema;
  */
 struct rdt_resource {
 	int			rid;
-	bool			mon_enabled;
+	bool			cdp_enabled;
 	bool			alloc_capable;
 	bool			mon_capable;
 	int			num_rmid;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
  2021-07-29 22:35 ` [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-09-01 21:19   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 04/20] x86/resctrl: Add domain offline " James Morse
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when cpu hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
allocate the memory.

Move domain_setup_mon_state(), the monitor subdir creation call and the
mbm/limbo workers into a new resctrl_online_domain() call.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 53 +++------------------
 arch/x86/kernel/cpu/resctrl/internal.h |  2 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 65 ++++++++++++++++++++++++--
 include/linux/resctrl.h                |  1 +
 4 files changed, 67 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2a20fcce9cc2..718f8ff00d87 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -443,42 +443,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
-{
-	size_t tsize;
-
-	if (is_llc_occupancy_enabled()) {
-		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
-		if (!d->rmid_busy_llc)
-			return -ENOMEM;
-		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
-	}
-	if (is_mbm_total_enabled()) {
-		tsize = sizeof(*d->mbm_total);
-		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-		if (!d->mbm_total) {
-			bitmap_free(d->rmid_busy_llc);
-			return -ENOMEM;
-		}
-	}
-	if (is_mbm_local_enabled()) {
-		tsize = sizeof(*d->mbm_local);
-		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
-		if (!d->mbm_local) {
-			bitmap_free(d->rmid_busy_llc);
-			kfree(d->mbm_total);
-			return -ENOMEM;
-		}
-	}
-
-	if (is_mbm_enabled()) {
-		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
-		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
-	}
-
-	return 0;
-}
-
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -498,6 +462,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	struct list_head *add_pos = NULL;
 	struct rdt_hw_domain *hw_dom;
 	struct rdt_domain *d;
+	int err;
 
 	d = rdt_find_domain(r, id, &add_pos);
 	if (IS_ERR(d)) {
@@ -527,19 +492,13 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (r->mon_capable && domain_setup_mon_state(r, d)) {
-		kfree(d);
-		return;
-	}
-
 	list_add_tail(&d->list, add_pos);
 
-	/*
-	 * If resctrl is mounted, add
-	 * per domain monitor data directories.
-	 */
-	if (static_branch_unlikely(&rdt_mon_enable_key))
-		mkdir_mondata_subdir_allrdtgrp(r, d);
+	err = resctrl_online_domain(r, d);
+	if (err) {
+		list_del(&d->list);
+		kfree(d);
+	}
 }
 
 static void domain_remove_cpu(int cpu, struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 8828b5c1b6d2..be48a682dbdb 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -524,8 +524,6 @@ void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 				    unsigned int dom_id);
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    struct rdt_domain *d);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3eb5f94a9e7d..e1af1d81b924 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2563,16 +2563,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
  * Add all subdirectories of mon_data for "ctrl_mon" groups
  * and "monitor" groups with given domain id.
  */
-void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    struct rdt_domain *d)
+static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+					   struct rdt_domain *d)
 {
 	struct kernfs_node *parent_kn;
 	struct rdtgroup *prgrp, *crgrp;
 	struct list_head *head;
 
-	if (!r->mon_capable)
-		return;
-
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		parent_kn = prgrp->mon.mon_data_kn;
 		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
@@ -3232,6 +3229,64 @@ static int __init rdtgroup_setup_root(void)
 	return ret;
 }
 
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
+{
+	size_t tsize;
+
+	if (is_llc_occupancy_enabled()) {
+		d->rmid_busy_llc = bitmap_zalloc(r->num_rmid, GFP_KERNEL);
+		if (!d->rmid_busy_llc)
+			return -ENOMEM;
+	}
+	if (is_mbm_total_enabled()) {
+		tsize = sizeof(*d->mbm_total);
+		d->mbm_total = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		if (!d->mbm_total) {
+			bitmap_free(d->rmid_busy_llc);
+			return -ENOMEM;
+		}
+	}
+	if (is_mbm_local_enabled()) {
+		tsize = sizeof(*d->mbm_local);
+		d->mbm_local = kcalloc(r->num_rmid, tsize, GFP_KERNEL);
+		if (!d->mbm_local) {
+			bitmap_free(d->rmid_busy_llc);
+			kfree(d->mbm_total);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	int err;
+
+	lockdep_assert_held(&rdtgroup_mutex); // the arch code took this for us
+
+	if (!r->mon_capable)
+		return 0;
+
+	err = domain_setup_mon_state(r, d);
+	if (err)
+		return err;
+
+	if (is_mbm_enabled()) {
+		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
+		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+	}
+
+	if (is_llc_occupancy_enabled())
+		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+
+	/* If resctrl is mounted, add per domain monitor data directories. */
+	if (static_branch_unlikely(&rdt_enable_key))
+		mkdir_mondata_subdir_allrdtgrp(r, d);
+
+	return 0;
+}
+
 /*
  * rdtgroup_init - rdtgroup initialization
  *
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d715df9de37f..cb48976a8d88 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -195,5 +195,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 closid, enum resctrl_conf_type type,
 			     u32 *value);
+int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 04/20] x86/resctrl: Add domain offline callback for resctrl work
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (2 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-08-11 16:10   ` Jamie Iles
  2021-09-01 21:21   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Because domains are exposed to user-space via resctrl, the filesystem
must update its state when cpu hotplug callbacks are triggered.

Some of this work is common to any architecture that would support
resctrl, but the work is tied up with the architecture code to
free the memory.

Move the monitor subdir removal and the cancelling of the mbm/limbo
works into a new resctrl_offline_domain() call.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 24 +---------------
 arch/x86/kernel/cpu/resctrl/internal.h |  2 --
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++---
 include/linux/resctrl.h                |  1 +
 4 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 718f8ff00d87..56b3541617b5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -516,27 +516,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 
 	cpumask_clear_cpu(cpu, &d->cpu_mask);
 	if (cpumask_empty(&d->cpu_mask)) {
-		/*
-		 * If resctrl is mounted, remove all the
-		 * per domain monitor data directories.
-		 */
-		if (static_branch_unlikely(&rdt_mon_enable_key))
-			rmdir_mondata_subdir_allrdtgrp(r, d->id);
+		resctrl_offline_domain(r, d);
 		list_del(&d->list);
-		if (r->mon_capable && is_mbm_enabled())
-			cancel_delayed_work(&d->mbm_over);
-		if (is_llc_occupancy_enabled() &&  has_busy_rmid(r, d)) {
-			/*
-			 * When a package is going down, forcefully
-			 * decrement rmid->ebusy. There is no way to know
-			 * that the L3 was flushed and hence may lead to
-			 * incorrect counts in rare scenarios, but leaving
-			 * the RMID as busy creates RMID leaks if the
-			 * package never comes back.
-			 */
-			__check_limbo(d, true);
-			cancel_delayed_work(&d->cqm_limbo);
-		}
 
 		/*
 		 * rdt_domain "d" is going to be freed below, so clear
@@ -547,9 +528,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 
 		kfree(hw_dom->ctrl_val);
 		kfree(hw_dom->mbps_val);
-		bitmap_free(d->rmid_busy_llc);
-		kfree(d->mbm_total);
-		kfree(d->mbm_local);
 		kfree(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index be48a682dbdb..e12b55f815bf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -522,8 +522,6 @@ void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-				    unsigned int dom_id);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e1af1d81b924..cf0db0b7a5d0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2497,14 +2497,12 @@ static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
  * Remove all subdirectories of mon_data of ctrl_mon groups
  * and monitor groups with given domain id.
  */
-void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+					   unsigned int dom_id)
 {
 	struct rdtgroup *prgrp, *crgrp;
 	char name[32];
 
-	if (!r->mon_capable)
-		return;
-
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		sprintf(name, "mon_%s_%02d", r->name, dom_id);
 		kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
@@ -3229,6 +3227,39 @@ static int __init rdtgroup_setup_root(void)
 	return ret;
 }
 
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
+{
+	lockdep_assert_held(&rdtgroup_mutex); // the arch code took this for us
+
+	if (!r->mon_capable)
+		return;
+
+	/*
+	 * If resctrl is mounted, remove all the
+	 * per domain monitor data directories.
+	 */
+	if (static_branch_unlikely(&rdt_mon_enable_key))
+		rmdir_mondata_subdir_allrdtgrp(r, d->id);
+
+	if (r->mon_capable && is_mbm_enabled())
+		cancel_delayed_work(&d->mbm_over);
+	if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
+		/*
+		 * When a package is going down, forcefully
+		 * decrement rmid->ebusy. There is no way to know
+		 * that the L3 was flushed and hence may lead to
+		 * incorrect counts in rare scenarios, but leaving
+		 * the RMID as busy creates RMID leaks if the
+		 * package never comes back.
+		 */
+		__check_limbo(d, true);
+		cancel_delayed_work(&d->cqm_limbo);
+	}
+	bitmap_free(d->rmid_busy_llc);
+	kfree(d->mbm_total);
+	kfree(d->mbm_local);
+}
+
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 {
 	size_t tsize;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index cb48976a8d88..7aeeb4ac6bcc 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -196,5 +196,6 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 closid, enum resctrl_conf_type type,
 			     u32 *value);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
+void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (3 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 04/20] x86/resctrl: Add domain offline " James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-08-11 16:32   ` Jamie Iles
  2021-09-01 21:22   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

To support resctrl's MBA software controller, the architecture must provide
a second configuration array to hold the mbps_val from user-space.

This complicates the interface between the architecture code.

Make the filesystem parts of resctrl create an array for the mba_sc
values when the struct resctrl_schema is created. The software controller
can be changed to use this, allowing the architecture code to only
consider the values configured in hardware.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 69 ++++++++++++++++++++++++++
 include/linux/resctrl.h                | 13 +++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e12b55f815bf..a7e2cbce29d5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -36,7 +36,6 @@
 #define MBM_OVERFLOW_INTERVAL		1000
 #define MAX_MBA_BW			100u
 #define MBA_IS_LINEAR			0x4
-#define MBA_MAX_MBPS			U32_MAX
 #define MAX_MBA_BW_AMD			0x800
 #define MBM_CNTR_WIDTH_OFFSET_AMD	20
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cf0db0b7a5d0..185f9bb992d1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2030,6 +2030,60 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 			     struct rdtgroup *prgrp,
 			     struct kernfs_node **mon_data_kn);
 
+static int mba_sc_domain_allocate(struct rdt_resource *res,
+				  struct rdt_domain *d)
+{
+	u32 num_closid = closid_free_map_len;
+	int cpu = cpumask_any(&d->cpu_mask);
+	int i;
+
+	d->mba_sc = kcalloc_node(num_closid, sizeof(*d->mba_sc),
+				 GFP_KERNEL, cpu_to_node(cpu));
+	if (!d->mba_sc)
+		return -ENOMEM;
+
+	for (i = 0; i < num_closid; i++)
+		d->mba_sc[i].mbps_val = MBA_MAX_MBPS;
+
+	return 0;
+}
+
+static int mba_sc_allocate(struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+	int ret;
+
+	lockdep_assert_cpus_held();
+
+	if (!is_mba_sc(r))
+		return 0;
+
+	list_for_each_entry(d, &r->domains, list) {
+		ret = mba_sc_domain_allocate(r, d);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static void mba_sc_domain_destroy(struct rdt_resource *r,
+				  struct rdt_domain *d)
+{
+	kfree(d->mba_sc);
+	d->mba_sc = NULL;
+}
+
+static void mba_sc_destroy(struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+
+	lockdep_assert_cpus_held();
+
+	list_for_each_entry(d, &r->domains, list)
+		mba_sc_domain_destroy(r, d);
+}
+
 static int rdt_enable_ctx(struct rdt_fs_context *ctx)
 {
 	int ret = 0;
@@ -2117,17 +2171,27 @@ static int schemata_list_create(void)
 
 		if (ret)
 			break;
+
+		ret = mba_sc_allocate(r);
+		if (ret)
+			break;
 	}
 
 	return ret;
 }
 
+/*
+ * During rdt_kill_sb(), the mba_sc state is reset before
+ * destroy_schemata_list() is called: unconditionally try to free the
+ * array.
+ */
 static void schemata_list_destroy(void)
 {
 	struct resctrl_schema *s, *tmp;
 
 	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
 		list_del(&s->list);
+		mba_sc_destroy(s->res);
 		kfree(s);
 	}
 }
@@ -3255,6 +3319,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 		__check_limbo(d, true);
 		cancel_delayed_work(&d->cqm_limbo);
 	}
+	if (static_branch_unlikely(&rdt_enable_key) && is_mba_sc(r))
+		mba_sc_domain_destroy(r, d);
 	bitmap_free(d->rmid_busy_llc);
 	kfree(d->mbm_total);
 	kfree(d->mbm_local);
@@ -3287,6 +3353,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
 		}
 	}
 
+	if (is_mba_sc(r))
+		mba_sc_domain_allocate(r, d);
+
 	return 0;
 }
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 7aeeb4ac6bcc..3c8522d63261 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m,
 
 #endif
 
+/* max value for struct resctrl_mba_sc's mbps_val */
+#define MBA_MAX_MBPS   U32_MAX
+
 /**
  * enum resctrl_conf_type - The type of configuration.
  * @CDP_NONE:	No prioritisation, both code and data are controlled or monitored.
@@ -39,6 +42,14 @@ struct resctrl_staged_config {
 	bool			have_new_ctrl;
 };
 
+/**
+ * struct resctrl_mba_sc - per-closid values for the control loop
+ * @mbps_val:		The user's specified control value
+ */
+struct resctrl_mba_sc {
+	u32		mbps_val;
+};
+
 /**
  * struct rdt_domain - group of CPUs sharing a resctrl resource
  * @list:		all instances of this resource
@@ -53,6 +64,7 @@ struct resctrl_staged_config {
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
  * @plr:		pseudo-locked region (if any) associated with domain
  * @staged_config:	parsed configuration to be applied
+ * @mba_sc:	the mba software controller properties, indexed by closid
  */
 struct rdt_domain {
 	struct list_head		list;
@@ -67,6 +79,7 @@ struct rdt_domain {
 	int				cqm_work_cpu;
 	struct pseudo_lock_region	*plr;
 	struct resctrl_staged_config	staged_config[CDP_NUM_TYPES];
+	struct resctrl_mba_sc		*mba_sc;
 };
 
 /**
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (4 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-09-01 21:25   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val James Morse
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Updates to resctrl's software controller follow the same path as
other configuration updates, but they don't modify the hardware state.
rdtgroup_schemata_write() uses parse_line() and the resource's
ctrlval_parse function to stage the configuration.
resctrl_arch_update_domains() then updates the mbps_val[] array
instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
call that would update hardware.

This complicates the interface between resctrl's filesystem parts
and architecture specific code. It should be possible for mba_sc
to be completely implemented by the filesystem parts of resctrl. This
would allow it to work on a second architecture with no additional code.

Change parse_bw() to write the configuration value directly to the
mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
to skip the call to resctrl_arch_update_domains(), meaning all the
mba_sc specific code in resctrl_arch_update_domains() can be removed.
On the read-side, show_doms() and update_mba_bw() are changed to read
the mba_sc[] array from the domain structure. With this,
resctrl_arch_get_config() no longer needs to consider mba_sc resources.

Change parse_bw() to write these values directly, meaning
rdtgroup_schemata_write() never needs to call update_domains()
for mba_sc resources.

Get show_doms() to test is_mba_sc() and retrieve the value
directly, instead of using get_config() for the hardware value.

This means the arch code's resctrl_arch_get_config() and
resctrl_arch_update_domains() no longer need to be aware of
mba_sc, and we can get rid of the update_mba_bw() code that
reaches into the hw_dom to get the msr value.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 45 ++++++++++++++---------
 arch/x86/kernel/cpu/resctrl/monitor.c     | 10 ++---
 2 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 9e1c6730520b..56789ea11185 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -61,6 +61,7 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 	     struct rdt_domain *d)
 {
 	struct resctrl_staged_config *cfg;
+	u32 closid = data->rdtgrp->closid;
 	struct rdt_resource *r = s->res;
 	unsigned long bw_val;
 
@@ -72,6 +73,12 @@ int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 
 	if (!bw_validate(data->buf, &bw_val, r))
 		return -EINVAL;
+
+	if (is_mba_sc(r)) {
+		d->mba_sc[closid].mbps_val = bw_val;
+		return 0;
+	}
+
 	cfg->new_ctrl = bw_val;
 	cfg->have_new_ctrl = true;
 
@@ -261,14 +268,13 @@ static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
 
 static bool apply_config(struct rdt_hw_domain *hw_dom,
 			 struct resctrl_staged_config *cfg, u32 idx,
-			 cpumask_var_t cpu_mask, bool mba_sc)
+			 cpumask_var_t cpu_mask)
 {
 	struct rdt_domain *dom = &hw_dom->d_resctrl;
-	u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;
 
-	if (cfg->new_ctrl != dc[idx]) {
+	if (cfg->new_ctrl != hw_dom->ctrl_val[idx]) {
 		cpumask_set_cpu(cpumask_any(&dom->cpu_mask), cpu_mask);
-		dc[idx] = cfg->new_ctrl;
+		hw_dom->ctrl_val[idx] = cfg->new_ctrl;
 
 		return true;
 	}
@@ -284,14 +290,12 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 	enum resctrl_conf_type t;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	bool mba_sc;
 	int cpu;
 	u32 idx;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
 
-	mba_sc = is_mba_sc(r);
 	msr_param.res = NULL;
 	list_for_each_entry(d, &r->domains, list) {
 		hw_dom = resctrl_to_arch_dom(d);
@@ -301,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 				continue;
 
 			idx = get_config_index(closid, t);
-			if (!apply_config(hw_dom, cfg, idx, cpu_mask, mba_sc))
+			if (!apply_config(hw_dom, cfg, idx, cpu_mask))
 				continue;
 
 			if (!msr_param.res) {
@@ -315,11 +319,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 		}
 	}
 
-	/*
-	 * Avoid writing the control msr with control values when
-	 * MBA software controller is enabled
-	 */
-	if (cpumask_empty(cpu_mask) || mba_sc)
+	if (cpumask_empty(cpu_mask))
 		goto done;
 	cpu = get_cpu();
 	/* Update resource control msr on this CPU if it's in cpu_mask. */
@@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
+
+		/*
+		 * Writes to mba_sc resources update the software controller,
+		 * not the control msr.
+		 */
+		if (is_mba_sc(r))
+			continue;
+
 		ret = resctrl_arch_update_domains(r, rdtgrp->closid);
 		if (ret)
 			goto out;
@@ -433,10 +441,7 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	u32 idx = get_config_index(closid, type);
 
-	if (!is_mba_sc(r))
-		*value = hw_dom->ctrl_val[idx];
-	else
-		*value = hw_dom->mbps_val[idx];
+	*value = hw_dom->ctrl_val[idx];
 }
 
 static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int closid)
@@ -451,8 +456,12 @@ static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int clo
 		if (sep)
 			seq_puts(s, ";");
 
-		resctrl_arch_get_config(r, dom, closid, schema->conf_type,
-					&ctrl_val);
+		if (is_mba_sc(r))
+			ctrl_val = dom->mba_sc[closid].mbps_val;
+		else
+			resctrl_arch_get_config(r, dom, closid,
+						schema->conf_type, &ctrl_val);
+
 		seq_printf(s, r->format_str, dom->id, max_data_width,
 			   ctrl_val);
 		sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 035e45e711ae..dcf3a73e2c17 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -442,13 +442,11 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
 	cur_bw = pmbm_data->prev_bw;
-	resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE, &user_bw);
+	user_bw = dom_mba->mba_sc[closid].mbps_val;
 	delta_bw = pmbm_data->delta_bw;
-	/*
-	 * resctrl_arch_get_config() chooses the mbps/ctrl value to return
-	 * based on is_mba_sc(). For now, reach into the hw_dom.
-	 */
-	cur_msr_val = hw_dom_mba->ctrl_val[closid];
+
+	/* MBA monitor resource doesn't support CDP */
+	resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_NONE, &cur_msr_val);
 
 	/*
 	 * For Ctrl groups read data from child monitor groups.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (5 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-09-01 21:26   ` Reinette Chatre
  2021-07-29 22:35 ` [PATCH v1 08/20] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

The resctrl arch code provides a second configuration array mbps_val[]
for the mba socftware controller.

Since resctrl switched over to allocating and freeing its own array
when needed, nothing uses the arch code version.

Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 20 ++++----------------
 arch/x86/kernel/cpu/resctrl/internal.h |  4 +---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 56b3541617b5..e864dbc6fe3d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -397,7 +397,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 	return NULL;
 }
 
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
+void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	int i;
@@ -406,12 +406,9 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
 	 * Initialize the Control MSRs to having no control.
 	 * For Cache Allocation: Set all bits in cbm
 	 * For Memory Allocation: Set b/w requested to 100%
-	 * and the bandwidth in MBps to U32_MAX
 	 */
-	for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
+	for (i = 0; i < hw_res->num_closid; i++, dc++)
 		*dc = r->default_ctrl;
-		*dm = MBA_MAX_MBPS;
-	}
 }
 
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
@@ -419,23 +416,15 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	struct msr_param m;
-	u32 *dc, *dm;
+	u32 *dc;
 
 	dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
 			   GFP_KERNEL);
 	if (!dc)
 		return -ENOMEM;
 
-	dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
-			   GFP_KERNEL);
-	if (!dm) {
-		kfree(dc);
-		return -ENOMEM;
-	}
-
 	hw_dom->ctrl_val = dc;
-	hw_dom->mbps_val = dm;
-	setup_default_ctrlval(r, dc, dm);
+	setup_default_ctrlval(r, dc);
 
 	m.low = 0;
 	m.high = hw_res->num_closid;
@@ -527,7 +516,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 			d->plr->d = NULL;
 
 		kfree(hw_dom->ctrl_val);
-		kfree(hw_dom->mbps_val);
 		kfree(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a7e2cbce29d5..796e13a0e8dc 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -308,14 +308,12 @@ struct mbm_state {
  *			  a resource
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
- * @mbps_val:	When mba_sc is enabled, this holds the bandwidth in MBps
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
 struct rdt_hw_domain {
 	struct rdt_domain		d_resctrl;
 	u32				*ctrl_val;
-	u32				*mbps_val;
 };
 
 static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
@@ -529,7 +527,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
 void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
-void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
+void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
 u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 185f9bb992d1..297c20491549 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1906,7 +1906,7 @@ static int set_mba_sc(bool mba_sc)
 	r->membw.mba_sc = mba_sc;
 	list_for_each_entry(d, &r->domains, list) {
 		hw_dom = resctrl_to_arch_dom(d);
-		setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
+		setup_default_ctrlval(r, hw_dom->ctrl_val);
 	}
 
 	return 0;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 08/20] x86/resctrl: Remove set_mba_sc()s control array re-initialisation
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (6 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-07-29 22:35 ` [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

set_mba_sc() is called by rdt_enable_ctx() during mount()
and rdt_kill_sb(). It currently re-initialises the arch code's control
value array.

These values are already set to their default when the domain is created,
and when rdt_kill_sb() is called, (via reset_all_ctrls()). set_mba_sc()s
extra call to setup_default_ctrlval() isn't needed as the values are
already at their defaults due to the creation of the domain, or reset
during umount().

Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 297c20491549..e321ea5de562 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1896,18 +1896,12 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
 static int set_mba_sc(bool mba_sc)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
-	struct rdt_hw_domain *hw_dom;
-	struct rdt_domain *d;
 
 	if (!is_mbm_enabled() || !is_mba_linear() ||
 	    mba_sc == is_mba_sc(r))
 		return -EINVAL;
 
 	r->membw.mba_sc = mba_sc;
-	list_for_each_entry(d, &r->domains, list) {
-		hw_dom = resctrl_to_arch_dom(d);
-		setup_default_ctrlval(r, hw_dom->ctrl_val);
-	}
 
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (7 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 08/20] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
@ 2021-07-29 22:35 ` James Morse
  2021-09-01 21:27   ` Reinette Chatre
  2021-09-24  6:23   ` tan.shaopeng
  2021-07-29 22:36 ` [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:35 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

To determine whether the mba_mbps option to resctrl should be supported,
resctrl tests the boot cpus' x86_vendor.

This isn't portable, and needs abstracting behind a helper so this check
can be part of the filesystem code that moves to /fs/.

Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
on this system. An 'alloc_capable' test is added so that this property
isn't implied by 'linear'.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e321ea5de562..4388685548a0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1888,17 +1888,26 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
 }
 
 /*
- * Enable or disable the MBA software controller
- * which helps user specify bandwidth in MBps.
  * MBA software controller is supported only if
  * MBM is supported and MBA is in linear scale.
  */
+static bool supports_mba_mbps(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
+	return (is_mbm_enabled() &&
+		r->alloc_capable && is_mba_linear());
+}
+
+/*
+ * Enable or disable the MBA software controller
+ * which helps user specify bandwidth in MBps.
+ */
 static int set_mba_sc(bool mba_sc)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
 
-	if (!is_mbm_enabled() || !is_mba_linear() ||
-	    mba_sc == is_mba_sc(r))
+	if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
 		return -EINVAL;
 
 	r->membw.mba_sc = mba_sc;
@@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
 		ctx->enable_cdpl2 = true;
 		return 0;
 	case Opt_mba_mbps:
-		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		if (supports_mba_mbps())
 			return -EINVAL;
 		ctx->enable_mba_mbps = true;
 		return 0;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (8 preceding siblings ...)
  2021-07-29 22:35 ` [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-09-01 21:28   ` Reinette Chatre
  2021-07-29 22:36 ` [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter James Morse
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

update_mba_bw() calculates a new control value for the MBA resource
based on the user provided mbps_val and the current measured
bandwidth. Some control values need remapping by delay_bw_map().

It does this by calling wrmsrl() directly. This needs splitting
up to be done by an architecture specific helper, so that the
remainder can eventually be moved to /fs/.

Add resctrl_arch_update_one() to apply one configuration value
to the provided resource and domain. This avoids the staging
and cross-calling that is only needed with changes made by
user-space. delay_bw_map() moves to be part of the arch code,
to maintain the 'percentage control' view of mba resources
in resctrl.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        |  2 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 21 +++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h    |  1 -
 arch/x86/kernel/cpu/resctrl/monitor.c     | 13 ++++---------
 include/linux/resctrl.h                   |  8 ++++++++
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e864dbc6fe3d..8a3c13c6c19f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -296,7 +296,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
  * that can be written to QOS_MSRs.
  * There are currently no SKUs which support non linear delay values.
  */
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
 {
 	if (r->membw.delay_linear)
 		return MAX_MBA_BW - bw;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 56789ea11185..5104f39928fd 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
 	return false;
 }
 
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	u32 idx = get_config_index(closid, t);
+	struct msr_param msr_param;
+
+	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+		return -EINVAL;
+
+	hw_dom->ctrl_val[idx] = cfg_val;
+
+	msr_param.res = r;
+	msr_param.low = idx;
+	msr_param.high = idx + 1;
+
+	rdt_ctrl_update(&msr_param);
+
+	return 0;
+}
+
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 {
 	struct resctrl_staged_config *cfg;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 796e13a0e8dc..1b07e49564cf 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -528,7 +528,6 @@ void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
 void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
-u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
 void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index dcf3a73e2c17..b178329d3661 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -415,10 +415,8 @@ void mon_event_count(void *info)
  */
 static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 {
-	u32 closid, rmid, cur_msr, cur_msr_val, new_msr_val;
+	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
-	struct rdt_hw_resource *hw_r_mba;
-	struct rdt_hw_domain *hw_dom_mba;
 	u32 cur_bw, delta_bw, user_bw;
 	struct rdt_resource *r_mba;
 	struct rdt_domain *dom_mba;
@@ -428,8 +426,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	if (!is_mbm_local_enabled())
 		return;
 
-	hw_r_mba = &rdt_resources_all[RDT_RESOURCE_MBA];
-	r_mba = &hw_r_mba->r_resctrl;
+	r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
+
 	closid = rgrp->closid;
 	rmid = rgrp->mon.rmid;
 	pmbm_data = &dom_mbm->mbm_local[rmid];
@@ -439,7 +437,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 		pr_warn_once("Failure to get domain for MBA update\n");
 		return;
 	}
-	hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
 	cur_bw = pmbm_data->prev_bw;
 	user_bw = dom_mba->mba_sc[closid].mbps_val;
@@ -481,9 +478,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 		return;
 	}
 
-	cur_msr = hw_r_mba->msr_base + closid;
-	wrmsrl(cur_msr, delay_bw_map(new_msr_val, r_mba));
-	hw_dom_mba->ctrl_val[closid] = new_msr_val;
+	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
 
 	/*
 	 * Delta values are updated dynamically package wise for each
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 3c8522d63261..4fe2d5500315 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -205,6 +205,14 @@ struct resctrl_schema {
 /* The number of closid supported by this resource regardless of CDP */
 u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+
+/*
+ * Update the ctrl_val and apply this config right now.
+ * Must be called on one of the domains cpus.
+ */
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 closid, enum resctrl_conf_type t, u32 cfg_val);
+
 void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 closid, enum resctrl_conf_type type,
 			     u32 *value);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (9 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-09-01 21:31   ` Reinette Chatre
  2021-07-29 22:36 ` [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

mbm_bw_count() maintains its own copy of prev_msr to allow it to
calculate the bandwidth as the number of chunks counted since the
last time mbm_bw_count() was invoked.

The prev_msr and chunks values are in a format specific to the
architecture. MPAM's monitor scaling can be enabled for some
counters and not others. If the value is changed once the bandwidth
counter passes some threshold, the prev_msr values need to be
converted to the new scale. Having two prev_msr values is a
hindrance to moving this logic behind an architecture specific
function that returns the counters number of bytes.

Change mbm_bw_count() to calculate the total number of bytes the
counter has seen in the same way as __mon_event_count(), then
calculate the bandwidth from that. prev_bw_msr is replaced by
prev_bw_chunks, the chunks value from the last time mbm_bw_count()
was invoked.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 15 ++++++++++-----
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1b07e49564cf..0a5721e1cc07 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -289,7 +289,7 @@ struct rftype {
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr:	Value of IA32_QM_CTR for this RMID last time we read it
- * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
+ * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
@@ -297,7 +297,7 @@ struct rftype {
 struct mbm_state {
 	u64	chunks;
 	u64	prev_msr;
-	u64	prev_bw_msr;
+	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
 	bool	delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b178329d3661..af60e154f0ed 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -316,7 +316,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 
 	if (rr->first) {
 		memset(m, 0, sizeof(struct mbm_state));
-		m->prev_bw_msr = m->prev_msr = tval;
+		m->prev_bw_chunks = m->prev_msr = tval;
 		return 0;
 	}
 
@@ -337,20 +337,25 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 tval, cur_bw, chunks;
+	u64 tval, cur_bw, chunks, bw_chunks;
 
 	tval = __rmid_read(rmid, rr->evtid);
 	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
 		return;
 
-	chunks = mbm_overflow_count(m->prev_bw_msr, tval, hw_res->mbm_width);
-	cur_bw = (get_corrected_mbm_count(rmid, chunks) * hw_res->mon_scale) >> 20;
+	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
+	m->chunks += chunks;
+	m->prev_msr = tval;
+	bw_chunks = get_corrected_mbm_count(rmid, m->chunks);
+
+	cur_bw = (bw_chunks - m->prev_bw_chunks) * hw_res->mon_scale;
+	cur_bw >>= 20;
 
 	if (m->delta_comp)
 		m->delta_bw = abs(cur_bw - m->prev_bw);
 	m->delta_comp = false;
 	m->prev_bw = cur_bw;
-	m->prev_bw_msr = tval;
+	m->prev_bw_chunks = bw_chunks;
 }
 
 /*
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (10 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-08-11 17:14   ` Jamie Iles
  2021-07-29 22:36 ` [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. For bandwidth
counters the resctrl filesystem provides the number of bytes
ever seen. Internally mba_sc uses a second prev_bw_msr to calculate
the bytes read since the last mba_sc invocation.

MPAM's scaling of counters can be changed at runtime, reducing the
resolution but increasing the range. When this is changed the prev_msr
values need to be converted by the architecture code.

Add storage for per-rmid private storage. The prev_msr and chunks
values will move here to allow resctrl_arch_rmid_read() to always
return the number of bytes read by this counter.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     | 29 ++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 8a3c13c6c19f..27c93d12ca27 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -432,6 +432,28 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 	return 0;
 }
 
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
+{
+	size_t tsize;
+
+	if (is_mbm_total_enabled()) {
+		tsize = sizeof(*hw_dom->arch_mbm_total);
+		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
+		if (!hw_dom->arch_mbm_total)
+			return -ENOMEM;
+	}
+	if (is_mbm_local_enabled()) {
+		tsize = sizeof(*hw_dom->arch_mbm_local);
+		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
+		if (!hw_dom->arch_mbm_local) {
+			kfree(hw_dom->arch_mbm_total);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -481,6 +503,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 		return;
 	}
 
+	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+		kfree(d);
+		return;
+	}
+
 	list_add_tail(&d->list, add_pos);
 
 	err = resctrl_online_domain(r, d);
@@ -516,6 +543,8 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 			d->plr->d = NULL;
 
 		kfree(hw_dom->ctrl_val);
+		kfree(hw_dom->arch_mbm_total);
+		kfree(hw_dom->arch_mbm_local);
 		kfree(hw_dom);
 		return;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0a5721e1cc07..aaae900a8ef3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -303,17 +303,30 @@ struct mbm_state {
 	bool	delta_comp;
 };
 
+/**
+ * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
+ *			   return value.
+ * @prev_msr	Value of IA32_QM_CTR for this RMID last time we read it
+ */
+struct arch_mbm_state {
+	u64	prev_msr;
+};
+
 /**
  * struct rdt_hw_domain - Arch private attributes of a set of CPUs that share
  *			  a resource
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
+ * @arch_mbm_total:	arch private state for MBM total bandwidth
+ * @arch_mbm_local:	arch private state for MBM local bandwidth
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
 struct rdt_hw_domain {
 	struct rdt_domain		d_resctrl;
 	u32				*ctrl_val;
+	struct arch_mbm_state		*arch_mbm_total;
+	struct arch_mbm_state		*arch_mbm_local;
 };
 
 static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (11 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-09-24  6:34   ` tan.shaopeng
  2021-07-29 22:36 ` [PATCH v1 14/20] x86/resctrl: Abstract __rmid_read() James Morse
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

To abstract the rmid counters into a helper that returns the number
of bytes counted, architecture specific per-rmid state is needed.

It needs to be possible to reset this hidden state, as the values
may outlive the life of an rmid, or the mount time of the filesystem.

mon_event_read() is called with first = true when an rmid is first
allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid()
and call it from __mon_event_count()'s rr->first check.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++--------
 arch/x86/kernel/cpu/resctrl/monitor.c  | 38 +++++++++++++++++++++-----
 include/linux/resctrl.h                | 23 ++++++++++++++++
 3 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index aaae900a8ef3..f3f31315a907 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -22,14 +22,6 @@
 
 #define L2_QOS_CDP_ENABLE		0x01ULL
 
-/*
- * Event IDs are used to program IA32_QM_EVTSEL before reading event
- * counter from IA32_QM_CTR
- */
-#define QOS_L3_OCCUP_EVENT_ID		0x01
-#define QOS_L3_MBM_TOTAL_EVENT_ID	0x02
-#define QOS_L3_MBM_LOCAL_EVENT_ID	0x03
-
 #define CQM_LIMBOCHECK_INTERVAL	1000
 
 #define MBM_CNTR_WIDTH_BASE		24
@@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
-	u32			evtid;
+	enum resctrl_event_id	evtid;
 	char			*name;
 	struct list_head	list;
 };
@@ -90,9 +82,9 @@ struct mon_evt {
 union mon_data_bits {
 	void *priv;
 	struct {
-		unsigned int rid	: 10;
-		unsigned int evtid	: 8;
-		unsigned int domid	: 14;
+		unsigned int rid		: 10;
+		enum resctrl_event_id evtid	: 8;
+		unsigned int domid		: 14;
 	} u;
 };
 
@@ -100,7 +92,7 @@ struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
 	struct rdt_domain	*d;
-	int			evtid;
+	enum resctrl_event_id	evtid;
 	bool			first;
 	u64			val;
 };
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index af60e154f0ed..3b8b29470a5c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -137,7 +137,34 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
 	return entry;
 }
 
-static u64 __rmid_read(u32 rmid, u32 eventid)
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain *hw_dom,
+						 u32 rmid,
+						 enum resctrl_event_id eventid)
+{
+	switch (eventid) {
+	case QOS_L3_OCCUP_EVENT_ID:
+		return NULL;
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return &hw_dom->arch_mbm_total[rmid];
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return &hw_dom->arch_mbm_local[rmid];
+	}
+
+	return NULL;
+}
+
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+			     u32 rmid, enum resctrl_event_id eventid)
+{
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	struct arch_mbm_state *m;
+
+	m = get_arch_mbm_state(hw_dom, rmid, eventid);
+	if (m)
+		memset(m, 0, sizeof(*m));
+}
+
+static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
 {
 	u64 val;
 
@@ -291,6 +318,9 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	struct mbm_state *m;
 	u64 chunks, tval;
 
+	if (rr->first)
+		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
+
 	tval = __rmid_read(rmid, rr->evtid);
 	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
 		rr->val = tval;
@@ -306,12 +336,6 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	case QOS_L3_MBM_LOCAL_EVENT_ID:
 		m = &rr->d->mbm_local[rmid];
 		break;
-	default:
-		/*
-		 * Code would never reach here because
-		 * an invalid event id would fail the __rmid_read.
-		 */
-		return -EINVAL;
 	}
 
 	if (rr->first) {
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4fe2d5500315..79e83ce3dfbc 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -32,6 +32,16 @@ enum resctrl_conf_type {
 
 #define CDP_NUM_TYPES	(CDP_DATA + 1)
 
+/*
+ * Event IDs, the values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
+enum resctrl_event_id {
+	QOS_L3_OCCUP_EVENT_ID		= 0x01,
+	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
+	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
+};
+
 /**
  * struct resctrl_staged_config - parsed configuration to be applied
  * @new_ctrl:		new ctrl value to be loaded
@@ -219,4 +229,17 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
 
+/**
+ * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
+ *			       and eventid.
+ * @r:		The domain's resource.
+ * @d:		The rmid's domain.
+ * @rmid:	The rmid whose counter values should be reset.
+ * @eventid:	The eventid whose counter values should be reset.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
+			     u32 rmid, enum resctrl_event_id eventid);
+
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 14/20] x86/resctrl: Abstract __rmid_read()
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (12 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 15/20] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

__rmid_read() selects the specified eventid and returns the counter
value from the msr. The error handling is architecture specific, and
handled by the callers, rdtgroup_mondata_show() and __mon_event_count().

Error handling should be handled by architecture specific code, as
a different architecture may have different requirements. MPAM's
counters can report that they are 'not ready', requiring a second
read after a short delay. This should be hidden from resctrl.

Make __rmid_read() the architecture specific function for reading
a counter. Rename it resctrl_arch_rmid_read() and move the error
handling into it.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
 arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 44 ++++++++++++++---------
 include/linux/resctrl.h                   |  1 +
 4 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 5104f39928fd..3269ee954941 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 
 	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
 
-	if (rr.val & RMID_VAL_ERROR)
+	if (rr.err == -EIO)
 		seq_puts(m, "Error\n");
-	else if (rr.val & RMID_VAL_UNAVAIL)
+	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
 	else
 		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f3f31315a907..eca7793d3342 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -40,7 +40,6 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
-
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
 	bool				enable_cdpl2;
@@ -94,6 +93,7 @@ struct rmid_read {
 	struct rdt_domain	*d;
 	enum resctrl_event_id	evtid;
 	bool			first;
+	int			err;
 	u64			val;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3b8b29470a5c..e7c43c40ff28 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -164,9 +164,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(m, 0, sizeof(*m));
 }
 
-static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
-	u64 val;
+	u64 msr_val;
 
 	/*
 	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -177,14 +177,24 @@ static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
 	 * are error bits.
 	 */
 	wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
-	rdmsrl(MSR_IA32_QM_CTR, val);
+	rdmsrl(MSR_IA32_QM_CTR, msr_val);
 
-	return val;
+	if (msr_val & RMID_VAL_ERROR)
+		return -EIO;
+	if (msr_val & RMID_VAL_UNAVAIL)
+		return -EINVAL;
+
+	*val = msr_val;
+
+	return 0;
 }
 
 static bool rmid_dirty(struct rmid_entry *entry)
 {
-	u64 val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
+	u64 val = 0;
+
+	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
+		return true;
 
 	return val >= resctrl_cqm_threshold;
 }
@@ -256,8 +266,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r;
 	struct rdt_domain *d;
-	int cpu;
-	u64 val;
+	int cpu, err;
+	u64 val = 0;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
@@ -265,8 +275,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
 		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			val = __rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID);
-			if (val <= resctrl_cqm_threshold)
+			err = resctrl_arch_rmid_read(entry->rmid,
+						     QOS_L3_OCCUP_EVENT_ID,
+						     &val);
+			if (err || val <= resctrl_cqm_threshold)
 				continue;
 		}
 
@@ -316,16 +328,15 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m;
-	u64 chunks, tval;
+	u64 chunks, tval = 0;
 
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
 
-	tval = __rmid_read(rmid, rr->evtid);
-	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
-		rr->val = tval;
+	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+	if (rr->err)
 		return -EINVAL;
-	}
+
 	switch (rr->evtid) {
 	case QOS_L3_OCCUP_EVENT_ID:
 		rr->val += tval;
@@ -361,10 +372,9 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 tval, cur_bw, chunks, bw_chunks;
+	u64 tval = 0, cur_bw, chunks, bw_chunks;
 
-	tval = __rmid_read(rmid, rr->evtid);
-	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+	if (resctrl_arch_rmid_read(rmid, rr->evtid, &tval))
 		return;
 
 	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 79e83ce3dfbc..543f6d0599a9 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -228,6 +228,7 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 *value);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
+int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 15/20] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (13 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 14/20] x86/resctrl: Abstract __rmid_read() James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 16/20] x86/resctrl: Move mbm_overflow_count() " James Morse
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware.

To convert this to bytes, some correction and overflow calculations
are needed. These depend on the resource and domain structures.
Overflow detection requires the old chunks value. None of this
is available to resctrl_arch_rmid_read(). MPAM requires the
resource and domain structures to find the MMIO device that holds
the counter.

Pass the resource and domain to resctrl_arch_rmid_read(). The
old chunks value is available as prev_msr or prev_bw_msr. Merge
rmid_dirty() with its only caller, the name is kept as a local
variable.
This is all a little noisy for __mon_event_count(), as the switch
statement work is now before the resctrl_arch_rmid_read() call.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 33 +++++++++++++++------------
 include/linux/resctrl.h               | 15 +++++++++++-
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e7c43c40ff28..aa85cfd95904 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -164,10 +164,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(m, 0, sizeof(*m));
 }
 
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
+			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
 	u64 msr_val;
 
+	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
+		return -EINVAL;
+
 	/*
 	 * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
 	 * with a valid event code for supported resource type and the bits
@@ -189,16 +193,6 @@ int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
 	return 0;
 }
 
-static bool rmid_dirty(struct rmid_entry *entry)
-{
-	u64 val = 0;
-
-	if (resctrl_arch_rmid_read(entry->rmid, QOS_L3_OCCUP_EVENT_ID, &val))
-		return true;
-
-	return val >= resctrl_cqm_threshold;
-}
-
 /*
  * Check the RMIDs that are marked as busy for this domain. If the
  * reported LLC occupancy is below the threshold clear the busy bit and
@@ -210,6 +204,8 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 	struct rmid_entry *entry;
 	struct rdt_resource *r;
 	u32 crmid = 1, nrmid;
+	bool rmid_dirty;
+	u64 val = 0;
 
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
@@ -225,7 +221,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(nrmid);
-		if (force_free || !rmid_dirty(entry)) {
+
+		if (resctrl_arch_rmid_read(r, d, entry->rmid,
+					   QOS_L3_OCCUP_EVENT_ID, &val))
+			rmid_dirty = true;
+		else
+			rmid_dirty = (val >= resctrl_cqm_threshold);
+
+		if (force_free || !rmid_dirty) {
 			clear_bit(entry->rmid, d->rmid_busy_llc);
 			if (!--entry->busy) {
 				rmid_limbo_count--;
@@ -275,7 +278,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
 		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
-			err = resctrl_arch_rmid_read(entry->rmid,
+			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
 			if (err || val <= resctrl_cqm_threshold)
@@ -333,7 +336,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
 
-	rr->err = resctrl_arch_rmid_read(rmid, rr->evtid, &tval);
+	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval);
 	if (rr->err)
 		return -EINVAL;
 
@@ -374,7 +377,7 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
 	u64 tval = 0, cur_bw, chunks, bw_chunks;
 
-	if (resctrl_arch_rmid_read(rmid, rr->evtid, &tval))
+	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval))
 		return;
 
 	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 543f6d0599a9..530df8e195c0 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -228,7 +228,20 @@ void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 *value);
 int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
 void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
-int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res);
+
+/**
+ * resctrl_arch_rmid_read() - Read the eventid counter correpsonding to rmid
+ *			      for this resource and domain.
+ * @r:			The resource that the counter should be read from.
+ * @d:			The domain that the counter should be read from.
+ * @rmid:		The rmid of the counter to read.
+ * @eventid:		The eventid to read, e.g. L3 occupancy.
+ * @val:		The result of the counter read in chunks.
+ *
+ * Returns 0 on success, or -EIO, -EINVAL etc on error.
+ */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
+			   u32 rmid, enum resctrl_event_id eventid, u64 *val);
 
 /**
  * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 16/20] x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (14 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 15/20] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 17/20] x86/resctrl: Move get_corrected_mbm_count() " James Morse
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. When reading a bandwidth
counter, mbm_overflow_count() must be used to correct for any possible
overflow.

mbm_overflow_count() is architecture specific, its behaviour should
be part of resctrl_arch_rmid_read().

Move the mbm_overflow_count() calls into resctrl_arch_rmid_read().
This allows the resctrl filesystems's prev_msr to be removed in
favour of the architecture private version.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 --
 arch/x86/kernel/cpu/resctrl/monitor.c  | 42 ++++++++++++++------------
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index eca7793d3342..2d0a6bba4a01 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -280,7 +280,6 @@ struct rftype {
 /**
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr:	Value of IA32_QM_CTR for this RMID last time we read it
  * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
@@ -288,7 +287,6 @@ struct rftype {
  */
 struct mbm_state {
 	u64	chunks;
-	u64	prev_msr;
 	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index aa85cfd95904..39f7e74a4236 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -164,9 +164,20 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(m, 0, sizeof(*m));
 }
 
+static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+{
+	u64 shift = 64 - width, chunks;
+
+	chunks = (cur_msr << shift) - (prev_msr << shift);
+	return chunks >>= shift;
+}
+
 int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 			   u32 rmid, enum resctrl_event_id eventid, u64 *val)
 {
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+	struct arch_mbm_state *m;
 	u64 msr_val;
 
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
@@ -188,7 +199,13 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	if (msr_val & RMID_VAL_UNAVAIL)
 		return -EINVAL;
 
-	*val = msr_val;
+	m = get_arch_mbm_state(hw_dom, rmid, eventid);
+	if (m) {
+		*val = mbm_overflow_count(m->prev_msr, msr_val, hw_res->mbm_width);
+		m->prev_msr = msr_val;
+	} else {
+		*val = msr_val;
+	}
 
 	return 0;
 }
@@ -319,19 +336,10 @@ void free_rmid(u32 rmid)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
-{
-	u64 shift = 64 - width, chunks;
-
-	chunks = (cur_msr << shift) - (prev_msr << shift);
-	return chunks >>= shift;
-}
-
 static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 {
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m;
-	u64 chunks, tval = 0;
+	u64 tval = 0;
 
 	if (rr->first)
 		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
@@ -354,13 +362,11 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 
 	if (rr->first) {
 		memset(m, 0, sizeof(struct mbm_state));
-		m->prev_bw_chunks = m->prev_msr = tval;
+		m->prev_bw_chunks = tval;
 		return 0;
 	}
 
-	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
-	m->chunks += chunks;
-	m->prev_msr = tval;
+	m->chunks += tval;
 
 	rr->val += get_corrected_mbm_count(rmid, m->chunks);
 
@@ -375,14 +381,12 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 tval = 0, cur_bw, chunks, bw_chunks;
+	u64 tval = 0, cur_bw, bw_chunks;
 
 	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval))
 		return;
 
-	chunks = mbm_overflow_count(m->prev_msr, tval, hw_res->mbm_width);
-	m->chunks += chunks;
-	m->prev_msr = tval;
+	m->chunks += tval;
 	bw_chunks = get_corrected_mbm_count(rmid, m->chunks);
 
 	cur_bw = (bw_chunks - m->prev_bw_chunks) * hw_res->mon_scale;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 17/20] x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (15 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 16/20] x86/resctrl: Move mbm_overflow_count() " James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 18/20] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_arch_rmid_read() is intended as the function that an
architecture agnostic resctrl filesystem driver can use to
read a value in bytes from a counter. Currently the function returns
the mbm values in chunks directly from hardware. When reading a bandwidth
counter, get_corrected_mbm_count() must be used to correct the
value read.

get_corrected_mbm_count() is architecture specific, this work should be
done in resctrl_arch_rmid_read().

Move the function calls. This allows the resctrl filesystems's chunks
value to be removed in favour of the architecture private version.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c  | 15 ++++++---------
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 2d0a6bba4a01..65b472d6b146 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -279,14 +279,12 @@ struct rftype {
 
 /**
  * struct mbm_state - status for each MBM counter in each domain
- * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
  */
 struct mbm_state {
-	u64	chunks;
 	u64	prev_bw_chunks;
 	u32	prev_bw;
 	u32	delta_bw;
@@ -296,9 +294,11 @@ struct mbm_state {
 /**
  * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
  *			   return value.
+ * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr	Value of IA32_QM_CTR for this RMID last time we read it
  */
 struct arch_mbm_state {
+	u64	chunks;
 	u64	prev_msr;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 39f7e74a4236..9321c758752a 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -201,7 +201,9 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 
 	m = get_arch_mbm_state(hw_dom, rmid, eventid);
 	if (m) {
-		*val = mbm_overflow_count(m->prev_msr, msr_val, hw_res->mbm_width);
+		m->chunks += mbm_overflow_count(m->prev_msr, msr_val,
+						hw_res->mbm_width);
+		*val = get_corrected_mbm_count(rmid, m->chunks);
 		m->prev_msr = msr_val;
 	} else {
 		*val = msr_val;
@@ -366,9 +368,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 		return 0;
 	}
 
-	m->chunks += tval;
-
-	rr->val += get_corrected_mbm_count(rmid, m->chunks);
+	rr->val += tval;
 
 	return 0;
 }
@@ -381,14 +381,11 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 tval = 0, cur_bw, bw_chunks;
+	u64 cur_bw, bw_chunks = 0;
 
-	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &tval))
+	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &bw_chunks))
 		return;
 
-	m->chunks += tval;
-	bw_chunks = get_corrected_mbm_count(rmid, m->chunks);
-
 	cur_bw = (bw_chunks - m->prev_bw_chunks) * hw_res->mon_scale;
 	cur_bw >>= 20;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 18/20] x86/resctrl: Rename and change the units of resctrl_cqm_threshold
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (16 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 17/20] x86/resctrl: Move get_corrected_mbm_count() " James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 19/20] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
  2021-07-29 22:36 ` [PATCH v1 20/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_cqm_threshold is stored in a hardware specific chunk size,
and exposed to user-space.

This means the filesystem parts of resctrl need to know how the hardware
counts, to convert the user provided byte value to chunks. The interface
between the architecture's resctrl code and the filesystem ought to
treat everything as bytes.

Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read()
still returns its value in chunks, so this needs converting to bytes.
As all the callers have been touched, rename the variable to
resctrl_rmid_realloc_threshold, which describes what the value is for.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 -
 arch/x86/kernel/cpu/resctrl/monitor.c  | 34 ++++++++++++--------------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 ++-----
 include/linux/resctrl.h                |  2 ++
 4 files changed, 20 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 65b472d6b146..4569b4588185 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -97,7 +97,6 @@ struct rmid_read {
 	u64			val;
 };
 
-extern unsigned int resctrl_cqm_threshold;
 extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9321c758752a..8e3efb0909a0 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -37,8 +37,8 @@ static LIST_HEAD(rmid_free_lru);
  * @rmid_limbo_count     count of currently unused but (potentially)
  *     dirty RMIDs.
  *     This counts RMIDs that no one is currently using but that
- *     may have a occupancy value > intel_cqm_threshold. User can change
- *     the threshold occupancy value.
+ *     may have a occupancy value > resctrl_rmid_realloc_threshold. User can
+ *     change the threshold occupancy value.
  */
 static unsigned int rmid_limbo_count;
 
@@ -59,10 +59,10 @@ bool rdt_mon_capable;
 unsigned int rdt_mon_features;
 
 /*
- * This is the threshold cache occupancy at which we will consider an
+ * This is the threshold cache occupancy in bytes at which we will consider an
  * RMID available for re-allocation.
  */
-unsigned int resctrl_cqm_threshold;
+unsigned int resctrl_rmid_realloc_threshold;
 
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
@@ -220,14 +220,13 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
  */
 void __check_limbo(struct rdt_domain *d, bool force_free)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rmid_entry *entry;
-	struct rdt_resource *r;
 	u32 crmid = 1, nrmid;
 	bool rmid_dirty;
 	u64 val = 0;
 
-	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
 	/*
 	 * Skip RMID 0 and start from RMID 1 and check all the RMIDs that
 	 * are marked as busy for occupancy < threshold. If the occupancy
@@ -242,10 +241,12 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 		entry = __rmid_entry(nrmid);
 
 		if (resctrl_arch_rmid_read(r, d, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, &val))
+					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
-		else
-			rmid_dirty = (val >= resctrl_cqm_threshold);
+		} else {
+			val *= hw_res->mon_scale;
+			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+		}
 
 		if (force_free || !rmid_dirty) {
 			clear_bit(entry->rmid, d->rmid_busy_llc);
@@ -286,13 +287,12 @@ int alloc_rmid(void)
 
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
-	struct rdt_resource *r;
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_domain *d;
 	int cpu, err;
 	u64 val = 0;
 
-	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-
 	entry->busy = 0;
 	cpu = get_cpu();
 	list_for_each_entry(d, &r->domains, list) {
@@ -300,7 +300,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
-			if (err || val <= resctrl_cqm_threshold)
+			val *= hw_res->mon_scale;
+			if (err || val <= resctrl_rmid_realloc_threshold)
 				continue;
 		}
 
@@ -736,10 +737,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;
-
-	/* h/w works in units of "boot_cpu_data.x86_cache_occ_scale" */
-	resctrl_cqm_threshold /= hw_res->mon_scale;
+	resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid;
 
 	ret = dom_data_init(r);
 	if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4388685548a0..d33442cd5975 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1030,10 +1030,7 @@ static int rdt_delay_linear_show(struct kernfs_open_file *of,
 static int max_threshold_occ_show(struct kernfs_open_file *of,
 				  struct seq_file *seq, void *v)
 {
-	struct rdt_resource *r = of->kn->parent->priv;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-
-	seq_printf(seq, "%u\n", resctrl_cqm_threshold * hw_res->mon_scale);
+	seq_printf(seq, "%u\n", resctrl_rmid_realloc_threshold);
 
 	return 0;
 }
@@ -1055,7 +1052,6 @@ static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
 static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 				       char *buf, size_t nbytes, loff_t off)
 {
-	struct rdt_hw_resource *hw_res;
 	unsigned int bytes;
 	int ret;
 
@@ -1066,8 +1062,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 	if (bytes > (boot_cpu_data.x86_cache_size * 1024))
 		return -EINVAL;
 
-	hw_res = resctrl_to_arch_res(of->kn->parent->priv);
-	resctrl_cqm_threshold = bytes / hw_res->mon_scale;
+	resctrl_rmid_realloc_threshold = bytes;
 
 	return nbytes;
 }
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 530df8e195c0..26e75c07a8ac 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -256,4 +256,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid);
 
+extern unsigned int resctrl_rmid_realloc_threshold;
+
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 19/20] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (17 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 18/20] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
@ 2021-07-29 22:36 ` James Morse
  2021-07-29 22:36 ` [PATCH v1 20/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_rmid_realloc_threshold can be set by user-space. The maximum
value is specified by the architecture.

Currently max_threshold_occ_write() reads the maximum value from
boot_cpu_data.x86_cache_size, which is not portable to another
architecture.

Add resctrl_rmid_realloc_limit to describe the maximum size in bytes
that user-space can set the threshold to.

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c  | 9 +++++++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
 include/linux/resctrl.h                | 1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 8e3efb0909a0..18eba1bb032f 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -64,6 +64,11 @@ unsigned int rdt_mon_features;
  */
 unsigned int resctrl_rmid_realloc_threshold;
 
+/*
+ * This is the maximum value for the reallocation threshold, in bytes.
+ */
+unsigned int resctrl_rmid_realloc_limit;
+
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
 /*
@@ -718,9 +723,9 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-	unsigned int cl_size = boot_cpu_data.x86_cache_size;
 	int ret;
 
+	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
 	r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
@@ -737,7 +742,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	 *
 	 * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
 	 */
-	resctrl_rmid_realloc_threshold = cl_size * 1024 / r->num_rmid;
+	resctrl_rmid_realloc_threshold = resctrl_rmid_realloc_limit / r->num_rmid;
 
 	ret = dom_data_init(r);
 	if (ret)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index d33442cd5975..20c871ad5166 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1059,7 +1059,7 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 	if (ret)
 		return ret;
 
-	if (bytes > (boot_cpu_data.x86_cache_size * 1024))
+	if (bytes > resctrl_rmid_realloc_limit)
 		return -EINVAL;
 
 	resctrl_rmid_realloc_threshold = bytes;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 26e75c07a8ac..ce7c7472da7d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -257,5 +257,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid);
 
 extern unsigned int resctrl_rmid_realloc_threshold;
+extern unsigned int resctrl_rmid_realloc_limit;
 
 #endif /* _RESCTRL_H */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 20/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
  2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
                   ` (18 preceding siblings ...)
  2021-07-29 22:36 ` [PATCH v1 19/20] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
@ 2021-07-29 22:36 ` James Morse
  19 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-07-29 22:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Fenghua Yu, Reinette Chatre, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Babu Moger, James Morse,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

resctrl_arch_rmid_read() returns a value in chunks, as read from the
hardware. This needs scaling to bytes by mon_scale, as provided by
the architecture code.

Now that resctrl_arch_rmid_read() performs the overflow and corrections
itself, it may as well return a value in bytes directly.

Move the mon_scale conversion into resctrl_arch_rmid_read().

Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  6 ++----
 arch/x86/kernel/cpu/resctrl/internal.h    |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c     | 22 +++++++++-------------
 include/linux/resctrl.h                   |  2 +-
 4 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 3269ee954941..6b6598811922 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -549,7 +549,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
-	struct rdt_hw_resource *hw_res;
 	u32 resid, evtid, domid;
 	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
@@ -569,8 +568,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	domid = md.u.domid;
 	evtid = md.u.evtid;
 
-	hw_res = &rdt_resources_all[resid];
-	r = &hw_res->r_resctrl;
+	r = &rdt_resources_all[resid].r_resctrl;
 	d = rdt_find_domain(r, domid, NULL);
 	if (IS_ERR_OR_NULL(d)) {
 		ret = -ENOENT;
@@ -584,7 +582,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
 	else
-		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
+		seq_printf(m, "%llu\n", rr.val);
 
 out:
 	rdtgroup_kn_unlock(of->kn);
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4569b4588185..3bf4b32fc531 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -278,13 +278,13 @@ struct rftype {
 
 /**
  * struct mbm_state - status for each MBM counter in each domain
- * @prev_bw_chunks: Previous chunks value read when for bandwidth calculation
+ * @prev_bw_bytes: Previous bytes value read when for bandwidth calculation
  * @prev_bw:	The most recent bandwidth in MBps
  * @delta_bw:	Difference between the current and previous bandwidth
  * @delta_comp:	Indicates whether to compute the delta_bw
  */
 struct mbm_state {
-	u64	prev_bw_chunks;
+	u64	prev_bw_bytes;
 	u32	prev_bw;
 	u32	delta_bw;
 	bool	delta_comp;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 18eba1bb032f..7de6e7aa2778 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -183,7 +183,7 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 	struct arch_mbm_state *m;
-	u64 msr_val;
+	u64 msr_val, chunks;
 
 	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
 		return -EINVAL;
@@ -208,10 +208,11 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 	if (m) {
 		m->chunks += mbm_overflow_count(m->prev_msr, msr_val,
 						hw_res->mbm_width);
-		*val = get_corrected_mbm_count(rmid, m->chunks);
+		chunks = get_corrected_mbm_count(rmid, m->chunks);
+		*val = chunks * hw_res->mon_scale;
 		m->prev_msr = msr_val;
 	} else {
-		*val = msr_val;
+		*val = msr_val * hw_res->mon_scale;
 	}
 
 	return 0;
@@ -226,7 +227,6 @@ int resctrl_arch_rmid_read(struct rdt_resource	*r, struct rdt_domain *d,
 void __check_limbo(struct rdt_domain *d, bool force_free)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rmid_entry *entry;
 	u32 crmid = 1, nrmid;
 	bool rmid_dirty;
@@ -249,7 +249,6 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
 					   QOS_L3_OCCUP_EVENT_ID, &val)) {
 			rmid_dirty = true;
 		} else {
-			val *= hw_res->mon_scale;
 			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
 		}
 
@@ -293,7 +292,6 @@ int alloc_rmid(void)
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct rdt_domain *d;
 	int cpu, err;
 	u64 val = 0;
@@ -305,7 +303,6 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
 			err = resctrl_arch_rmid_read(r, d, entry->rmid,
 						     QOS_L3_OCCUP_EVENT_ID,
 						     &val);
-			val *= hw_res->mon_scale;
 			if (err || val <= resctrl_rmid_realloc_threshold)
 				continue;
 		}
@@ -370,7 +367,7 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
 
 	if (rr->first) {
 		memset(m, 0, sizeof(struct mbm_state));
-		m->prev_bw_chunks = tval;
+		m->prev_bw_bytes = tval;
 		return 0;
 	}
 
@@ -385,21 +382,20 @@ static int __mon_event_count(u32 rmid, struct rmid_read *rr)
  */
 static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(rr->r);
 	struct mbm_state *m = &rr->d->mbm_local[rmid];
-	u64 cur_bw, bw_chunks = 0;
+	u64 cur_bw, bw_bytes = 0;
 
-	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &bw_chunks))
+	if (resctrl_arch_rmid_read(rr->r, rr->d, rmid, rr->evtid, &bw_bytes))
 		return;
 
-	cur_bw = (bw_chunks - m->prev_bw_chunks) * hw_res->mon_scale;
+	cur_bw = (bw_bytes - m->prev_bw_bytes);
 	cur_bw >>= 20;
 
 	if (m->delta_comp)
 		m->delta_bw = abs(cur_bw - m->prev_bw);
 	m->delta_comp = false;
 	m->prev_bw = cur_bw;
-	m->prev_bw_chunks = bw_chunks;
+	m->prev_bw_bytes = bw_bytes;
 }
 
 /*
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index ce7c7472da7d..f63af6d4c10e 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -236,7 +236,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
  * @d:			The domain that the counter should be read from.
  * @rmid:		The rmid of the counter to read.
  * @eventid:		The eventid to read, e.g. L3 occupancy.
- * @val:		The result of the counter read in chunks.
+ * @val:		The result of the counter read in bytes.
  *
  * Returns 0 on success, or -EIO, -EINVAL etc on error.
  */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled
  2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
@ 2021-08-11 12:12   ` Jamie Iles
  2021-09-01 21:18   ` Reinette Chatre
  1 sibling, 0 replies; 50+ messages in thread
From: Jamie Iles @ 2021-08-11 12:12 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

On Thu, Jul 29, 2021 at 10:35:51PM +0000, James Morse wrote:
> rdt_resources_all[] used to have extra entries for L2CODE/L2DATA entries.
> These were hidden from resctrl by the alloc_enabled value.
> 
> Now that the L2/L2CODE/L2DATA resources have been merged together,
> alloc_enabled doesn't mean anything, it always has the same value as
> alloc_capable which indicates CAT is supported by this cache.
> 
> Remove alloc_enabled and its helpers.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

There's a comment referring to alloc_enabled in rdtgroup_create_info_dir 
that can be replaced with alloc_capable.

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled
  2021-07-29 22:35 ` [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled James Morse
@ 2021-08-11 12:15   ` Jamie Iles
  2021-08-11 15:16     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Jamie Iles @ 2021-08-11 12:15 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

On Thu, Jul 29, 2021 at 10:35:52PM +0000, James Morse wrote:
> mon_enabled and mon_capable are always set as a pair by
> rdt_get_mon_l3_config().
> 
> There is no point having two values.
> 
> Merge them together.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 4 ----
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 1 -
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++----
>  include/linux/resctrl.h                | 4 ++--
>  4 files changed, 6 insertions(+), 11 deletions(-)
> 
...
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index ada0a02093a6..d715df9de37f 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -130,7 +130,7 @@ struct resctrl_schema;
>  /**
>   * struct rdt_resource - attributes of a resctrl resource
>   * @rid:		The index of the resource
> - * @mon_enabled:	Is monitoring enabled for this feature
> + * @cdp_enabled		Is CDP enabled for this resource
>   * @alloc_capable:	Is allocation available on this machine
>   * @mon_capable:	Is monitor feature available on this machine
>   * @num_rmid:		Number of RMIDs available
> @@ -149,7 +149,7 @@ struct resctrl_schema;
>   */
>  struct rdt_resource {
>  	int			rid;
> -	bool			mon_enabled;
> +	bool			cdp_enabled;

Nothing is setting cdp_enabled in this patch, is this intended to be 
here?

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled
  2021-08-11 12:15   ` Jamie Iles
@ 2021-08-11 15:16     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-08-11 15:16 UTC (permalink / raw)
  To: Jamie Iles
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, lcherian,
	bobo.shaobowang

Hi Jamie,

On 11/08/2021 13:15, Jamie Iles wrote:
> On Thu, Jul 29, 2021 at 10:35:52PM +0000, James Morse wrote:
>> mon_enabled and mon_capable are always set as a pair by
>> rdt_get_mon_l3_config().
>>
>> There is no point having two values.
>>
>> Merge them together.

>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index ada0a02093a6..d715df9de37f 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -130,7 +130,7 @@ struct resctrl_schema;
>>  /**
>>   * struct rdt_resource - attributes of a resctrl resource
>>   * @rid:		The index of the resource
>> - * @mon_enabled:	Is monitoring enabled for this feature
>> + * @cdp_enabled		Is CDP enabled for this resource
>>   * @alloc_capable:	Is allocation available on this machine
>>   * @mon_capable:	Is monitor feature available on this machine
>>   * @num_rmid:		Number of RMIDs available
>> @@ -149,7 +149,7 @@ struct resctrl_schema;
>>   */
>>  struct rdt_resource {
>>  	int			rid;
>> -	bool			mon_enabled;
>> +	bool			cdp_enabled;
> 
> Nothing is setting cdp_enabled in this patch, is this intended to be 
> here?

Bother, that is the result of a merge conflict from changes from the previous series!

Thanks, I've removed it.


James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 04/20] x86/resctrl: Add domain offline callback for resctrl work
  2021-07-29 22:35 ` [PATCH v1 04/20] x86/resctrl: Add domain offline " James Morse
@ 2021-08-11 16:10   ` Jamie Iles
  2021-09-01 21:21   ` Reinette Chatre
  1 sibling, 0 replies; 50+ messages in thread
From: Jamie Iles @ 2021-08-11 16:10 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

On Thu, Jul 29, 2021 at 10:35:54PM +0000, James Morse wrote:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when cpu hotplug callbacks are triggered.
> 
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> free the memory.
> 
> Move the monitor subdir removal and the cancelling of the mbm/limbo
> works into a new resctrl_offline_domain() call.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     | 24 +---------------
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 --
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 39 +++++++++++++++++++++++---
>  include/linux/resctrl.h                |  1 +
>  4 files changed, 37 insertions(+), 29 deletions(-)
...
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e1af1d81b924..cf0db0b7a5d0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
...
> @@ -3229,6 +3227,39 @@ static int __init rdtgroup_setup_root(void)
>  	return ret;
>  }
>  
> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	lockdep_assert_held(&rdtgroup_mutex); // the arch code took this for us
> +
> +	if (!r->mon_capable)
> +		return;
> +
> +	/*
> +	 * If resctrl is mounted, remove all the
> +	 * per domain monitor data directories.
> +	 */
> +	if (static_branch_unlikely(&rdt_mon_enable_key))
> +		rmdir_mondata_subdir_allrdtgrp(r, d->id);
> +
> +	if (r->mon_capable && is_mbm_enabled())
> +		cancel_delayed_work(&d->mbm_over);

There's a redundant r->mon_capable check here.

> +	if (is_llc_occupancy_enabled() && has_busy_rmid(r, d)) {
> +		/*
> +		 * When a package is going down, forcefully
> +		 * decrement rmid->ebusy. There is no way to know
> +		 * that the L3 was flushed and hence may lead to
> +		 * incorrect counts in rare scenarios, but leaving
> +		 * the RMID as busy creates RMID leaks if the
> +		 * package never comes back.
> +		 */
> +		__check_limbo(d, true);
> +		cancel_delayed_work(&d->cqm_limbo);
> +	}
> +	bitmap_free(d->rmid_busy_llc);
> +	kfree(d->mbm_total);
> +	kfree(d->mbm_local);
> +}

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-07-29 22:35 ` [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
@ 2021-08-11 16:32   ` Jamie Iles
  2021-08-31 16:24     ` James Morse
  2021-09-01 21:22   ` Reinette Chatre
  1 sibling, 1 reply; 50+ messages in thread
From: Jamie Iles @ 2021-08-11 16:32 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

On Thu, Jul 29, 2021 at 10:35:55PM +0000, James Morse wrote:
> To support resctrl's MBA software controller, the architecture must provide
> a second configuration array to hold the mbps_val from user-space.
> 
> This complicates the interface between the architecture code.
> 
> Make the filesystem parts of resctrl create an array for the mba_sc
> values when the struct resctrl_schema is created. The software controller
> can be changed to use this, allowing the architecture code to only
> consider the values configured in hardware.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  1 -
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 69 ++++++++++++++++++++++++++
>  include/linux/resctrl.h                | 13 +++++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e12b55f815bf..a7e2cbce29d5 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -36,7 +36,6 @@
>  #define MBM_OVERFLOW_INTERVAL		1000
>  #define MAX_MBA_BW			100u
>  #define MBA_IS_LINEAR			0x4
> -#define MBA_MAX_MBPS			U32_MAX
>  #define MAX_MBA_BW_AMD			0x800
>  #define MBM_CNTR_WIDTH_OFFSET_AMD	20
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cf0db0b7a5d0..185f9bb992d1 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2030,6 +2030,60 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
>  			     struct rdtgroup *prgrp,
>  			     struct kernfs_node **mon_data_kn);
>  
> +static int mba_sc_domain_allocate(struct rdt_resource *res,
> +				  struct rdt_domain *d)
> +{
> +	u32 num_closid = closid_free_map_len;
> +	int cpu = cpumask_any(&d->cpu_mask);
> +	int i;
> +
> +	d->mba_sc = kcalloc_node(num_closid, sizeof(*d->mba_sc),
> +				 GFP_KERNEL, cpu_to_node(cpu));
> +	if (!d->mba_sc)
> +		return -ENOMEM;

If a CPU was hotplugged before resctrl is mounted then isn't it possible 
for this to already be allocated?  I might be misunderstanding the flows 
here though...

> +	for (i = 0; i < num_closid; i++)
> +		d->mba_sc[i].mbps_val = MBA_MAX_MBPS;
> +
> +	return 0;
> +}
> +
> +static int mba_sc_allocate(struct rdt_resource *r)
> +{
> +	struct rdt_domain *d;
> +	int ret;
> +
> +	lockdep_assert_cpus_held();
> +
> +	if (!is_mba_sc(r))
> +		return 0;
> +
> +	list_for_each_entry(d, &r->domains, list) {
> +		ret = mba_sc_domain_allocate(r, d);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static void mba_sc_domain_destroy(struct rdt_resource *r,
> +				  struct rdt_domain *d)
> +{
> +	kfree(d->mba_sc);
> +	d->mba_sc = NULL;
> +}
> +
> +static void mba_sc_destroy(struct rdt_resource *r)
> +{
> +	struct rdt_domain *d;
> +
> +	lockdep_assert_cpus_held();
> +
> +	list_for_each_entry(d, &r->domains, list)
> +		mba_sc_domain_destroy(r, d);
> +}
> +
>  static int rdt_enable_ctx(struct rdt_fs_context *ctx)
>  {
>  	int ret = 0;
> @@ -2117,17 +2171,27 @@ static int schemata_list_create(void)
>  
>  		if (ret)
>  			break;
> +
> +		ret = mba_sc_allocate(r);
> +		if (ret)
> +			break;
>  	}
>  
>  	return ret;
>  }
>  
> +/*
> + * During rdt_kill_sb(), the mba_sc state is reset before
> + * destroy_schemata_list() is called: unconditionally try to free the
> + * array.
> + */
>  static void schemata_list_destroy(void)
>  {
>  	struct resctrl_schema *s, *tmp;
>  
>  	list_for_each_entry_safe(s, tmp, &resctrl_schema_all, list) {
>  		list_del(&s->list);
> +		mba_sc_destroy(s->res);
>  		kfree(s);
>  	}
>  }
> @@ -3255,6 +3319,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>  		__check_limbo(d, true);
>  		cancel_delayed_work(&d->cqm_limbo);
>  	}
> +	if (static_branch_unlikely(&rdt_enable_key) && is_mba_sc(r))
> +		mba_sc_domain_destroy(r, d);
>  	bitmap_free(d->rmid_busy_llc);
>  	kfree(d->mbm_total);
>  	kfree(d->mbm_local);
> @@ -3287,6 +3353,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>  		}
>  	}
>  
> +	if (is_mba_sc(r))
> +		mba_sc_domain_allocate(r, d);

This looks to be missing an error check.

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  2021-07-29 22:36 ` [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
@ 2021-08-11 17:14   ` Jamie Iles
  2021-08-31 16:25     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Jamie Iles @ 2021-08-11 17:14 UTC (permalink / raw)
  To: James Morse
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, Jamie Iles, D Scott Phillips OS,
	lcherian, bobo.shaobowang

Hi James,

On Thu, Jul 29, 2021 at 10:36:02PM +0000, James Morse wrote:
> resctrl_arch_rmid_read() is intended as the function that an
> architecture agnostic resctrl filesystem driver can use to
> read a value in bytes from a counter. Currently the function returns
> the mbm values in chunks directly from hardware. For bandwidth
> counters the resctrl filesystem provides the number of bytes
> ever seen. Internally mba_sc uses a second prev_bw_msr to calculate
> the bytes read since the last mba_sc invocation.
> 
> MPAM's scaling of counters can be changed at runtime, reducing the
> resolution but increasing the range. When this is changed the prev_msr
> values need to be converted by the architecture code.
> 
> Add storage for per-rmid private storage. The prev_msr and chunks
> values will move here to allow resctrl_arch_rmid_read() to always
> return the number of bytes read by this counter.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     | 29 ++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h | 13 ++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 8a3c13c6c19f..27c93d12ca27 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -432,6 +432,28 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>  	return 0;
>  }
>  
> +static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
> +{
> +	size_t tsize;
> +
> +	if (is_mbm_total_enabled()) {
> +		tsize = sizeof(*hw_dom->arch_mbm_total);
> +		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
> +		if (!hw_dom->arch_mbm_total)
> +			return -ENOMEM;
> +	}
> +	if (is_mbm_local_enabled()) {
> +		tsize = sizeof(*hw_dom->arch_mbm_local);
> +		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
> +		if (!hw_dom->arch_mbm_local) {
> +			kfree(hw_dom->arch_mbm_total);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * domain_add_cpu - Add a cpu to a resource's domain list.
>   *
> @@ -481,6 +503,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  		return;
>  	}
>  
> +	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> +		kfree(d);

Does this error path and subsequent ones in domain_add_cpu also need to 
undo the allocations from arch_domain_mbm_alloc?

Thanks,

Jamie

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-08-11 16:32   ` Jamie Iles
@ 2021-08-31 16:24     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-08-31 16:24 UTC (permalink / raw)
  To: Jamie Iles
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, lcherian,
	bobo.shaobowang

Hi Jamie,

On 11/08/2021 17:32, Jamie Iles wrote:
> On Thu, Jul 29, 2021 at 10:35:55PM +0000, James Morse wrote:
>> To support resctrl's MBA software controller, the architecture must provide
>> a second configuration array to hold the mbps_val from user-space.
>>
>> This complicates the interface between the architecture code.
>>
>> Make the filesystem parts of resctrl create an array for the mba_sc
>> values when the struct resctrl_schema is created. The software controller
>> can be changed to use this, allowing the architecture code to only
>> consider the values configured in hardware.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index cf0db0b7a5d0..185f9bb992d1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2030,6 +2030,60 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
>>  			     struct rdtgroup *prgrp,
>>  			     struct kernfs_node **mon_data_kn);
>>  
>> +static int mba_sc_domain_allocate(struct rdt_resource *res,
>> +				  struct rdt_domain *d)
>> +{
>> +	u32 num_closid = closid_free_map_len;
>> +	int cpu = cpumask_any(&d->cpu_mask);
>> +	int i;
>> +
>> +	d->mba_sc = kcalloc_node(num_closid, sizeof(*d->mba_sc),
>> +				 GFP_KERNEL, cpu_to_node(cpu));
>> +	if (!d->mba_sc)
>> +		return -ENOMEM;
> 
> If a CPU was hotplugged before resctrl is mounted then isn't it possible 
> for this to already be allocated?  I might be misunderstanding the flows 
> here though...

Yeah, its tortuous. All this is behind an is_mba_sc(r), check, which can only return true
while the filesystem is mounted. cpus_read_lock() is what ensures the mount-time setup
doesn't race with the hotplug callbacks.


>> +	for (i = 0; i < num_closid; i++)
>> +		d->mba_sc[i].mbps_val = MBA_MAX_MBPS;
>> +
>> +	return 0;
>> +}
>> +
>> +static int mba_sc_allocate(struct rdt_resource *r)
>> +{
>> +	struct rdt_domain *d;
>> +	int ret;
>> +
>> +	lockdep_assert_cpus_held();
>> +
>> +	if (!is_mba_sc(r))
>> +		return 0;
>> +
>> +	list_for_each_entry(d, &r->domains, list) {
>> +		ret = mba_sc_domain_allocate(r, d);
>> +		if (ret)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}


>> @@ -3287,6 +3353,9 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d)
>>  		}
>>  	}
>>  
>> +	if (is_mba_sc(r))
>> +		mba_sc_domain_allocate(r, d);
> 
> This looks to be missing an error check.

Fixed, thanks.

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks
  2021-08-11 17:14   ` Jamie Iles
@ 2021-08-31 16:25     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-08-31 16:25 UTC (permalink / raw)
  To: Jamie Iles
  Cc: x86, linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Babu Moger,
	shameerali.kolothum.thodi, D Scott Phillips OS, lcherian,
	bobo.shaobowang

Hi Jamie,

On 11/08/2021 18:14, Jamie Iles wrote:
> On Thu, Jul 29, 2021 at 10:36:02PM +0000, James Morse wrote:
>> resctrl_arch_rmid_read() is intended as the function that an
>> architecture agnostic resctrl filesystem driver can use to
>> read a value in bytes from a counter. Currently the function returns
>> the mbm values in chunks directly from hardware. For bandwidth
>> counters the resctrl filesystem provides the number of bytes
>> ever seen. Internally mba_sc uses a second prev_bw_msr to calculate
>> the bytes read since the last mba_sc invocation.
>>
>> MPAM's scaling of counters can be changed at runtime, reducing the
>> resolution but increasing the range. When this is changed the prev_msr
>> values need to be converted by the architecture code.
>>
>> Add storage for per-rmid private storage. The prev_msr and chunks
>> values will move here to allow resctrl_arch_rmid_read() to always
>> return the number of bytes read by this counter.

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 8a3c13c6c19f..27c93d12ca27 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -432,6 +432,28 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)

>> +static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
>> +{
>> +	size_t tsize;
>> +
>> +	if (is_mbm_total_enabled()) {
>> +		tsize = sizeof(*hw_dom->arch_mbm_total);
>> +		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
>> +		if (!hw_dom->arch_mbm_total)
>> +			return -ENOMEM;
>> +	}
>> +	if (is_mbm_local_enabled()) {
>> +		tsize = sizeof(*hw_dom->arch_mbm_local);
>> +		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
>> +		if (!hw_dom->arch_mbm_local) {
>> +			kfree(hw_dom->arch_mbm_total);
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +

>> @@ -481,6 +503,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>>  		return;
>>  	}
>>  
>> +	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
>> +		kfree(d);
> 
> Does this error path and subsequent ones in domain_add_cpu also need to 
> undo the allocations from arch_domain_mbm_alloc?

This one needs to free the ctrl_val array, as does the path from resctrl_online_domain().
Yes, resctrl_online_domain() needs to free the two mbm arrays. Thanks!

Its probably cleanest to add some helper in patch 3 that kfree()s anything that might have
been allocated, and add-to/remote-from that over time. The root structure is kzalloc()d ,
so shouldn't need an elaborate and verbose unwind/goto-nest.


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled
  2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
  2021-08-11 12:12   ` Jamie Iles
@ 2021-09-01 21:18   ` Reinette Chatre
  1 sibling, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:18 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> rdt_resources_all[] used to have extra entries for L2CODE/L2DATA entries.

The double "entries" seem redundant above. Perhaps just 
"rdt_resources_all[] used to have entries for L2CODE/L2DATA resources." ?

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work
  2021-07-29 22:35 ` [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work James Morse
@ 2021-09-01 21:19   ` Reinette Chatre
  2021-09-17 16:57     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:19 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when cpu hotplug callbacks are triggered.

Could you please replace "cpu" with "CPU" throughout the series in 
changelogs and comments?

> 
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> allocate the memory.

(I read the above as problem statement.)

> 
> Move domain_setup_mon_state(), the monitor subdir creation call and the
> mbm/limbo workers into a new resctrl_online_domain() call.

(I read the above as what the code does.)

Could you please add description on what the patch does to address the 
problem you describe?

...

> +}
> +
> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	int err;
> +
> +	lockdep_assert_held(&rdtgroup_mutex); // the arch code took this for us
> +

Please do not use trailing comments.

> +	if (!r->mon_capable)
> +		return 0;
> +
> +	err = domain_setup_mon_state(r, d);
> +	if (err)
> +		return err;
> +
> +	if (is_mbm_enabled()) {
> +		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
> +		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
> +	}
> +
> +	if (is_llc_occupancy_enabled())
> +		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
> +

You also seem to address an issue where this work was not properly 
cleaned up on the error paths of the replaced domain_setup_mon_state(). 
Thank you.

> +	/* If resctrl is mounted, add per domain monitor data directories. */
> +	if (static_branch_unlikely(&rdt_enable_key))

Should this be rdt_mon_enable_key instead?

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 04/20] x86/resctrl: Add domain offline callback for resctrl work
  2021-07-29 22:35 ` [PATCH v1 04/20] x86/resctrl: Add domain offline " James Morse
  2021-08-11 16:10   ` Jamie Iles
@ 2021-09-01 21:21   ` Reinette Chatre
  1 sibling, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:21 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> Because domains are exposed to user-space via resctrl, the filesystem
> must update its state when cpu hotplug callbacks are triggered.

cpu -> CPU please.

> 
> Some of this work is common to any architecture that would support
> resctrl, but the work is tied up with the architecture code to
> free the memory.
> 
> Move the monitor subdir removal and the cancelling of the mbm/limbo
> works into a new resctrl_offline_domain() call.

Same as previous patch, could you please elaborate how the code change 
you describe fixes the problem you state earlier.

...

> +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	lockdep_assert_held(&rdtgroup_mutex); // the arch code took this for us
> +

No trailing comments please.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-07-29 22:35 ` [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
  2021-08-11 16:32   ` Jamie Iles
@ 2021-09-01 21:22   ` Reinette Chatre
  2021-09-17 16:57     ` James Morse
  1 sibling, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:22 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> To support resctrl's MBA software controller, the architecture must provide
> a second configuration array to hold the mbps_val from user-space.
> 
> This complicates the interface between the architecture code.

This sentence seems incomplete. I was expecting something like " 
complicates the interface between the architecture code and ..."

> 
> Make the filesystem parts of resctrl create an array for the mba_sc
> values when the struct resctrl_schema is created. The software controller
> can be changed to use this, allowing the architecture code to only
> consider the values configured in hardware.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  1 -
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 69 ++++++++++++++++++++++++++
>   include/linux/resctrl.h                | 13 +++++
>   3 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e12b55f815bf..a7e2cbce29d5 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -36,7 +36,6 @@
>   #define MBM_OVERFLOW_INTERVAL		1000
>   #define MAX_MBA_BW			100u
>   #define MBA_IS_LINEAR			0x4
> -#define MBA_MAX_MBPS			U32_MAX
>   #define MAX_MBA_BW_AMD			0x800
>   #define MBM_CNTR_WIDTH_OFFSET_AMD	20
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index cf0db0b7a5d0..185f9bb992d1 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2030,6 +2030,60 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
>   			     struct rdtgroup *prgrp,
>   			     struct kernfs_node **mon_data_kn);
>   
> +static int mba_sc_domain_allocate(struct rdt_resource *res,
> +				  struct rdt_domain *d)
> +{
> +	u32 num_closid = closid_free_map_len;
> +	int cpu = cpumask_any(&d->cpu_mask);
> +	int i;
> +
> +	d->mba_sc = kcalloc_node(num_closid, sizeof(*d->mba_sc),
> +				 GFP_KERNEL, cpu_to_node(cpu));
> +	if (!d->mba_sc)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_closid; i++)
> +		d->mba_sc[i].mbps_val = MBA_MAX_MBPS;
> +
> +	return 0;
> +}
> +

I had the same initial reaction as Jamie and noted your answer to him. 
Considering the intricate flow here could you please add some comments 
to these functions that explains the calling flows in support of their 
safety?

...

>   /**
>    * struct rdt_domain - group of CPUs sharing a resctrl resource
>    * @list:		all instances of this resource
> @@ -53,6 +64,7 @@ struct resctrl_staged_config {
>    * @cqm_work_cpu:	worker CPU for CQM h/w counters
>    * @plr:		pseudo-locked region (if any) associated with domain
>    * @staged_config:	parsed configuration to be applied
> + * @mba_sc:	the mba software controller properties, indexed by closid
>    */
>   struct rdt_domain {
>   	struct list_head		list;
> @@ -67,6 +79,7 @@ struct rdt_domain {
>   	int				cqm_work_cpu;
>   	struct pseudo_lock_region	*plr;
>   	struct resctrl_staged_config	staged_config[CDP_NUM_TYPES];
> +	struct resctrl_mba_sc		*mba_sc;
>   };

Why is this additional abstraction needed? As I understand the usage 
struct resctrl_mba_sc would always only have the one member so why not 
have mbps_val within rdt_domain?

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-07-29 22:35 ` [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
@ 2021-09-01 21:25   ` Reinette Chatre
  2021-09-17 16:57     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:25 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> Updates to resctrl's software controller follow the same path as
> other configuration updates, but they don't modify the hardware state.
> rdtgroup_schemata_write() uses parse_line() and the resource's
> ctrlval_parse function to stage the configuration.
> resctrl_arch_update_domains() then updates the mbps_val[] array
> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
> call that would update hardware.
> 
> This complicates the interface between resctrl's filesystem parts
> and architecture specific code. It should be possible for mba_sc
> to be completely implemented by the filesystem parts of resctrl. This
> would allow it to work on a second architecture with no additional code.
> 
> Change parse_bw() to write the configuration value directly to the
> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
> to skip the call to resctrl_arch_update_domains(), meaning all the
> mba_sc specific code in resctrl_arch_update_domains() can be removed.
> On the read-side, show_doms() and update_mba_bw() are changed to read
> the mba_sc[] array from the domain structure. With this,
> resctrl_arch_get_config() no longer needs to consider mba_sc resources.
> 
> Change parse_bw() to write these values directly, meaning
> rdtgroup_schemata_write() never needs to call update_domains()
> for mba_sc resources.

The above paragraph seems to contain duplicate information from the 
paragraph that precedes it.

> 
> Get show_doms() to test is_mba_sc() and retrieve the value
> directly, instead of using get_config() for the hardware value.
> 
> This means the arch code's resctrl_arch_get_config() and
> resctrl_arch_update_domains() no longer need to be aware of
> mba_sc, and we can get rid of the update_mba_bw() code that
> reaches into the hw_dom to get the msr value.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---

...
> @@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>   
>   	list_for_each_entry(s, &resctrl_schema_all, list) {
>   		r = s->res;
> +
> +		/*
> +		 * Writes to mba_sc resources update the software controller,
> +		 * not the control msr.
> +		 */
> +		if (is_mba_sc(r))
> +			continue;
> +

A few resources can be updated in a single write to the schemata file. 
It is thus possible to update the cache allocation resource as well as 
memory bandwidth allocation in a single write. As I understand this 
change in this scenario all configuration updates will be skipped, not 
just the memory bandwidth allocation ones.

>   		ret = resctrl_arch_update_domains(r, rdtgrp->closid);
>   		if (ret)
>   			goto out;

Reinette


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val
  2021-07-29 22:35 ` [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val James Morse
@ 2021-09-01 21:26   ` Reinette Chatre
  2021-09-17 16:57     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:26 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> The resctrl arch code provides a second configuration array mbps_val[]
> for the mba socftware controller.

"mba socftware" -> "MBA software"

> 
> Since resctrl switched over to allocating and freeing its own array
> when needed, nothing uses the arch code version.
> 
> Remove it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/core.c     | 20 ++++----------------
>   arch/x86/kernel/cpu/resctrl/internal.h |  4 +---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c |  2 +-
>   3 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 56b3541617b5..e864dbc6fe3d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -397,7 +397,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
>   	return NULL;
>   }
>   
> -void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
> +void setup_default_ctrlval(struct rdt_resource *r, u32 *dc)
>   {
>   	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>   	int i;
> @@ -406,12 +406,9 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
>   	 * Initialize the Control MSRs to having no control.
>   	 * For Cache Allocation: Set all bits in cbm
>   	 * For Memory Allocation: Set b/w requested to 100%
> -	 * and the bandwidth in MBps to U32_MAX
>   	 */
> -	for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
> +	for (i = 0; i < hw_res->num_closid; i++, dc++)
>   		*dc = r->default_ctrl;
> -		*dm = MBA_MAX_MBPS;
> -	}

Since this function used to reset the array to default I was expecting 
its callers to now reset the new array (more below).

>   }
>   
>   static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
> @@ -419,23 +416,15 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>   	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>   	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
>   	struct msr_param m;
> -	u32 *dc, *dm;
> +	u32 *dc;
>   
>   	dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val),
>   			   GFP_KERNEL);
>   	if (!dc)
>   		return -ENOMEM;
>   
> -	dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val),
> -			   GFP_KERNEL);
> -	if (!dm) {
> -		kfree(dc);
> -		return -ENOMEM;
> -	}
> -
>   	hw_dom->ctrl_val = dc;
> -	hw_dom->mbps_val = dm;
> -	setup_default_ctrlval(r, dc, dm);
> +	setup_default_ctrlval(r, dc);
>   
>   	m.low = 0;
>   	m.high = hw_res->num_closid;
> @@ -527,7 +516,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>   			d->plr->d = NULL;
>   
>   		kfree(hw_dom->ctrl_val);
> -		kfree(hw_dom->mbps_val);
>   		kfree(hw_dom);
>   		return;
>   	}
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a7e2cbce29d5..796e13a0e8dc 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -308,14 +308,12 @@ struct mbm_state {
>    *			  a resource
>    * @d_resctrl:	Properties exposed to the resctrl file system
>    * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
> - * @mbps_val:	When mba_sc is enabled, this holds the bandwidth in MBps
>    *
>    * Members of this structure are accessed via helpers that provide abstraction.
>    */
>   struct rdt_hw_domain {
>   	struct rdt_domain		d_resctrl;
>   	u32				*ctrl_val;
> -	u32				*mbps_val;
>   };
>   
>   static inline struct rdt_hw_domain *resctrl_to_arch_dom(struct rdt_domain *r)
> @@ -529,7 +527,7 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom,
>   void mbm_handle_overflow(struct work_struct *work);
>   void __init intel_rdt_mbm_apply_quirk(void);
>   bool is_mba_sc(struct rdt_resource *r);
> -void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm);
> +void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
>   u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
>   void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
>   void cqm_handle_limbo(struct work_struct *work);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 185f9bb992d1..297c20491549 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1906,7 +1906,7 @@ static int set_mba_sc(bool mba_sc)
>   	r->membw.mba_sc = mba_sc;
>   	list_for_each_entry(d, &r->domains, list) {
>   		hw_dom = resctrl_to_arch_dom(d);
> -		setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
> +		setup_default_ctrlval(r, hw_dom->ctrl_val);
>   	}
>   

I am wondering why new array is not reset instead of original at this 
call site? oh ... I see it is removed in following patch BUT it mentions 
that it is ok because reset_all_ctrls() does similar reset ... but it 
does not seem to do so for mbps_val.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-07-29 22:35 ` [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
@ 2021-09-01 21:27   ` Reinette Chatre
  2021-09-17 16:57     ` James Morse
  2021-09-24  6:23   ` tan.shaopeng
  1 sibling, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:27 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:35 PM, James Morse wrote:
> To determine whether the mba_mbps option to resctrl should be supported,

mba_mbps -> mba_MBps

> resctrl tests the boot cpus' x86_vendor.

CPU

> 
> This isn't portable, and needs abstracting behind a helper so this check
> can be part of the filesystem code that moves to /fs/.
> 
> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
> on this system. An 'alloc_capable' test is added so that this property
> isn't implied by 'linear'.

Why can linear not imply alloc_capable? It is a property of a MBA 
resource so if it is set then it has to be a MBA resource.

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
>   1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e321ea5de562..4388685548a0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1888,17 +1888,26 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
>   }
>   
>   /*
> - * Enable or disable the MBA software controller
> - * which helps user specify bandwidth in MBps.
>    * MBA software controller is supported only if
>    * MBM is supported and MBA is in linear scale.
>    */
> +static bool supports_mba_mbps(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> +
> +	return (is_mbm_enabled() &&
> +		r->alloc_capable && is_mba_linear());

As mentioned above I do not see need for the alloc_capable check since 
this is hardcoded to be a MBA resource for which delay_linear can only 
be set if alloc_capable is set.

> +}
> +
> +/*
> + * Enable or disable the MBA software controller
> + * which helps user specify bandwidth in MBps.
> + */
>   static int set_mba_sc(bool mba_sc)
>   {
>   	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>   
> -	if (!is_mbm_enabled() || !is_mba_linear() ||
> -	    mba_sc == is_mba_sc(r))
> +	if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
>   		return -EINVAL;
>   
>   	r->membw.mba_sc = mba_sc;
> @@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
>   		ctx->enable_cdpl2 = true;
>   		return 0;
>   	case Opt_mba_mbps:
> -		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)

I am not familiar with the history but it looks like AMD does not 
support linear (based on __rdt_get_mem_config_amd()) which may be the 
reason for this test and thus moving it to a feature check is ok.

> +		if (supports_mba_mbps())
>   			return -EINVAL;
>   		ctx->enable_mba_mbps = true;
>   		return 0;
> 

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly
  2021-07-29 22:36 ` [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
@ 2021-09-01 21:28   ` Reinette Chatre
  0 siblings, 0 replies; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:28 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 7/29/2021 3:36 PM, James Morse wrote:
> update_mba_bw() calculates a new control value for the MBA resource
> based on the user provided mbps_val and the current measured
> bandwidth. Some control values need remapping by delay_bw_map().
> 
> It does this by calling wrmsrl() directly. This needs splitting
> up to be done by an architecture specific helper, so that the
> remainder can eventually be moved to /fs/.
> 
> Add resctrl_arch_update_one() to apply one configuration value
> to the provided resource and domain. This avoids the staging
> and cross-calling that is only needed with changes made by
> user-space. delay_bw_map() moves to be part of the arch code,
> to maintain the 'percentage control' view of mba resources

mba -> MBA

> in resctrl.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   arch/x86/kernel/cpu/resctrl/core.c        |  2 +-
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 21 +++++++++++++++++++++
>   arch/x86/kernel/cpu/resctrl/internal.h    |  1 -
>   arch/x86/kernel/cpu/resctrl/monitor.c     | 13 ++++---------
>   include/linux/resctrl.h                   |  8 ++++++++
>   5 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index e864dbc6fe3d..8a3c13c6c19f 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -296,7 +296,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
>    * that can be written to QOS_MSRs.
>    * There are currently no SKUs which support non linear delay values.
>    */
> -u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
> +static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
>   {
>   	if (r->membw.delay_linear)
>   		return MAX_MBA_BW - bw;
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 56789ea11185..5104f39928fd 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -282,6 +282,27 @@ static bool apply_config(struct rdt_hw_domain *hw_dom,
>   	return false;
>   }
>   
> +int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> +{
> +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +	u32 idx = get_config_index(closid, t);
> +	struct msr_param msr_param;
> +
> +	if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
> +		return -EINVAL;
> +
> +	hw_dom->ctrl_val[idx] = cfg_val;
> +
> +	msr_param.res = r;
> +	msr_param.low = idx;
> +	msr_param.high = idx + 1;
> +
> +	rdt_ctrl_update(&msr_param);
> +
> +	return 0;
> +}
> +
>   int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
>   {
>   	struct resctrl_staged_config *cfg;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 796e13a0e8dc..1b07e49564cf 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -528,7 +528,6 @@ void mbm_handle_overflow(struct work_struct *work);
>   void __init intel_rdt_mbm_apply_quirk(void);
>   bool is_mba_sc(struct rdt_resource *r);
>   void setup_default_ctrlval(struct rdt_resource *r, u32 *dc);
> -u32 delay_bw_map(unsigned long bw, struct rdt_resource *r);
>   void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
>   void cqm_handle_limbo(struct work_struct *work);
>   bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index dcf3a73e2c17..b178329d3661 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -415,10 +415,8 @@ void mon_event_count(void *info)
>    */
>   static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   {
> -	u32 closid, rmid, cur_msr, cur_msr_val, new_msr_val;
> +	u32 closid, rmid, cur_msr_val, new_msr_val;
>   	struct mbm_state *pmbm_data, *cmbm_data;
> -	struct rdt_hw_resource *hw_r_mba;
> -	struct rdt_hw_domain *hw_dom_mba;
>   	u32 cur_bw, delta_bw, user_bw;
>   	struct rdt_resource *r_mba;
>   	struct rdt_domain *dom_mba;
> @@ -428,8 +426,8 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   	if (!is_mbm_local_enabled())
>   		return;
>   
> -	hw_r_mba = &rdt_resources_all[RDT_RESOURCE_MBA];
> -	r_mba = &hw_r_mba->r_resctrl;
> +	r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> +
>   	closid = rgrp->closid;
>   	rmid = rgrp->mon.rmid;
>   	pmbm_data = &dom_mbm->mbm_local[rmid];
> @@ -439,7 +437,6 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   		pr_warn_once("Failure to get domain for MBA update\n");
>   		return;
>   	}
> -	hw_dom_mba = resctrl_to_arch_dom(dom_mba);
>   
>   	cur_bw = pmbm_data->prev_bw;
>   	user_bw = dom_mba->mba_sc[closid].mbps_val;
> @@ -481,9 +478,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
>   		return;
>   	}
>   
> -	cur_msr = hw_r_mba->msr_base + closid;
> -	wrmsrl(cur_msr, delay_bw_map(new_msr_val, r_mba));
> -	hw_dom_mba->ctrl_val[closid] = new_msr_val;
> +	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
>   

This can now fail ... but looks to be ok considering the earlier test.

>   	/*
>   	 * Delta values are updated dynamically package wise for each
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 3c8522d63261..4fe2d5500315 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -205,6 +205,14 @@ struct resctrl_schema {
>   /* The number of closid supported by this resource regardless of CDP */
>   u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>   int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
> +
> +/*
> + * Update the ctrl_val and apply this config right now.
> + * Must be called on one of the domains cpus.

domains' CPUs

> + */
> +int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 closid, enum resctrl_conf_type t, u32 cfg_val);
> +
>   void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
>   			     u32 closid, enum resctrl_conf_type type,
>   			     u32 *value);
> 

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter
  2021-07-29 22:36 ` [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter James Morse
@ 2021-09-01 21:31   ` Reinette Chatre
  2021-09-17 16:58     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-01 21:31 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

Apologies but I find the changelog hard to understand.

On 7/29/2021 3:36 PM, James Morse wrote:
> mbm_bw_count() maintains its own copy of prev_msr to allow it to
> calculate the bandwidth as the number of chunks counted since the
> last time mbm_bw_count() was invoked.

ok, I understand there is an extra copy

> 
> The prev_msr and chunks values are in a format specific to the
> architecture. MPAM's monitor scaling can be enabled for some
> counters and not others. If the value is changed once the bandwidth
> counter passes some threshold, the prev_msr values need to be
> converted to the new scale. Having two prev_msr values is a
> hindrance to moving this logic behind an architecture specific
> function that returns the counters number of bytes.

Above talks about format ... it is not clear how it connects to 
introduction where there was mention of an "extra copy"

> 
> Change mbm_bw_count() to calculate the total number of bytes the
> counter has seen in the same way as __mon_event_count(), then
> calculate the bandwidth from that. prev_bw_msr is replaced by
> prev_bw_chunks, the chunks value from the last time mbm_bw_count()
> was invoked.

It is not clear to me how this change connects to either the extra copy 
or the format comments. It describes the changes made but it is not 
clear what the problem being solved is.

It is also not clear to me what the impact of this change on the non 
software controller flow is (where data is read from user space). What 
is the impact now that these two flows no longer track the previous read 
value separately? Would user space still see similar data as before this 
change? How would the software controller sharing control of prev_msr 
affect the data?

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-09-01 21:27   ` Reinette Chatre
@ 2021-09-17 16:57     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-09-17 16:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:27, Reinette Chatre wrote:
> On 7/29/2021 3:35 PM, James Morse wrote:
>> To determine whether the mba_mbps option to resctrl should be supported,
> 
> mba_mbps -> mba_MBps
> 
>> resctrl tests the boot cpus' x86_vendor.
> 
> CPU
> 
>>
>> This isn't portable, and needs abstracting behind a helper so this check
>> can be part of the filesystem code that moves to /fs/.

>> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
>> on this system. An 'alloc_capable' test is added so that this property
>> isn't implied by 'linear'.

> Why can linear not imply alloc_capable? It is a property of a MBA resource so if it is set
> then it has to be a MBA resource.

mba_sc depends on controls and monitors. It checks for monitors with is_mbm_enabled(), and
assumes that the delay_linear property being true means the control feature must exist. I
think this is fragile.

For MPAM, delay_linear is always true. Any way of controlling the bandwidth has this
property. Its very likely to be set statically in the struct, regardless of whether the
controls exist. alloc_capable seems to mean 'controls exist'.

This is to avoid an annoying bug where supports_mba_mbps() reports true, but the controls
aren't available. Or equally annoying, for the arch code to set/clear that flag in
addition to alloc_capable.

I'll add some verbage about MPAM to the commit message.



Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val
  2021-09-01 21:26   ` Reinette Chatre
@ 2021-09-17 16:57     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-09-17 16:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:26, Reinette Chatre wrote:
> On 7/29/2021 3:35 PM, James Morse wrote:
>> The resctrl arch code provides a second configuration array mbps_val[]
>> for the mba socftware controller.
> 
> "mba socftware" -> "MBA software"

(fixed)

>> Since resctrl switched over to allocating and freeing its own array
>> when needed, nothing uses the arch code version.
>>
>> Remove it.

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 56b3541617b5..e864dbc6fe3d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c

>> @@ -406,12 +406,9 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 *dc, u32 *dm)
>>        * Initialize the Control MSRs to having no control.
>>        * For Cache Allocation: Set all bits in cbm
>>        * For Memory Allocation: Set b/w requested to 100%
>> -     * and the bandwidth in MBps to U32_MAX
>>        */
>> -    for (i = 0; i < hw_res->num_closid; i++, dc++, dm++) {
>> +    for (i = 0; i < hw_res->num_closid; i++, dc++)
>>           *dc = r->default_ctrl;
>> -        *dm = MBA_MAX_MBPS;
>> -    }
> 
> Since this function used to reset the array to default I was expecting its callers to now
> reset the new array (more below).

dm/mbps_val? Its initialised when its allocated by mba_sc_domain_allocate() when the
filesystem is mounted, or the domain online callback is made.


>>   }
>>     static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 185f9bb992d1..297c20491549 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1906,7 +1906,7 @@ static int set_mba_sc(bool mba_sc)
>>       r->membw.mba_sc = mba_sc;
>>       list_for_each_entry(d, &r->domains, list) {
>>           hw_dom = resctrl_to_arch_dom(d);
>> -        setup_default_ctrlval(r, hw_dom->ctrl_val, hw_dom->mbps_val);
>> +        setup_default_ctrlval(r, hw_dom->ctrl_val);
>>       }
>>   
> 
> I am wondering why new array is not reset instead of original at this call site?

set_mba_sc() doesn't need to. During mount mba_sc is set to true in rdt_enable_ctx(),
which is before the mbps_val[] array is allocated. Once its allocated, its initialised.
While the filesystem is mounted, and domains come and go, the array is re-allocated and
re-initialised. On unmount, the array is freed when the schemata list is destroyed.

I think moving the allocation/free into set_mba_sc() will make this clearer.

> oh ... I
> see it is removed in following patch BUT it mentions that it is ok because
> reset_all_ctrls() does similar reset ... but it does not seem to do so for mbps_val.

That is about the ctrl_val[] array. The reset_all_ctrls() call should reset the hardware
that the arch code looks after. After these changes the mbps_val[] array is a resctrl
filesystem thing that the arch code doesn't need to know anything about.


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-09-01 21:25   ` Reinette Chatre
@ 2021-09-17 16:57     ` James Morse
  2021-09-17 18:20       ` Reinette Chatre
  0 siblings, 1 reply; 50+ messages in thread
From: James Morse @ 2021-09-17 16:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:25, Reinette Chatre wrote:
> On 7/29/2021 3:35 PM, James Morse wrote:
>> Updates to resctrl's software controller follow the same path as
>> other configuration updates, but they don't modify the hardware state.
>> rdtgroup_schemata_write() uses parse_line() and the resource's
>> ctrlval_parse function to stage the configuration.
>> resctrl_arch_update_domains() then updates the mbps_val[] array
>> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
>> call that would update hardware.
>>
>> This complicates the interface between resctrl's filesystem parts
>> and architecture specific code. It should be possible for mba_sc
>> to be completely implemented by the filesystem parts of resctrl. This
>> would allow it to work on a second architecture with no additional code.
>>
>> Change parse_bw() to write the configuration value directly to the
>> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
>> to skip the call to resctrl_arch_update_domains(), meaning all the
>> mba_sc specific code in resctrl_arch_update_domains() can be removed.
>> On the read-side, show_doms() and update_mba_bw() are changed to read
>> the mba_sc[] array from the domain structure. With this,
>> resctrl_arch_get_config() no longer needs to consider mba_sc resources.
>>
>> Change parse_bw() to write these values directly, meaning
>> rdtgroup_schemata_write() never needs to call update_domains()
>> for mba_sc resources.

> The above paragraph seems to contain duplicate information from the paragraph that
> precedes it.

Looks like two commit messages got combined. I've removed this, and the below paragraphs
as its already covered.


>> Get show_doms() to test is_mba_sc() and retrieve the value
>> directly, instead of using get_config() for the hardware value.
>>
>> This means the arch code's resctrl_arch_get_config() and
>> resctrl_arch_update_domains() no longer need to be aware of
>> mba_sc, and we can get rid of the update_mba_bw() code that
>> reaches into the hw_dom to get the msr value.

>> @@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>>         list_for_each_entry(s, &resctrl_schema_all, list) {
>>           r = s->res;
>> +
>> +        /*
>> +         * Writes to mba_sc resources update the software controller,
>> +         * not the control msr.
>> +         */
>> +        if (is_mba_sc(r))
>> +            continue;
>> +
> 
> A few resources can be updated in a single write to the schemata file. It is thus possible
> to update the cache allocation resource as well as memory bandwidth allocation in a single
> write.

i.e. echo "L3:0=7ff;1=7ff\nMB:0=100;1=50" > schemata

> As I understand this change in this scenario all configuration updates will be
> skipped, not just the memory bandwidth allocation ones.

The loop is per-schema, so its not a problem for L2/L3. This would only be a problem if
the is_mba_sc() resource had multiple schema. Only CDP does this, which the MBA controls
don't support.


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain
  2021-09-01 21:22   ` Reinette Chatre
@ 2021-09-17 16:57     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-09-17 16:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:22, Reinette Chatre wrote:
> On 7/29/2021 3:35 PM, James Morse wrote:
>> To support resctrl's MBA software controller, the architecture must provide
>> a second configuration array to hold the mbps_val from user-space.
>>
>> This complicates the interface between the architecture code.
> 
> This sentence seems incomplete. I was expecting something like " complicates the interface
> between the architecture code and ..."

Yup, looks like I got distracted while writing that.


>> Make the filesystem parts of resctrl create an array for the mba_sc
>> values when the struct resctrl_schema is created. The software controller
>> can be changed to use this, allowing the architecture code to only
>> consider the values configured in hardware.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index cf0db0b7a5d0..185f9bb992d1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2030,6 +2030,60 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
>>                    struct rdtgroup *prgrp,
>>                    struct kernfs_node **mon_data_kn);
>>   +static int mba_sc_domain_allocate(struct rdt_resource *res,
>> +                  struct rdt_domain *d)
>> +{
>> +    u32 num_closid = closid_free_map_len;
>> +    int cpu = cpumask_any(&d->cpu_mask);
>> +    int i;
>> +
>> +    d->mba_sc = kcalloc_node(num_closid, sizeof(*d->mba_sc),
>> +                 GFP_KERNEL, cpu_to_node(cpu));
>> +    if (!d->mba_sc)
>> +        return -ENOMEM;
>> +
>> +    for (i = 0; i < num_closid; i++)
>> +        d->mba_sc[i].mbps_val = MBA_MAX_MBPS;
>> +
>> +    return 0;
>> +}
>> +
> 
> I had the same initial reaction as Jamie and noted your answer to him. Considering the
> intricate flow here could you please add some comments to these functions that explains
> the calling flows in support of their safety?

After a shuffle to make this allocated at a more obvious time, it has this comment:
|	/*
|	 * d->mbps_val is allocated by a call to this function in set_mba_sc(),
|	 * and domain_setup_mon_state(). Both calls are guarded by is_mba_sc(),
|	 * which can only return true while the filesystem is mounted. The
|	 * two calls are prevented from racing as rdt_get_tree() takes the
|	 * cpuhp read lock before calling rdt_enable_ctx(ctx), which prevents
|	 * it running concurrently with resctrl_online_domain().
|	 */


>>   /**
>>    * struct rdt_domain - group of CPUs sharing a resctrl resource
>>    * @list:        all instances of this resource
>> @@ -53,6 +64,7 @@ struct resctrl_staged_config {
>>    * @cqm_work_cpu:    worker CPU for CQM h/w counters
>>    * @plr:        pseudo-locked region (if any) associated with domain
>>    * @staged_config:    parsed configuration to be applied
>> + * @mba_sc:    the mba software controller properties, indexed by closid
>>    */
>>   struct rdt_domain {
>>       struct list_head        list;
>> @@ -67,6 +79,7 @@ struct rdt_domain {
>>       int                cqm_work_cpu;
>>       struct pseudo_lock_region    *plr;
>>       struct resctrl_staged_config    staged_config[CDP_NUM_TYPES];
>> +    struct resctrl_mba_sc        *mba_sc;
>>   };
> 
> Why is this additional abstraction needed? As I understand the usage struct resctrl_mba_sc
> would always only have the one member so why not have mbps_val within rdt_domain?

Probably because I thought some form of prev_bw/delta_bw would go in here too, once the
variables need to calculate bytes had moved to be part of the architecture specific code.
In the end it didn't make sense because there are two mbm_state arrays, so can't be a
single struct array.

I'll roll it out,


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work
  2021-09-01 21:19   ` Reinette Chatre
@ 2021-09-17 16:57     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-09-17 16:57 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:19, Reinette Chatre wrote:
> On 7/29/2021 3:35 PM, James Morse wrote:
>> +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>> +{

>> +
>> +    if (is_mbm_enabled()) {
>> +        INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
>> +        mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
>> +    }
>> +
>> +    if (is_llc_occupancy_enabled())
>> +        INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
>> +
> 
> You also seem to address an issue where this work was not properly cleaned up on the error
> paths of the replaced domain_setup_mon_state(). Thank you.

Oops ... if I'd thought of it like that I'd have sent it as a fix!
(huh, it'll need a backport too, bother)


>> +    /* If resctrl is mounted, add per domain monitor data directories. */
>> +    if (static_branch_unlikely(&rdt_enable_key))
> 
> Should this be rdt_mon_enable_key instead?

Yup, fixed.


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter
  2021-09-01 21:31   ` Reinette Chatre
@ 2021-09-17 16:58     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-09-17 16:58 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 01/09/2021 22:31, Reinette Chatre wrote:
> Apologies but I find the changelog hard to understand.

No problem, clearly room for improvement!


> On 7/29/2021 3:36 PM, James Morse wrote:
>> mbm_bw_count() maintains its own copy of prev_msr to allow it to
>> calculate the bandwidth as the number of chunks counted since the
>> last time mbm_bw_count() was invoked.
> 
> ok, I understand there is an extra copy

The point I was trying to get across here is mbm_bw_count() is holding a hardware value,
which means it isn't architecture agnostic. Calculating bytes first paves the way to using
an arch helper that returns bytes.

This was originally later in the series, and it looks like it got damaged during a rebase.
I've rewritten it to calculate bandwidth based on the value read by the previous
__mon_event_count(), this is simpler and less noisy for the rest of the series.



Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-09-17 16:57     ` James Morse
@ 2021-09-17 18:20       ` Reinette Chatre
  2021-10-01 16:02         ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2021-09-17 18:20 UTC (permalink / raw)
  To: James Morse, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi James,

On 9/17/2021 9:57 AM, James Morse wrote:
> Hi Reinette,
> 
> On 01/09/2021 22:25, Reinette Chatre wrote:
>> On 7/29/2021 3:35 PM, James Morse wrote:
>>> Updates to resctrl's software controller follow the same path as
>>> other configuration updates, but they don't modify the hardware state.
>>> rdtgroup_schemata_write() uses parse_line() and the resource's
>>> ctrlval_parse function to stage the configuration.
>>> resctrl_arch_update_domains() then updates the mbps_val[] array
>>> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
>>> call that would update hardware.
>>>
>>> This complicates the interface between resctrl's filesystem parts
>>> and architecture specific code. It should be possible for mba_sc
>>> to be completely implemented by the filesystem parts of resctrl. This
>>> would allow it to work on a second architecture with no additional code.
>>>
>>> Change parse_bw() to write the configuration value directly to the
>>> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
>>> to skip the call to resctrl_arch_update_domains(), meaning all the
>>> mba_sc specific code in resctrl_arch_update_domains() can be removed.
>>> On the read-side, show_doms() and update_mba_bw() are changed to read
>>> the mba_sc[] array from the domain structure. With this,
>>> resctrl_arch_get_config() no longer needs to consider mba_sc resources.
>>>
>>> Change parse_bw() to write these values directly, meaning
>>> rdtgroup_schemata_write() never needs to call update_domains()
>>> for mba_sc resources.
> 
>> The above paragraph seems to contain duplicate information from the paragraph that
>> precedes it.
> 
> Looks like two commit messages got combined. I've removed this, and the below paragraphs
> as its already covered.
> 
> 
>>> Get show_doms() to test is_mba_sc() and retrieve the value
>>> directly, instead of using get_config() for the hardware value.
>>>
>>> This means the arch code's resctrl_arch_get_config() and
>>> resctrl_arch_update_domains() no longer need to be aware of
>>> mba_sc, and we can get rid of the update_mba_bw() code that
>>> reaches into the hw_dom to get the msr value.
> 
>>> @@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>>>          list_for_each_entry(s, &resctrl_schema_all, list) {
>>>            r = s->res;
>>> +
>>> +        /*
>>> +         * Writes to mba_sc resources update the software controller,
>>> +         * not the control msr.
>>> +         */
>>> +        if (is_mba_sc(r))
>>> +            continue;
>>> +
>>
>> A few resources can be updated in a single write to the schemata file. It is thus possible
>> to update the cache allocation resource as well as memory bandwidth allocation in a single
>> write.
> 
> i.e. echo "L3:0=7ff;1=7ff\nMB:0=100;1=50" > schemata

I do not think something like the above would show the issue. If you 
want to test this via the shell you need to use ANSI-C quoting. 
Adjusting what you show to something like:

echo -n $'L3:0=7ff;1=7ff\nMB:0=100;1=50\n'

>> As I understand this change in this scenario all configuration updates will be
>> skipped, not just the memory bandwidth allocation ones.
> 
> The loop is per-schema, so its not a problem for L2/L3. This would only be a problem if
> the is_mba_sc() resource had multiple schema. Only CDP does this, which the MBA controls
> don't support.

The loop iterates through the entire buffer provided to the schemata 
file and the buffer could contain multiple schema. This is more typical 
when interacting with the schemata file with a SDK perhaps.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-07-29 22:35 ` [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
  2021-09-01 21:27   ` Reinette Chatre
@ 2021-09-24  6:23   ` tan.shaopeng
  2021-10-01 16:01     ` James Morse
  1 sibling, 1 reply; 50+ messages in thread
From: tan.shaopeng @ 2021-09-24  6:23 UTC (permalink / raw)
  To: 'James Morse', 'x86@kernel.org',
	'linux-kernel@vger.kernel.org'
  Cc: 'Fenghua Yu', 'Reinette Chatre',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'H Peter Anvin',
	'Babu Moger',
	'shameerali.kolothum.thodi@huawei.com',
	'Jamie Iles', 'D Scott Phillips OS',
	'lcherian@marvell.com',
	'bobo.shaobowang@huawei.com'

Hi James,

> To determine whether the mba_mbps option to resctrl should be supported,
> resctrl tests the boot cpus' x86_vendor.
> 
> This isn't portable, and needs abstracting behind a helper so this check can be
> part of the filesystem code that moves to /fs/.
> 
> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
> on this system. An 'alloc_capable' test is added so that this property isn't
> implied by 'linear'.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e321ea5de562..4388685548a0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1888,17 +1888,26 @@ void rdt_domain_reconfigure_cdp(struct
> rdt_resource *r)  }
> 
>  /*
> - * Enable or disable the MBA software controller
> - * which helps user specify bandwidth in MBps.
>   * MBA software controller is supported only if
>   * MBM is supported and MBA is in linear scale.
>   */
> +static bool supports_mba_mbps(void)
> +{
> +	struct rdt_resource *r =
> +&rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> +
> +	return (is_mbm_enabled() &&
> +		r->alloc_capable && is_mba_linear()); }
> +
> +/*
> + * Enable or disable the MBA software controller
> + * which helps user specify bandwidth in MBps.
> + */
>  static int set_mba_sc(bool mba_sc)
>  {
>  	struct rdt_resource *r =
> &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> 
> -	if (!is_mbm_enabled() || !is_mba_linear() ||
> -	    mba_sc == is_mba_sc(r))
> +	if (!supports_mba_mbps() || mba_sc == is_mba_sc(r))
>  		return -EINVAL;
> 
>  	r->membw.mba_sc = mba_sc;
> @@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc,
> struct fs_parameter *param)
>  		ctx->enable_cdpl2 = true;
>  		return 0;
>  	case Opt_mba_mbps:
> -		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> +		if (supports_mba_mbps())
>  			return -EINVAL;
I think if(!supports_mba_mbps()) is correct, isn't it? 

Thanks,

Shaopeng Tan
>  		ctx->enable_mba_mbps = true;
>  		return 0;
> --
> 2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset
  2021-07-29 22:36 ` [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
@ 2021-09-24  6:34   ` tan.shaopeng
  2021-10-01 16:01     ` James Morse
  0 siblings, 1 reply; 50+ messages in thread
From: tan.shaopeng @ 2021-09-24  6:34 UTC (permalink / raw)
  To: 'James Morse', 'x86@kernel.org',
	'linux-kernel@vger.kernel.org'
  Cc: 'Fenghua Yu', 'Reinette Chatre',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'H Peter Anvin',
	'Babu Moger',
	'shameerali.kolothum.thodi@huawei.com',
	'Jamie Iles', 'D Scott Phillips OS',
	'lcherian@marvell.com',
	'bobo.shaobowang@huawei.com',
	tan.shaopeng

Hi James,

> To abstract the rmid counters into a helper that returns the number of bytes
> counted, architecture specific per-rmid state is needed.
> 
> It needs to be possible to reset this hidden state, as the values may outlive the
> life of an rmid, or the mount time of the filesystem.
> 
> mon_event_read() is called with first = true when an rmid is first allocated in
> mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from
> __mon_event_count()'s rr->first check.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h | 18 ++++--------
> arch/x86/kernel/cpu/resctrl/monitor.c  | 38
> +++++++++++++++++++++-----
>  include/linux/resctrl.h                | 23 ++++++++++++++++
>  3 files changed, 59 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> b/arch/x86/kernel/cpu/resctrl/internal.h
> index aaae900a8ef3..f3f31315a907 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -22,14 +22,6 @@
> 
>  #define L2_QOS_CDP_ENABLE		0x01ULL
> 
> -/*
> - * Event IDs are used to program IA32_QM_EVTSEL before reading event
> - * counter from IA32_QM_CTR
> - */
> -#define QOS_L3_OCCUP_EVENT_ID		0x01
> -#define QOS_L3_MBM_TOTAL_EVENT_ID	0x02
> -#define QOS_L3_MBM_LOCAL_EVENT_ID	0x03
> -
>  #define CQM_LIMBOCHECK_INTERVAL	1000
> 
>  #define MBM_CNTR_WIDTH_BASE		24
> @@ -73,7 +65,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
> -	u32			evtid;
> +	enum resctrl_event_id	evtid;
>  	char			*name;
>  	struct list_head	list;
>  };
> @@ -90,9 +82,9 @@ struct mon_evt {
>  union mon_data_bits {
>  	void *priv;
>  	struct {
> -		unsigned int rid	: 10;
> -		unsigned int evtid	: 8;
> -		unsigned int domid	: 14;
> +		unsigned int rid		: 10;
> +		enum resctrl_event_id evtid	: 8;
> +		unsigned int domid		: 14;
>  	} u;
>  };
> 
> @@ -100,7 +92,7 @@ struct rmid_read {
>  	struct rdtgroup		*rgrp;
>  	struct rdt_resource	*r;
>  	struct rdt_domain	*d;
> -	int			evtid;
> +	enum resctrl_event_id	evtid;
>  	bool			first;
>  	u64			val;
>  };
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index af60e154f0ed..3b8b29470a5c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -137,7 +137,34 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
>  	return entry;
>  }
> 
> -static u64 __rmid_read(u32 rmid, u32 eventid)
> +static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain
> *hw_dom,
> +						 u32 rmid,
> +						 enum resctrl_event_id
> eventid)
> +{
> +	switch (eventid) {
> +	case QOS_L3_OCCUP_EVENT_ID:
> +		return NULL;
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		return &hw_dom->arch_mbm_total[rmid];
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		return &hw_dom->arch_mbm_local[rmid];
> +	}
> +

Since it is unexpected to come here,
it might be better to add WARN_ON.

In addition, I have tested these patches on Intel(R) Xeon(R) Gold 6254 CPU with resctrl selftest. It is no problem.

Thanks,

Shaopeng Tan

> +	return NULL;
> +}
> +
> +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> +			     u32 rmid, enum resctrl_event_id eventid) {
> +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +	struct arch_mbm_state *m;
> +
> +	m = get_arch_mbm_state(hw_dom, rmid, eventid);
> +	if (m)
> +		memset(m, 0, sizeof(*m));
> +}
> +
> +static u64 __rmid_read(u32 rmid, enum resctrl_event_id eventid)
>  {
>  	u64 val;
> 
> @@ -291,6 +318,9 @@ static int __mon_event_count(u32 rmid, struct
> rmid_read *rr)
>  	struct mbm_state *m;
>  	u64 chunks, tval;
> 
> +	if (rr->first)
> +		resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
> +
>  	tval = __rmid_read(rmid, rr->evtid);
>  	if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
>  		rr->val = tval;
> @@ -306,12 +336,6 @@ static int __mon_event_count(u32 rmid, struct
> rmid_read *rr)
>  	case QOS_L3_MBM_LOCAL_EVENT_ID:
>  		m = &rr->d->mbm_local[rmid];
>  		break;
> -	default:
> -		/*
> -		 * Code would never reach here because
> -		 * an invalid event id would fail the __rmid_read.
> -		 */
> -		return -EINVAL;
>  	}
> 
>  	if (rr->first) {
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index
> 4fe2d5500315..79e83ce3dfbc 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -32,6 +32,16 @@ enum resctrl_conf_type {
> 
>  #define CDP_NUM_TYPES	(CDP_DATA + 1)
> 
> +/*
> + * Event IDs, the values match those used to program IA32_QM_EVTSEL
> +before
> + * reading IA32_QM_CTR on RDT systems.
> + */
> +enum resctrl_event_id {
> +	QOS_L3_OCCUP_EVENT_ID		= 0x01,
> +	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
> +	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
> +};
> +
>  /**
>   * struct resctrl_staged_config - parsed configuration to be applied
>   * @new_ctrl:		new ctrl value to be loaded
> @@ -219,4 +229,17 @@ void resctrl_arch_get_config(struct rdt_resource *r,
> struct rdt_domain *d,  int resctrl_online_domain(struct rdt_resource *r, struct
> rdt_domain *d);  void resctrl_offline_domain(struct rdt_resource *r, struct
> rdt_domain *d);
> 
> +/**
> + * resctrl_arch_reset_rmid() - Reset any private state associated with rmid
> + *			       and eventid.
> + * @r:		The domain's resource.
> + * @d:		The rmid's domain.
> + * @rmid:	The rmid whose counter values should be reset.
> + * @eventid:	The eventid whose counter values should be reset.
> + *
> + * This can be called from any CPU.
> + */
> +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
> +			     u32 rmid, enum resctrl_event_id eventid);
> +
>  #endif /* _RESCTRL_H */
> --
> 2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset
  2021-09-24  6:34   ` tan.shaopeng
@ 2021-10-01 16:01     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-10-01 16:01 UTC (permalink / raw)
  To: tan.shaopeng, 'x86@kernel.org',
	'linux-kernel@vger.kernel.org'
  Cc: 'Fenghua Yu', 'Reinette Chatre',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'H Peter Anvin',
	'Babu Moger',
	'shameerali.kolothum.thodi@huawei.com',
	'Jamie Iles', 'D Scott Phillips OS',
	'lcherian@marvell.com',
	'bobo.shaobowang@huawei.com'

Hi Shaopeng Tan,

On 24/09/2021 07:34, tan.shaopeng@fujitsu.com wrote:
>> To abstract the rmid counters into a helper that returns the number of bytes
>> counted, architecture specific per-rmid state is needed.
>>
>> It needs to be possible to reset this hidden state, as the values may outlive the
>> life of an rmid, or the mount time of the filesystem.
>>
>> mon_event_read() is called with first = true when an rmid is first allocated in
>> mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from
>> __mon_event_count()'s rr->first check.


>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index af60e154f0ed..3b8b29470a5c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -137,7 +137,34 @@ static inline struct rmid_entry *__rmid_entry(u32 rmid)
>>  	return entry;
>>  }
>>
>> -static u64 __rmid_read(u32 rmid, u32 eventid)
>> +static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_domain
>> *hw_dom,
>> +						 u32 rmid,
>> +						 enum resctrl_event_id
>> eventid)
>> +{
>> +	switch (eventid) {
>> +	case QOS_L3_OCCUP_EVENT_ID:
>> +		return NULL;
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		return &hw_dom->arch_mbm_total[rmid];
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		return &hw_dom->arch_mbm_local[rmid];
>> +	}
>> +
> 
> Since it is unexpected to come here,
> it might be better to add WARN_ON.

Sure. (it'll be the 'once' version to avoid spamming the console)
I'm relying on the compiler generating a warning a built-time here if a new enum entry is
ever added, but it can't hurt to warning if someone passes something totally crazy to it.


> In addition, I have tested these patches on Intel(R) Xeon(R) Gold 6254 CPU with
> resctrl selftest. It is no problem.

Good to know, thanks!


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps()
  2021-09-24  6:23   ` tan.shaopeng
@ 2021-10-01 16:01     ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-10-01 16:01 UTC (permalink / raw)
  To: tan.shaopeng, 'x86@kernel.org',
	'linux-kernel@vger.kernel.org'
  Cc: 'Fenghua Yu', 'Reinette Chatre',
	'Thomas Gleixner', 'Ingo Molnar',
	'Borislav Petkov', 'H Peter Anvin',
	'Babu Moger',
	'shameerali.kolothum.thodi@huawei.com',
	'Jamie Iles', 'D Scott Phillips OS',
	'lcherian@marvell.com',
	'bobo.shaobowang@huawei.com'

Hi  Shaopeng Tan,

On 24/09/2021 07:23, tan.shaopeng@fujitsu.com wrote:
>> To determine whether the mba_mbps option to resctrl should be supported,
>> resctrl tests the boot cpus' x86_vendor.
>>
>> This isn't portable, and needs abstracting behind a helper so this check can be
>> part of the filesystem code that moves to /fs/.
>>
>> Re-use the tests set_mba_sc() does to determine if the mba_sc is supported
>> on this system. An 'alloc_capable' test is added so that this property isn't
>> implied by 'linear'.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index e321ea5de562..4388685548a0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2317,7 +2326,7 @@ static int rdt_parse_param(struct fs_context *fc,
>> struct fs_parameter *param)
>>  		ctx->enable_cdpl2 = true;
>>  		return 0;
>>  	case Opt_mba_mbps:
>> -		if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>> +		if (supports_mba_mbps())
>>  			return -EINVAL;
> I think if(!supports_mba_mbps()) is correct, isn't it? 

Ooops!

(looks like I messed this up when fixing the name)


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list
  2021-09-17 18:20       ` Reinette Chatre
@ 2021-10-01 16:02         ` James Morse
  0 siblings, 0 replies; 50+ messages in thread
From: James Morse @ 2021-10-01 16:02 UTC (permalink / raw)
  To: Reinette Chatre, x86, linux-kernel
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Babu Moger, shameerali.kolothum.thodi, Jamie Iles,
	D Scott Phillips OS, lcherian, bobo.shaobowang

Hi Reinette,

On 17/09/2021 19:20, Reinette Chatre wrote:
> On 9/17/2021 9:57 AM, James Morse wrote:
>> On 01/09/2021 22:25, Reinette Chatre wrote:
>>> On 7/29/2021 3:35 PM, James Morse wrote:
>>>> Updates to resctrl's software controller follow the same path as
>>>> other configuration updates, but they don't modify the hardware state.
>>>> rdtgroup_schemata_write() uses parse_line() and the resource's
>>>> ctrlval_parse function to stage the configuration.
>>>> resctrl_arch_update_domains() then updates the mbps_val[] array
>>>> instead, and resctrl_arch_update_domains() skips the rdt_ctrl_update()
>>>> call that would update hardware.
>>>>
>>>> This complicates the interface between resctrl's filesystem parts
>>>> and architecture specific code. It should be possible for mba_sc
>>>> to be completely implemented by the filesystem parts of resctrl. This
>>>> would allow it to work on a second architecture with no additional code.
>>>>
>>>> Change parse_bw() to write the configuration value directly to the
>>>> mba_sc[] array in the domain structure. Change rdtgroup_schemata_write()
>>>> to skip the call to resctrl_arch_update_domains(), meaning all the
>>>> mba_sc specific code in resctrl_arch_update_domains() can be removed.
>>>> On the read-side, show_doms() and update_mba_bw() are changed to read
>>>> the mba_sc[] array from the domain structure. With this,
>>>> resctrl_arch_get_config() no longer needs to consider mba_sc resources.
>>>>
>>>> Change parse_bw() to write these values directly, meaning
>>>> rdtgroup_schemata_write() never needs to call update_domains()
>>>> for mba_sc resources.
>>
>>> The above paragraph seems to contain duplicate information from the paragraph that
>>> precedes it.
>>
>> Looks like two commit messages got combined. I've removed this, and the below paragraphs
>> as its already covered.
>>
>>
>>>> Get show_doms() to test is_mba_sc() and retrieve the value
>>>> directly, instead of using get_config() for the hardware value.
>>>>
>>>> This means the arch code's resctrl_arch_get_config() and
>>>> resctrl_arch_update_domains() no longer need to be aware of
>>>> mba_sc, and we can get rid of the update_mba_bw() code that
>>>> reaches into the hw_dom to get the msr value.
>>
>>>> @@ -406,6 +406,14 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
>>>>          list_for_each_entry(s, &resctrl_schema_all, list) {
>>>>            r = s->res;
>>>> +
>>>> +        /*
>>>> +         * Writes to mba_sc resources update the software controller,
>>>> +         * not the control msr.
>>>> +         */
>>>> +        if (is_mba_sc(r))
>>>> +            continue;
>>>> +
>>>
>>> A few resources can be updated in a single write to the schemata file. It is thus possible
>>> to update the cache allocation resource as well as memory bandwidth allocation in a single
>>> write.
>>
>> i.e. echo "L3:0=7ff;1=7ff\nMB:0=100;1=50" > schemata
> 
> I do not think something like the above would show the issue. If you want to test this via
> the shell you need to use ANSI-C quoting. Adjusting what you show to something like:
> 
> echo -n $'L3:0=7ff;1=7ff\nMB:0=100;1=50\n'
> 
>>> As I understand this change in this scenario all configuration updates will be
>>> skipped, not just the memory bandwidth allocation ones.
>>
>> The loop is per-schema, so its not a problem for L2/L3. This would only be a problem if
>> the is_mba_sc() resource had multiple schema. Only CDP does this, which the MBA controls
>> don't support.


> The loop iterates through the entire buffer provided to the schemata file and the buffer
> could contain multiple schema. This is more typical when interacting with the schemata
> file with a SDK perhaps.

I think we are talking about different loops. The diff didn't include much context.
With more context:

| ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
| 				  char *buf, size_t nbytes, loff_t off)
| {
[...]
|
| 	while ((tok = strsep(&buf, "\n")) != NULL) {
| 		resname = strim(strsep(&tok, ":"));
| 		if (!tok) {
| 			rdt_last_cmd_puts("Missing ':'\n");
| 			ret = -EINVAL;
| 			goto out;
| 		}
|		if (tok[0] == '\0') {
| 			rdt_last_cmd_printf("Missing '%s' value\n", resname);
| 			ret = -EINVAL;
| 			goto out;
| 		}
| 		ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
| 		if (ret)
| 			goto out;
| 	}

This is the loop that iterates over the buffer. A break in here would cause the problem
you describe.

|
| 	list_for_each_entry(s, &resctrl_schema_all, list) {
| 		r = s->res;
|
| 		/*
| 		 * Writes to mba_sc resources update the software controller,
| 		 * not the control msr.
| 		 */
| 		if (is_mba_sc(r))
| 			continue;
|
| 		ret = resctrl_arch_update_domains(r, rdtgrp->closid);
| 		if (ret)
| 			goto out;
| 	}

Whereas this one is per-schema. The continue skips the call to update the hardware for
mba_sc, because this will be done by update_mba_bw() when it is next called.

Updating multiple resources with one schema write would be dealt with by the first loop.
The whole buffer is parsed, (unless there is an error). This patch doesn't affect that.
The second loop is is about updating the hardware to match the freshly parsed config.


Thanks,

James

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-10-01 16:02 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 22:35 [PATCH v1 00/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse
2021-07-29 22:35 ` [PATCH v1 01/20] x86/resctrl: Kill off alloc_enabled James Morse
2021-08-11 12:12   ` Jamie Iles
2021-09-01 21:18   ` Reinette Chatre
2021-07-29 22:35 ` [PATCH v1 02/20] x86/resctrl: Merge mon_capable and mon_enabled James Morse
2021-08-11 12:15   ` Jamie Iles
2021-08-11 15:16     ` James Morse
2021-07-29 22:35 ` [PATCH v1 03/20] x86/resctrl: Add domain online callback for resctrl work James Morse
2021-09-01 21:19   ` Reinette Chatre
2021-09-17 16:57     ` James Morse
2021-07-29 22:35 ` [PATCH v1 04/20] x86/resctrl: Add domain offline " James Morse
2021-08-11 16:10   ` Jamie Iles
2021-09-01 21:21   ` Reinette Chatre
2021-07-29 22:35 ` [PATCH v1 05/20] x86/resctrl: Create mba_sc configuration in the rdt_domain James Morse
2021-08-11 16:32   ` Jamie Iles
2021-08-31 16:24     ` James Morse
2021-09-01 21:22   ` Reinette Chatre
2021-09-17 16:57     ` James Morse
2021-07-29 22:35 ` [PATCH v1 06/20] x86/resctrl: Switch over to the resctrl mbps_val list James Morse
2021-09-01 21:25   ` Reinette Chatre
2021-09-17 16:57     ` James Morse
2021-09-17 18:20       ` Reinette Chatre
2021-10-01 16:02         ` James Morse
2021-07-29 22:35 ` [PATCH v1 07/20] x86/resctrl: Remove architecture copy of mbps_val James Morse
2021-09-01 21:26   ` Reinette Chatre
2021-09-17 16:57     ` James Morse
2021-07-29 22:35 ` [PATCH v1 08/20] x86/resctrl: Remove set_mba_sc()s control array re-initialisation James Morse
2021-07-29 22:35 ` [PATCH v1 09/20] x86/resctrl: Abstract and use supports_mba_mbps() James Morse
2021-09-01 21:27   ` Reinette Chatre
2021-09-17 16:57     ` James Morse
2021-09-24  6:23   ` tan.shaopeng
2021-10-01 16:01     ` James Morse
2021-07-29 22:36 ` [PATCH v1 10/20] x86/resctrl: Allow update_mba_bw() to update controls directly James Morse
2021-09-01 21:28   ` Reinette Chatre
2021-07-29 22:36 ` [PATCH v1 11/20] x86/resctrl: Calculate bandwidth from the total bytes counter James Morse
2021-09-01 21:31   ` Reinette Chatre
2021-09-17 16:58     ` James Morse
2021-07-29 22:36 ` [PATCH v1 12/20] x86/recstrl: Add per-rmid arch private storage for overflow and chunks James Morse
2021-08-11 17:14   ` Jamie Iles
2021-08-31 16:25     ` James Morse
2021-07-29 22:36 ` [PATCH v1 13/20] x86/recstrl: Allow per-rmid arch private storage to be reset James Morse
2021-09-24  6:34   ` tan.shaopeng
2021-10-01 16:01     ` James Morse
2021-07-29 22:36 ` [PATCH v1 14/20] x86/resctrl: Abstract __rmid_read() James Morse
2021-07-29 22:36 ` [PATCH v1 15/20] x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() James Morse
2021-07-29 22:36 ` [PATCH v1 16/20] x86/resctrl: Move mbm_overflow_count() " James Morse
2021-07-29 22:36 ` [PATCH v1 17/20] x86/resctrl: Move get_corrected_mbm_count() " James Morse
2021-07-29 22:36 ` [PATCH v1 18/20] x86/resctrl: Rename and change the units of resctrl_cqm_threshold James Morse
2021-07-29 22:36 ` [PATCH v1 19/20] x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data James Morse
2021-07-29 22:36 ` [PATCH v1 20/20] x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes James Morse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).