linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix
@ 2022-08-22 13:42 Babu Moger
  2022-08-22 13:42 ` [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
                   ` (9 more replies)
  0 siblings, 10 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:42 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

New AMD processors can now support following QoS features.
1. Slow Memory Bandwidth Configuration
   With this feature, the QOS enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC)
   The bandwidth monitoring events mbm_total_event and mbm_local_event 
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not enough
   to count all the different types are memory events. With the feature
   BMEC, the users have the option to configure mbm_total_event and
   mbm_local_event to count the specific type of events.

   Following are the bitmaps of events supported.
   Bits    Description
      6       Dirty Victims from the QOS domain to all types of memory
      5       Reads to slow memory in the non-local NUMA domain
      4       Reads to slow memory in the local NUMA domain
      3       Non-temporal writes to non-local NUMA domain
      2       Non-temporal writes to local NUMA domain
      1       Reads to memory in the non-local NUMA domain
      0       Reads to memory in the local NUMA domain

This series adds support for these features.

Feature description is available in the specification, "AMD64 Technology Platform Quality
of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---

v3:
  a. Rebased the patches to latest tip. Resolved some conflicts.
     https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  b. Taken care of feedback from Bagas Sanjaya.
  c. Added Reviewed by from Mingo.
  Note: I am still looking for comments from Reinette or Fenghua.

v2:
  a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
  b. Added the patch to fix CBM issue on AMD. This was originally discussed
     https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
  https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/

v1:
  https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/

Babu Moger (10):
      x86/resctrl: Fix min_cbm_bits for AMD
      x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
      x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
      x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
      x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
      x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
      x86/resctrl: Add sysfs interface files to read/write event configuration
      x86/resctrl: Add the sysfs interface to read the event configuration
      x86/resctrl: Add sysfs interface to write the event configuration
      Documentation/x86: Update resctrl_ui.rst for new features


 Documentation/x86/resctrl.rst             | 126 ++++++++++++
 arch/x86/include/asm/cpufeatures.h        |   2 +
 arch/x86/kernel/cpu/resctrl/core.c        |  70 ++++++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |   2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    |  26 +++
 arch/x86/kernel/cpu/resctrl/monitor.c     |  16 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 237 +++++++++++++++++++++-
 arch/x86/kernel/cpu/scattered.c           |   2 +
 include/linux/resctrl.h                   |   1 +
 9 files changed, 472 insertions(+), 10 deletions(-)

--


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
@ 2022-08-22 13:42 ` Babu Moger
  2022-08-23 20:56   ` Reinette Chatre
  2022-08-22 13:42 ` [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:42 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

AMD systems support zero CBM (capacity bit mask) for L3 allocation.
That is reflected in rdt_init_res_defs_amd() by:

	r->cache.arch_has_empty_bitmaps = true;

However given the unified code in cbm_validate(), checking for:
	val == 0 && !arch_has_empty_bitmaps

is not enough because of another check in cbm_validate():

	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)

The default value of r->cache.min_cbm_bits = 1.

Leading to:

	$ cd /sys/fs/resctrl
	$ mkdir foo
	$ cd foo
	$ echo L3:0=0 > schemata
          -bash: echo: write error: Invalid argument
	$ cat /sys/fs/resctrl/info/last_cmd_status
	  Need at least 1 bits in the mask

Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also remove
the default setting of min_cbm_bits and initialize it separately.

After the fix
	$ cd /sys/fs/resctrl
	$ mkdir foo
	$ cd foo
	$ echo L3:0=0 > schemata
	$ cat /sys/fs/resctrl/info/last_cmd_status
	  ok

Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bb1c3f5f60c8..a5c51a14fbce 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.rid			= RDT_RESOURCE_L3,
 			.name			= "L3",
 			.cache_level		= 3,
-			.cache = {
-				.min_cbm_bits	= 1,
-			},
 			.domains		= domain_init(RDT_RESOURCE_L3),
 			.parse_ctrlval		= parse_cbm,
 			.format_str		= "%d=%0*x",
@@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.rid			= RDT_RESOURCE_L2,
 			.name			= "L2",
 			.cache_level		= 2,
-			.cache = {
-				.min_cbm_bits	= 1,
-			},
 			.domains		= domain_init(RDT_RESOURCE_L2),
 			.parse_ctrlval		= parse_cbm,
 			.format_str		= "%d=%0*x",
@@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
 			r->cache.arch_has_sparse_bitmaps = false;
 			r->cache.arch_has_empty_bitmaps = false;
 			r->cache.arch_has_per_cpu_cfg = false;
+			r->cache.min_cbm_bits = 1;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
 			hw_res->msr_update = mba_wrmsr_intel;
@@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
 			r->cache.arch_has_sparse_bitmaps = true;
 			r->cache.arch_has_empty_bitmaps = true;
 			r->cache.arch_has_per_cpu_cfg = true;
+			r->cache.min_cbm_bits = 0;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
  2022-08-22 13:42 ` [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
@ 2022-08-22 13:42 ` Babu Moger
  2022-08-23 22:47   ` Reinette Chatre
  2022-08-22 13:43 ` [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:42 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
enforcement policies can be applied to external slow memory connected
to the host. QOS enforcement is accomplished by assigning a Class Of
Service (COS) to a processor and specifying allocations or limits for
that COS for each resource to be allocated.

This feature is identified by the CPUID Function 8000_0020_EBX_x0.

CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
Bits    Field Name      Description
2       L3SBE           L3 external slow memory bandwidth enforcement

Feature description is available in the specification, "AMD64 Technology Platform Quality
of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 235dc85c91c3..1815435c9c88 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -304,6 +304,7 @@
 #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
+#define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index fd44b54c90d5..885ecf46abb2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
  2022-08-22 13:42 ` [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
  2022-08-22 13:42 ` [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-08-22 13:43 ` Babu Moger
  2022-08-24 17:39   ` Reinette Chatre
  2022-08-26 14:59   ` Moger, Babu
  2022-08-22 13:43 ` [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:43 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS
enforcement policies on the external slow memory.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index a5c51a14fbce..6c38427b71a2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_SMBA] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_SMBA,
+			.name			= "SB",
+			.cache_level		= 3,
+			.domains		= domain_init(RDT_RESOURCE_SMBA),
+			.parse_ctrlval		= parse_bw,
+			.format_str		= "%d=%*u",
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d647188a43b..24a1dfeb6cb2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -418,6 +418,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_SMBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (2 preceding siblings ...)
  2022-08-22 13:43 ` [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-08-22 13:43 ` Babu Moger
  2022-08-23 22:47   ` Reinette Chatre
  2022-08-22 13:43 ` [PATCH v3 05/10] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:43 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c        |   50 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +
 arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    9 +++--
 4 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6c38427b71a2..36ad97ab7342 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -253,6 +253,37 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	return true;
 }
 
+static bool __rdt_get_s_mem_config_amd(struct rdt_resource *r)
+{
+	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	union cpuid_0x10_3_eax eax;
+	union cpuid_0x10_x_edx edx;
+	u32 ebx, ecx;
+
+	cpuid_count(0x80000020, 2, &eax.full, &ebx, &ecx, &edx.full);
+	hw_res->num_closid = edx.split.cos_max + 1;
+	r->default_ctrl = MAX_MBA_BW_AMD;
+
+	/* AMD does not use delay */
+	r->membw.delay_linear = false;
+	r->membw.arch_needs_linear = false;
+
+	/*
+	 * AMD does not use memory delay throttle model to control
+	 * the allocation like Intel does.
+	 */
+	r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+	r->membw.min_bw = 0;
+	r->membw.bw_gran = 1;
+	/* Max value is 2048, Data width should be 4 in decimal */
+	r->data_width = 4;
+
+	r->alloc_capable = true;
+	r->alloc_enabled = true;
+
+	return true;
+}
+
 static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
 {
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -787,6 +818,19 @@ static __init bool get_mem_config(void)
 	return false;
 }
 
+static __init bool get_s_mem_config(void)
+{
+	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
+
+	if (!rdt_cpu_has(X86_FEATURE_SMBA))
+		return false;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return __rdt_get_s_mem_config_amd(&hw_res->r_resctrl);
+
+	return false;
+}
+
 static __init bool get_rdt_alloc_resources(void)
 {
 	struct rdt_resource *r;
@@ -817,6 +861,9 @@ static __init bool get_rdt_alloc_resources(void)
 	if (get_mem_config())
 		ret = true;
 
+	if (get_s_mem_config())
+		ret = true;
+
 	return ret;
 }
 
@@ -908,6 +955,9 @@ static __init void rdt_init_res_defs_amd(void)
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;
+		} else if (r->rid == RDT_RESOURCE_SMBA) {
+			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
+			hw_res->msr_update = mba_wrmsr_amd;
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..11ec3577db40 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -203,7 +203,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	unsigned long dom_id;
 
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    r->rid == RDT_RESOURCE_MBA) {
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 24a1dfeb6cb2..c049a274383c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -14,6 +14,7 @@
 #define MSR_IA32_L2_CBM_BASE		0xd10
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
+#define MSR_IA32_SMBA_BW_BASE		0xc0000280
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f276aff521e8..fc5286067201 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1218,7 +1218,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA)
+		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->domains, list) {
@@ -1399,7 +1399,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 				ctrl = resctrl_arch_get_config(r, d,
 							       rdtgrp->closid,
 							       schema->conf_type);
-				if (r->rid == RDT_RESOURCE_MBA)
+				if (r->rid == RDT_RESOURCE_MBA ||
+				    r->rid == RDT_RESOURCE_SMBA)
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -2807,7 +2808,9 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA) {
+		if (r->rid == RDT_RESOURCE_MBA ||
+		    r->rid == RDT_RESOURCE_SMBA) {
+
 			rdtgroup_init_mba(r);
 		} else {
 			ret = rdtgroup_init_cat(s, rdtgrp->closid);



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 05/10] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (3 preceding siblings ...)
  2022-08-22 13:43 ` [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
@ 2022-08-22 13:43 ` Babu Moger
  2022-08-22 13:43 ` [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration Babu Moger
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:43 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

Support of this feature is available via CPUID Fn8000_0020_EBX_x0 (ECX=0).
Bits    Field Name       Description
3        EVT_CFG         Bandwidth Monitoring Event Configuration (BMEC)

Currently, the bandwidth monitoring events mbm_total_bytes and
mbm_local_bytes are set to count all the total and local reads/writes
respectively. With the introduction of SLOW memory, the two counters are
not enough to count all the different types of memory events. With the
feature BMEC, the users have the option to configure mbm_total_bytes and
mbm_local_bytes to count the specific type of events.

Each BMEC event has a configuration MSR, QOS_EVT_CFG (0x000_0400h +
EventID) which contains one field for each Bandwidth Type that can be used
to configure the bandwidth event to track any combination of supported
bandwidth types. The event will count requests from every Bandwidth Type
bit that is set in the corresponding configuration register.

Following are the types of events supported:

====    ========================================================
Bits    Description
====    ========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1815435c9c88..a4ee02a56d54 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -305,6 +305,7 @@
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
 #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC		(11*32+18) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 885ecf46abb2..7981df0b910e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
+	{ X86_FEATURE_BMEC,             CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (4 preceding siblings ...)
  2022-08-22 13:43 ` [PATCH v3 05/10] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-08-22 13:43 ` Babu Moger
  2022-08-24 21:15   ` Reinette Chatre
  2022-08-22 13:43 ` [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:43 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
are configurable when this feature is present.

Set mon_configurable if the feature is available.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/monitor.c  |   14 ++++++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   17 +++++++++++++++++
 include/linux/resctrl.h                |    1 +
 3 files changed, 32 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eaf25a234ff5..b9de417dac1c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -682,6 +682,16 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
+
+void __rdt_get_mon_l3_config_amd(struct rdt_resource *r)
+{
+	/*
+	 * Check if CPU supports the Bandwidth Monitoring Event Configuration
+	 */
+	if (boot_cpu_has(X86_FEATURE_BMEC))
+		r->mon_configurable = true;
+}
+
 int rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
@@ -714,6 +724,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (ret)
 		return ret;
 
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		__rdt_get_mon_l3_config_amd(r);
+
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index fc5286067201..855483b297a8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_mon_configurable_show(struct kernfs_open_file *of,
+				     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->mon_configurable);
+
+	return 0;
+}
+
 static int rdt_mon_features_show(struct kernfs_open_file *of,
 				 struct seq_file *seq, void *v)
 {
@@ -1447,6 +1457,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_num_rmids_show,
 		.fflags		= RF_MON_INFO,
 	},
+	{
+		.name		= "mon_configurable",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_mon_configurable_show,
+		.fflags		= RF_MON_INFO,
+	},
 	{
 		.name		= "cbm_mask",
 		.mode		= 0444,
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 21deb5212bbd..4ee2b606ac14 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -154,6 +154,7 @@ struct rdt_resource {
 	bool			mon_enabled;
 	bool			alloc_capable;
 	bool			mon_capable;
+	bool			mon_configurable;
 	int			num_rmid;
 	int			cache_level;
 	struct resctrl_cache	cache;



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (5 preceding siblings ...)
  2022-08-22 13:43 ` [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration Babu Moger
@ 2022-08-22 13:43 ` Babu Moger
  2022-08-24 21:15   ` Reinette Chatre
  2022-08-22 13:44 ` [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the " Babu Moger
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:43 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Add two new sysfs files to read/write the event configuration if
the feature Bandwidth Monitoring Event Configuration (BMEC) is
supported. The file mbm_local_config is for the configuration
of the event mbm_local_bytes and the file mbm_total_config is
for the configuration of mbm_total_bytes.

$ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config

$ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/internal.h |    3 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  |    2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c049a274383c..fc725f5e9024 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -72,11 +72,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
+ * @config:	current configuration
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	u32			evtid;
 	char			*name;
+	char			*config;
 	struct list_head	list;
 };
 
@@ -95,6 +97,7 @@ union mon_data_bits {
 		unsigned int rid	: 10;
 		unsigned int evtid	: 8;
 		unsigned int domid	: 14;
+		unsigned int mon_config : 32;
 	} u;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b9de417dac1c..3f900241dbab 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -656,11 +656,13 @@ static struct mon_evt llc_occupancy_event = {
 static struct mon_evt mbm_total_event = {
 	.name		= "mbm_total_bytes",
 	.evtid		= QOS_L3_MBM_TOTAL_EVENT_ID,
+	.config 	= "mbm_total_config",
 };
 
 static struct mon_evt mbm_local_event = {
 	.name		= "mbm_local_bytes",
 	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
+	.config 	= "mbm_local_config",
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 855483b297a8..30d2182d4fda 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
 	.seq_show		= rdtgroup_mondata_show,
 };
 
+static const struct kernfs_ops kf_mondata_config_ops = {
+	.atomic_write_len       = PAGE_SIZE,
+};
+
 static bool is_cpu_list(struct kernfs_open_file *of)
 {
 	struct rftype *rft = of->kn->priv;
@@ -2534,6 +2538,25 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
 	}
 }
 
+static int mon_config_addfile(struct kernfs_node *parent_kn, const char *name,
+			      void *priv)
+{
+	struct kernfs_node *kn;
+	int ret = 0;
+
+	kn = __kernfs_create_file(parent_kn, name, 0644,
+			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
+			&kf_mondata_config_ops, priv, NULL, NULL);
+	if (IS_ERR(kn))
+		return PTR_ERR(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		kernfs_remove(kn);
+
+	return ret;
+}
+
 static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 				struct rdt_domain *d,
 				struct rdt_resource *r, struct rdtgroup *prgrp)
@@ -2568,6 +2591,15 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		if (ret)
 			goto out_destroy;
 
+		/* Create the sysfs event configuration files */
+		if (r->mon_configurable &&
+		    (mevt->evtid == QOS_L3_MBM_TOTAL_EVENT_ID ||
+		     mevt->evtid == QOS_L3_MBM_LOCAL_EVENT_ID)) {
+			ret = mon_config_addfile(kn, mevt->config, priv.priv);
+			if (ret)
+				goto out_destroy;
+		}
+
 		if (is_mbm_event(mevt->evtid))
 			mon_event_read(&rr, r, d, prgrp, mevt->evtid, true);
 	}



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (6 preceding siblings ...)
  2022-08-22 13:43 ` [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
@ 2022-08-22 13:44 ` Babu Moger
  2022-08-22 13:47   ` Bagas Sanjaya
  2022-08-24 21:16   ` Reinette Chatre
  2022-08-22 13:44 ` [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write " Babu Moger
  2022-08-22 13:45 ` [PATCH v3 10/10] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
  9 siblings, 2 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:44 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

The current event configuration can be viewed by the user by reading
the sysfs configuration file.

Following are the types of events supported.
====================================================================
Bits    Description
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain

By default the mbm_total_bytes configuration is set to 0x7f to count
all the types of events and mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

$cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
0x7f

$cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
0x15

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/internal.h |   21 ++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   70 ++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index fc725f5e9024..457666709386 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -15,6 +15,7 @@
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
@@ -50,6 +51,26 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
+/* Reads to Local DRAM Memory */
+#define READS_TO_LOCAL_MEM		BIT(0)
+
+/* Reads to Remote DRAM Memory */
+#define READS_TO_REMOTE_MEM		BIT(1)
+
+/* Non-Temporal Writes to Local Memory */
+#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
+
+/* Non-Temporal Writes to Remote Memory */
+#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
+
+/* Reads to Local Memory the system identifies as "Slow Memory" */
+#define READS_TO_LOCAL_S_MEM		BIT(4)
+
+/* Reads to Remote Memory the system identifies as "Slow Memory" */
+#define READS_TO_REMOTE_S_MEM		BIT(5)
+
+/* Dirty Victims to All Types of Memory */
+#define  DIRTY_VICTIS_TO_ALL_MEM	BIT(6)
 
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 30d2182d4fda..e1847d49fa15 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -254,8 +254,78 @@ static const struct kernfs_ops kf_mondata_ops = {
 	.seq_show		= rdtgroup_mondata_show,
 };
 
+/*
+ * This is called via IPI to read the CQM/MBM counters
+ * in a domain.
+ */
+void mon_event_config_read(void *info)
+{
+	union mon_data_bits *md = info;
+	u32 evtid = md->u.evtid;
+	u32 h, msr_index;
+
+	switch (evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		return; /* Not expected to come here */
+	}
+
+	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, h);
+}
+
+void mondata_config_read(struct rdt_domain *d, union mon_data_bits *md)
+{
+	smp_call_function_any(&d->cpu_mask, mon_event_config_read, md, 1);
+}
+
+int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
+{
+	struct kernfs_open_file *of = m->private;
+	struct rdt_hw_resource *hw_res;
+	u32 resid, evtid, domid;
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	union mon_data_bits md;
+	struct rdt_domain *d;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	md.priv = of->kn->priv;
+	resid = md.u.rid;
+	domid = md.u.domid;
+	evtid = md.u.evtid;
+
+	hw_res = &rdt_resources_all[resid];
+	r = &hw_res->r_resctrl;
+
+	d = rdt_find_domain(r, domid, NULL);
+	if (IS_ERR_OR_NULL(d)) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	mondata_config_read(d, &md);
+
+	seq_printf(m, "0x%x\n", md.u.mon_config);
+
+out:
+	rdtgroup_kn_unlock(of->kn);
+	return ret;
+}
+
 static const struct kernfs_ops kf_mondata_config_ops = {
 	.atomic_write_len       = PAGE_SIZE,
+	.seq_show               = rdtgroup_mondata_config_show,
 };
 
 static bool is_cpu_list(struct kernfs_open_file *of)



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write the event configuration
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (7 preceding siblings ...)
  2022-08-22 13:44 ` [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the " Babu Moger
@ 2022-08-22 13:44 ` Babu Moger
  2022-08-24 21:16   ` Reinette Chatre
  2022-08-22 13:45 ` [PATCH v3 10/10] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
  9 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:44 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Add the sysfs interface to write the event configuration for the
MBM configurable events. The event configuration can be changed by
writing to the sysfs file for that specific event.

Following are the types of events supported.
==================================================================
Bits    Description
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain

By default the mbm_total_bytes configuration is set to 0x7f to count
all the types of events and mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

For example:
To change the mbm_total_bytes to count all the reads, run the command.
$echo  0x33 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config

To change the mbm_local_bytes to count all the slow memory reads, run
the command.
$echo  0x30 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  109 ++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e1847d49fa15..83c8780726ff 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -323,9 +323,118 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
 	return ret;
 }
 
+/*
+ * This is called via IPI to read the CQM/MBM counters
+ * in a domain.
+ */
+void mon_event_config_write(void *info)
+{
+	union mon_data_bits *md = info;
+	u32 evtid = md->u.evtid;
+	u32 msr_index;
+
+	switch (evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		return; /* Not expected to come here */
+	}
+
+	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, 0);
+}
+
+ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
+				       char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_hw_resource *hw_res;
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	unsigned int mon_config;
+	cpumask_var_t cpu_mask;
+	union mon_data_bits md;
+	struct rdt_domain *d;
+	u32 resid, domid;
+	int ret = 0, cpu;
+
+	ret = kstrtouint(buf, 0, &mon_config);
+	if (ret)
+		return ret;
+
+	rdt_last_cmd_clear();
+
+	/* mon_config cannot be more than the supported set of events */
+	if (mon_config > GENMASK(6, 0)) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return -EINVAL;
+	}
+
+	cpus_read_lock();
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		return -ENOENT;
+		goto e_unlock;
+	}
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto e_unlock;
+	}
+
+
+	md.priv = of->kn->priv;
+	resid = md.u.rid;
+	domid = md.u.domid;
+
+	hw_res = &rdt_resources_all[resid];
+	r = &hw_res->r_resctrl;
+	d = rdt_find_domain(r, domid, NULL);
+	if (IS_ERR_OR_NULL(d)) {
+		ret = -ENOENT;
+		goto e_cpumask;
+	}
+
+	md.u.mon_config = mon_config & 0xFF;
+
+	/* Pick all the CPUs in the domain instance */
+	for_each_cpu(cpu, &d->cpu_mask)
+		cpumask_set_cpu(cpu, cpu_mask);
+
+	cpu = get_cpu();
+	/* Update MSR_IA32_EVT_CFG_BASE MSR on this cpu if it's in cpu_mask */
+	if (cpumask_test_cpu(cpu, cpu_mask))
+		mon_event_config_write(&md);
+
+	/* Update MSR_IA32_EVT_CFG_BASE MSR on all other cpus in cpu_mask */
+	smp_call_function_many(cpu_mask, mon_event_config_write, &md, 1);
+	put_cpu();
+
+	/*
+	 * When an Event Configuration is changed, the bandwidth counters
+	 * for all RMIDs and Events will be cleared, and the U-bit for every
+	 * RMID will be set on the next read to any BwEvent for every RMID.
+	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
+	 */
+	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
+	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
+
+e_cpumask:
+	free_cpumask_var(cpu_mask);
+
+e_unlock:
+	rdtgroup_kn_unlock(of->kn);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 static const struct kernfs_ops kf_mondata_config_ops = {
 	.atomic_write_len       = PAGE_SIZE,
 	.seq_show               = rdtgroup_mondata_config_show,
+	.write                  = rdtgroup_mondata_config_write,
 };
 
 static bool is_cpu_list(struct kernfs_open_file *of)



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 10/10] Documentation/x86: Update resctrl_ui.rst for new features
  2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (8 preceding siblings ...)
  2022-08-22 13:44 ` [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write " Babu Moger
@ 2022-08-22 13:45 ` Babu Moger
  9 siblings, 0 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-22 13:45 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, hpa, linux-kernel,
	linux-doc, bagasdotme

Update the documentation for the new features:
1. Slow Memory Bandwidth allocation.
   With this feature, the QOS  enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC).
   The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not
   enough to count all the different types are memory events. With the
   feature BMEC, the users have the option to configure mbm_total_bytes
   and mbm_local_bytes to count the specific type of events.

Also add configuration instructions with examples.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 Documentation/x86/resctrl.rst |  126 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 71a531061e4e..871d0f031ab5 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -167,6 +167,12 @@ with the following files:
 		bytes) at which a previously used LLC_occupancy
 		counter can be considered for re-use.
 
+"mon_configurable":
+                Provides the information if the events mbm_total and
+                mbm_local are configurable. See the configuration
+                details for "mbm_total_config" and "mbm_local_config"
+                for more information.
+
 Finally, in the top level of the "info" directory there is a file
 named "last_cmd_status". This is reset with every "command" issued
 via the file system (making new directories or writing to any of the
@@ -264,6 +270,32 @@ When monitoring is enabled all MON groups will also contain:
 	the sum for all tasks in the CTRL_MON group and all tasks in
 	MON groups. Please see example section for more details on usage.
 
+"mbm_total_config", "mbm_local_config":
+        This contains the current event configuration for the events
+        mbm_total_bytes and mbm_local_bytes, respectively, when the
+        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
+        These files are organized by L3 domains under the subdirectories
+        "mon_L3_00" and "mon_L3_01". When BMEC is supported, the events
+        mbm_local_bytes and mbm_total_bytes are configurable.
+
+        Following are the types of events supported:
+
+        ====    ========================================================
+        Bits    Description
+        ====    ========================================================
+        6       Dirty Victims from the QOS domain to all types of memory
+        5       Reads to slow memory in the non-local NUMA domain
+        4       Reads to slow memory in the local NUMA domain
+        3       Non-temporal writes to non-local NUMA domain
+        2       Non-temporal writes to local NUMA domain
+        1       Reads to memory in the non-local NUMA domain
+        0       Reads to memory in the local NUMA domain
+        ====    ========================================================
+
+        By default, the mbm_total_bytes configuration is set to 0x7F to count
+        all the event types and the mbm_local_bytes configuration is set to
+        0x15 to count all the local memory events.
+
 Resource allocation rules
 -------------------------
 
@@ -464,6 +496,14 @@ Memory bandwidth domain is L3 cache.
 
 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
 
+Slow Memory bandwidth Allocation (when supported)
+-------------------------------------------------
+
+Slow Memory b/w domain is L3 cache.
+::
+
+	SB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Reading/writing the schemata file
 ---------------------------------
 Reading the schemata file will show the state of all resources
@@ -479,6 +519,44 @@ which you wish to change.  E.g.
   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Reading/writing the schemata file (on AMD systems)
+--------------------------------------------------
+Reading the schemata file will show the state of all resources
+on all domains. When writing the memory bandwidth allocation you
+only need to specify those values in an absolute number expressed
+in 1/8 GB/s increments. To allocate bandwidth limit of 2GB, you
+need to specify the value 16 (16 * 1/8 = 2).  E.g.
+::
+
+  # cat schemata
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "MB:1=16" > schemata
+  # cat schemata
+    MB:0=2048;1=  16;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+Reading/writing the schemata file (on AMD systems) with slow memory
+-------------------------------------------------------------------
+Reading the schemata file will show the state of all resources
+on all domains. When writing the memory bandwidth allocation you
+only need to specify those values in an absolute number expressed
+in 1/8 GB/s increments. To allocate bandwidth limit of 8GB, you
+need to specify the value 64 (64 * 1/8 = 8).  E.g.
+::
+
+  # cat schemata
+    SB:0=2048;1=2048;2=2048;3=2048
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "SB:1=64" > schemata
+  # cat schemata
+    SB:0=2048;1=  64;2=2048;3=2048
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
 Cache Pseudo-Locking
 ====================
 CAT enables a user to specify the amount of cache space that an
@@ -1210,6 +1288,54 @@ View the llc occupancy snapshot::
   # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
   11234000
 
+Example 5 (Configure and Monitor specific event types)
+------------------------------------------------------
+
+A single socket system which has real time tasks running on cores 0-4
+and non real time tasks on other CPUs. We want to monitor the memory
+bandwidth allocation for specific events.
+::
+
+  # mount -t resctrl resctrl /sys/fs/resctrl
+  # cd /sys/fs/resctrl
+  # mkdir p1
+
+Move the CPUs 0-4 over to p1::
+
+  # echo 0xf > p1/cpus
+
+View the current mbm_local_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_bytes
+  112501
+
+Change the mbm_local_bytes to count mon-temporal writes to both local
+and non-local NUMA domain. Refer to event supported bitmap under
+mbm_local_config::
+
+  # echo 0xc > /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_config
+
+View the updated mbm_local_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_bytes
+  12601
+
+Similar experiment on mbm_total_bytes. First view the current mbm_total_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes
+  1532501
+
+Change the mbm_total_bytes to count only reads to slow memory on both local
+and non-local NUMA domain. Refer to event supported bitmap under
+mbm_total_config::
+
+  # echo 0x30 > /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_config
+
+View the updated mbm_total_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes
+  104562
+
 Intel RDT Errata
 ================
 



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:44 ` [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the " Babu Moger
@ 2022-08-22 13:47   ` Bagas Sanjaya
  2022-08-22 13:50     ` Moger, Babu
  2022-08-22 13:55     ` Moger, Babu
  2022-08-24 21:16   ` Reinette Chatre
  1 sibling, 2 replies; 44+ messages in thread
From: Bagas Sanjaya @ 2022-08-22 13:47 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc

On 8/22/22 20:44, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the sysfs configuration file.
> 
> Following are the types of events supported.
> ====================================================================
> Bits    Description
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> 

If the table above was in Documentation/, Sphinx would flag it as
malformed table. Regardless (because it is in the patch description),
I'd like to see it properly formatted.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:47   ` Bagas Sanjaya
@ 2022-08-22 13:50     ` Moger, Babu
  2022-08-22 13:55     ` Moger, Babu
  1 sibling, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-22 13:50 UTC (permalink / raw)
  To: Bagas Sanjaya, fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc


On 8/22/22 08:47, Bagas Sanjaya wrote:
> On 8/22/22 20:44, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the sysfs configuration file.
>>
>> Following are the types of events supported.
>> ====================================================================
>> Bits    Description
>> 6       Dirty Victims from the QOS domain to all types of memory
>> 5       Reads to slow memory in the non-local NUMA domain
>> 4       Reads to slow memory in the local NUMA domain
>> 3       Non-temporal writes to non-local NUMA domain
>> 2       Non-temporal writes to local NUMA domain
>> 1       Reads to memory in the non-local NUMA domain
>> 0       Reads to memory in the local NUMA domain
>>
> If the table above was in Documentation/, Sphinx would flag it as
> malformed table. Regardless (because it is in the patch description),
> I'd like to see it properly formatted.
Sure. Will do.
>
> Thanks.
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:47   ` Bagas Sanjaya
  2022-08-22 13:50     ` Moger, Babu
@ 2022-08-22 13:55     ` Moger, Babu
  2022-08-23  1:55       ` Bagas Sanjaya
  1 sibling, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-22 13:55 UTC (permalink / raw)
  To: Bagas Sanjaya, fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc


On 8/22/22 08:47, Bagas Sanjaya wrote:
> On 8/22/22 20:44, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the sysfs configuration file.
>>
>> Following are the types of events supported.
>> ====================================================================
>> Bits    Description
>> 6       Dirty Victims from the QOS domain to all types of memory
>> 5       Reads to slow memory in the non-local NUMA domain
>> 4       Reads to slow memory in the local NUMA domain
>> 3       Non-temporal writes to non-local NUMA domain
>> 2       Non-temporal writes to local NUMA domain
>> 1       Reads to memory in the non-local NUMA domain
>> 0       Reads to memory in the local NUMA domain
>>
> If the table above was in Documentation/, Sphinx would flag it as
> malformed table. Regardless (because it is in the patch description),
> I'd like to see it properly formatted.

Actually, I ran "make htmldocs" and didn't see any warnings. I will try to
fix this though. 


>
> Thanks.
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:55     ` Moger, Babu
@ 2022-08-23  1:55       ` Bagas Sanjaya
  0 siblings, 0 replies; 44+ messages in thread
From: Bagas Sanjaya @ 2022-08-23  1:55 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc

On 8/22/22 20:55, Moger, Babu wrote:
>> If the table above was in Documentation/, Sphinx would flag it as
>> malformed table. Regardless (because it is in the patch description),
>> I'd like to see it properly formatted.
> 
> Actually, I ran "make htmldocs" and didn't see any warnings. I will try to
> fix this though. 
> 

That's because the table is in patch description, not in the diff.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD
  2022-08-22 13:42 ` [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
@ 2022-08-23 20:56   ` Reinette Chatre
  2022-08-24 15:58     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-23 20:56 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:42 AM, Babu Moger wrote:
> AMD systems support zero CBM (capacity bit mask) for L3 allocation.
> That is reflected in rdt_init_res_defs_amd() by:
> 
> 	r->cache.arch_has_empty_bitmaps = true;
> 
> However given the unified code in cbm_validate(), checking for:
> 	val == 0 && !arch_has_empty_bitmaps
> 
> is not enough because of another check in cbm_validate():
> 
> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
> 
> The default value of r->cache.min_cbm_bits = 1.
> 
> Leading to:
> 
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
>           -bash: echo: write error: Invalid argument
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  Need at least 1 bits in the mask
> 
> Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also remove
> the default setting of min_cbm_bits and initialize it separately.
> 
> After the fix
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  ok
> 
> Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
> Signed-off-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index bb1c3f5f60c8..a5c51a14fbce 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.rid			= RDT_RESOURCE_L3,
>  			.name			= "L3",
>  			.cache_level		= 3,
> -			.cache = {
> -				.min_cbm_bits	= 1,
> -			},
>  			.domains		= domain_init(RDT_RESOURCE_L3),
>  			.parse_ctrlval		= parse_cbm,
>  			.format_str		= "%d=%0*x",
> @@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.rid			= RDT_RESOURCE_L2,
>  			.name			= "L2",
>  			.cache_level		= 2,
> -			.cache = {
> -				.min_cbm_bits	= 1,
> -			},
>  			.domains		= domain_init(RDT_RESOURCE_L2),
>  			.parse_ctrlval		= parse_cbm,
>  			.format_str		= "%d=%0*x",
> @@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
>  			r->cache.arch_has_sparse_bitmaps = false;
>  			r->cache.arch_has_empty_bitmaps = false;
>  			r->cache.arch_has_per_cpu_cfg = false;
> +			r->cache.min_cbm_bits = 1;
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
>  			hw_res->msr_update = mba_wrmsr_intel;
> @@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
>  			r->cache.arch_has_sparse_bitmaps = true;
>  			r->cache.arch_has_empty_bitmaps = true;
>  			r->cache.arch_has_per_cpu_cfg = true;
> +			r->cache.min_cbm_bits = 0;
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>  			hw_res->msr_update = mba_wrmsr_amd;
> 
> 

Thank you for putting this together.

This change makes arch_has_empty_bitmaps redundant. Can it be removed?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-22 13:42 ` [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-08-23 22:47   ` Reinette Chatre
  2022-08-25 22:42     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-23 22:47 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:42 AM, Babu Moger wrote:
> Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
> enforcement policies can be applied to external slow memory connected
> to the host. QOS enforcement is accomplished by assigning a Class Of
> Service (COS) to a processor and specifying allocations or limits for
> that COS for each resource to be allocated.
> 
> This feature is identified by the CPUID Function 8000_0020_EBX_x0.
> 
> CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
> Bits    Field Name      Description
> 2       L3SBE           L3 external slow memory bandwidth enforcement
> 
> Feature description is available in the specification, "AMD64 Technology Platform Quality
> of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---

resctrl currently supports "memory bandwidth allocation" and this series adds
"slow memory bandwidth allocation". Could you please provide more detail about
what the difference is between "MBA" and "SMBA"? It is clear that the implementation
treats them as different resources, but both resources are associated with L3 cache
domains and (from what I understand) throttling always occurs at the CPU. Can both
types of memory resources thus be seen as downstream from L3 cache? How can
a user know what memory is considered when configuring MBA and what memory is
considered when configuring SMBA? Additionally, I do find the term "slow" to be
vague as a way to distinguish between different memory types. What is the
definition of "slow"? Would all "slow" memory on the system support SMBA?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-08-22 13:43 ` [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
@ 2022-08-23 22:47   ` Reinette Chatre
  2022-08-24 16:48     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-23 22:47 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:43 AM, Babu Moger wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6c38427b71a2..36ad97ab7342 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -253,6 +253,37 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>  	return true;
>  }
>  
> +static bool __rdt_get_s_mem_config_amd(struct rdt_resource *r)
> +{
> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +	union cpuid_0x10_3_eax eax;
> +	union cpuid_0x10_x_edx edx;
> +	u32 ebx, ecx;
> +
> +	cpuid_count(0x80000020, 2, &eax.full, &ebx, &ecx, &edx.full);
> +	hw_res->num_closid = edx.split.cos_max + 1;
> +	r->default_ctrl = MAX_MBA_BW_AMD;
> +
> +	/* AMD does not use delay */
> +	r->membw.delay_linear = false;
> +	r->membw.arch_needs_linear = false;
> +
> +	/*
> +	 * AMD does not use memory delay throttle model to control
> +	 * the allocation like Intel does.
> +	 */
> +	r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
> +	r->membw.min_bw = 0;
> +	r->membw.bw_gran = 1;
> +	/* Max value is 2048, Data width should be 4 in decimal */
> +	r->data_width = 4;
> +
> +	r->alloc_capable = true;
> +	r->alloc_enabled = true;
> +
> +	return true;
> +}
> +


From what I can tell this new function is almost identical to (it differs with
one character from) __rdt_get_mem_config_amd(). Could it be refactored to
avoid such duplication?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD
  2022-08-23 20:56   ` Reinette Chatre
@ 2022-08-24 15:58     ` Moger, Babu
  0 siblings, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-24 15:58 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/23/22 15:56, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:42 AM, Babu Moger wrote:
>> AMD systems support zero CBM (capacity bit mask) for L3 allocation.
>> That is reflected in rdt_init_res_defs_amd() by:
>>
>> 	r->cache.arch_has_empty_bitmaps = true;
>>
>> However given the unified code in cbm_validate(), checking for:
>> 	val == 0 && !arch_has_empty_bitmaps
>>
>> is not enough because of another check in cbm_validate():
>>
>> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
>>
>> The default value of r->cache.min_cbm_bits = 1.
>>
>> Leading to:
>>
>> 	$ cd /sys/fs/resctrl
>> 	$ mkdir foo
>> 	$ cd foo
>> 	$ echo L3:0=0 > schemata
>>           -bash: echo: write error: Invalid argument
>> 	$ cat /sys/fs/resctrl/info/last_cmd_status
>> 	  Need at least 1 bits in the mask
>>
>> Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also remove
>> the default setting of min_cbm_bits and initialize it separately.
>>
>> After the fix
>> 	$ cd /sys/fs/resctrl
>> 	$ mkdir foo
>> 	$ cd foo
>> 	$ echo L3:0=0 > schemata
>> 	$ cat /sys/fs/resctrl/info/last_cmd_status
>> 	  ok
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20220517001234.3137157-1-eranian%40google.com%2F&amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cd695c39eb7d94d659db008da8549ed74%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637968849807214070%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=prSwl8bTHx0y3o9hBNxp3u%2FNu8SrNcEWUwDOCQCVURk%3D&amp;reserved=0
>> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/kernel/cpu/resctrl/core.c |    8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index bb1c3f5f60c8..a5c51a14fbce 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>>  			.rid			= RDT_RESOURCE_L3,
>>  			.name			= "L3",
>>  			.cache_level		= 3,
>> -			.cache = {
>> -				.min_cbm_bits	= 1,
>> -			},
>>  			.domains		= domain_init(RDT_RESOURCE_L3),
>>  			.parse_ctrlval		= parse_cbm,
>>  			.format_str		= "%d=%0*x",
>> @@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>>  			.rid			= RDT_RESOURCE_L2,
>>  			.name			= "L2",
>>  			.cache_level		= 2,
>> -			.cache = {
>> -				.min_cbm_bits	= 1,
>> -			},
>>  			.domains		= domain_init(RDT_RESOURCE_L2),
>>  			.parse_ctrlval		= parse_cbm,
>>  			.format_str		= "%d=%0*x",
>> @@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
>>  			r->cache.arch_has_sparse_bitmaps = false;
>>  			r->cache.arch_has_empty_bitmaps = false;
>>  			r->cache.arch_has_per_cpu_cfg = false;
>> +			r->cache.min_cbm_bits = 1;
>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>  			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
>>  			hw_res->msr_update = mba_wrmsr_intel;
>> @@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
>>  			r->cache.arch_has_sparse_bitmaps = true;
>>  			r->cache.arch_has_empty_bitmaps = true;
>>  			r->cache.arch_has_per_cpu_cfg = true;
>> +			r->cache.min_cbm_bits = 0;
>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>  			hw_res->msr_update = mba_wrmsr_amd;
>>
>>
> Thank you for putting this together.
>
> This change makes arch_has_empty_bitmaps redundant. Can it be removed?

Hi Reinette,

Actually, I thought about that. Sure. We can remove it.

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-08-23 22:47   ` Reinette Chatre
@ 2022-08-24 16:48     ` Moger, Babu
  0 siblings, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-24 16:48 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Reinette,

On 8/23/22 17:47, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:43 AM, Babu Moger wrote:
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 6c38427b71a2..36ad97ab7342 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -253,6 +253,37 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>>  	return true;
>>  }
>>  
>> +static bool __rdt_get_s_mem_config_amd(struct rdt_resource *r)
>> +{
>> +	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +	union cpuid_0x10_3_eax eax;
>> +	union cpuid_0x10_x_edx edx;
>> +	u32 ebx, ecx;
>> +
>> +	cpuid_count(0x80000020, 2, &eax.full, &ebx, &ecx, &edx.full);
>> +	hw_res->num_closid = edx.split.cos_max + 1;
>> +	r->default_ctrl = MAX_MBA_BW_AMD;
>> +
>> +	/* AMD does not use delay */
>> +	r->membw.delay_linear = false;
>> +	r->membw.arch_needs_linear = false;
>> +
>> +	/*
>> +	 * AMD does not use memory delay throttle model to control
>> +	 * the allocation like Intel does.
>> +	 */
>> +	r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
>> +	r->membw.min_bw = 0;
>> +	r->membw.bw_gran = 1;
>> +	/* Max value is 2048, Data width should be 4 in decimal */
>> +	r->data_width = 4;
>> +
>> +	r->alloc_capable = true;
>> +	r->alloc_enabled = true;
>> +
>> +	return true;
>> +}
>> +
>
> From what I can tell this new function is almost identical to (it differs with
> one character from) __rdt_get_mem_config_amd(). Could it be refactored to
> avoid such duplication?

Yes. Sure. We can do that.

Thanks


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-08-22 13:43 ` [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-08-24 17:39   ` Reinette Chatre
  2022-08-26 14:59   ` Moger, Babu
  1 sibling, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2022-08-24 17:39 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:43 AM, Babu Moger wrote:
> Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS
> enforcement policies on the external slow memory.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index a5c51a14fbce..6c38427b71a2 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.fflags			= RFTYPE_RES_MB,
>  		},
>  	},
> +	[RDT_RESOURCE_SMBA] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_SMBA,
> +			.name			= "SB",

This name seems very cryptic to me. I do not know how you picked this
name but thought it worth mentioning that names need not be limited to
two characters. There is already support to ensure fields are lined up
(see "max_name_width") in support of the longer "L2CODE", "L2DATA",
"L3CODE", and "L3DATA" resource names.

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-22 13:43 ` [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration Babu Moger
@ 2022-08-24 21:15   ` Reinette Chatre
  2022-08-25 15:11     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-24 21:15 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:43 AM, Babu Moger wrote:
> Newer AMD processors support the new feature Bandwidth Monitoring Event
> Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
> are configurable when this feature is present.
> 
> Set mon_configurable if the feature is available.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c  |   14 ++++++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   17 +++++++++++++++++
>  include/linux/resctrl.h                |    1 +
>  3 files changed, 32 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index eaf25a234ff5..b9de417dac1c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -682,6 +682,16 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>  		list_add_tail(&mbm_local_event.list, &r->evt_list);
>  }
>  
> +
> +void __rdt_get_mon_l3_config_amd(struct rdt_resource *r)
> +{
> +	/*
> +	 * Check if CPU supports the Bandwidth Monitoring Event Configuration
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_BMEC))
> +		r->mon_configurable = true;
> +}

Could this rather use rdt_cpu_has() with the added support for disabling
the feature via kernel parameter?


> +
>  int rdt_get_mon_l3_config(struct rdt_resource *r)
>  {
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
> @@ -714,6 +724,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  	if (ret)
>  		return ret;
>  
> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> +		__rdt_get_mon_l3_config_amd(r);
> +
> +

Why is this vendor check needed? Is X86_FEATURE_BMEC not sufficient?

>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index fc5286067201..855483b297a8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static int rdt_mon_configurable_show(struct kernfs_open_file *of,
> +				     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	seq_printf(seq, "%d\n", r->mon_configurable);

Why is this file needed? It seems that the next patches also introduce
files in support of this new feature that will make the actual configuration
data accessible - those files are only created if this feature is supported.
Would those files not be sufficient for user space to learn about the feature
support?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-08-22 13:43 ` [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
@ 2022-08-24 21:15   ` Reinette Chatre
  2022-08-26 16:07     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-24 21:15 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:43 AM, Babu Moger wrote:
> Add two new sysfs files to read/write the event configuration if
> the feature Bandwidth Monitoring Event Configuration (BMEC) is
> supported. The file mbm_local_config is for the configuration
> of the event mbm_local_bytes and the file mbm_total_config is
> for the configuration of mbm_total_bytes.
> 
> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> 
> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |    3 +++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    2 ++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   32 ++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c049a274383c..fc725f5e9024 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -72,11 +72,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> + * @config:	current configuration
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	u32			evtid;
>  	char			*name;
> +	char			*config;
>  	struct list_head	list;
>  };
>  
> @@ -95,6 +97,7 @@ union mon_data_bits {
>  		unsigned int rid	: 10;
>  		unsigned int evtid	: 8;
>  		unsigned int domid	: 14;
> +		unsigned int mon_config : 32;
>  	} u;
>  };
>  

This does not seem to be used in this patch.

> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index b9de417dac1c..3f900241dbab 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -656,11 +656,13 @@ static struct mon_evt llc_occupancy_event = {
>  static struct mon_evt mbm_total_event = {
>  	.name		= "mbm_total_bytes",
>  	.evtid		= QOS_L3_MBM_TOTAL_EVENT_ID,
> +	.config 	= "mbm_total_config",
>  };
>  
>  static struct mon_evt mbm_local_event = {
>  	.name		= "mbm_local_bytes",
>  	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
> +	.config 	= "mbm_local_config",
>  };
>  
>  /*
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 855483b297a8..30d2182d4fda 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
>  	.seq_show		= rdtgroup_mondata_show,
>  };
>  
> +static const struct kernfs_ops kf_mondata_config_ops = {
> +	.atomic_write_len       = PAGE_SIZE,
> +};
> +
>  static bool is_cpu_list(struct kernfs_open_file *of)
>  {
>  	struct rftype *rft = of->kn->priv;
> @@ -2534,6 +2538,25 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
>  	}
>  }
>  
> +static int mon_config_addfile(struct kernfs_node *parent_kn, const char *name,
> +			      void *priv)
> +{
> +	struct kernfs_node *kn;
> +	int ret = 0;
> +
> +	kn = __kernfs_create_file(parent_kn, name, 0644,
> +			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
> +			&kf_mondata_config_ops, priv, NULL, NULL);
> +	if (IS_ERR(kn))
> +		return PTR_ERR(kn);
> +
> +	ret = rdtgroup_kn_set_ugid(kn);
> +	if (ret)
> +		kernfs_remove(kn);
> +
> +	return ret;
> +}
> +
>  static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>  				struct rdt_domain *d,
>  				struct rdt_resource *r, struct rdtgroup *prgrp)
> @@ -2568,6 +2591,15 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>  		if (ret)
>  			goto out_destroy;
>  
> +		/* Create the sysfs event configuration files */
> +		if (r->mon_configurable &&
> +		    (mevt->evtid == QOS_L3_MBM_TOTAL_EVENT_ID ||
> +		     mevt->evtid == QOS_L3_MBM_LOCAL_EVENT_ID)) {
> +			ret = mon_config_addfile(kn, mevt->config, priv.priv);
> +			if (ret)
> +				goto out_destroy;
> +		}
> +

This seems complex to have event features embedded in the code in this way. Could
the events not be configured during system enumeration? For example, instead
of hardcoding the config like above to always set:

  static struct mon_evt mbm_local_event = {
  	.name		= "mbm_local_bytes",
  	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
 +	.config 	= "mbm_local_config",


What if instead this information is dynamically set in rdt_get_mon_l3_config()? To
make things simpler struct mon_evt could get a new member "configurable" and the
events that actually support configuration will have this set only
if system has X86_FEATURE_BMEC (struct rdt_resource->configurable then
becomes unnecessary?). Being configurable thus becomes an event property, not
a resource property. The "config" member introduced here could then be "config_name".

I think doing so will also make this file creation simpler with a single 
mon_config_addfile() (possibly with more parameters) used to add both files to
avoid the code duplication introduced by mon_config_addfile() above.

What do you think?

>  		if (is_mbm_event(mevt->evtid))
>  			mon_event_read(&rr, r, d, prgrp, mevt->evtid, true);
>  	}
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-22 13:44 ` [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the " Babu Moger
  2022-08-22 13:47   ` Bagas Sanjaya
@ 2022-08-24 21:16   ` Reinette Chatre
  2022-08-26 16:49     ` Moger, Babu
  1 sibling, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-24 21:16 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:44 AM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the sysfs configuration file.
> 
> Following are the types of events supported.
> ====================================================================
> Bits    Description
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> 
> By default the mbm_total_bytes configuration is set to 0x7f to count
> all the types of events and mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> $cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> 0x7f
> 
> $cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> 0x15
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |   21 ++++++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   70 ++++++++++++++++++++++++++++++++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index fc725f5e9024..457666709386 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -15,6 +15,7 @@
>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
>  #define MSR_IA32_QM_CTR			0x0c8e
>  #define MSR_IA32_QM_EVTSEL		0x0c8d
> @@ -50,6 +51,26 @@
>   */
>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>  
> +/* Reads to Local DRAM Memory */
> +#define READS_TO_LOCAL_MEM		BIT(0)
> +
> +/* Reads to Remote DRAM Memory */
> +#define READS_TO_REMOTE_MEM		BIT(1)
> +
> +/* Non-Temporal Writes to Local Memory */
> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
> +
> +/* Non-Temporal Writes to Remote Memory */
> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
> +
> +/* Reads to Local Memory the system identifies as "Slow Memory" */
Seems unexpected character slipped into "identifies".

> +#define READS_TO_LOCAL_S_MEM		BIT(4)
> +
> +/* Reads to Remote Memory the system identifies as "Slow Memory" */

here also

> +#define READS_TO_REMOTE_S_MEM		BIT(5)
> +
> +/* Dirty Victims to All Types of Memory */
> +#define  DIRTY_VICTIS_TO_ALL_MEM	BIT(6)

Is this intended to be "DIRTY_VICTIMS_TO_ALL_MEM" ?

>  
>  struct rdt_fs_context {
>  	struct kernfs_fs_context	kfc;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 30d2182d4fda..e1847d49fa15 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -254,8 +254,78 @@ static const struct kernfs_ops kf_mondata_ops = {
>  	.seq_show		= rdtgroup_mondata_show,
>  };
>  
> +/*
> + * This is called via IPI to read the CQM/MBM counters
> + * in a domain.
> + */
> +void mon_event_config_read(void *info)
> +{
> +	union mon_data_bits *md = info;
> +	u32 evtid = md->u.evtid;
> +	u32 h, msr_index;
> +
> +	switch (evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		return; /* Not expected to come here */

Please no inline comments.

> +	}
> +
> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, h);
> +}
> +
> +void mondata_config_read(struct rdt_domain *d, union mon_data_bits *md)
> +{
> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, md, 1);
> +}
> +
> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
> +{
> +	struct kernfs_open_file *of = m->private;
> +	struct rdt_hw_resource *hw_res;
> +	u32 resid, evtid, domid;
> +	struct rdtgroup *rdtgrp;
> +	struct rdt_resource *r;
> +	union mon_data_bits md;
> +	struct rdt_domain *d;
> +	int ret = 0;
> +
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	md.priv = of->kn->priv;
> +	resid = md.u.rid;
> +	domid = md.u.domid;
> +	evtid = md.u.evtid;
> +
> +	hw_res = &rdt_resources_all[resid];
> +	r = &hw_res->r_resctrl;
> +
> +	d = rdt_find_domain(r, domid, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	mondata_config_read(d, &md);
> +
> +	seq_printf(m, "0x%x\n", md.u.mon_config);

Looking at this patch and the next, the sysfs files are introduced to read
from and write to the configuration register. From what I can tell the
data is never used internally (what did I miss?). Why is the value of the
configuration register stored? 

> +
> +out:
> +	rdtgroup_kn_unlock(of->kn);
> +	return ret;
> +}
> +
>  static const struct kernfs_ops kf_mondata_config_ops = {
>  	.atomic_write_len       = PAGE_SIZE,
> +	.seq_show               = rdtgroup_mondata_config_show,
>  };
>  
>  static bool is_cpu_list(struct kernfs_open_file *of)
> 
> 


Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write the event configuration
  2022-08-22 13:44 ` [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write " Babu Moger
@ 2022-08-24 21:16   ` Reinette Chatre
  2022-08-26 18:17     ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-24 21:16 UTC (permalink / raw)
  To: Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/22/2022 6:44 AM, Babu Moger wrote:
> Add the sysfs interface to write the event configuration for the
> MBM configurable events. The event configuration can be changed by
> writing to the sysfs file for that specific event.
> 
> Following are the types of events supported.
> ==================================================================
> Bits    Description
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> 
> By default the mbm_total_bytes configuration is set to 0x7f to count
> all the types of events and mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> For example:
> To change the mbm_total_bytes to count all the reads, run the command.
> $echo  0x33 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> 
> To change the mbm_local_bytes to count all the slow memory reads, run
> the command.
> $echo  0x30 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  109 ++++++++++++++++++++++++++++++++
>  1 file changed, 109 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e1847d49fa15..83c8780726ff 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -323,9 +323,118 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>  	return ret;
>  }
>  
> +/*
> + * This is called via IPI to read the CQM/MBM counters
> + * in a domain.
> + */
> +void mon_event_config_write(void *info)
> +{
> +	union mon_data_bits *md = info;
> +	u32 evtid = md->u.evtid;
> +	u32 msr_index;
> +
> +	switch (evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		return; /* Not expected to come here */

Please no inline comments.

> +	}
> +
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, 0);
> +}
> +
> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_hw_resource *hw_res;
> +	struct rdtgroup *rdtgrp;
> +	struct rdt_resource *r;
> +	unsigned int mon_config;
> +	cpumask_var_t cpu_mask;
> +	union mon_data_bits md;
> +	struct rdt_domain *d;
> +	u32 resid, domid;
> +	int ret = 0, cpu;
> +
> +	ret = kstrtouint(buf, 0, &mon_config);
> +	if (ret)
> +		return ret;
> +
> +	rdt_last_cmd_clear();
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (mon_config > GENMASK(6, 0)) {

Could this "GENMASK()" be given a name and be moved closer to where
the bits are defined in internal.h? This will be easier to read and if any
new bits are added it would hopefully be noticed more easily to get
an update also.

> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	cpus_read_lock();
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		return -ENOENT;
> +		goto e_unlock;
> +	}
> +
> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
> +		ret = -ENOMEM;
> +		goto e_unlock;
> +	}
> +
> +
> +	md.priv = of->kn->priv;
> +	resid = md.u.rid;
> +	domid = md.u.domid;
> +
> +	hw_res = &rdt_resources_all[resid];
> +	r = &hw_res->r_resctrl;
> +	d = rdt_find_domain(r, domid, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		ret = -ENOENT;
> +		goto e_cpumask;
> +	}
> +
> +	md.u.mon_config = mon_config & 0xFF;

Same question as previous patch. I do not see this internal
value used in the code, is storing it necessary?

> +
> +	/* Pick all the CPUs in the domain instance */
> +	for_each_cpu(cpu, &d->cpu_mask)
> +		cpumask_set_cpu(cpu, cpu_mask);
> +
> +	cpu = get_cpu();
> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on this cpu if it's in cpu_mask */

Please always use caps for CPU.

> +	if (cpumask_test_cpu(cpu, cpu_mask))
> +		mon_event_config_write(&md);
> +
> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all other cpus in cpu_mask */
> +	smp_call_function_many(cpu_mask, mon_event_config_write, &md, 1);
> +	put_cpu();

I do not think we need to propagate this pattern more in the resctrl code.
How about on_each_cpu_mask()? 

> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
> +	 * RMID will be set on the next read to any BwEvent for every RMID.
> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);

Could you please check if this is sufficient? Note how "mon_event_read()"
is called with "first = true" right after the mon sysfs files are created to
clear the state _and_ initialize the state to set the "prev" MSRs correctly.


> +
> +e_cpumask:
> +	free_cpumask_var(cpu_mask);
> +
> +e_unlock:
> +	rdtgroup_kn_unlock(of->kn);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  static const struct kernfs_ops kf_mondata_config_ops = {
>  	.atomic_write_len       = PAGE_SIZE,
>  	.seq_show               = rdtgroup_mondata_config_show,
> +	.write                  = rdtgroup_mondata_config_write,
>  };
>  
>  static bool is_cpu_list(struct kernfs_open_file *of)
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-24 21:15   ` Reinette Chatre
@ 2022-08-25 15:11     ` Moger, Babu
  2022-08-25 15:56       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-25 15:11 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Reinette,

On 8/24/22 16:15, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:43 AM, Babu Moger wrote:
>> Newer AMD processors support the new feature Bandwidth Monitoring Event
>> Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
>> are configurable when this feature is present.
>>
>> Set mon_configurable if the feature is available.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |   14 ++++++++++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   17 +++++++++++++++++
>>  include/linux/resctrl.h                |    1 +
>>  3 files changed, 32 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index eaf25a234ff5..b9de417dac1c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -682,6 +682,16 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>>  		list_add_tail(&mbm_local_event.list, &r->evt_list);
>>  }
>>  
>> +
>> +void __rdt_get_mon_l3_config_amd(struct rdt_resource *r)
>> +{
>> +	/*
>> +	 * Check if CPU supports the Bandwidth Monitoring Event Configuration
>> +	 */
>> +	if (boot_cpu_has(X86_FEATURE_BMEC))
>> +		r->mon_configurable = true;
>> +}
> Could this rather use rdt_cpu_has() with the added support for disabling
> the feature via kernel parameter?

Yes, That is possible. We could do that.


>
>
>> +
>>  int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  {
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>> @@ -714,6 +724,10 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>> +		__rdt_get_mon_l3_config_amd(r);
>> +
>> +
> Why is this vendor check needed? Is X86_FEATURE_BMEC not sufficient?
Yes. I can remove the vendor check.
>
>>  	l3_mon_evt_init(r);
>>  
>>  	r->mon_capable = true;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index fc5286067201..855483b297a8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +static int rdt_mon_configurable_show(struct kernfs_open_file *of,
>> +				     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	seq_printf(seq, "%d\n", r->mon_configurable);
> Why is this file needed? It seems that the next patches also introduce
> files in support of this new feature that will make the actual configuration
> data accessible - those files are only created if this feature is supported.
> Would those files not be sufficient for user space to learn about the feature
> support?

This is part of /sys/fs/resctrl/info/L3_MON# directory which basically has
the information about all the monitoring features. As this is one of the
mon features, I have added it there. Also, this can be used from the
utility(like pqos or rdtset) to check if the configuring the monitor is
supported without looking at individual files. It makes things easier.

Thanks

Babu


> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-25 15:11     ` Moger, Babu
@ 2022-08-25 15:56       ` Reinette Chatre
  2022-08-25 20:44         ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-25 15:56 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme, Luck, Tony

Hi Babu,

On 8/25/2022 8:11 AM, Moger, Babu wrote:
> On 8/24/22 16:15, Reinette Chatre wrote:
>> On 8/22/2022 6:43 AM, Babu Moger wrote:
>>> Newer AMD processors support the new feature Bandwidth Monitoring Event
>>> Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
>>> are configurable when this feature is present.
>>>
>>> Set mon_configurable if the feature is available.
>>>

...

>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index fc5286067201..855483b297a8 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>>>  	return 0;
>>>  }
>>>  
>>> +static int rdt_mon_configurable_show(struct kernfs_open_file *of,
>>> +				     struct seq_file *seq, void *v)
>>> +{
>>> +	struct rdt_resource *r = of->kn->parent->priv;
>>> +
>>> +	seq_printf(seq, "%d\n", r->mon_configurable);
>> Why is this file needed? It seems that the next patches also introduce
>> files in support of this new feature that will make the actual configuration
>> data accessible - those files are only created if this feature is supported.
>> Would those files not be sufficient for user space to learn about the feature
>> support?
> 
> This is part of /sys/fs/resctrl/info/L3_MON# directory which basically has
> the information about all the monitoring features. As this is one of the
> mon features, I have added it there. Also, this can be used from the
> utility(like pqos or rdtset) to check if the configuring the monitor is
> supported without looking at individual files. It makes things easier.

I understand the motivation. My concern is that this is a resource wide
file that will display a binary value that, if true, currently means two
events are configurable. We need to consider how things can change in the
future. We should consider that this is only the beginning of monitoring
configuration and need this interface to be ready for future changes. For
example, what if all of the monitoring events are configurable? Let's say,
for example, in future AMD hardware the "llc_occupancy" event also becomes
configurable, how should info/L3_MON/configurable be interpreted? On some
machines it would thus mean that mbm_total_bytes and mbm_local_bytes are
configurable and on some machines it would mean that mbm_total_bytes,
mbm_local_bytes, and llc_occupancy are configurable. This does not make
it easy for utilities.

So, in this series the info directory on a system that supports BMEC
would look like:

info/L3_MON/mon_features:llc_occupancy
info/L3_MON/mon_features:mbm_total_bytes
info/L3_MON/mon_features:mbm_local_bytes
info/L3_MON/configurable:1

Would information like below not be more specific?
info/L3_MON/mon_features:llc_occupancy
info/L3_MON/mon_features:mbm_total_bytes
info/L3_MON/mon_features:mbm_local_bytes
info/L3_MON/mon_features:mbm_total_config
info/L3_MON/mon_features:mbm_local_config

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-25 15:56       ` Reinette Chatre
@ 2022-08-25 20:44         ` Moger, Babu
  2022-08-25 21:24           ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-25 20:44 UTC (permalink / raw)
  To: Reinette Chatre, babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme, Luck, Tony


On 8/25/2022 10:56 AM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/25/2022 8:11 AM, Moger, Babu wrote:
>> On 8/24/22 16:15, Reinette Chatre wrote:
>>> On 8/22/2022 6:43 AM, Babu Moger wrote:
>>>> Newer AMD processors support the new feature Bandwidth Monitoring Event
>>>> Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
>>>> are configurable when this feature is present.
>>>>
>>>> Set mon_configurable if the feature is available.
>>>>
> ...
>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index fc5286067201..855483b297a8 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>>>>   	return 0;
>>>>   }
>>>>   
>>>> +static int rdt_mon_configurable_show(struct kernfs_open_file *of,
>>>> +				     struct seq_file *seq, void *v)
>>>> +{
>>>> +	struct rdt_resource *r = of->kn->parent->priv;
>>>> +
>>>> +	seq_printf(seq, "%d\n", r->mon_configurable);
>>> Why is this file needed? It seems that the next patches also introduce
>>> files in support of this new feature that will make the actual configuration
>>> data accessible - those files are only created if this feature is supported.
>>> Would those files not be sufficient for user space to learn about the feature
>>> support?
>> This is part of /sys/fs/resctrl/info/L3_MON# directory which basically has
>> the information about all the monitoring features. As this is one of the
>> mon features, I have added it there. Also, this can be used from the
>> utility(like pqos or rdtset) to check if the configuring the monitor is
>> supported without looking at individual files. It makes things easier.
> I understand the motivation. My concern is that this is a resource wide
> file that will display a binary value that, if true, currently means two
> events are configurable. We need to consider how things can change in the
> future. We should consider that this is only the beginning of monitoring
> configuration and need this interface to be ready for future changes. For
> example, what if all of the monitoring events are configurable? Let's say,
> for example, in future AMD hardware the "llc_occupancy" event also becomes
> configurable, how should info/L3_MON/configurable be interpreted? On some
> machines it would thus mean that mbm_total_bytes and mbm_local_bytes are
> configurable and on some machines it would mean that mbm_total_bytes,
> mbm_local_bytes, and llc_occupancy are configurable. This does not make
> it easy for utilities.
>
> So, in this series the info directory on a system that supports BMEC
> would look like:
>
> info/L3_MON/mon_features:llc_occupancy
> info/L3_MON/mon_features:mbm_total_bytes
> info/L3_MON/mon_features:mbm_local_bytes
> info/L3_MON/configurable:1
>
> Would information like below not be more specific?
> info/L3_MON/mon_features:llc_occupancy
> info/L3_MON/mon_features:mbm_total_bytes
> info/L3_MON/mon_features:mbm_local_bytes
> info/L3_MON/mon_features:mbm_total_config
> info/L3_MON/mon_features:mbm_local_config

Hi Reinette,

  Yes. That is more specific.

So, basically your idea is to print the information from mon_evt 
structure if mon_configarable is set in the resource structure.

Some thing like ..

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 83c8780726ff..46c6bf3f08e3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1194,8 +1194,11 @@ static int rdt_mon_features_show(struct 
kernfs_open_file *of,
         struct rdt_resource *r = of->kn->parent->priv;
         struct mon_evt *mevt;

-       list_for_each_entry(mevt, &r->evt_list, list)
+       list_for_each_entry(mevt, &r->evt_list, list) {
                 seq_printf(seq, "%s\n", mevt->name);
+               if (r->mon_configurable)
+                       seq_printf(seq, "%s\n", mevt->config);
+       }

         return 0;
  }

Is that the idea?

Thanks
Babu


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-25 20:44         ` Moger, Babu
@ 2022-08-25 21:24           ` Reinette Chatre
  2022-08-26 14:30             ` Babu Moger
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-25 21:24 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme, Luck, Tony

Hi Babu,

On 8/25/2022 1:44 PM, Moger, Babu wrote:
> 
> On 8/25/2022 10:56 AM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/25/2022 8:11 AM, Moger, Babu wrote:
>>> On 8/24/22 16:15, Reinette Chatre wrote:
>>>> On 8/22/2022 6:43 AM, Babu Moger wrote:
>>>>> Newer AMD processors support the new feature Bandwidth Monitoring Event
>>>>> Configuration (BMEC). The events mbm_total_bytes and mbm_local_bytes
>>>>> are configurable when this feature is present.
>>>>>
>>>>> Set mon_configurable if the feature is available.
>>>>>
>> ...
>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> index fc5286067201..855483b297a8 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>>> @@ -995,6 +995,16 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
>>>>>       return 0;
>>>>>   }
>>>>>   +static int rdt_mon_configurable_show(struct kernfs_open_file *of,
>>>>> +                     struct seq_file *seq, void *v)
>>>>> +{
>>>>> +    struct rdt_resource *r = of->kn->parent->priv;
>>>>> +
>>>>> +    seq_printf(seq, "%d\n", r->mon_configurable);
>>>> Why is this file needed? It seems that the next patches also introduce
>>>> files in support of this new feature that will make the actual configuration
>>>> data accessible - those files are only created if this feature is supported.
>>>> Would those files not be sufficient for user space to learn about the feature
>>>> support?
>>> This is part of /sys/fs/resctrl/info/L3_MON# directory which basically has
>>> the information about all the monitoring features. As this is one of the
>>> mon features, I have added it there. Also, this can be used from the
>>> utility(like pqos or rdtset) to check if the configuring the monitor is
>>> supported without looking at individual files. It makes things easier.
>> I understand the motivation. My concern is that this is a resource wide
>> file that will display a binary value that, if true, currently means two
>> events are configurable. We need to consider how things can change in the
>> future. We should consider that this is only the beginning of monitoring
>> configuration and need this interface to be ready for future changes. For
>> example, what if all of the monitoring events are configurable? Let's say,
>> for example, in future AMD hardware the "llc_occupancy" event also becomes
>> configurable, how should info/L3_MON/configurable be interpreted? On some
>> machines it would thus mean that mbm_total_bytes and mbm_local_bytes are
>> configurable and on some machines it would mean that mbm_total_bytes,
>> mbm_local_bytes, and llc_occupancy are configurable. This does not make
>> it easy for utilities.
>>
>> So, in this series the info directory on a system that supports BMEC
>> would look like:
>>
>> info/L3_MON/mon_features:llc_occupancy
>> info/L3_MON/mon_features:mbm_total_bytes
>> info/L3_MON/mon_features:mbm_local_bytes
>> info/L3_MON/configurable:1
>>
>> Would information like below not be more specific?
>> info/L3_MON/mon_features:llc_occupancy
>> info/L3_MON/mon_features:mbm_total_bytes
>> info/L3_MON/mon_features:mbm_local_bytes
>> info/L3_MON/mon_features:mbm_total_config
>> info/L3_MON/mon_features:mbm_local_config
> 
> Hi Reinette,
> 
>  Yes. That is more specific.
> 
> So, basically your idea is to print the information from mon_evt structure if mon_configarable is set in the resource structure.
> 
> Some thing like ..
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 83c8780726ff..46c6bf3f08e3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1194,8 +1194,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>         struct rdt_resource *r = of->kn->parent->priv;
>         struct mon_evt *mevt;
> 
> -       list_for_each_entry(mevt, &r->evt_list, list)
> +       list_for_each_entry(mevt, &r->evt_list, list) {
>                 seq_printf(seq, "%s\n", mevt->name);
> +               if (r->mon_configurable)
> +                       seq_printf(seq, "%s\n", mevt->config);
> +       }
> 
>         return 0;
>  }
> 
> Is that the idea?


I do not see why struct rdt_resource->configurable is needed. Again, this
is a resource wide property with an implicit meaning related to only two
event counters. Again, what if AMD later makes the llc_occupancy event counter
configurable? How can resctrl know, using "r->mon_configurable" whether
it should print mevt->config? 

How about:
		if (mevt->config)
			seq_printf(seq, "%s\n", mevt->config);

As I mentioned in [1], mevt->config can be set in rdt_get_mon_l3_config()
based on a check on the BMEC feature instead of hardcoded as it is now.
Or, if the string manipulation is of concern the hardcoding of mevt->config
(perhaps then mevt->config_name) could remain and a new mevt->configurable
could be set from rdt_get_mon_l3_config() and then the above could be:

		if (mevt->configurable)
			seq_printf(seq, "%s\n", mevt->config_name);

Reinette

[1] https://lore.kernel.org/lkml/c5777707-746e-edab-2ce2-3405ff55be56@intel.com/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-23 22:47   ` Reinette Chatre
@ 2022-08-25 22:42     ` Moger, Babu
  2022-08-26 16:17       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-25 22:42 UTC (permalink / raw)
  To: Reinette Chatre, Babu Moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/23/2022 5:47 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:42 AM, Babu Moger wrote:
>> Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
>> enforcement policies can be applied to external slow memory connected
>> to the host. QOS enforcement is accomplished by assigning a Class Of
>> Service (COS) to a processor and specifying allocations or limits for
>> that COS for each resource to be allocated.
>>
>> This feature is identified by the CPUID Function 8000_0020_EBX_x0.
>>
>> CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
>> Bits    Field Name      Description
>> 2       L3SBE           L3 external slow memory bandwidth enforcement
>>
>> Feature description is available in the specification, "AMD64 Technology Platform Quality
>> of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-quality-service-extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C4385e95126b24de58aec08da85597c88%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637968916632283680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=dKSxIinxZGybtACbs3%2FVZr4zbeAvXYc%2FezVivq3xjx0%3D&amp;reserved=0
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C4385e95126b24de58aec08da85597c88%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637968916632283680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=f7MJmrwkBGxq8BuWjNY6Ze9NdzJc6NOkXxNjUZk5c4U%3D&amp;reserved=0
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
> resctrl currently supports "memory bandwidth allocation" and this series adds
> "slow memory bandwidth allocation". Could you please provide more detail about
> what the difference is between "MBA" and "SMBA"? It is clear that the implementation
In this case the slow memory means memory attached to CXL device.
> treats them as different resources, but both resources are associated with L3 cache
> domains and (from what I understand) throttling always occurs at the CPU. Can both
> types of memory resources thus be seen as downstream from L3 cache? How can
Yes. that is correct. They are seen as downstream from L3.
> a user know what memory is considered when configuring MBA and what memory is
> considered when configuring SMBA? Additionally, I do find the term "slow" to be

This memory completely transparent to OS with little bit higher latency 
that regular main memory.

Yes. I know slow word is bit vague. I am not an expert of CXL. But i see 
that word slow is being used to refer the CXL memory to differentiate it 
from regular memory.

> vague as a way to distinguish between different memory types. What is the
> definition of "slow"? Would all "slow" memory on the system support SMBA?

Yes. All the slow memory in the system can support SMBA.

Thanks

Babu

>
> Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration
  2022-08-25 21:24           ` Reinette Chatre
@ 2022-08-26 14:30             ` Babu Moger
  0 siblings, 0 replies; 44+ messages in thread
From: Babu Moger @ 2022-08-26 14:30 UTC (permalink / raw)
  To: reinette.chatre
  Cc: babu.moger, bagasdotme, bp, corbet, dave.hansen, eranian,
	fenghua.yu, hpa, linux-doc, linux-kernel, mingo, tglx, tony.luck,
	x86

Hi Reinette,
   Some reason this thread did not land in my mailbox. Replying using git sendmail to the thread 

Yes. This should work. I will try that. Thanks

Babu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-08-22 13:43 ` [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
  2022-08-24 17:39   ` Reinette Chatre
@ 2022-08-26 14:59   ` Moger, Babu
  1 sibling, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 14:59 UTC (permalink / raw)
  To: fenghua.yu, reinette.chatre, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Reinette,

On 8/22/22 08:43, Babu Moger wrote:
> Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS
> enforcement policies on the external slow memory.
>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
>  2 files changed, 13 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index a5c51a14fbce..6c38427b71a2 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.fflags			= RFTYPE_RES_MB,
>  		},
>  	},
> +	[RDT_RESOURCE_SMBA] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_SMBA,
> +			.name			= "SB",


For some reason some of the messages are not landing in the mailbox.

Yes.  I will change it to "SMBA" instead of just "SB".

Thanks

Babu


> +			.cache_level		= 3,
> +			.domains		= domain_init(RDT_RESOURCE_SMBA),
> +			.parse_ctrlval		= parse_bw,
> +			.format_str		= "%d=%*u",
> +			.fflags			= RFTYPE_RES_MB,
> +		},
> +	},
>  };
>  
>  /*
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 1d647188a43b..24a1dfeb6cb2 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -418,6 +418,7 @@ enum resctrl_res_level {
>  	RDT_RESOURCE_L3,
>  	RDT_RESOURCE_L2,
>  	RDT_RESOURCE_MBA,
> +	RDT_RESOURCE_SMBA,
>  
>  	/* Must be the last */
>  	RDT_NUM_RESOURCES,
>
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-08-24 21:15   ` Reinette Chatre
@ 2022-08-26 16:07     ` Moger, Babu
  2022-08-26 16:35       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 16:07 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Reinette,

On 8/24/22 16:15, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:43 AM, Babu Moger wrote:
>> Add two new sysfs files to read/write the event configuration if
>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
>> supported. The file mbm_local_config is for the configuration
>> of the event mbm_local_bytes and the file mbm_total_config is
>> for the configuration of mbm_total_bytes.
>>
>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>
>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |    3 +++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |    2 ++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   32 ++++++++++++++++++++++++++++++++
>>  3 files changed, 37 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index c049a274383c..fc725f5e9024 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -72,11 +72,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>>   * struct mon_evt - Entry in the event list of a resource
>>   * @evtid:		event id
>>   * @name:		name of the event
>> + * @config:	current configuration
>>   * @list:		entry in &rdt_resource->evt_list
>>   */
>>  struct mon_evt {
>>  	u32			evtid;
>>  	char			*name;
>> +	char			*config;
>>  	struct list_head	list;
>>  };
>>  
>> @@ -95,6 +97,7 @@ union mon_data_bits {
>>  		unsigned int rid	: 10;
>>  		unsigned int evtid	: 8;
>>  		unsigned int domid	: 14;
>> +		unsigned int mon_config : 32;
>>  	} u;
>>  };
>>  
> This does not seem to be used in this patch.


I will move it next patch if required.

>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index b9de417dac1c..3f900241dbab 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -656,11 +656,13 @@ static struct mon_evt llc_occupancy_event = {
>>  static struct mon_evt mbm_total_event = {
>>  	.name		= "mbm_total_bytes",
>>  	.evtid		= QOS_L3_MBM_TOTAL_EVENT_ID,
>> +	.config 	= "mbm_total_config",
>>  };
>>  
>>  static struct mon_evt mbm_local_event = {
>>  	.name		= "mbm_local_bytes",
>>  	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
>> +	.config 	= "mbm_local_config",
>>  };
>>  
>>  /*
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 855483b297a8..30d2182d4fda 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
>>  	.seq_show		= rdtgroup_mondata_show,
>>  };
>>  
>> +static const struct kernfs_ops kf_mondata_config_ops = {
>> +	.atomic_write_len       = PAGE_SIZE,
>> +};
>> +
>>  static bool is_cpu_list(struct kernfs_open_file *of)
>>  {
>>  	struct rftype *rft = of->kn->priv;
>> @@ -2534,6 +2538,25 @@ void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r, unsigned int dom_id)
>>  	}
>>  }
>>  
>> +static int mon_config_addfile(struct kernfs_node *parent_kn, const char *name,
>> +			      void *priv)
>> +{
>> +	struct kernfs_node *kn;
>> +	int ret = 0;
>> +
>> +	kn = __kernfs_create_file(parent_kn, name, 0644,
>> +			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
>> +			&kf_mondata_config_ops, priv, NULL, NULL);
>> +	if (IS_ERR(kn))
>> +		return PTR_ERR(kn);
>> +
>> +	ret = rdtgroup_kn_set_ugid(kn);
>> +	if (ret)
>> +		kernfs_remove(kn);
>> +
>> +	return ret;
>> +}
>> +
>>  static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>  				struct rdt_domain *d,
>>  				struct rdt_resource *r, struct rdtgroup *prgrp)
>> @@ -2568,6 +2591,15 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>  		if (ret)
>>  			goto out_destroy;
>>  
>> +		/* Create the sysfs event configuration files */
>> +		if (r->mon_configurable &&
>> +		    (mevt->evtid == QOS_L3_MBM_TOTAL_EVENT_ID ||
>> +		     mevt->evtid == QOS_L3_MBM_LOCAL_EVENT_ID)) {
>> +			ret = mon_config_addfile(kn, mevt->config, priv.priv);
>> +			if (ret)
>> +				goto out_destroy;
>> +		}
>> +
> This seems complex to have event features embedded in the code in this way. Could
> the events not be configured during system enumeration? For example, instead
> of hardcoding the config like above to always set:
>
>   static struct mon_evt mbm_local_event = {
>   	.name		= "mbm_local_bytes",
>   	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
>  +	.config 	= "mbm_local_config",
>
>
> What if instead this information is dynamically set in rdt_get_mon_l3_config()? To
> make things simpler struct mon_evt could get a new member "configurable" and the
> events that actually support configuration will have this set only
> if system has X86_FEATURE_BMEC (struct rdt_resource->configurable then
> becomes unnecessary?). Being configurable thus becomes an event property, not
> a resource property. The "config" member introduced here could then be "config_name".
>
> I think doing so will also make this file creation simpler with a single 
> mon_config_addfile() (possibly with more parameters) used to add both files to
> avoid the code duplication introduced by mon_config_addfile() above.
>
> What do you think?

Yes. We could do that. Something like this.

struct mon_evt {
        u32                     evtid;
        char                    *name;

+      bool                     configurable;

         char                    *config;
        struct list_head        list;
};

Set the configurable if  the  system has X86_FEATURE_BMEC feature in
rdt_get_mon_l3_config.

Create both files  mbm_local_bytes and  mbm_local_config in mon_addfile.

Change the mon_addfile to pass mon_evt structure, so it have all
information to create both the files.

Then we can remove  rdt_resource->configurable. 

Does that make sense?

Thanks

Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-25 22:42     ` Moger, Babu
@ 2022-08-26 16:17       ` Reinette Chatre
  2022-08-29 23:25         ` Babu Moger
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-26 16:17 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme, Luck, Tony

Hi Babu,

On 8/25/2022 3:42 PM, Moger, Babu wrote:
> 
> On 8/23/2022 5:47 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 8/22/2022 6:42 AM, Babu Moger wrote:
>>> Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
>>> enforcement policies can be applied to external slow memory connected
>>> to the host. QOS enforcement is accomplished by assigning a Class Of
>>> Service (COS) to a processor and specifying allocations or limits for
>>> that COS for each resource to be allocated.
>>>
>>> This feature is identified by the CPUID Function 8000_0020_EBX_x0.
>>>
>>> CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
>>> Bits    Field Name      Description
>>> 2       L3SBE           L3 external slow memory bandwidth enforcement
>>>
>>> Feature description is available in the specification, "AMD64 Technology Platform Quality
>>> of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".
>>>

(snip modified links)

>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>>> ---
>> resctrl currently supports "memory bandwidth allocation" and this series adds
>> "slow memory bandwidth allocation". Could you please provide more detail about
>> what the difference is between "MBA" and "SMBA"? It is clear that the implementation

> In this case the slow memory means memory attached to CXL device.

When you say "in this case", is there another case?

Should "Slow Memory Bandwidth Allocation" thus be considered to be "CXL.mem
Memory Bandwidth Allocation"? Why not call it "CXL(.mem?) Memory Bandwith
Allocation"?

I am not familiar with CXL so please correct me where I am
wrong. From what I understand CXL.mem is a protocol and devices that implement
it can have different memory types ... some faster than others. So, even if
SMBA supports "CXL.mem" devices, could a system have multiple CXL.mem devices,
some faster than others? Would all be configured the same with SMBA (they
would all be classified as "slow" and throttled the same)?

>> treats them as different resources, but both resources are associated with L3 cache
>> domains and (from what I understand) throttling always occurs at the CPU. Can both
>> types of memory resources thus be seen as downstream from L3 cache? How can

> Yes. that is correct. They are seen as downstream from L3.


>> a user know what memory is considered when configuring MBA and what memory is
>> considered when configuring SMBA? Additionally, I do find the term "slow" to be
> 
> This memory completely transparent to OS with little bit higher latency that regular main memory.

I do not think these devices are invisible to the OS though (after
reading Documentation/driver-api/cxl/memory-devices.rst and
Documentation/ABI/testing/sysfs-class-cxl).

Is there not a way to provide some more clarity to users on what
would be throttled? 

> 
> Yes. I know slow word is bit vague. I am not an expert of CXL. But i see that word slow is being used to refer the CXL memory to differentiate it from regular memory.

What is very vague to me is how a user is intended to use this feature.
Would the "SMBA" resource be available only when CXL.mem devices are present
on the system? Since this is a CPU feature it is unclear to me whether
presence of CXL.mem devices would be known at the time "SMBA" is enumerated.
Could the "SMBA" resource thus exist without memory to throttle?

>> vague as a way to distinguish between different memory types. What is the
>> definition of "slow"? Would all "slow" memory on the system support SMBA?
> 
> Yes. All the slow memory in the system can support SMBA.
> 

How does a user know which memory on the system is "slow memory"?

It remains unclear to me how a user is intended to use this feature.

How will a user know which devices/memory (if any) are being
throttled by "SMBA"?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-08-26 16:07     ` Moger, Babu
@ 2022-08-26 16:35       ` Reinette Chatre
  2022-08-26 16:57         ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-26 16:35 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/26/2022 9:07 AM, Moger, Babu wrote:
> On 8/24/22 16:15, Reinette Chatre wrote:
>> On 8/22/2022 6:43 AM, Babu Moger wrote:

...

>>>  static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>>  				struct rdt_domain *d,
>>>  				struct rdt_resource *r, struct rdtgroup *prgrp)
>>> @@ -2568,6 +2591,15 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>>  		if (ret)
>>>  			goto out_destroy;
>>>  
>>> +		/* Create the sysfs event configuration files */
>>> +		if (r->mon_configurable &&
>>> +		    (mevt->evtid == QOS_L3_MBM_TOTAL_EVENT_ID ||
>>> +		     mevt->evtid == QOS_L3_MBM_LOCAL_EVENT_ID)) {
>>> +			ret = mon_config_addfile(kn, mevt->config, priv.priv);
>>> +			if (ret)
>>> +				goto out_destroy;
>>> +		}
>>> +
>> This seems complex to have event features embedded in the code in this way. Could
>> the events not be configured during system enumeration? For example, instead
>> of hardcoding the config like above to always set:
>>
>>   static struct mon_evt mbm_local_event = {
>>   	.name		= "mbm_local_bytes",
>>   	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
>>  +	.config 	= "mbm_local_config",
>>
>>
>> What if instead this information is dynamically set in rdt_get_mon_l3_config()? To
>> make things simpler struct mon_evt could get a new member "configurable" and the
>> events that actually support configuration will have this set only
>> if system has X86_FEATURE_BMEC (struct rdt_resource->configurable then
>> becomes unnecessary?). Being configurable thus becomes an event property, not
>> a resource property. The "config" member introduced here could then be "config_name".
>>
>> I think doing so will also make this file creation simpler with a single 
>> mon_config_addfile() (possibly with more parameters) used to add both files to
>> avoid the code duplication introduced by mon_config_addfile() above.
>>
>> What do you think?
> 
> Yes. We could do that. Something like this.
> 
> struct mon_evt {
>         u32                     evtid;
>         char                    *name;
> 
> +      bool                     configurable;
> 
>          char                    *config;
>         struct list_head        list;
> };
> 
> Set the configurable if  the  system has X86_FEATURE_BMEC feature in
> rdt_get_mon_l3_config.

This would work (using bool in struct is something resctrl already do
in many places). I also think that "config" should rather be named to
"config_name" to make clear that it is not the actual configuration of
the event.
Remember to update struct mon_evt's kerneldoc (I just noticed it is
missing from this series).

> 
> Create both files  mbm_local_bytes and  mbm_local_config in mon_addfile.
> 
> Change the mon_addfile to pass mon_evt structure, so it have all
> information to create both the files.

Providing the structure to the function would make all the information
available but I am not sure that doing so would make it easy to eliminate the
duplicate code needed to create the other file. Giving more parameters
to mon_addfile() is another option but it should be more clear to
you as you write this code.

> 
> Then we can remove  rdt_resource->configurable. 
> 
> Does that make sense?
> 

Yes.

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-24 21:16   ` Reinette Chatre
@ 2022-08-26 16:49     ` Moger, Babu
  2022-08-26 17:34       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 16:49 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/24/22 16:16, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:44 AM, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the sysfs configuration file.
>>
>> Following are the types of events supported.
>> ====================================================================
>> Bits    Description
>> 6       Dirty Victims from the QOS domain to all types of memory
>> 5       Reads to slow memory in the non-local NUMA domain
>> 4       Reads to slow memory in the local NUMA domain
>> 3       Non-temporal writes to non-local NUMA domain
>> 2       Non-temporal writes to local NUMA domain
>> 1       Reads to memory in the non-local NUMA domain
>> 0       Reads to memory in the local NUMA domain
>>
>> By default the mbm_total_bytes configuration is set to 0x7f to count
>> all the types of events and mbm_local_bytes configuration is set to
>> 0x15 to count all the local memory events.
>>
>> $cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>> 0x7f
>>
>> $cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>> 0x15
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |   21 ++++++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   70 ++++++++++++++++++++++++++++++++
>>  2 files changed, 91 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index fc725f5e9024..457666709386 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -15,6 +15,7 @@
>>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  
>>  #define MSR_IA32_QM_CTR			0x0c8e
>>  #define MSR_IA32_QM_EVTSEL		0x0c8d
>> @@ -50,6 +51,26 @@
>>   */
>>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>>  
>> +/* Reads to Local DRAM Memory */
>> +#define READS_TO_LOCAL_MEM		BIT(0)
>> +
>> +/* Reads to Remote DRAM Memory */
>> +#define READS_TO_REMOTE_MEM		BIT(1)
>> +
>> +/* Non-Temporal Writes to Local Memory */
>> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
>> +
>> +/* Non-Temporal Writes to Remote Memory */
>> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
>> +
>> +/* Reads to Local Memory the system identifies as "Slow Memory" */
> Seems unexpected character slipped into "identifies".
Sure. Will fix it
>
>> +#define READS_TO_LOCAL_S_MEM		BIT(4)
>> +
>> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
> here also
sure. Will fix it.
>
>> +#define READS_TO_REMOTE_S_MEM		BIT(5)
>> +
>> +/* Dirty Victims to All Types of Memory */
>> +#define  DIRTY_VICTIS_TO_ALL_MEM	BIT(6)
> Is this intended to be "DIRTY_VICTIMS_TO_ALL_MEM" ?
Yes. that is what spec says.
>
>>  
>>  struct rdt_fs_context {
>>  	struct kernfs_fs_context	kfc;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 30d2182d4fda..e1847d49fa15 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -254,8 +254,78 @@ static const struct kernfs_ops kf_mondata_ops = {
>>  	.seq_show		= rdtgroup_mondata_show,
>>  };
>>  
>> +/*
>> + * This is called via IPI to read the CQM/MBM counters
>> + * in a domain.
>> + */
>> +void mon_event_config_read(void *info)
>> +{
>> +	union mon_data_bits *md = info;
>> +	u32 evtid = md->u.evtid;
>> +	u32 h, msr_index;
>> +
>> +	switch (evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		return; /* Not expected to come here */
> Please no inline comments.

Ok


>
>> +	}
>> +
>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, h);
>> +}
>> +
>> +void mondata_config_read(struct rdt_domain *d, union mon_data_bits *md)
>> +{
>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, md, 1);
>> +}
>> +
>> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>> +{
>> +	struct kernfs_open_file *of = m->private;
>> +	struct rdt_hw_resource *hw_res;
>> +	u32 resid, evtid, domid;
>> +	struct rdtgroup *rdtgrp;
>> +	struct rdt_resource *r;
>> +	union mon_data_bits md;
>> +	struct rdt_domain *d;
>> +	int ret = 0;
>> +
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	md.priv = of->kn->priv;
>> +	resid = md.u.rid;
>> +	domid = md.u.domid;
>> +	evtid = md.u.evtid;
>> +
>> +	hw_res = &rdt_resources_all[resid];
>> +	r = &hw_res->r_resctrl;
>> +
>> +	d = rdt_find_domain(r, domid, NULL);
>> +	if (IS_ERR_OR_NULL(d)) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	mondata_config_read(d, &md);
>> +
>> +	seq_printf(m, "0x%x\n", md.u.mon_config);
> Looking at this patch and the next, the sysfs files are introduced to read
> from and write to the configuration register. From what I can tell the
> data is never used internally (what did I miss?). Why is the value of the
> configuration register stored? 

You didn't miss anything. We don't need to store it.  But we need it as
part of mon_data_bits structure because, it need to be passed to
mon_event_config_read and rdtgroup_mondata_config_write.

In these functions we need evtid and also config value (mon_config).

Thanks

Babu

>
>> +
>> +out:
>> +	rdtgroup_kn_unlock(of->kn);
>> +	return ret;
>> +}
>> +
>>  static const struct kernfs_ops kf_mondata_config_ops = {
>>  	.atomic_write_len       = PAGE_SIZE,
>> +	.seq_show               = rdtgroup_mondata_config_show,
>>  };
>>  
>>  static bool is_cpu_list(struct kernfs_open_file *of)
>>
>>
>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-08-26 16:35       ` Reinette Chatre
@ 2022-08-26 16:57         ` Moger, Babu
  0 siblings, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 16:57 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/26/22 11:35, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/26/2022 9:07 AM, Moger, Babu wrote:
>> On 8/24/22 16:15, Reinette Chatre wrote:
>>> On 8/22/2022 6:43 AM, Babu Moger wrote:
> ...
>
>>>>  static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>>>  				struct rdt_domain *d,
>>>>  				struct rdt_resource *r, struct rdtgroup *prgrp)
>>>> @@ -2568,6 +2591,15 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>>>>  		if (ret)
>>>>  			goto out_destroy;
>>>>  
>>>> +		/* Create the sysfs event configuration files */
>>>> +		if (r->mon_configurable &&
>>>> +		    (mevt->evtid == QOS_L3_MBM_TOTAL_EVENT_ID ||
>>>> +		     mevt->evtid == QOS_L3_MBM_LOCAL_EVENT_ID)) {
>>>> +			ret = mon_config_addfile(kn, mevt->config, priv.priv);
>>>> +			if (ret)
>>>> +				goto out_destroy;
>>>> +		}
>>>> +
>>> This seems complex to have event features embedded in the code in this way. Could
>>> the events not be configured during system enumeration? For example, instead
>>> of hardcoding the config like above to always set:
>>>
>>>   static struct mon_evt mbm_local_event = {
>>>   	.name		= "mbm_local_bytes",
>>>   	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
>>>  +	.config 	= "mbm_local_config",
>>>
>>>
>>> What if instead this information is dynamically set in rdt_get_mon_l3_config()? To
>>> make things simpler struct mon_evt could get a new member "configurable" and the
>>> events that actually support configuration will have this set only
>>> if system has X86_FEATURE_BMEC (struct rdt_resource->configurable then
>>> becomes unnecessary?). Being configurable thus becomes an event property, not
>>> a resource property. The "config" member introduced here could then be "config_name".
>>>
>>> I think doing so will also make this file creation simpler with a single 
>>> mon_config_addfile() (possibly with more parameters) used to add both files to
>>> avoid the code duplication introduced by mon_config_addfile() above.
>>>
>>> What do you think?
>> Yes. We could do that. Something like this.
>>
>> struct mon_evt {
>>         u32                     evtid;
>>         char                    *name;
>>
>> +      bool                     configurable;
>>
>>          char                    *config;
>>         struct list_head        list;
>> };
>>
>> Set the configurable if  the  system has X86_FEATURE_BMEC feature in
>> rdt_get_mon_l3_config.
> This would work (using bool in struct is something resctrl already do
> in many places). I also think that "config" should rather be named to
> "config_name" to make clear that it is not the actual configuration of
> the event.
Sure.
> Remember to update struct mon_evt's kerneldoc (I just noticed it is
> missing from this series).

Oh,, Will do.

Thanks

Babu


>
>> Create both files  mbm_local_bytes and  mbm_local_config in mon_addfile.
>>
>> Change the mon_addfile to pass mon_evt structure, so it have all
>> information to create both the files.
> Providing the structure to the function would make all the information
> available but I am not sure that doing so would make it easy to eliminate the
> duplicate code needed to create the other file. Giving more parameters
> to mon_addfile() is another option but it should be more clear to
> you as you write this code.
>
>> Then we can remove  rdt_resource->configurable. 
>>
>> Does that make sense?
>>
> Yes.
>
> Reinette
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-26 16:49     ` Moger, Babu
@ 2022-08-26 17:34       ` Reinette Chatre
  2022-08-26 18:34         ` Moger, Babu
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2022-08-26 17:34 UTC (permalink / raw)
  To: babu.moger, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme

Hi Babu,

On 8/26/2022 9:49 AM, Moger, Babu wrote:
> On 8/24/22 16:16, Reinette Chatre wrote:
>> On 8/22/2022 6:44 AM, Babu Moger wrote:

...

>>> +#define READS_TO_REMOTE_S_MEM		BIT(5)
>>> +
>>> +/* Dirty Victims to All Types of Memory */
>>> +#define  DIRTY_VICTIS_TO_ALL_MEM	BIT(6)
>> Is this intended to be "DIRTY_VICTIMS_TO_ALL_MEM" ?
> Yes. that is what spec says.

You did notice the typo, right?

>>> +	}
>>> +
>>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, h);
>>> +}
>>> +
>>> +void mondata_config_read(struct rdt_domain *d, union mon_data_bits *md)
>>> +{
>>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, md, 1);
>>> +}
>>> +
>>> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>>> +{
>>> +	struct kernfs_open_file *of = m->private;
>>> +	struct rdt_hw_resource *hw_res;
>>> +	u32 resid, evtid, domid;
>>> +	struct rdtgroup *rdtgrp;
>>> +	struct rdt_resource *r;
>>> +	union mon_data_bits md;
>>> +	struct rdt_domain *d;
>>> +	int ret = 0;
>>> +
>>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>> +	if (!rdtgrp) {
>>> +		ret = -ENOENT;
>>> +		goto out;
>>> +	}
>>> +
>>> +	md.priv = of->kn->priv;
>>> +	resid = md.u.rid;
>>> +	domid = md.u.domid;
>>> +	evtid = md.u.evtid;
>>> +
>>> +	hw_res = &rdt_resources_all[resid];
>>> +	r = &hw_res->r_resctrl;
>>> +
>>> +	d = rdt_find_domain(r, domid, NULL);
>>> +	if (IS_ERR_OR_NULL(d)) {
>>> +		ret = -ENOENT;
>>> +		goto out;
>>> +	}
>>> +
>>> +	mondata_config_read(d, &md);
>>> +
>>> +	seq_printf(m, "0x%x\n", md.u.mon_config);
>> Looking at this patch and the next, the sysfs files are introduced to read
>> from and write to the configuration register. From what I can tell the
>> data is never used internally (what did I miss?). Why is the value of the
>> configuration register stored? 
> 
> You didn't miss anything. We don't need to store it.  But we need it as
> part of mon_data_bits structure because, it need to be passed to
> mon_event_config_read and rdtgroup_mondata_config_write.

These functions are introduced here ... so it is only needed because
the demand is created here also. This can be changed, no? 

> 
> In these functions we need evtid and also config value (mon_config).
> 

I see no need to pass evtid so deep - it can be checked right in
rdtgroup_mondata_config_show() and then an appropriate wrapper
can be called to just return the config value. Even if had to also
pass evtid through many layers you could create a temporary structure
to do so and not unnecessarily  increase the size of a long lived
system structure to satisfy this temporary need.

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write the event configuration
  2022-08-24 21:16   ` Reinette Chatre
@ 2022-08-26 18:17     ` Moger, Babu
  0 siblings, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 18:17 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/24/22 16:16, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/22/2022 6:44 AM, Babu Moger wrote:
>> Add the sysfs interface to write the event configuration for the
>> MBM configurable events. The event configuration can be changed by
>> writing to the sysfs file for that specific event.
>>
>> Following are the types of events supported.
>> ==================================================================
>> Bits    Description
>> 6       Dirty Victims from the QOS domain to all types of memory
>> 5       Reads to slow memory in the non-local NUMA domain
>> 4       Reads to slow memory in the local NUMA domain
>> 3       Non-temporal writes to non-local NUMA domain
>> 2       Non-temporal writes to local NUMA domain
>> 1       Reads to memory in the non-local NUMA domain
>> 0       Reads to memory in the local NUMA domain
>>
>> By default the mbm_total_bytes configuration is set to 0x7f to count
>> all the types of events and mbm_local_bytes configuration is set to
>> 0x15 to count all the local memory events.
>>
>> For example:
>> To change the mbm_total_bytes to count all the reads, run the command.
>> $echo  0x33 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>
>> To change the mbm_local_bytes to count all the slow memory reads, run
>> the command.
>> $echo  0x30 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Reviewed-by: Ingo Molnar <mingo@kernel.org>
>> ---
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  109 ++++++++++++++++++++++++++++++++
>>  1 file changed, 109 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index e1847d49fa15..83c8780726ff 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -323,9 +323,118 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * This is called via IPI to read the CQM/MBM counters
>> + * in a domain.
>> + */
>> +void mon_event_config_write(void *info)
>> +{
>> +	union mon_data_bits *md = info;
>> +	u32 evtid = md->u.evtid;
>> +	u32 msr_index;
>> +
>> +	switch (evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		return; /* Not expected to come here */
> Please no inline comments.
Ok sure.
>
>> +	}
>> +
>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, 0);
>> +}
>> +
>> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
>> +				       char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_hw_resource *hw_res;
>> +	struct rdtgroup *rdtgrp;
>> +	struct rdt_resource *r;
>> +	unsigned int mon_config;
>> +	cpumask_var_t cpu_mask;
>> +	union mon_data_bits md;
>> +	struct rdt_domain *d;
>> +	u32 resid, domid;
>> +	int ret = 0, cpu;
>> +
>> +	ret = kstrtouint(buf, 0, &mon_config);
>> +	if (ret)
>> +		return ret;
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	/* mon_config cannot be more than the supported set of events */
>> +	if (mon_config > GENMASK(6, 0)) {
> Could this "GENMASK()" be given a name and be moved closer to where
> the bits are defined in internal.h? This will be easier to read and if any
> new bits are added it would hopefully be noticed more easily to get
> an update also.
Ok. Sure..
>
>> +		rdt_last_cmd_puts("Invalid event configuration\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	cpus_read_lock();
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp) {
>> +		return -ENOENT;
>> +		goto e_unlock;
>> +	}
>> +
>> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
>> +		ret = -ENOMEM;
>> +		goto e_unlock;
>> +	}
>> +
>> +
>> +	md.priv = of->kn->priv;
>> +	resid = md.u.rid;
>> +	domid = md.u.domid;
>> +
>> +	hw_res = &rdt_resources_all[resid];
>> +	r = &hw_res->r_resctrl;
>> +	d = rdt_find_domain(r, domid, NULL);
>> +	if (IS_ERR_OR_NULL(d)) {
>> +		ret = -ENOENT;
>> +		goto e_cpumask;
>> +	}
>> +
>> +	md.u.mon_config = mon_config & 0xFF;
> Same question as previous patch. I do not see this internal
> value used in the code, is storing it necessary?

Storing is not necessary.  As I commented before we need mon_config  as
part of mon_data_bits structure because, it need to be passed to
mon_event_config_read and rdtgroup_mondata_config_write.

In these functions we need evtid and also config value (mon_config).

>
>> +
>> +	/* Pick all the CPUs in the domain instance */
>> +	for_each_cpu(cpu, &d->cpu_mask)
>> +		cpumask_set_cpu(cpu, cpu_mask);
>> +
>> +	cpu = get_cpu();
>> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on this cpu if it's in cpu_mask */
> Please always use caps for CPU.
Sure
>
>> +	if (cpumask_test_cpu(cpu, cpu_mask))
>> +		mon_event_config_write(&md);
>> +
>> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all other cpus in cpu_mask */
>> +	smp_call_function_many(cpu_mask, mon_event_config_write, &md, 1);
>> +	put_cpu();
> I do not think we need to propagate this pattern more in the resctrl code.
> How about on_each_cpu_mask()? 
Yea. We can do that. It probably better to replace all the current
smp_call_function_many in resctrl code.
>
>> +
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
>> +	 * RMID will be set on the next read to any BwEvent for every RMID.
>> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> Could you please check if this is sufficient? Note how "mon_event_read()"
> is called with "first = true" right after the mon sysfs files are created to
> clear the state _and_ initialize the state to set the "prev" MSRs correctly.

I see that it is set

m->prev_bw_msr = m->prev_msr = tval;

With above code we are setting

m->prev_bw_msr = m->prev_msr = 0;

I cannot think of any problems this could cause. Please let me know if you
see a problem.

Thanks

Babu Moger

>
>
>> +
>> +e_cpumask:
>> +	free_cpumask_var(cpu_mask);
>> +
>> +e_unlock:
>> +	rdtgroup_kn_unlock(of->kn);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  static const struct kernfs_ops kf_mondata_config_ops = {
>>  	.atomic_write_len       = PAGE_SIZE,
>>  	.seq_show               = rdtgroup_mondata_config_show,
>> +	.write                  = rdtgroup_mondata_config_write,
>>  };
>>  
>>  static bool is_cpu_list(struct kernfs_open_file *of)
>>
>>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-08-26 17:34       ` Reinette Chatre
@ 2022-08-26 18:34         ` Moger, Babu
  0 siblings, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-26 18:34 UTC (permalink / raw)
  To: Reinette Chatre, fenghua.yu, tglx, mingo, bp
  Cc: eranian, dave.hansen, x86, hpa, corbet, linux-kernel, linux-doc,
	bagasdotme


On 8/26/22 12:34, Reinette Chatre wrote:
> Hi Babu,
>
> On 8/26/2022 9:49 AM, Moger, Babu wrote:
>> On 8/24/22 16:16, Reinette Chatre wrote:
>>> On 8/22/2022 6:44 AM, Babu Moger wrote:
> ...
>
>>>> +#define READS_TO_REMOTE_S_MEM		BIT(5)
>>>> +
>>>> +/* Dirty Victims to All Types of Memory */
>>>> +#define  DIRTY_VICTIS_TO_ALL_MEM	BIT(6)
>>> Is this intended to be "DIRTY_VICTIMS_TO_ALL_MEM" ?
>> Yes. that is what spec says.
> You did notice the typo, right?

oh. yea. Thanks


>
>>>> +	}
>>>> +
>>>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, h);
>>>> +}
>>>> +
>>>> +void mondata_config_read(struct rdt_domain *d, union mon_data_bits *md)
>>>> +{
>>>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, md, 1);
>>>> +}
>>>> +
>>>> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>>>> +{
>>>> +	struct kernfs_open_file *of = m->private;
>>>> +	struct rdt_hw_resource *hw_res;
>>>> +	u32 resid, evtid, domid;
>>>> +	struct rdtgroup *rdtgrp;
>>>> +	struct rdt_resource *r;
>>>> +	union mon_data_bits md;
>>>> +	struct rdt_domain *d;
>>>> +	int ret = 0;
>>>> +
>>>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>>>> +	if (!rdtgrp) {
>>>> +		ret = -ENOENT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	md.priv = of->kn->priv;
>>>> +	resid = md.u.rid;
>>>> +	domid = md.u.domid;
>>>> +	evtid = md.u.evtid;
>>>> +
>>>> +	hw_res = &rdt_resources_all[resid];
>>>> +	r = &hw_res->r_resctrl;
>>>> +
>>>> +	d = rdt_find_domain(r, domid, NULL);
>>>> +	if (IS_ERR_OR_NULL(d)) {
>>>> +		ret = -ENOENT;
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	mondata_config_read(d, &md);
>>>> +
>>>> +	seq_printf(m, "0x%x\n", md.u.mon_config);
>>> Looking at this patch and the next, the sysfs files are introduced to read
>>> from and write to the configuration register. From what I can tell the
>>> data is never used internally (what did I miss?). Why is the value of the
>>> configuration register stored? 
>> You didn't miss anything. We don't need to store it.  But we need it as
>> part of mon_data_bits structure because, it need to be passed to
>> mon_event_config_read and rdtgroup_mondata_config_write.
> These functions are introduced here ... so it is only needed because
> the demand is created here also. This can be changed, no? 

I think we can change that.


>
>> In these functions we need evtid and also config value (mon_config).
>>
> I see no need to pass evtid so deep - it can be checked right in
> rdtgroup_mondata_config_show() and then an appropriate wrapper
> can be called to just return the config value. Even if had to also
> pass evtid through many layers you could create a temporary structure
> to do so and not unnecessarily  increase the size of a long lived
> system structure to satisfy this temporary need.

Yea. I think we can do that. Let me try that.

Thanks

Babu

>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-26 16:17       ` Reinette Chatre
@ 2022-08-29 23:25         ` Babu Moger
  2022-08-30 16:39           ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Babu Moger @ 2022-08-29 23:25 UTC (permalink / raw)
  To: reinette.chatre
  Cc: babu.moger, bagasdotme, bp, corbet, dave.hansen, eranian,
	fenghua.yu, hpa, linux-doc, linux-kernel, mingo, tglx, tony.luck,
	x86

Hi Reinette,
   Some reason this thread did not land in my mailbox. Replying using git sendmail to the thread 

>(snip modified links)

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

>When you say "in this case", is there another case?

There is no other interface. It is only CXL memory device.

>
>Should "Slow Memory Bandwidth Allocation" thus be considered to be "CXL.mem
>Memory Bandwidth Allocation"? Why not call it "CXL(.mem?) Memory Bandwith
>Allocation"?

Checked with our team here. The currently only supported slow memory is CXL.mem
device. As for the naming, the "slow" memory landscape is still evolving.
While CXL.mem is the only known type supported right now. The specs says
"Slow Memory Bandwidth Allocation". So, we would prefer to keep it that way.

>
>I am not familiar with CXL so please correct me where I am
>wrong. From what I understand CXL.mem is a protocol and devices that implement
>it can have different memory types ... some faster than others. So, even if
>SMBA supports "CXL.mem" devices, could a system have multiple CXL.mem devices,
>some faster than others? Would all be configured the same with SMBA (they
>would all be classified as "slow" and throttled the same)?

I have not tested the multiple devices with different memory speeds here.
But checking with team here says it should just work the same way. It appears
that the throttling logic groups all the slow sources together and applies
the limit on them as a whole.

>
>
>
>I do not think these devices are invisible to the OS though (after
>reading Documentation/driver-api/cxl/memory-devices.rst and
>Documentation/ABI/testing/sysfs-class-cxl).
>
>Is there not a way to provide some more clarity to users on what
>would be throttled? 
>
>
>How does a user know which memory on the system is "slow memory"?
>
>It remains unclear to me how a user is intended to use this feature.
>
>How will a user know which devices/memory (if any) are being
>throttled by "SMBA"?
>
This is a new technology. I am still learning. 

Currently, I have tested with CXL 1.1 type of device. CXL 1.1 uses a simple
topology structure of direct attachment between host (such as a CPU or GPU)
and CXL device.

#numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB
node 0 free: 59542 MB
node 1 cpus:             (CPU list is emply. Node 1 have CXL memory)
node 1 size: 16122 MB    (There is 16GB CXL memory) 
node 1 free: 15627 MB    
node distances:
node   0   1
  0:  10  50
  1:  50  10

The cpu-cxl node distance is greater than cpu-to-cpu distances.

We can also verify using lsmem
 
#lsmem --output RANGE,SIZE,STATE,NODE,ZONES,BLOCK
RANGE                                 SIZE  STATE NODE  ZONES BLOCK
0x0000000000000000-0x000000007fffffff   2G online    0   None     0
0x0000000080000000-0x00000000ffffffff   2G online    0  DMA32     1
0x0000000100000000-0x0000000fffffffff  60G online    0 Normal  2-31
0x0000001000000000-0x000000107fffffff   2G online    0   None    32
0x0000001080000000-0x000000147fffffff  16G online    1 Normal 33-40

Memory block size:         2G
Total online memory:      82G
Total offline memory:      0B


We can also verify using ACPI SRAT table and memory maps.

Thanks
Babu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-29 23:25         ` Babu Moger
@ 2022-08-30 16:39           ` Reinette Chatre
       [not found]             ` <3aa991a8-ac08-297d-8328-5380897f6dd9@amd.com>
  2022-08-30 22:28             ` Moger, Babu
  0 siblings, 2 replies; 44+ messages in thread
From: Reinette Chatre @ 2022-08-30 16:39 UTC (permalink / raw)
  To: Babu Moger
  Cc: bagasdotme, bp, corbet, dave.hansen, eranian, fenghua.yu, hpa,
	linux-doc, linux-kernel, mingo, tglx, tony.luck, x86

Hi Babu,

On 8/29/2022 4:25 PM, Babu Moger wrote:
> Hi Reinette,
>    Some reason this thread did not land in my mailbox. Replying using git sendmail to the thread 
> 
>> (snip modified links)
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
>> When you say "in this case", is there another case?
> 
> There is no other interface. It is only CXL memory device.
> 
>>
>> Should "Slow Memory Bandwidth Allocation" thus be considered to be "CXL.mem
>> Memory Bandwidth Allocation"? Why not call it "CXL(.mem?) Memory Bandwith
>> Allocation"?
> 
> Checked with our team here. The currently only supported slow memory is CXL.mem
> device. As for the naming, the "slow" memory landscape is still evolving.
> While CXL.mem is the only known type supported right now. The specs says
> "Slow Memory Bandwidth Allocation". So, we would prefer to keep it that way.

If you prefer to keep "Slow Memory Bandwidth Allocation" then please also
provide clear information to the user on what is managed via "Memory Bandwidth
Allocation" and what is managed via "Slow Memory Bandwidth Allocation". This
could be in the documentation.

>> I am not familiar with CXL so please correct me where I am
>> wrong. From what I understand CXL.mem is a protocol and devices that implement
>> it can have different memory types ... some faster than others. So, even if
>> SMBA supports "CXL.mem" devices, could a system have multiple CXL.mem devices,
>> some faster than others? Would all be configured the same with SMBA (they
>> would all be classified as "slow" and throttled the same)?
> 
> I have not tested the multiple devices with different memory speeds here.
> But checking with team here says it should just work the same way. It appears
> that the throttling logic groups all the slow sources together and applies
> the limit on them as a whole.

"the throttling logic groups all the slow sources together and applies
the limit on them as a whole.". This is valuable content for
the documentation about this feature. Could the changes to
Documentation/x86/resctrl.rst be updated to include a paragraph
describing SMBA and what is (or is not) considered a "slow resource"? 

>> I do not think these devices are invisible to the OS though (after
>> reading Documentation/driver-api/cxl/memory-devices.rst and
>> Documentation/ABI/testing/sysfs-class-cxl).
>>
>> Is there not a way to provide some more clarity to users on what
>> would be throttled? 
>>

I repeat the question you snipped from my email (please don't do that). Could
you please answer it?:
Would the "SMBA" resource be available only when CXL.mem devices are present
on the system? Since this is a CPU feature it is unclear to me whether
presence of CXL.mem devices would be known at the time "SMBA" is enumerated.
Could the "SMBA" resource thus exist without memory to throttle?

>> How does a user know which memory on the system is "slow memory"?
>>
>> It remains unclear to me how a user is intended to use this feature.
>>
>> How will a user know which devices/memory (if any) are being
>> throttled by "SMBA"?
>>
> This is a new technology. I am still learning. 
> 
> Currently, I have tested with CXL 1.1 type of device. CXL 1.1 uses a simple
> topology structure of direct attachment between host (such as a CPU or GPU)
> and CXL device.
> 
> #numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
> node 0 size: 63678 MB
> node 0 free: 59542 MB
> node 1 cpus:             (CPU list is emply. Node 1 have CXL memory)
> node 1 size: 16122 MB    (There is 16GB CXL memory) 
> node 1 free: 15627 MB    
> node distances:
> node   0   1
>   0:  10  50
>   1:  50  10
> 
> The cpu-cxl node distance is greater than cpu-to-cpu distances.
> 
> We can also verify using lsmem
>  
> #lsmem --output RANGE,SIZE,STATE,NODE,ZONES,BLOCK
> RANGE                                 SIZE  STATE NODE  ZONES BLOCK
> 0x0000000000000000-0x000000007fffffff   2G online    0   None     0
> 0x0000000080000000-0x00000000ffffffff   2G online    0  DMA32     1
> 0x0000000100000000-0x0000000fffffffff  60G online    0 Normal  2-31
> 0x0000001000000000-0x000000107fffffff   2G online    0   None    32
> 0x0000001080000000-0x000000147fffffff  16G online    1 Normal 33-40
> 
> Memory block size:         2G
> Total online memory:      82G
> Total offline memory:      0B
> 
> 
> We can also verify using ACPI SRAT table and memory maps.

I think that adding (in general terms) that "SMBA throttles CXL.mem
devices" to Documentation/x86/resctrl.rst may be sufficient for
a user to understand what will be throttled without needing to go into
details about CXL device discovery. 

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
       [not found]             ` <3aa991a8-ac08-297d-8328-5380897f6dd9@amd.com>
@ 2022-08-30 22:23               ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2022-08-30 22:23 UTC (permalink / raw)
  To: babu.moger
  Cc: bagasdotme, bp, corbet, dave.hansen, eranian, fenghua.yu, hpa,
	linux-doc, linux-kernel, mingo, tglx, tony.luck, x86

Hi Babu,

On 8/30/2022 3:10 PM, Moger, Babu wrote:
> On 8/30/2022 11:39 AM, Reinette Chatre wrote:

>> Would the "SMBA" resource be available only when CXL.mem devices are present
>> on the system? Since this is a CPU feature it is unclear to me whether
>> presence of CXL.mem devices would be known at the time "SMBA" is enumerated.
>> Could the "SMBA" resource thus exist without memory to throttle?
> 
> Yes.The presence of the SMBA feature(with CXL.mem) is independent of whether slow memory is actually present in the system.If there is no slow memory, then setting a SMBA limit will have no impact on the performance of the system.
> 

I think this is significant and should also be documented.

Reinette



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-08-30 16:39           ` Reinette Chatre
       [not found]             ` <3aa991a8-ac08-297d-8328-5380897f6dd9@amd.com>
@ 2022-08-30 22:28             ` Moger, Babu
  1 sibling, 0 replies; 44+ messages in thread
From: Moger, Babu @ 2022-08-30 22:28 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: bagasdotme, bp, corbet, dave.hansen, eranian, fenghua.yu, hpa,
	linux-doc, linux-kernel, mingo, tglx, tony.luck, x86

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, August 30, 2022 11:40 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: bagasdotme@gmail.com; bp@alien8.de; corbet@lwn.net;
> dave.hansen@linux.intel.com; eranian@google.com; fenghua.yu@intel.com;
> hpa@zytor.com; linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> mingo@redhat.com; tglx@linutronix.de; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth
> Allocation feature flag
> 
> Hi Babu,
> 
> On 8/29/2022 4:25 PM, Babu Moger wrote:
> > Hi Reinette,
> >    Some reason this thread did not land in my mailbox. Replying using
> > git sendmail to the thread
> >
> >> (snip modified links)
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C5e1d3f7a
> >
> a30749a3841e08da8aa69bd0%7C3dd8961fe4884e608e11a82d994e183d%7C0
> %7C0%7C
> >
> 637974796714276452%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=yGIo
> > q%2Fp9xD1i6IfrkPEUj8sg9Xz08r0jrNTvGK7khko%3D&amp;reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7C5e1d3f7aa30749a3841e08da8aa69bd0%7C3dd8961fe48
> 84e608e
> >
> 11a82d994e183d%7C0%7C0%7C637974796714276452%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=qu1cxHp6nCdEFJbJv5QDD0tAHHaV4tJ63NKC9fIiIx0%
> 3D&am
> > p;reserved=0
> >
> >> When you say "in this case", is there another case?
> >
> > There is no other interface. It is only CXL memory device.
> >
> >>
> >> Should "Slow Memory Bandwidth Allocation" thus be considered to be
> >> "CXL.mem Memory Bandwidth Allocation"? Why not call it "CXL(.mem?)
> >> Memory Bandwith Allocation"?
> >
> > Checked with our team here. The currently only supported slow memory
> > is CXL.mem device. As for the naming, the "slow" memory landscape is still
> evolving.
> > While CXL.mem is the only known type supported right now. The specs
> > says "Slow Memory Bandwidth Allocation". So, we would prefer to keep it
> that way.
> 
> If you prefer to keep "Slow Memory Bandwidth Allocation" then please also
> provide clear information to the user on what is managed via "Memory
> Bandwidth Allocation" and what is managed via "Slow Memory Bandwidth
> Allocation". This could be in the documentation.

Sure.
> 
> >> I am not familiar with CXL so please correct me where I am wrong.
> >> From what I understand CXL.mem is a protocol and devices that
> >> implement it can have different memory types ... some faster than
> >> others. So, even if SMBA supports "CXL.mem" devices, could a system
> >> have multiple CXL.mem devices, some faster than others? Would all be
> >> configured the same with SMBA (they would all be classified as "slow" and
> throttled the same)?
> >
> > I have not tested the multiple devices with different memory speeds here.
> > But checking with team here says it should just work the same way. It
> > appears that the throttling logic groups all the slow sources together
> > and applies the limit on them as a whole.
> 
> "the throttling logic groups all the slow sources together and applies the limit
> on them as a whole.". This is valuable content for the documentation about this
> feature. Could the changes to Documentation/x86/resctrl.rst be updated to
> include a paragraph describing SMBA and what is (or is not) considered a "slow
> resource"?

Sure.
> 
> >> I do not think these devices are invisible to the OS though (after
> >> reading Documentation/driver-api/cxl/memory-devices.rst and
> >> Documentation/ABI/testing/sysfs-class-cxl).
> >>
> >> Is there not a way to provide some more clarity to users on what
> >> would be throttled?
> >>
> 
> I repeat the question you snipped from my email (please don't do that). Could
Sorry.. Not intentional.

> you please answer it?:
> Would the "SMBA" resource be available only when CXL.mem devices are
> present on the system? Since this is a CPU feature it is unclear to me whether
> presence of CXL.mem devices would be known at the time "SMBA" is
> enumerated.
> Could the "SMBA" resource thus exist without memory to throttle?

Yes.  The presence of the SMBA feature(with CXL.mem) is independent of whether slow memory is actually present in the system.  If there is no slow memory, then setting a SMBA limit will have no impact on the performance of the system.
> 
> >> How does a user know which memory on the system is "slow memory"?
> >>
> >> It remains unclear to me how a user is intended to use this feature.
> >>
> >> How will a user know which devices/memory (if any) are being
> >> throttled by "SMBA"?
> >>
> > This is a new technology. I am still learning.
> >
> > Currently, I have tested with CXL 1.1 type of device. CXL 1.1 uses a
> > simple topology structure of direct attachment between host (such as a
> > CPU or GPU) and CXL device.
> >
> > #numactl -H
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 node 0 size:
> > 63678 MB node 0 free: 59542 MB
> > node 1 cpus:             (CPU list is emply. Node 1 have CXL memory)
> > node 1 size: 16122 MB    (There is 16GB CXL memory)
> > node 1 free: 15627 MB
> > node distances:
> > node   0   1
> >   0:  10  50
> >   1:  50  10
> >
> > The cpu-cxl node distance is greater than cpu-to-cpu distances.
> >
> > We can also verify using lsmem
> >
> > #lsmem --output RANGE,SIZE,STATE,NODE,ZONES,BLOCK
> > RANGE                                 SIZE  STATE NODE  ZONES BLOCK
> > 0x0000000000000000-0x000000007fffffff   2G online    0   None     0
> > 0x0000000080000000-0x00000000ffffffff   2G online    0  DMA32     1
> > 0x0000000100000000-0x0000000fffffffff  60G online    0 Normal  2-31
> > 0x0000001000000000-0x000000107fffffff   2G online    0   None    32
> > 0x0000001080000000-0x000000147fffffff  16G online    1 Normal 33-40
> >
> > Memory block size:         2G
> > Total online memory:      82G
> > Total offline memory:      0B
> >
> >
> > We can also verify using ACPI SRAT table and memory maps.
> 
> I think that adding (in general terms) that "SMBA throttles CXL.mem devices" to
> Documentation/x86/resctrl.rst may be sufficient for a user to understand what
> will be throttled without needing to go into details about CXL device discovery.

Sure.
Thanks
Babu

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-08-30 22:31 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-22 13:42 [PATCH v3 00/10] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
2022-08-22 13:42 ` [PATCH v3 01/10] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
2022-08-23 20:56   ` Reinette Chatre
2022-08-24 15:58     ` Moger, Babu
2022-08-22 13:42 ` [PATCH v3 02/10] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
2022-08-23 22:47   ` Reinette Chatre
2022-08-25 22:42     ` Moger, Babu
2022-08-26 16:17       ` Reinette Chatre
2022-08-29 23:25         ` Babu Moger
2022-08-30 16:39           ` Reinette Chatre
     [not found]             ` <3aa991a8-ac08-297d-8328-5380897f6dd9@amd.com>
2022-08-30 22:23               ` Reinette Chatre
2022-08-30 22:28             ` Moger, Babu
2022-08-22 13:43 ` [PATCH v3 03/10] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
2022-08-24 17:39   ` Reinette Chatre
2022-08-26 14:59   ` Moger, Babu
2022-08-22 13:43 ` [PATCH v3 04/10] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
2022-08-23 22:47   ` Reinette Chatre
2022-08-24 16:48     ` Moger, Babu
2022-08-22 13:43 ` [PATCH v3 05/10] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
2022-08-22 13:43 ` [PATCH v3 06/10] x86/resctrl: Introduce mon_configurable to detect Bandwidth Monitoring Event Configuration Babu Moger
2022-08-24 21:15   ` Reinette Chatre
2022-08-25 15:11     ` Moger, Babu
2022-08-25 15:56       ` Reinette Chatre
2022-08-25 20:44         ` Moger, Babu
2022-08-25 21:24           ` Reinette Chatre
2022-08-26 14:30             ` Babu Moger
2022-08-22 13:43 ` [PATCH v3 07/10] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
2022-08-24 21:15   ` Reinette Chatre
2022-08-26 16:07     ` Moger, Babu
2022-08-26 16:35       ` Reinette Chatre
2022-08-26 16:57         ` Moger, Babu
2022-08-22 13:44 ` [PATCH v3 08/10] x86/resctrl: Add the sysfs interface to read the " Babu Moger
2022-08-22 13:47   ` Bagas Sanjaya
2022-08-22 13:50     ` Moger, Babu
2022-08-22 13:55     ` Moger, Babu
2022-08-23  1:55       ` Bagas Sanjaya
2022-08-24 21:16   ` Reinette Chatre
2022-08-26 16:49     ` Moger, Babu
2022-08-26 17:34       ` Reinette Chatre
2022-08-26 18:34         ` Moger, Babu
2022-08-22 13:44 ` [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write " Babu Moger
2022-08-24 21:16   ` Reinette Chatre
2022-08-26 18:17     ` Moger, Babu
2022-08-22 13:45 ` [PATCH v3 10/10] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).