linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix
@ 2022-09-07 17:59 Babu Moger
  2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
                   ` (12 more replies)
  0 siblings, 13 replies; 49+ messages in thread
From: Babu Moger @ 2022-09-07 17:59 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

New AMD processors can now support following QoS features.

1. Slow Memory Bandwidth Allocation (SMBA)
   With this feature, the QOS enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

   Currently, CXL.memory is the only supported "slow" memory device. With
   the support of SMBA feature the hardware enables bandwidth allocation
   on the slow memory devices.

2. Bandwidth Monitoring Event Configuration (BMEC)
   The bandwidth monitoring events mbm_total_event and mbm_local_event 
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not enough
   to count all the different types are memory events. With the feature
   BMEC, the users have the option to configure mbm_total_event and
   mbm_local_event to count the specific type of events.

   Following are the bitmaps of events supported.
   Bits    Description
     6       Dirty Victims from the QOS domain to all types of memory
     5       Reads to slow memory in the non-local NUMA domain
     4       Reads to slow memory in the local NUMA domain
     3       Non-temporal writes to non-local NUMA domain
     2       Non-temporal writes to local NUMA domain
     1       Reads to memory in the non-local NUMA domain
     0       Reads to memory in the local NUMA domain

This series adds support for these features. Added a bug fix and a code cleanup.

Feature description is available in the specification, "AMD64 Technology Platform Quality
of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

---
4:
  Got numerios of comments from Reinette Chatre. Addressed most of them. 
  Summary of changes.
  1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.  
  2. Updated mon_features texts if the BMEC is supported.
  3. Added more explanation about the slow memory support.
  4. Replaced smp_call_function_many with on_each_cpu_mask call.
  5. Removed arch_has_empty_bitmaps
  6. Few other text changes.
  7. Removed Reviewed-by if the patch is modified.
  8. Rebased the patches to latest tip.

v3:
  https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit@bmoger-ubuntu/ 
  a. Rebased the patches to latest tip. Resolved some conflicts.
     https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  b. Taken care of feedback from Bagas Sanjaya.
  c. Added Reviewed by from Mingo.
  Note: I am still looking for comments from Reinette or Fenghua.

v2:
  https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
  b. Added the patch to fix CBM issue on AMD. This was originally discussed
     https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/

v1:
  https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/

Babu Moger (13):
      x86/resctrl: Fix min_cbm_bits for AMD
      x86/resctrl: Remove arch_has_empty_bitmaps
      x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
      x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
      x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
      x86/resctrl: Include new features in command line options
      x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
      x86/resctrl : Introduce data structure to support monitor configuration
      x86/resctrl: Add sysfs interface files to read/write event configuration
      x86/resctrl: Add the sysfs interface to read the event configuration
      x86/resctrl: Add sysfs interface to write the event configuration
      x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask
      Documentation/x86: Update resctrl_ui.rst for new features


 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/x86/resctrl.rst                 | 148 +++++++++-
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  58 +++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   5 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  32 ++-
 arch/x86/kernel/cpu/resctrl/monitor.c         |   9 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 272 +++++++++++++++---
 arch/x86/kernel/cpu/scattered.c               |   2 +
 include/linux/resctrl.h                       |   6 +-
 10 files changed, 479 insertions(+), 57 deletions(-)

--


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
@ 2022-09-07 17:59 ` Babu Moger
  2022-09-09 17:00   ` James Morse
  2022-09-16 15:53   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps Babu Moger
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Babu Moger @ 2022-09-07 17:59 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

AMD systems support zero CBM (capacity bit mask) for L3 allocation.
That is reflected in rdt_init_res_defs_amd() by:

	r->cache.arch_has_empty_bitmaps = true;

However given the unified code in cbm_validate(), checking for:
	val == 0 && !arch_has_empty_bitmaps

is not enough because of another check in cbm_validate():

	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)

The default value of r->cache.min_cbm_bits = 1.

Leading to:

	$ cd /sys/fs/resctrl
	$ mkdir foo
	$ cd foo
	$ echo L3:0=0 > schemata
          -bash: echo: write error: Invalid argument
	$ cat /sys/fs/resctrl/info/last_cmd_status
	  Need at least 1 bits in the mask

Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also,
remove the default setting of min_cbm_bits and initialize it separately.

After the fix
	$ cd /sys/fs/resctrl
	$ mkdir foo
	$ cd foo
	$ echo L3:0=0 > schemata
	$ cat /sys/fs/resctrl/info/last_cmd_status
	  ok

Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bb1c3f5f60c8..a5c51a14fbce 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.rid			= RDT_RESOURCE_L3,
 			.name			= "L3",
 			.cache_level		= 3,
-			.cache = {
-				.min_cbm_bits	= 1,
-			},
 			.domains		= domain_init(RDT_RESOURCE_L3),
 			.parse_ctrlval		= parse_cbm,
 			.format_str		= "%d=%0*x",
@@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.rid			= RDT_RESOURCE_L2,
 			.name			= "L2",
 			.cache_level		= 2,
-			.cache = {
-				.min_cbm_bits	= 1,
-			},
 			.domains		= domain_init(RDT_RESOURCE_L2),
 			.parse_ctrlval		= parse_cbm,
 			.format_str		= "%d=%0*x",
@@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
 			r->cache.arch_has_sparse_bitmaps = false;
 			r->cache.arch_has_empty_bitmaps = false;
 			r->cache.arch_has_per_cpu_cfg = false;
+			r->cache.min_cbm_bits = 1;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
 			hw_res->msr_update = mba_wrmsr_intel;
@@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
 			r->cache.arch_has_sparse_bitmaps = true;
 			r->cache.arch_has_empty_bitmaps = true;
 			r->cache.arch_has_per_cpu_cfg = true;
+			r->cache.min_cbm_bits = 0;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
  2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-16 15:53   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The field arch_has_empty_bitmaps is not required anymore. The field
min_cbm_bits is enough to validate the CBM (capacity bit mask) if the
architecture can support the zero CBM or not.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        |    2 --
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    3 +--
 include/linux/resctrl.h                   |    6 +++---
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index a5c51a14fbce..c2657754672e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -869,7 +869,6 @@ static __init void rdt_init_res_defs_intel(void)
 		if (r->rid == RDT_RESOURCE_L3 ||
 		    r->rid == RDT_RESOURCE_L2) {
 			r->cache.arch_has_sparse_bitmaps = false;
-			r->cache.arch_has_empty_bitmaps = false;
 			r->cache.arch_has_per_cpu_cfg = false;
 			r->cache.min_cbm_bits = 1;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
@@ -890,7 +889,6 @@ static __init void rdt_init_res_defs_amd(void)
 		if (r->rid == RDT_RESOURCE_L3 ||
 		    r->rid == RDT_RESOURCE_L2) {
 			r->cache.arch_has_sparse_bitmaps = true;
-			r->cache.arch_has_empty_bitmaps = true;
 			r->cache.arch_has_per_cpu_cfg = true;
 			r->cache.min_cbm_bits = 0;
 		} else if (r->rid == RDT_RESOURCE_MBA) {
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 87666275eed9..7f38c8bd8429 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -98,8 +98,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
 		return false;
 	}
 
-	if ((!r->cache.arch_has_empty_bitmaps && val == 0) ||
-	    val > r->default_ctrl) {
+	if ((r->cache.min_cbm_bits > 0 && val == 0) || val > r->default_ctrl) {
 		rdt_last_cmd_puts("Mask out of range\n");
 		return false;
 	}
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 21deb5212bbd..46ed8589857c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -72,11 +72,12 @@ struct rdt_domain {
 /**
  * struct resctrl_cache - Cache allocation related data
  * @cbm_len:		Length of the cache bit mask
- * @min_cbm_bits:	Minimum number of consecutive bits to be set
+ * @min_cbm_bits:	Minimum number of consecutive bits to be set.
+ *			The value 0 means the architecture can support
+ *			zero CBM.
  * @shareable_bits:	Bitmask of shareable resource with other
  *			executing entities
  * @arch_has_sparse_bitmaps:	True if a bitmap like f00f is valid.
- * @arch_has_empty_bitmaps:	True if the '0' bitmap is valid.
  * @arch_has_per_cpu_cfg:	True if QOS_CFG register for this cache
  *				level has CPU scope.
  */
@@ -85,7 +86,6 @@ struct resctrl_cache {
 	unsigned int	min_cbm_bits;
 	unsigned int	shareable_bits;
 	bool		arch_has_sparse_bitmaps;
-	bool		arch_has_empty_bitmaps;
 	bool		arch_has_per_cpu_cfg;
 };
 



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
  2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
  2022-09-07 18:00 ` [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-16 15:54   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
enforcement policies can be applied to external slow memory connected
to the host. QOS enforcement is accomplished by assigning a Class Of
Service (COS) to a processor and specifying allocations or limits for
that COS for each resource to be allocated.

This feature is identified by the CPUID Function 8000_0020_EBX_x0.

CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
Bits    Field Name      Description
2       L3SBE           L3 external slow memory bandwidth enforcement

Currently, CXL.memory is the only supported "slow" memory device. With the
support of SMBA feature the hardware enables bandwidth allocation on the
slow memory devices. If there are multiple slow memory devices in the system,
then the throttling logic groups all the slow sources together and applies
the limit on them as a whole.

The presence of the SMBA feature(with CXL.memory) is independent of whether
slow memory device is actually present in the system. If there is no slow
memory in the system, then setting a SMBA limit will have no impact on the
performance of the system.

Presence of CXL memory can be identified by numactl command.

$numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB node 0 free: 59542 MB
node 1 cpus:
node 1 size: 16122 MB
node 1 free: 15627 MB
node distances:
node   0   1
   0:  10  50
   1:  50  10

CPU list for CXL memory will be emply. The cpu-cxl node distance is greater
than cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL
memory can also be identified using ACPI SRAT table and memory maps.

Feature description is available in the specification, "AMD64 Technology Platform Quality
of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 235dc85c91c3..1815435c9c88 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -304,6 +304,7 @@
 #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
+#define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index fd44b54c90d5..885ecf46abb2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (2 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-16 15:54   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS
enforcement policies on the external slow memory.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c2657754672e..a7e9aabff8c8 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_SMBA] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_SMBA,
+			.name			= "SMBA",
+			.cache_level		= 3,
+			.domains		= domain_init(RDT_RESOURCE_SMBA),
+			.parse_ctrlval		= parse_bw,
+			.format_str		= "%d=%*u",
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 1d647188a43b..24a1dfeb6cb2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -418,6 +418,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_SMBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (3 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-07 18:36   ` Daniel Sneddon
  2022-09-16 15:55   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 06/13] x86/resctrl: Include new features in command line options Babu Moger
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

Support of this feature is available via CPUID Fn8000_0020_EBX_x0 (ECX=0).
Bits    Field Name       Description
3        EVT_CFG         Bandwidth Monitoring Event Configuration (BMEC)

Currently, the bandwidth monitoring events mbm_total_bytes and
mbm_local_bytes are set to count all the total and local reads/writes
respectively. With the introduction of SLOW memory, the two counters are
not enough to count all the different types of memory events. With the
feature BMEC, the users have the option to configure mbm_total_bytes and
mbm_local_bytes to count the specific type of events.

Each BMEC event has a configuration MSR, QOS_EVT_CFG (0x000_0400h +
EventID) which contains one field for each Bandwidth Type that can be used
to configure the bandwidth event to track any combination of supported
bandwidth types. The event will count requests from every Bandwidth Type
bit that is set in the corresponding configuration register.

Following are the types of events supported:

====    ========================================================
Bits    Description
====    ========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

Feature description is available in the specification, "AMD64 Technology
Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1815435c9c88..a4ee02a56d54 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -305,6 +305,7 @@
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
 #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC		(11*32+18) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 885ecf46abb2..7981df0b910e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
+	{ X86_FEATURE_BMEC,             CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 06/13] x86/resctrl: Include new features in command line options
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (4 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-16 15:55   ` Reinette Chatre
  2022-09-07 18:00 ` [PATCH v4 07/13] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the command line options to disable the new features.
smba : Slow Memory Bandwidth Allocation
mbec : Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d7f30902fda0..c109fecb93ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5169,7 +5169,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba.
+			mba, smba, bmec.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index a7e9aabff8c8..53fbc3acad81 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -700,6 +700,8 @@ enum {
 	RDT_FLAG_L2_CAT,
 	RDT_FLAG_L2_CDP,
 	RDT_FLAG_MBA,
+	RDT_FLAG_SMBA,
+	RDT_FLAG_BMEC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -723,6 +725,8 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
 	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
+	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
+	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 07/13] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (5 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 06/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-07 18:00 ` [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration Babu Moger
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        |   29 +++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    9 ++++++---
 4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 53fbc3acad81..56c96607259c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -227,9 +227,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	union cpuid_0x10_3_eax eax;
 	union cpuid_0x10_x_edx edx;
-	u32 ebx, ecx;
+	u32 ebx, ecx, subleaf;
+
+	/*
+	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
+	 * CPUID_Fn80000020_EDX_x02 for SMBA
+	 */
+	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
 
-	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
+	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
 	hw_res->num_closid = edx.split.cos_max + 1;
 	r->default_ctrl = MAX_MBA_BW_AMD;
 
@@ -791,6 +797,19 @@ static __init bool get_mem_config(void)
 	return false;
 }
 
+static __init bool get_slow_mem_config(void)
+{
+	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
+
+	if (!rdt_cpu_has(X86_FEATURE_SMBA))
+		return false;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
+
+	return false;
+}
+
 static __init bool get_rdt_alloc_resources(void)
 {
 	struct rdt_resource *r;
@@ -821,6 +840,9 @@ static __init bool get_rdt_alloc_resources(void)
 	if (get_mem_config())
 		ret = true;
 
+	if (get_slow_mem_config())
+		ret = true;
+
 	return ret;
 }
 
@@ -910,6 +932,9 @@ static __init void rdt_init_res_defs_amd(void)
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;
+		} else if (r->rid == RDT_RESOURCE_SMBA) {
+			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
+			hw_res->msr_update = mba_wrmsr_amd;
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 7f38c8bd8429..480600b8e4cf 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -202,7 +202,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	unsigned long dom_id;
 
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    r->rid == RDT_RESOURCE_MBA) {
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 24a1dfeb6cb2..c049a274383c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -14,6 +14,7 @@
 #define MSR_IA32_L2_CBM_BASE		0xd10
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
+#define MSR_IA32_SMBA_BW_BASE		0xc0000280
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f276aff521e8..fc5286067201 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1218,7 +1218,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA)
+		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->domains, list) {
@@ -1399,7 +1399,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 				ctrl = resctrl_arch_get_config(r, d,
 							       rdtgrp->closid,
 							       schema->conf_type);
-				if (r->rid == RDT_RESOURCE_MBA)
+				if (r->rid == RDT_RESOURCE_MBA ||
+				    r->rid == RDT_RESOURCE_SMBA)
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -2807,7 +2808,9 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA) {
+		if (r->rid == RDT_RESOURCE_MBA ||
+		    r->rid == RDT_RESOURCE_SMBA) {
+
 			rdtgroup_init_mba(r);
 		} else {
 			ret = rdtgroup_init_cat(s, rdtgrp->closid);



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (6 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 07/13] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
@ 2022-09-07 18:00 ` Babu Moger
  2022-09-16 15:56   ` Reinette Chatre
  2022-09-07 18:01 ` [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add couple of fields in mon_evt to support Bandwidth Monitoring Event
Configuratio (BMEC) and also update the "mon_features".

The sysfs file "mon_features" will display the monitor configuration if
supported.

Before the change.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_local_bytes

After the change if BMEC is supported.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_total_config
	mbm_local_bytes
	mbm_local_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |    3 ++-
 arch/x86/kernel/cpu/resctrl/internal.h |    6 +++++-
 arch/x86/kernel/cpu/resctrl/monitor.c  |    9 ++++++++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 56c96607259c..513e6a00f58e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -849,6 +849,7 @@ static __init bool get_rdt_alloc_resources(void)
 static __init bool get_rdt_mon_resources(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
 		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
@@ -860,7 +861,7 @@ static __init bool get_rdt_mon_resources(void)
 	if (!rdt_mon_features)
 		return false;
 
-	return !rdt_get_mon_l3_config(r);
+	return !rdt_get_mon_l3_config(r, mon_configurable);
 }
 
 static __init void __check_quirks_intel(void)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c049a274383c..45923eb4022f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -72,11 +72,15 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
+ * @configurable:	true if the event is configurable
+ * @config_name:	sysfs file name of the event if configurable
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	u32			evtid;
 	char			*name;
+	bool 			configurable;
+	char			*config_name;
 	struct list_head	list;
 };
 
@@ -529,7 +533,7 @@ int closids_supported(void);
 void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
-int rdt_get_mon_l3_config(struct rdt_resource *r);
+int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index eaf25a234ff5..dc97aa7a3b3d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -656,11 +656,13 @@ static struct mon_evt llc_occupancy_event = {
 static struct mon_evt mbm_total_event = {
 	.name		= "mbm_total_bytes",
 	.evtid		= QOS_L3_MBM_TOTAL_EVENT_ID,
+	.config_name	= "mbm_total_config",
 };
 
 static struct mon_evt mbm_local_event = {
 	.name		= "mbm_local_bytes",
 	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
+	.config_name	= "mbm_local_config",
 };
 
 /*
@@ -682,7 +684,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
-int rdt_get_mon_l3_config(struct rdt_resource *r)
+int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -714,6 +716,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (ret)
 		return ret;
 
+	if (configurable) {
+		mbm_total_event.configurable = true;
+		mbm_local_event.configurable = true;
+	}
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index fc5286067201..f55a693fa958 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list)
+	list_for_each_entry(mevt, &r->evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
+		if (mevt->configurable)
+			seq_printf(seq, "%s\n", mevt->config_name);
+	}
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (7 preceding siblings ...)
  2022-09-07 18:00 ` [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration Babu Moger
@ 2022-09-07 18:01 ` Babu Moger
  2022-09-16 15:58   ` Reinette Chatre
  2022-09-07 18:01 ` [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the " Babu Moger
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add two new sysfs files to read/write the event configuration if
the feature Bandwidth Monitoring Event Configuration (BMEC) is
supported. The file mbm_local_config is for the configuration
of the event mbm_local_bytes and the file mbm_total_config is
for the configuration of mbm_total_bytes.

$ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config

$ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
/sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   40 ++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f55a693fa958..da11fdad204d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
 	.seq_show		= rdtgroup_mondata_show,
 };
 
+static const struct kernfs_ops kf_mondata_config_ops = {
+	.atomic_write_len       = PAGE_SIZE,
+};
+
 static bool is_cpu_list(struct kernfs_open_file *of)
 {
 	struct rftype *rft = of->kn->priv;
@@ -2478,24 +2482,40 @@ static struct file_system_type rdt_fs_type = {
 	.kill_sb		= rdt_kill_sb,
 };
 
-static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
+static int mon_addfile(struct kernfs_node *parent_kn, struct mon_evt *mevt,
 		       void *priv)
 {
-	struct kernfs_node *kn;
+	struct kernfs_node *kn_evt, *kn_evt_config;
 	int ret = 0;
 
-	kn = __kernfs_create_file(parent_kn, name, 0444,
-				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
-				  &kf_mondata_ops, priv, NULL, NULL);
-	if (IS_ERR(kn))
-		return PTR_ERR(kn);
+	kn_evt = __kernfs_create_file(parent_kn, mevt->name, 0444,
+			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
+			&kf_mondata_ops, priv, NULL, NULL);
+	if (IS_ERR(kn_evt))
+		return PTR_ERR(kn_evt);
 
-	ret = rdtgroup_kn_set_ugid(kn);
+	ret = rdtgroup_kn_set_ugid(kn_evt);
 	if (ret) {
-		kernfs_remove(kn);
+		kernfs_remove(kn_evt);
 		return ret;
 	}
 
+	if (mevt->configurable) {
+		kn_evt_config = __kernfs_create_file(parent_kn,
+				mevt->config_name, 0644,
+				GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
+				&kf_mondata_config_ops, priv, NULL, NULL);
+		if (IS_ERR(kn_evt_config))
+			return PTR_ERR(kn_evt_config);
+
+		ret = rdtgroup_kn_set_ugid(kn_evt_config);
+		if (ret) {
+			kernfs_remove(kn_evt_config);
+			kernfs_remove(kn_evt);
+			return ret;
+		}
+	}
+
 	return ret;
 }
 
@@ -2550,7 +2570,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 	priv.u.domid = d->id;
 	list_for_each_entry(mevt, &r->evt_list, list) {
 		priv.u.evtid = mevt->evtid;
-		ret = mon_addfile(kn, mevt->name, priv.priv);
+		ret = mon_addfile(kn, mevt, priv.priv);
 		if (ret)
 			goto out_destroy;
 



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (8 preceding siblings ...)
  2022-09-07 18:01 ` [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
@ 2022-09-07 18:01 ` Babu Moger
  2022-09-16 15:59   ` Reinette Chatre
  2022-09-07 18:01 ` [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write " Babu Moger
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be viewed by the user by reading
the sysfs configuration file.

Following are the types of events supported:

====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

$cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
0x7f

$cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
0x15

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |   24 ++++++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   77 ++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 45923eb4022f..96f439324d78 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -15,6 +15,7 @@
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
@@ -50,6 +51,29 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
+/* Reads to Local DRAM Memory */
+#define READS_TO_LOCAL_MEM		BIT(0)
+
+/* Reads to Remote DRAM Memory */
+#define READS_TO_REMOTE_MEM		BIT(1)
+
+/* Non-Temporal Writes to Local Memory */
+#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
+
+/* Non-Temporal Writes to Remote Memory */
+#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
+
+/* Reads to Local Memory the system identifies as "Slow Memory" */
+#define READS_TO_LOCAL_S_MEM		BIT(4)
+
+/* Reads to Remote Memory the system identifies as "Slow Memory" */
+#define READS_TO_REMOTE_S_MEM		BIT(5)
+
+/* Dirty Victims to All Types of Memory */
+#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
+
+/* Max event bits supported */
+#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index da11fdad204d..6f067c1ac7c1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -254,8 +254,85 @@ static const struct kernfs_ops kf_mondata_ops = {
 	.seq_show		= rdtgroup_mondata_show,
 };
 
+struct mon_config_info {
+	u32 evtid;
+	u32 mon_config;
+};
+
+/*
+ * This is called via IPI to read the CQM/MBM counters
+ * in a domain.
+ */
+void mon_event_config_read(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 h, msr_index;
+
+	switch (mon_info->evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		/* Not expected to come here */
+		return;
+	}
+
+	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
+}
+
+void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+{
+	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
+}
+
+int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
+{
+	struct kernfs_open_file *of = m->private;
+	struct mon_config_info mon_info;
+	struct rdt_hw_resource *hw_res;
+	u32 resid, evtid, domid;
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	union mon_data_bits md;
+	struct rdt_domain *d;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	md.priv = of->kn->priv;
+	resid = md.u.rid;
+	domid = md.u.domid;
+	evtid = md.u.evtid;
+
+	hw_res = &rdt_resources_all[resid];
+	r = &hw_res->r_resctrl;
+
+	d = rdt_find_domain(r, domid, NULL);
+	if (IS_ERR_OR_NULL(d)) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	mon_info.evtid = evtid;
+	mondata_config_read(d, &mon_info);
+
+	seq_printf(m, "0x%x\n", mon_info.mon_config);
+
+out:
+	rdtgroup_kn_unlock(of->kn);
+	return ret;
+}
+
 static const struct kernfs_ops kf_mondata_config_ops = {
 	.atomic_write_len       = PAGE_SIZE,
+	.seq_show               = rdtgroup_mondata_config_show,
 };
 
 static bool is_cpu_list(struct kernfs_open_file *of)



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write the event configuration
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (9 preceding siblings ...)
  2022-09-07 18:01 ` [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the " Babu Moger
@ 2022-09-07 18:01 ` Babu Moger
  2022-09-16 16:17   ` Reinette Chatre
  2022-09-07 18:01 ` [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask Babu Moger
  2022-09-07 18:01 ` [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the sysfs interface to write the event configuration for the
MBM configurable events. The event configuration can be changed by
writing to the sysfs file for that specific event.

Following are the types of events supported:

====  ===========================================================
Bits    Description
====  ===========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

For example:
To change the mbm_total_bytes to count all the reads, run the command.
$echo  0x33 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config

To change the mbm_local_bytes to count all the slow memory reads, run
the command.
$echo  0x30 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  112 ++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6f067c1ac7c1..59b484eb1267 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -330,9 +330,121 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
 	return ret;
 }
 
+/*
+ * This is called via IPI to read the CQM/MBM counters
+ * in a domain.
+ */
+void mon_event_config_write(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 msr_index;
+
+	switch (mon_info->evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		/* Not expected to come here */
+		return;
+	}
+
+	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
+}
+
+ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
+				       char *buf, size_t nbytes, loff_t off)
+{
+	struct mon_config_info mon_info;
+	struct rdt_hw_resource *hw_res;
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	unsigned int mon_config;
+	cpumask_var_t cpu_mask;
+	union mon_data_bits md;
+	struct rdt_domain *d;
+	u32 resid, domid;
+	int ret = 0, cpu;
+
+	ret = kstrtouint(buf, 0, &mon_config);
+	if (ret)
+		return ret;
+
+	rdt_last_cmd_clear();
+
+	/* mon_config cannot be more than the supported set of events */
+	if (mon_config > MAX_EVT_CONFIG_BITS) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return -EINVAL;
+	}
+
+	cpus_read_lock();
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		return -ENOENT;
+		goto e_unlock;
+	}
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
+		ret = -ENOMEM;
+		goto e_unlock;
+	}
+
+	md.priv = of->kn->priv;
+	resid = md.u.rid;
+	domid = md.u.domid;
+
+	hw_res = &rdt_resources_all[resid];
+	r = &hw_res->r_resctrl;
+	d = rdt_find_domain(r, domid, NULL);
+	if (IS_ERR_OR_NULL(d)) {
+		ret = -ENOENT;
+		goto e_cpumask;
+	}
+
+	/*
+	 * Read the current config value first. If both are same
+	 * then we dont need to write again
+	 */
+	mon_info.evtid = md.u.evtid;
+	mondata_config_read(d, &mon_info);
+	if (mon_info.mon_config == mon_config)
+		goto e_cpumask;
+
+	mon_info.mon_config = mon_config;
+
+	/* Pick all the CPUs in the domain instance */
+	for_each_cpu(cpu, &d->cpu_mask)
+		cpumask_set_cpu(cpu, cpu_mask);
+
+	/* Update MSR_IA32_EVT_CFG_BASE MSR on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
+
+	/*
+	 * When an Event Configuration is changed, the bandwidth counters
+	 * for all RMIDs and Events will be cleared, and the U-bit for every
+	 * RMID will be set on the next read to any BwEvent for every RMID.
+	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
+	 */
+	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
+	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
+
+e_cpumask:
+	free_cpumask_var(cpu_mask);
+
+e_unlock:
+	rdtgroup_kn_unlock(of->kn);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 static const struct kernfs_ops kf_mondata_config_ops = {
 	.atomic_write_len       = PAGE_SIZE,
 	.seq_show               = rdtgroup_mondata_config_show,
+	.write                  = rdtgroup_mondata_config_write,
 };
 
 static bool is_cpu_list(struct kernfs_open_file *of)



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (10 preceding siblings ...)
  2022-09-07 18:01 ` [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write " Babu Moger
@ 2022-09-07 18:01 ` Babu Moger
  2022-09-16 16:17   ` Reinette Chatre
  2022-09-07 18:01 ` [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The call on_each_cpu_mask can run the function on each CPU specified by
cpumask, which may include the local processor. So, replace the call
smp_call_function_many with on_each_cpu_mask to simplify the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 59b484eb1267..fa0e1e8e29aa 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -518,12 +518,7 @@ static void update_cpu_closid_rmid(void *info)
 static void
 update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 {
-	int cpu = get_cpu();
-
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
-	put_cpu();
+	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
 }
 
 static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
@@ -2058,13 +2053,9 @@ static int set_cache_qos_cfg(int level, bool enable)
 			/* Pick one CPU from each domain instance to update MSR */
 			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 	}
-	cpu = get_cpu();
-	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update(&enable);
-	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
-	put_cpu();
+
+	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, update, &enable, 1);
 
 	free_cpumask_var(cpu_mask);
 
@@ -2506,7 +2497,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	int i, cpu;
+	int i;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
@@ -2527,13 +2518,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
 		for (i = 0; i < hw_res->num_closid; i++)
 			hw_dom->ctrl_val[i] = r->default_ctrl;
 	}
-	cpu = get_cpu();
-	/* Update CBM on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_ctrl_update(&msr_param);
-	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-	put_cpu();
+
+	/* Update CBM on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 
 	free_cpumask_var(cpu_mask);
 



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features
  2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
                   ` (11 preceding siblings ...)
  2022-09-07 18:01 ` [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask Babu Moger
@ 2022-09-07 18:01 ` Babu Moger
  2022-09-08  4:07   ` Bagas Sanjaya
  12 siblings, 1 reply; 49+ messages in thread
From: Babu Moger @ 2022-09-07 18:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Update the documentation for the new features:
1. Slow Memory Bandwidth allocation (SMBA).
   With this feature, the QOS  enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC).
   The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not
   enough to count all the different types are memory events. With the
   feature BMEC, the users have the option to configure mbm_total_bytes
   and mbm_local_bytes to count the specific type of events.

Also add configuration instructions with examples.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/x86/resctrl.rst |  148 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 71a531061e4e..56581587c1a3 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
 flag bits:
 
-=============================================	================================
+===============================================	================================
 RDT (Resource Director Technology) Allocation	"rdt_a"
 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
-=============================================	================================
+SMBA (Slow Memory Bandwidth Allocation)		"smba"
+BMEC (Bandwidth Monitoring Event Configuration) "bmec"
+===============================================	================================
 
 To use the feature mount the file system::
 
@@ -161,6 +163,23 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
+                Example output:
+
+                # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                llc_occupancy
+                mbm_total_bytes
+                mbm_local_bytes
+
+                If the system supports Bandwidth Monitoring Event
+                Configuration (BMEC), then the bandwidth events will
+                be configurable. Then the output will be.
+
+                # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                llc_occupancy
+                mbm_total_bytes
+                mbm_total_config
+                mbm_local_bytes
+                mbm_local_config
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
@@ -264,6 +283,32 @@ When monitoring is enabled all MON groups will also contain:
 	the sum for all tasks in the CTRL_MON group and all tasks in
 	MON groups. Please see example section for more details on usage.
 
+"mbm_total_config", "mbm_local_config":
+        This contains the current event configuration for the events
+        mbm_total_bytes and mbm_local_bytes, respectively, when the
+        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
+        These files are organized by L3 domains under the subdirectories
+        "mon_L3_00" and "mon_L3_01". When BMEC is supported, the events
+        mbm_local_bytes and mbm_total_bytes are configurable.
+
+        Following are the types of events supported:
+
+        ====    ========================================================
+        Bits    Description
+        ====    ========================================================
+        6       Dirty Victims from the QOS domain to all types of memory
+        5       Reads to slow memory in the non-local NUMA domain
+        4       Reads to slow memory in the local NUMA domain
+        3       Non-temporal writes to non-local NUMA domain
+        2       Non-temporal writes to local NUMA domain
+        1       Reads to memory in the non-local NUMA domain
+        0       Reads to memory in the local NUMA domain
+        ====    ========================================================
+
+        By default, the mbm_total_bytes configuration is set to 0x7F to count
+        all the event types and the mbm_local_bytes configuration is set to
+        0x15 to count all the local memory events.
+
 Resource allocation rules
 -------------------------
 
@@ -464,6 +509,19 @@ Memory bandwidth domain is L3 cache.
 
 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
 
+Slow Memory bandwidth Allocation (when supported)
+-------------------------------------------------
+Currently, CXL.memory is the only supported "slow" memory device.
+With the support of SMBA feature the hardware enables bandwidth
+allocation on the slow memory devices. If there are multiple slow
+memory devices in the system, then the throttling logic groups all
+the slow sources together and applies the limit on them as a whole.
+
+Slow Memory b/w domain is L3 cache.
+::
+
+	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Reading/writing the schemata file
 ---------------------------------
 Reading the schemata file will show the state of all resources
@@ -479,6 +537,44 @@ which you wish to change.  E.g.
   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Reading/writing the schemata file (on AMD systems)
+--------------------------------------------------
+Reading the schemata file will show the state of all resources
+on all domains. When writing the memory bandwidth allocation you
+only need to specify those values in an absolute number expressed
+in 1/8 GB/s increments. To allocate bandwidth limit of 2GB, you
+need to specify the value 16 (16 * 1/8 = 2).  E.g.
+::
+
+  # cat schemata
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "MB:1=16" > schemata
+  # cat schemata
+    MB:0=2048;1=  16;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+Reading/writing the schemata file (on AMD systems) with slow memory
+-------------------------------------------------------------------
+Reading the schemata file will show the state of all resources
+on all domains. When writing the memory bandwidth allocation you
+only need to specify those values in an absolute number expressed
+in 1/8 GB/s increments. To allocate bandwidth limit of 8GB, you
+need to specify the value 64 (64 * 1/8 = 8).  E.g.
+::
+
+  # cat schemata
+    SMBA:0=2048;1=2048;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "SMBA:1=64" > schemata
+  # cat schemata
+    SMBA:0=2048;1=  64;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
 Cache Pseudo-Locking
 ====================
 CAT enables a user to specify the amount of cache space that an
@@ -1210,6 +1306,54 @@ View the llc occupancy snapshot::
   # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
   11234000
 
+Example 5 (Configure and Monitor specific event types)
+------------------------------------------------------
+
+A single socket system which has real time tasks running on cores 0-4
+and non real time tasks on other CPUs. We want to monitor the memory
+bandwidth allocation for specific events.
+::
+
+  # mount -t resctrl resctrl /sys/fs/resctrl
+  # cd /sys/fs/resctrl
+  # mkdir p1
+
+Move the CPUs 0-4 over to p1::
+
+  # echo 0xf > p1/cpus
+
+View the current mbm_local_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_bytes
+  112501
+
+Change the mbm_local_bytes to count mon-temporal writes to both local
+and non-local NUMA domain. Refer to event supported bitmap under
+mbm_local_config::
+
+  # echo 0xc > /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_config
+
+View the updated mbm_local_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_local_bytes
+  12601
+
+Similar experiment on mbm_total_bytes. First view the current mbm_total_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes
+  1532501
+
+Change the mbm_total_bytes to count only reads to slow memory on both local
+and non-local NUMA domain. Refer to event supported bitmap under
+mbm_total_config::
+
+  # echo 0x30 > /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_config
+
+View the updated mbm_total_bytes::
+
+  # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/mbm_total_bytes
+  104562
+
 Intel RDT Errata
 ================
 



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-09-07 18:00 ` [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-09-07 18:36   ` Daniel Sneddon
  2022-09-07 19:59     ` Moger, Babu
  2022-09-16 15:55   ` Reinette Chatre
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel Sneddon @ 2022-09-07 18:36 UTC (permalink / raw)
  To: Babu Moger, corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, sandipan.das,
	tony.luck, james.morse, linux-doc, linux-kernel, bagasdotme,
	eranian

On 9/7/22 11:00, Babu Moger wrote:
>  #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
> +#define X86_FEATURE_BMEC		(11*32+18) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */

Shouldn't this be +19 instead?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-09-07 18:36   ` Daniel Sneddon
@ 2022-09-07 19:59     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-07 19:59 UTC (permalink / raw)
  To: Daniel Sneddon, corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, sandipan.das,
	tony.luck, james.morse, linux-doc, linux-kernel, bagasdotme,
	eranian

Hi Daniel,

On 9/7/22 13:36, Daniel Sneddon wrote:
> On 9/7/22 11:00, Babu Moger wrote:
>>  #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
>> +#define X86_FEATURE_BMEC		(11*32+18) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */
> Shouldn't this be +19 instead?

Sorry. Missed that. Will fix it. 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features
  2022-09-07 18:01 ` [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
@ 2022-09-08  4:07   ` Bagas Sanjaya
  2022-09-08  9:26     ` Bagas Sanjaya
  0 siblings, 1 reply; 49+ messages in thread
From: Bagas Sanjaya @ 2022-09-08  4:07 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, reinette.chatre, tglx, mingo, bp, fenghua.yu,
	dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	eranian

[-- Attachment #1: Type: text/plain, Size: 5840 bytes --]

On Wed, Sep 07, 2022 at 01:01:29PM -0500, Babu Moger wrote:
> Update the documentation for the new features:
> 1. Slow Memory Bandwidth allocation (SMBA).
>    With this feature, the QOS  enforcement policies can be applied
>    to the external slow memory connected to the host. QOS enforcement
>    is accomplished by assigning a Class Of Service (COS) to a processor
>    and specifying allocations or limits for that COS for each resource
>    to be allocated.
> 
> 2. Bandwidth Monitoring Event Configuration (BMEC).
>    The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
>    are set to count all the total and local reads/writes respectively.
>    With the introduction of slow memory, the two counters are not
>    enough to count all the different types are memory events. With the
>    feature BMEC, the users have the option to configure mbm_total_bytes
>    and mbm_local_bytes to count the specific type of events.
> 
> Also add configuration instructions with examples.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  Documentation/x86/resctrl.rst |  148 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 146 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 71a531061e4e..56581587c1a3 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
>  This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
>  flag bits:
>  
> -=============================================	================================
> +===============================================	================================
>  RDT (Resource Director Technology) Allocation	"rdt_a"
>  CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
>  CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
>  CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
>  MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
>  MBA (Memory Bandwidth Allocation)		"mba"
> -=============================================	================================
> +SMBA (Slow Memory Bandwidth Allocation)		"smba"
> +BMEC (Bandwidth Monitoring Event Configuration) "bmec"
> +===============================================	================================
>  
>  To use the feature mount the file system::
>  
> @@ -161,6 +163,23 @@ with the following files:
>  "mon_features":
>  		Lists the monitoring events if
>  		monitoring is enabled for the resource.
> +                Example output:
> +
> +                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                llc_occupancy
> +                mbm_total_bytes
> +                mbm_local_bytes
> +
> +                If the system supports Bandwidth Monitoring Event
> +                Configuration (BMEC), then the bandwidth events will
> +                be configurable. Then the output will be.
> +
> +                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                llc_occupancy
> +                mbm_total_bytes
> +                mbm_total_config
> +                mbm_local_bytes
> +                mbm_local_config
>  
Use code blocks for terminal output above:

---- >8 ----
t a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 56581587c1a331..6474cf655792bf 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -163,23 +163,23 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
-                Example output:
+                Example::
 
-                # cat /sys/fs/resctrl/info/L3_MON/mon_features
-                llc_occupancy
-                mbm_total_bytes
-                mbm_local_bytes
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_local_bytes
 
                 If the system supports Bandwidth Monitoring Event
                 Configuration (BMEC), then the bandwidth events will
-                be configurable. Then the output will be.
+                be configurable. The output will be::
 
-                # cat /sys/fs/resctrl/info/L3_MON/mon_features
-                llc_occupancy
-                mbm_total_bytes
-                mbm_total_config
-                mbm_local_bytes
-                mbm_local_config
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_total_config
+                   mbm_local_bytes
+                   mbm_local_config
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in

> +        Following are the types of events supported:
> +
> +        ====    ========================================================
> +        Bits    Description
> +        ====    ========================================================
> +        6       Dirty Victims from the QOS domain to all types of memory
> +        5       Reads to slow memory in the non-local NUMA domain
> +        4       Reads to slow memory in the local NUMA domain
> +        3       Non-temporal writes to non-local NUMA domain
> +        2       Non-temporal writes to local NUMA domain
> +        1       Reads to memory in the non-local NUMA domain
> +        0       Reads to memory in the local NUMA domain
> +        ====    ========================================================
> +

The table above looks OK.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features
  2022-09-08  4:07   ` Bagas Sanjaya
@ 2022-09-08  9:26     ` Bagas Sanjaya
  2022-09-08 13:40       ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Bagas Sanjaya @ 2022-09-08  9:26 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, reinette.chatre, tglx, mingo, bp, fenghua.yu,
	dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	eranian

[-- Attachment #1: Type: text/plain, Size: 3446 bytes --]

On Thu, Sep 08, 2022 at 11:07:59AM +0700, Bagas Sanjaya wrote:
> Use code blocks for terminal output above:
> 
> ---- >8 ----
> t a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 56581587c1a331..6474cf655792bf 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -163,23 +163,23 @@ with the following files:
>  "mon_features":
>  		Lists the monitoring events if
>  		monitoring is enabled for the resource.
> -                Example output:
> +                Example::
>  
> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> -                llc_occupancy
> -                mbm_total_bytes
> -                mbm_local_bytes
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_local_bytes
>  
>                  If the system supports Bandwidth Monitoring Event
>                  Configuration (BMEC), then the bandwidth events will
> -                be configurable. Then the output will be.
> +                be configurable. The output will be::
>  
> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> -                llc_occupancy
> -                mbm_total_bytes
> -                mbm_total_config
> -                mbm_local_bytes
> -                mbm_local_config
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_total_config
> +                   mbm_local_bytes
> +                   mbm_local_config

Hi Babu,

The suggestion diff above looks corrupted, so here is the proper one:

---- >8 ----
diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 56581587c1a331..6474cf655792bf 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -163,23 +163,23 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
-                Example output:
+                Example::
 
-                # cat /sys/fs/resctrl/info/L3_MON/mon_features
-                llc_occupancy
-                mbm_total_bytes
-                mbm_local_bytes
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_local_bytes
 
                 If the system supports Bandwidth Monitoring Event
                 Configuration (BMEC), then the bandwidth events will
-                be configurable. Then the output will be.
+                be configurable. The output will be::
 
-                # cat /sys/fs/resctrl/info/L3_MON/mon_features
-                llc_occupancy
-                mbm_total_bytes
-                mbm_total_config
-                mbm_local_bytes
-                mbm_local_config
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_total_config
+                   mbm_local_bytes
+                   mbm_local_config
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features
  2022-09-08  9:26     ` Bagas Sanjaya
@ 2022-09-08 13:40       ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-08 13:40 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: corbet, reinette.chatre, tglx, mingo, bp, fenghua.yu,
	dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	eranian


On 9/8/22 04:26, Bagas Sanjaya wrote:
> On Thu, Sep 08, 2022 at 11:07:59AM +0700, Bagas Sanjaya wrote:
>> Use code blocks for terminal output above:
>>
>> ---- >8 ----
>> t a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
>> index 56581587c1a331..6474cf655792bf 100644
>> --- a/Documentation/x86/resctrl.rst
>> +++ b/Documentation/x86/resctrl.rst
>> @@ -163,23 +163,23 @@ with the following files:
>>  "mon_features":
>>  		Lists the monitoring events if
>>  		monitoring is enabled for the resource.
>> -                Example output:
>> +                Example::
>>  
>> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> -                llc_occupancy
>> -                mbm_total_bytes
>> -                mbm_local_bytes
>> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> +                   llc_occupancy
>> +                   mbm_total_bytes
>> +                   mbm_local_bytes
>>  
>>                  If the system supports Bandwidth Monitoring Event
>>                  Configuration (BMEC), then the bandwidth events will
>> -                be configurable. Then the output will be.
>> +                be configurable. The output will be::
>>  
>> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> -                llc_occupancy
>> -                mbm_total_bytes
>> -                mbm_total_config
>> -                mbm_local_bytes
>> -                mbm_local_config
>> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
>> +                   llc_occupancy
>> +                   mbm_total_bytes
>> +                   mbm_total_config
>> +                   mbm_local_bytes
>> +                   mbm_local_config
> Hi Babu,
>
> The suggestion diff above looks corrupted, so here is the proper one:

Sanjaya,

Sure, Thank you.

Babu

>
> ---- >8 ----
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 56581587c1a331..6474cf655792bf 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -163,23 +163,23 @@ with the following files:
>  "mon_features":
>  		Lists the monitoring events if
>  		monitoring is enabled for the resource.
> -                Example output:
> +                Example::
>  
> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> -                llc_occupancy
> -                mbm_total_bytes
> -                mbm_local_bytes
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_local_bytes
>  
>                  If the system supports Bandwidth Monitoring Event
>                  Configuration (BMEC), then the bandwidth events will
> -                be configurable. Then the output will be.
> +                be configurable. The output will be::
>  
> -                # cat /sys/fs/resctrl/info/L3_MON/mon_features
> -                llc_occupancy
> -                mbm_total_bytes
> -                mbm_total_config
> -                mbm_local_bytes
> -                mbm_local_config
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_total_config
> +                   mbm_local_bytes
> +                   mbm_local_config
>  
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
>
> Thanks.
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
@ 2022-09-09 17:00   ` James Morse
  2022-09-12 14:54     ` Moger, Babu
  2022-09-16 15:53   ` Reinette Chatre
  1 sibling, 1 reply; 49+ messages in thread
From: James Morse @ 2022-09-09 17:00 UTC (permalink / raw)
  To: Babu Moger, corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian

Hi Babu,

On 07/09/2022 18:59, Babu Moger wrote:
> AMD systems support zero CBM (capacity bit mask) for L3 allocation.
> That is reflected in rdt_init_res_defs_amd() by:
> 
> 	r->cache.arch_has_empty_bitmaps = true;
> 
> However given the unified code in cbm_validate(), checking for:
> 	val == 0 && !arch_has_empty_bitmaps
> 
> is not enough because of another check in cbm_validate():
> 
> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)

Right, the Intel version had this, but the AMD didn't. I evidently only thought about this
the !arch_has_empty_bitmaps way round! Sorry about that.


> The default value of r->cache.min_cbm_bits = 1.
> 
> Leading to:
> 
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
>           -bash: echo: write error: Invalid argument
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  Need at least 1 bits in the mask
> 
> Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also,
> remove the default setting of min_cbm_bits and initialize it separately.
> 
> After the fix
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  ok
> 
> Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")

> Signed-off-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>

Er, who is the author if this patch? If you are reposting Stephane's patch then there
needs to be a 'From: ' at the top of the email so that git preserves the ownership. You
may need some incantation of "git commit --amend --author=" to fix this in your tree.

As its a fix, have you posted this separately? Mixing fixes and new-code makes it hard for
the maintainer to spot what needs to be taken for the next -rc.


> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index bb1c3f5f60c8..a5c51a14fbce 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.rid			= RDT_RESOURCE_L3,
>  			.name			= "L3",
>  			.cache_level		= 3,
> -			.cache = {
> -				.min_cbm_bits	= 1,
> -			},
>  			.domains		= domain_init(RDT_RESOURCE_L3),
>  			.parse_ctrlval		= parse_cbm,
>  			.format_str		= "%d=%0*x",
> @@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.rid			= RDT_RESOURCE_L2,
>  			.name			= "L2",
>  			.cache_level		= 2,
> -			.cache = {
> -				.min_cbm_bits	= 1,
> -			},
>  			.domains		= domain_init(RDT_RESOURCE_L2),
>  			.parse_ctrlval		= parse_cbm,
>  			.format_str		= "%d=%0*x",
> @@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
>  			r->cache.arch_has_sparse_bitmaps = false;
>  			r->cache.arch_has_empty_bitmaps = false;
>  			r->cache.arch_has_per_cpu_cfg = false;
> +			r->cache.min_cbm_bits = 1;
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
>  			hw_res->msr_update = mba_wrmsr_intel;
> @@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
>  			r->cache.arch_has_sparse_bitmaps = true;
>  			r->cache.arch_has_empty_bitmaps = true;
>  			r->cache.arch_has_per_cpu_cfg = true;
> +			r->cache.min_cbm_bits = 0;
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>  			hw_res->msr_update = mba_wrmsr_amd;
> 
> 


Reviewed-by: James Morse <james.morse@arm.com>

Thanks,

James

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-09 17:00   ` James Morse
@ 2022-09-12 14:54     ` Moger, Babu
  2022-09-16 15:52       ` Reinette Chatre
  0 siblings, 1 reply; 49+ messages in thread
From: Moger, Babu @ 2022-09-12 14:54 UTC (permalink / raw)
  To: James Morse, corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian

Hi James,

On 9/9/22 12:00, James Morse wrote:
> Hi Babu,
>
> On 07/09/2022 18:59, Babu Moger wrote:
>> AMD systems support zero CBM (capacity bit mask) for L3 allocation.
>> That is reflected in rdt_init_res_defs_amd() by:
>>
>> 	r->cache.arch_has_empty_bitmaps = true;
>>
>> However given the unified code in cbm_validate(), checking for:
>> 	val == 0 && !arch_has_empty_bitmaps
>>
>> is not enough because of another check in cbm_validate():
>>
>> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
> Right, the Intel version had this, but the AMD didn't. I evidently only thought about this
> the !arch_has_empty_bitmaps way round! Sorry about that.

>
>
>> The default value of r->cache.min_cbm_bits = 1.
>>
>> Leading to:
>>
>> 	$ cd /sys/fs/resctrl
>> 	$ mkdir foo
>> 	$ cd foo
>> 	$ echo L3:0=0 > schemata
>>           -bash: echo: write error: Invalid argument
>> 	$ cat /sys/fs/resctrl/info/last_cmd_status
>> 	  Need at least 1 bits in the mask
>>
>> Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also,
>> remove the default setting of min_cbm_bits and initialize it separately.
>>
>> After the fix
>> 	$ cd /sys/fs/resctrl
>> 	$ mkdir foo
>> 	$ cd foo
>> 	$ echo L3:0=0 > schemata
>> 	$ cat /sys/fs/resctrl/info/last_cmd_status
>> 	  ok
>>
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20220517001234.3137157-1-eranian%40google.com%2F&amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cc5b955e9726344c550f008da9284dedd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637983396695085653%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=wIVM5%2BrCwfWDpIrLkDtycgoCd4PWMC3D8y%2FAjshIW%2FQ%3D&amp;reserved=0
>> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Er, who is the author if this patch? If you are reposting Stephane's patch then there
> needs to be a 'From: ' at the top of the email so that git preserves the ownership. You

I can add Stephane's name in "From" if it is right thing to do. But this
patch was modified from the original version Stephane posted.

https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/


> may need some incantation of "git commit --amend --author=" to fix this in your tree.
>
> As its a fix, have you posted this separately? Mixing fixes and new-code makes it hard for
> the maintainer to spot what needs to be taken for the next -rc.

ok. I can send this separate in next version.


>
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index bb1c3f5f60c8..a5c51a14fbce 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -66,9 +66,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>>  			.rid			= RDT_RESOURCE_L3,
>>  			.name			= "L3",
>>  			.cache_level		= 3,
>> -			.cache = {
>> -				.min_cbm_bits	= 1,
>> -			},
>>  			.domains		= domain_init(RDT_RESOURCE_L3),
>>  			.parse_ctrlval		= parse_cbm,
>>  			.format_str		= "%d=%0*x",
>> @@ -83,9 +80,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
>>  			.rid			= RDT_RESOURCE_L2,
>>  			.name			= "L2",
>>  			.cache_level		= 2,
>> -			.cache = {
>> -				.min_cbm_bits	= 1,
>> -			},
>>  			.domains		= domain_init(RDT_RESOURCE_L2),
>>  			.parse_ctrlval		= parse_cbm,
>>  			.format_str		= "%d=%0*x",
>> @@ -877,6 +871,7 @@ static __init void rdt_init_res_defs_intel(void)
>>  			r->cache.arch_has_sparse_bitmaps = false;
>>  			r->cache.arch_has_empty_bitmaps = false;
>>  			r->cache.arch_has_per_cpu_cfg = false;
>> +			r->cache.min_cbm_bits = 1;
>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>  			hw_res->msr_base = MSR_IA32_MBA_THRTL_BASE;
>>  			hw_res->msr_update = mba_wrmsr_intel;
>> @@ -897,6 +892,7 @@ static __init void rdt_init_res_defs_amd(void)
>>  			r->cache.arch_has_sparse_bitmaps = true;
>>  			r->cache.arch_has_empty_bitmaps = true;
>>  			r->cache.arch_has_per_cpu_cfg = true;
>> +			r->cache.min_cbm_bits = 0;
>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>  			hw_res->msr_update = mba_wrmsr_amd;
>>
>>
>
> Reviewed-by: James Morse <james.morse@arm.com>


Thank You.

Babu


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-12 14:54     ` Moger, Babu
@ 2022-09-16 15:52       ` Reinette Chatre
  2022-09-16 18:28         ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:52 UTC (permalink / raw)
  To: babu.moger, James Morse, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian

Hi Babu,

On 9/12/2022 7:54 AM, Moger, Babu wrote:
> On 9/9/22 12:00, James Morse wrote:
>> On 07/09/2022 18:59, Babu Moger wrote:


>>> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
>>> Signed-off-by: Stephane Eranian <eranian@google.com>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> Er, who is the author if this patch? If you are reposting Stephane's patch then there
>> needs to be a 'From: ' at the top of the email so that git preserves the ownership. You
> 
> I can add Stephane's name in "From" if it is right thing to do. But this
> patch was modified from the original version Stephane posted.
> 
> https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/

True, but also please consider what Stephane posted originally:
https://lore.kernel.org/lkml/20220516055055.2734840-1-eranian@google.com/

A "Co-developed-by" may help with attribution: 

Co-developed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
  2022-09-09 17:00   ` James Morse
@ 2022-09-16 15:53   ` Reinette Chatre
  2022-09-16 18:31     ` Moger, Babu
  1 sibling, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:53 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 10:59 AM, Babu Moger wrote:
> AMD systems support zero CBM (capacity bit mask) for L3 allocation.

Above mentions "for L3 allocation", but the change impacts
L3 as well as L2 allocation. Perhaps just
"cache allocation"?

> That is reflected in rdt_init_res_defs_amd() by:
> 
> 	r->cache.arch_has_empty_bitmaps = true;
> 
> However given the unified code in cbm_validate(), checking for:
> 	val == 0 && !arch_has_empty_bitmaps
> 
> is not enough because of another check in cbm_validate():
> 
> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
> 
> The default value of r->cache.min_cbm_bits = 1.
> 
> Leading to:
> 
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
>           -bash: echo: write error: Invalid argument
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  Need at least 1 bits in the mask
> 
> Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also,
> remove the default setting of min_cbm_bits and initialize it separately.
> 
> After the fix
> 	$ cd /sys/fs/resctrl
> 	$ mkdir foo
> 	$ cd foo
> 	$ echo L3:0=0 > schemata
> 	$ cat /sys/fs/resctrl/info/last_cmd_status
> 	  ok
> 
> Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
> Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
> Signed-off-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>

(apart from the changelog nitpick)

Thank you for clarifying the way forward for this fix.

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps
  2022-09-07 18:00 ` [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps Babu Moger
@ 2022-09-16 15:53   ` Reinette Chatre
  2022-09-16 19:00     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:53 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:00 AM, Babu Moger wrote:
> The field arch_has_empty_bitmaps is not required anymore. The field
> min_cbm_bits is enough to validate the CBM (capacity bit mask) if the
> architecture can support the zero CBM or not.
> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Thank you.

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-09-07 18:00 ` [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-09-16 15:54   ` Reinette Chatre
  2022-09-16 19:02     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:54 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:00 AM, Babu Moger wrote:
> Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS

Adds -> Add

> enforcement policies can be applied to external slow memory connected
> to the host. QOS enforcement is accomplished by assigning a Class Of
> Service (COS) to a processor and specifying allocations or limits for
> that COS for each resource to be allocated.
> 
> This feature is identified by the CPUID Function 8000_0020_EBX_x0.
> 
> CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers (ECX=0)
> Bits    Field Name      Description
> 2       L3SBE           L3 external slow memory bandwidth enforcement
> 
> Currently, CXL.memory is the only supported "slow" memory device. With the
> support of SMBA feature the hardware enables bandwidth allocation on the
> slow memory devices. If there are multiple slow memory devices in the system,
> then the throttling logic groups all the slow sources together and applies
> the limit on them as a whole.

The above is a useful addition (made in patch 13/13) to the documentation ...

> 
> The presence of the SMBA feature(with CXL.memory) is independent of whether
> slow memory device is actually present in the system. If there is no slow
> memory in the system, then setting a SMBA limit will have no impact on the
> performance of the system.

... could the above snippet also please be added to the documentation?

> 
> Presence of CXL memory can be identified by numactl command.
> 
> $numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
> node 0 size: 63678 MB node 0 free: 59542 MB
> node 1 cpus:
> node 1 size: 16122 MB
> node 1 free: 15627 MB
> node distances:
> node   0   1
>    0:  10  50
>    1:  50  10
> 
> CPU list for CXL memory will be emply. The cpu-cxl node distance is greater

emply -> empty?

> than cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL
> memory can also be identified using ACPI SRAT table and memory maps.
> 
> Feature description is available in the specification, "AMD64 Technology Platform Quality
> of Service Extensions, Revision: 1.03 Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Please shorten these lines to the recommended 75 chars per line.

> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 235dc85c91c3..1815435c9c88 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -304,6 +304,7 @@
>  #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
>  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
>  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
> +#define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */

Why is "SLOW" in all caps? 

>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index fd44b54c90d5..885ecf46abb2 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> +	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };
> 
> 

Could you please follow the coding style (wrt tabs vs. spaces) of the area you
are contributing to here? Please do so in all patches in this series.

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-09-07 18:00 ` [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-09-16 15:54   ` Reinette Chatre
  2022-09-16 19:11     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:54 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:00 AM, Babu Moger wrote:
> Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS

(nitpick) Adds -> Add

> enforcement policies on the external slow memory.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Apart from nitpick this looks good.

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-09-07 18:00 ` [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
  2022-09-07 18:36   ` Daniel Sneddon
@ 2022-09-16 15:55   ` Reinette Chatre
  2022-09-16 20:19     ` Moger, Babu
  1 sibling, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:55 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:00 AM, Babu Moger wrote:
> Newer AMD processors support the new feature Bandwidth Monitoring Event
> Configuration (BMEC).
> 
> Support of this feature is available via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> Bits    Field Name       Description
> 3        EVT_CFG         Bandwidth Monitoring Event Configuration (BMEC)
> 
> Currently, the bandwidth monitoring events mbm_total_bytes and
> mbm_local_bytes are set to count all the total and local reads/writes
> respectively. With the introduction of SLOW memory, the two counters are

Why is SLOW in caps?

> not enough to count all the different types of memory events. With the
> feature BMEC, the users have the option to configure mbm_total_bytes and
> mbm_local_bytes to count the specific type of events.
> 
> Each BMEC event has a configuration MSR, QOS_EVT_CFG (0x000_0400h +
> EventID) which contains one field for each Bandwidth Type that can be used
> to configure the bandwidth event to track any combination of supported
> bandwidth types. The event will count requests from every Bandwidth Type
> bit that is set in the corresponding configuration register.
> 
> Following are the types of events supported:
> 
> ====    ========================================================
> Bits    Description
> ====    ========================================================
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> ====    ========================================================
> 
> By default, the mbm_total_bytes configuration is set to 0x7F to count
> all the event types and the mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> Feature description is available in the specification, "AMD64 Technology
> Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
> Revision: 1.03 Issue Date: February 2022".
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 1815435c9c88..a4ee02a56d54 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -305,6 +305,7 @@
>  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
>  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
>  #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory Bandwidth Allocation */
> +#define X86_FEATURE_BMEC		(11*32+18) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */

(numbering issue has already been discussed)

>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 885ecf46abb2..7981df0b910e 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
> +	{ X86_FEATURE_BMEC,             CPUID_EBX,  3, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };
> 
> 

Similar to previous - please use same coding style as area being changed.

Is there a feature dependency (cpuid_deps[]) on X86_FEATURE_CQM_LLC needed?

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 06/13] x86/resctrl: Include new features in command line options
  2022-09-07 18:00 ` [PATCH v4 06/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-09-16 15:55   ` Reinette Chatre
  2022-09-16 20:22     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:55 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:00 AM, Babu Moger wrote:
> Add the command line options to disable the new features.
> smba : Slow Memory Bandwidth Allocation
> mbec : Bandwidth Monitor Event Configuration.

mbec -> bmec ?

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration
  2022-09-07 18:00 ` [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration Babu Moger
@ 2022-09-16 15:56   ` Reinette Chatre
  2022-09-16 21:23     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:56 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

(Please watch for the stray space in the subject line before the ":")

On 9/7/2022 11:00 AM, Babu Moger wrote:
> Add couple of fields in mon_evt to support Bandwidth Monitoring Event
> Configuratio (BMEC) and also update the "mon_features".
> 
> The sysfs file "mon_features" will display the monitor configuration if
> supported.
> 
> Before the change.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_local_bytes
> 
> After the change if BMEC is supported.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_total_config
> 	mbm_local_bytes
> 	mbm_local_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

...

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c049a274383c..45923eb4022f 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -72,11 +72,15 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> + * @configurable:	true if the event is configurable
> + * @config_name:	sysfs file name of the event if configurable
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	u32			evtid;
>  	char			*name;
> +	bool 			configurable;
> +	char			*config_name;
>  	struct list_head	list;
>  };

Please ensure there is no spaces before tabs - this is
a checkpatch failure. Running this series through checkpatch.pl
encounters several formatting issues. Could you please
run this series through "checkpatch.pl --strict --codespell"
before the next submission? The warnings related to code where you
are following the existing style need not be addressed, but the
"spaces before tabs" like above, unnecessary empty lines, 
alignment issues, spelling issues ... please address those.


Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-07 18:01 ` [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
@ 2022-09-16 15:58   ` Reinette Chatre
  2022-09-19 15:46     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:58 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:01 AM, Babu Moger wrote:
> Add two new sysfs files to read/write the event configuration if
> the feature Bandwidth Monitoring Event Configuration (BMEC) is
> supported. The file mbm_local_config is for the configuration
> of the event mbm_local_bytes and the file mbm_total_config is
> for the configuration of mbm_total_bytes.
> 
> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> 
> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> 

This patch makes the mbm*config files per monitor group. Looking
ahead at later patches how the configuration is set it is not clear
to me that this is the right place for these configuration files.

Looking ahead to patch 10 there is neither rmid nor closid within
the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes
the bits indicating what access types needs to be counted. Also
in patch 10 I understand that the scope of this register is per L3 cache
domain.

Considering this, why is the sysfs file associated with each
monitor group?

For example, consider the following scenario:
# cd /sys/fs/resctrl
# mkdir g2
# mkdir mon_groups/m1
# mkdir mon_groups/m2
# find . | grep mbm_local_config
./mon_data/mon_L3_00/mbm_local_config
./mon_data/mon_L3_01/mbm_local_config
./g2/mon_data/mon_L3_00/mbm_local_config
./g2/mon_data/mon_L3_01/mbm_local_config
./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config


From what I understand, the following sysfs files are
associated with cache domain #0 and thus writing to any of these
files would change the same configuration:
./mon_data/mon_L3_00/mbm_local_config
./g2/mon_data/mon_L3_00/mbm_local_config
./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config

Could you please correct me where I am wrong?


> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   40 ++++++++++++++++++++++++--------
>  1 file changed, 30 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f55a693fa958..da11fdad204d 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
>  	.seq_show		= rdtgroup_mondata_show,
>  };
>  
> +static const struct kernfs_ops kf_mondata_config_ops = {
> +	.atomic_write_len       = PAGE_SIZE,
> +};
> +

Please use coding style (tabs vs spaces) that is consistent with area
you are contributing to.

>  static bool is_cpu_list(struct kernfs_open_file *of)
>  {
>  	struct rftype *rft = of->kn->priv;
> @@ -2478,24 +2482,40 @@ static struct file_system_type rdt_fs_type = {
>  	.kill_sb		= rdt_kill_sb,
>  };
>  
> -static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
> +static int mon_addfile(struct kernfs_node *parent_kn, struct mon_evt *mevt,
>  		       void *priv)
>  {
> -	struct kernfs_node *kn;
> +	struct kernfs_node *kn_evt, *kn_evt_config;
>  	int ret = 0;
>  
> -	kn = __kernfs_create_file(parent_kn, name, 0444,
> -				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
> -				  &kf_mondata_ops, priv, NULL, NULL);
> -	if (IS_ERR(kn))
> -		return PTR_ERR(kn);
> +	kn_evt = __kernfs_create_file(parent_kn, mevt->name, 0444,
> +			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
> +			&kf_mondata_ops, priv, NULL, NULL);

Please run your series through checkpatch (alignment issue above)

> +	if (IS_ERR(kn_evt))
> +		return PTR_ERR(kn_evt);
>  
> -	ret = rdtgroup_kn_set_ugid(kn);
> +	ret = rdtgroup_kn_set_ugid(kn_evt);
>  	if (ret) {
> -		kernfs_remove(kn);
> +		kernfs_remove(kn_evt);
>  		return ret;
>  	}
>  
> +	if (mevt->configurable) {
> +		kn_evt_config = __kernfs_create_file(parent_kn,
> +				mevt->config_name, 0644,
> +				GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
> +				&kf_mondata_config_ops, priv, NULL, NULL);
> +		if (IS_ERR(kn_evt_config))
> +			return PTR_ERR(kn_evt_config);
> +

Since an error is returned here it seems that some cleanup (kn_evt) is missing?


> +		ret = rdtgroup_kn_set_ugid(kn_evt_config);
> +		if (ret) {
> +			kernfs_remove(kn_evt_config);
> +			kernfs_remove(kn_evt);
> +			return ret;
> +		}
> +	}
> +
>  	return ret;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-09-07 18:01 ` [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the " Babu Moger
@ 2022-09-16 15:59   ` Reinette Chatre
  2022-09-19 16:07     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 15:59 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:01 AM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the sysfs configuration file.
> 
> Following are the types of events supported:
> 
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 

...

> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 45923eb4022f..96f439324d78 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -15,6 +15,7 @@
>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
>  #define MSR_IA32_QM_CTR			0x0c8e
>  #define MSR_IA32_QM_EVTSEL		0x0c8d
> @@ -50,6 +51,29 @@
>   */
>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>  
> +/* Reads to Local DRAM Memory */

What prompted the terminology switch between the
changelog ("local NUMA")  and the comments ("DRAM")?

> +#define READS_TO_LOCAL_MEM		BIT(0)
> +
> +/* Reads to Remote DRAM Memory */
> +#define READS_TO_REMOTE_MEM		BIT(1)
> +
> +/* Non-Temporal Writes to Local Memory */
> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
> +
> +/* Non-Temporal Writes to Remote Memory */
> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
> +
> +/* Reads to Local Memory the system identifies as "Slow Memory" */
> +#define READS_TO_LOCAL_S_MEM		BIT(4)
> +
> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
> +#define READS_TO_REMOTE_S_MEM		BIT(5)
> +
> +/* Dirty Victims to All Types of Memory */
> +#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
> +

Could you please fixup the comments to only capitalize 
the first word of each sentence (unless it is an acronym
or required for some other reason)?

> +/* Max event bits supported */
> +#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
>  
>  struct rdt_fs_context {
>  	struct kernfs_fs_context	kfc;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index da11fdad204d..6f067c1ac7c1 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -254,8 +254,85 @@ static const struct kernfs_ops kf_mondata_ops = {
>  	.seq_show		= rdtgroup_mondata_show,
>  };
>  
> +struct mon_config_info {
> +	u32 evtid;
> +	u32 mon_config;
> +};
> +
> +/*
> + * This is called via IPI to read the CQM/MBM counters
> + * in a domain.

This comment does not seem accurate - it is not reading the
actual counters but the configuration of the counters?

> + */
> +void mon_event_config_read(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 h, msr_index;
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		/* Not expected to come here */
> +		return;
> +	}
> +
> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
> +}
> +
> +void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> +{
> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> +}
> +
> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
> +{
> +	struct kernfs_open_file *of = m->private;
> +	struct mon_config_info mon_info;

Could you please initialize this struct? I think this is important considering
that there is an (albeit unlikely) chance that uninitialized data can be returned
to user space.

> +	struct rdt_hw_resource *hw_res;
> +	u32 resid, evtid, domid;
> +	struct rdtgroup *rdtgrp;
> +	struct rdt_resource *r;
> +	union mon_data_bits md;
> +	struct rdt_domain *d;
> +	int ret = 0;
> +
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	md.priv = of->kn->priv;
> +	resid = md.u.rid;
> +	domid = md.u.domid;
> +	evtid = md.u.evtid;
> +
> +	hw_res = &rdt_resources_all[resid];
> +	r = &hw_res->r_resctrl;
> +
> +	d = rdt_find_domain(r, domid, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	mon_info.evtid = evtid;
> +	mondata_config_read(d, &mon_info);
> +
> +	seq_printf(m, "0x%x\n", mon_info.mon_config);
> +
> +out:
> +	rdtgroup_kn_unlock(of->kn);
> +	return ret;
> +}
> +
>  static const struct kernfs_ops kf_mondata_config_ops = {
>  	.atomic_write_len       = PAGE_SIZE,
> +	.seq_show               = rdtgroup_mondata_config_show,
>  };
  
>  static bool is_cpu_list(struct kernfs_open_file *of)
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write the event configuration
  2022-09-07 18:01 ` [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write " Babu Moger
@ 2022-09-16 16:17   ` Reinette Chatre
  2022-09-19 16:50     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 16:17 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:01 AM, Babu Moger wrote:

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6f067c1ac7c1..59b484eb1267 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -330,9 +330,121 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>  	return ret;
>  }
>  
> +/*
> + * This is called via IPI to read the CQM/MBM counters
> + * in a domain.

copy&paste from previous patch?

> + */
> +void mon_event_config_write(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 msr_index;
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		/* Not expected to come here */
> +		return;
> +	}
> +
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
> +}
> +
> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes, loff_t off)
> +{
> +	struct mon_config_info mon_info;
> +	struct rdt_hw_resource *hw_res;
> +	struct rdtgroup *rdtgrp;
> +	struct rdt_resource *r;
> +	unsigned int mon_config;
> +	cpumask_var_t cpu_mask;
> +	union mon_data_bits md;
> +	struct rdt_domain *d;
> +	u32 resid, domid;
> +	int ret = 0, cpu;
> +
> +	ret = kstrtouint(buf, 0, &mon_config);
> +	if (ret)
> +		return ret;
> +
> +	rdt_last_cmd_clear();
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (mon_config > MAX_EVT_CONFIG_BITS) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	cpus_read_lock();
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		return -ENOENT;
> +		goto e_unlock;
> +	}
> +
> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
> +		ret = -ENOMEM;
> +		goto e_unlock;
> +	}
> +
> +	md.priv = of->kn->priv;
> +	resid = md.u.rid;
> +	domid = md.u.domid;
> +
> +	hw_res = &rdt_resources_all[resid];
> +	r = &hw_res->r_resctrl;
> +	d = rdt_find_domain(r, domid, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		ret = -ENOENT;
> +		goto e_cpumask;
> +	}
> +
> +	/*
> +	 * Read the current config value first. If both are same
> +	 * then we dont need to write again
> +	 */
> +	mon_info.evtid = md.u.evtid;
> +	mondata_config_read(d, &mon_info);
> +	if (mon_info.mon_config == mon_config)
> +		goto e_cpumask;
> +
> +	mon_info.mon_config = mon_config;
> +
> +	/* Pick all the CPUs in the domain instance */
> +	for_each_cpu(cpu, &d->cpu_mask)
> +		cpumask_set_cpu(cpu, cpu_mask);
> +
> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all the CPUs in cpu_mask */
> +	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);

If this is required then could you please add a comment why every CPU in
the domain needs to be updated? Until now configuration changes
only needed to be made on one CPU per domain. 
Even in the previous patch when the user reads the current configuration
value it is only done on one CPU in the domain ... to me that implies
that the scope is per domain and only one CPU in the domain needs to be
changed.

> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
> +	 * RMID will be set on the next read to any BwEvent for every RMID.
> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
> +	 */

This is a snippet that was copied from the hardware spec and since it is
inserted into the kernel driver the context makes it hard to understand.
Could it be translated into what it means in this context?
Perhaps something like:

	/*
	 * When an Event Configuration is changed, the bandwidth counters
	 * for all RMIDs and Events will be cleared by the hardware. The
         * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for every
	 * RMID on the next read to any event for every RMID. Subsequent
	 * reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) cleared
	 * while it is tracked by the hardware.
	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
	 */

Please fixup where I got it wrong.

> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask
  2022-09-07 18:01 ` [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask Babu Moger
@ 2022-09-16 16:17   ` Reinette Chatre
  2022-09-16 20:38     ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-16 16:17 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/7/2022 11:01 AM, Babu Moger wrote:
> The call on_each_cpu_mask can run the function on each CPU specified by

parenthesis are used to indicate functions, in this case, please write
"on_each_cpu_mask()" and "smp_call_function_many()" instead.

> cpumask, which may include the local processor. So, replace the call
> smp_call_function_many with on_each_cpu_mask to simplify the code.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-16 15:52       ` Reinette Chatre
@ 2022-09-16 18:28         ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 18:28 UTC (permalink / raw)
  To: Reinette Chatre, James Morse, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:53 AM
> To: Moger, Babu <Babu.Moger@amd.com>; James Morse
> <james.morse@arm.com>; corbet@lwn.net; tglx@linutronix.de;
> mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org; bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
> 
> Hi Babu,
> 
> On 9/12/2022 7:54 AM, Moger, Babu wrote:
> > On 9/9/22 12:00, James Morse wrote:
> >> On 07/09/2022 18:59, Babu Moger wrote:
> 
> 
> >>> Fixes: 316e7f901f5a ("x86/resctrl: Add struct
> >>> rdt_cache::arch_has_{sparse, empty}_bitmaps")
> >>> Signed-off-by: Stephane Eranian <eranian@google.com>
> >>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >> Er, who is the author if this patch? If you are reposting Stephane's
> >> patch then there needs to be a 'From: ' at the top of the email so
> >> that git preserves the ownership. You
> >
> > I can add Stephane's name in "From" if it is right thing to do. But
> > this patch was modified from the original version Stephane posted.
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Flkml%2F20220517001234.3137157-1-
> eranian%40google.com%2F&
> >
> amp;data=05%7C01%7Cbabu.moger%40amd.com%7C69a28ad3fcfe444404a50
> 8da97fb
> >
> 82ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6379894037788
> 16201%7
> >
> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> 6Ik1
> >
> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=OnLKz6cBiypVvv1x
> 8PSv1JUz
> > ugilG1Gpwgkcz55ydqI%3D&amp;reserved=0
> 
> True, but also please consider what Stephane posted originally:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2F20220516055055.2734840-1-
> eranian%40google.com%2F&amp;data=05%7C01%7Cbabu.moger%40amd.com
> %7C69a28ad3fcfe444404a508da97fb82ee%7C3dd8961fe4884e608e11a82d994
> e183d%7C0%7C0%7C637989403778816201%7CUnknown%7CTWFpbGZsb3d8ey
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C3000%7C%7C%7C&amp;sdata=2mTjDYC9B%2Fr6HR6SlwToWXewsWub2rfPQp
> c9JIkETus%3D&amp;reserved=0
> 
> A "Co-developed-by" may help with attribution:
> 
> Co-developed-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> 
Yes, Sounds good to me.
Thanks
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
  2022-09-16 15:53   ` Reinette Chatre
@ 2022-09-16 18:31     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 18:31 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:53 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD
> 
> Hi Babu,
> 
> On 9/7/2022 10:59 AM, Babu Moger wrote:
> > AMD systems support zero CBM (capacity bit mask) for L3 allocation.
> 
> Above mentions "for L3 allocation", but the change impacts
> L3 as well as L2 allocation. Perhaps just "cache allocation"?
> 
> > That is reflected in rdt_init_res_defs_amd() by:
> >
> > 	r->cache.arch_has_empty_bitmaps = true;
> >
> > However given the unified code in cbm_validate(), checking for:
> > 	val == 0 && !arch_has_empty_bitmaps
> >
> > is not enough because of another check in cbm_validate():
> >
> > 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
> >
> > The default value of r->cache.min_cbm_bits = 1.
> >
> > Leading to:
> >
> > 	$ cd /sys/fs/resctrl
> > 	$ mkdir foo
> > 	$ cd foo
> > 	$ echo L3:0=0 > schemata
> >           -bash: echo: write error: Invalid argument
> > 	$ cat /sys/fs/resctrl/info/last_cmd_status
> > 	  Need at least 1 bits in the mask
> >
> > Fix the issue by initializing the min_cbm_bits to 0 for AMD. Also,
> > remove the default setting of min_cbm_bits and initialize it separately.
> >
> > After the fix
> > 	$ cd /sys/fs/resctrl
> > 	$ mkdir foo
> > 	$ cd foo
> > 	$ echo L3:0=0 > schemata
> > 	$ cat /sys/fs/resctrl/info/last_cmd_status
> > 	  ok
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Flkml%2F20220517001234.3137157-1-
> eranian%40google.com%2F&
> >
> amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cfdccc20e6a234fb3872a0
> 8da97fb
> >
> 94fc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6379894040498
> 44076%7
> >
> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> 6Ik1
> >
> haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=DIpkBAkI7lAjj5QQ4
> 5VahssT
> > dWGj%2FcUwGJDiHXNYzz8%3D&amp;reserved=0
> > Fixes: 316e7f901f5a ("x86/resctrl: Add struct
> > rdt_cache::arch_has_{sparse, empty}_bitmaps")
> > Signed-off-by: Stephane Eranian <eranian@google.com>
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Reviewed-by: Ingo Molnar <mingo@kernel.org>
> 
> (apart from the changelog nitpick)
> 
> Thank you for clarifying the way forward for this fix.
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thank you
Babu Moger

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps
  2022-09-16 15:53   ` Reinette Chatre
@ 2022-09-16 19:00     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 19:00 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:54 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps
> 
> Hi Babu,
> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > The field arch_has_empty_bitmaps is not required anymore. The field
> > min_cbm_bits is enough to validate the CBM (capacity bit mask) if the
> > architecture can support the zero CBM or not.
> >
> > Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> 
> Thank you.
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thank You.
Babu Moger

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-09-16 15:54   ` Reinette Chatre
@ 2022-09-16 19:02     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 19:02 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:54 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth
> Allocation feature flag
> 
> Hi Babu,
> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > Adds the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
> 
> Adds -> Add

Sure

> 
> > enforcement policies can be applied to external slow memory connected
> > to the host. QOS enforcement is accomplished by assigning a Class Of
> > Service (COS) to a processor and specifying allocations or limits for
> > that COS for each resource to be allocated.
> >
> > This feature is identified by the CPUID Function 8000_0020_EBX_x0.
> >
> > CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature
> Identifiers (ECX=0)
> > Bits    Field Name      Description
> > 2       L3SBE           L3 external slow memory bandwidth enforcement
> >
> > Currently, CXL.memory is the only supported "slow" memory device. With
> > the support of SMBA feature the hardware enables bandwidth allocation
> > on the slow memory devices. If there are multiple slow memory devices
> > in the system, then the throttling logic groups all the slow sources
> > together and applies the limit on them as a whole.
> 
> The above is a useful addition (made in patch 13/13) to the documentation ...
> 
> >
> > The presence of the SMBA feature(with CXL.memory) is independent of
> > whether slow memory device is actually present in the system. If there
> > is no slow memory in the system, then setting a SMBA limit will have
> > no impact on the performance of the system.
> 
> ... could the above snippet also please be added to the documentation?
> 

Ok Sure.

> >
> > Presence of CXL memory can be identified by numactl command.
> >
> > $numactl -H
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 node 0 size:
> > 63678 MB node 0 free: 59542 MB node 1 cpus:
> > node 1 size: 16122 MB
> > node 1 free: 15627 MB
> > node distances:
> > node   0   1
> >    0:  10  50
> >    1:  50  10
> >
> > CPU list for CXL memory will be emply. The cpu-cxl node distance is
> > greater
> 
> emply -> empty?

ok
> 
> > than cpu-to-cpu distances. Node 1 has the CXL memory in this case. CXL
> > memory can also be identified using ACPI SRAT table and memory maps.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03 Publication
> # 56375 Revision: 1.03 Issue Date: February 2022".
> 
> Please shorten these lines to the recommended 75 chars per line.

Sure
> 
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C60553f32
> >
> e9ab4ed1219c08da97fbc048%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C
> >
> 637989404770663129%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=WY4H
> > EzWHdMpMUUR%2FBnBupwdHuQ6O2RfdrfcGx4TkXfI%3D&amp;reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7C60553f32e9ab4ed1219c08da97fbc048%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C637989404770663129%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=7e7sGH0iaEW8mlst5mK3fn9wy%2FYRhDU%2BbBm
> PWzSrGL4%3
> > D&amp;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index 235dc85c91c3..1815435c9c88 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -304,6 +304,7 @@
> >  #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB
> untrain return */
> >  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB
> during runtime firmware calls */
> >  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on
> VM exit when EIBRS is enabled */
> > +#define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory
> Bandwidth Allocation */
> 
> Why is "SLOW" in all caps?

Will correct it.

> 
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index fd44b54c90d5..885ecf46abb2
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> > +	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >  };
> >
> >
> 
> Could you please follow the coding style (wrt tabs vs. spaces) of the area you
> are contributing to here? Please do so in all patches in this series.

Sure. Thanks
Babu Moger

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-09-16 15:54   ` Reinette Chatre
@ 2022-09-16 19:11     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 19:11 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:55 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 04/13] x86/resctrl: Add a new resource type
> RDT_RESOURCE_SMBA
> 
> Hi Babu,
> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > Adds a new resource type RDT_RESOURCE_SMBA to handle the QoS
> 
> (nitpick) Adds -> Add
> 
> > enforcement policies on the external slow memory.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> 
> Apart from nitpick this looks good.

Thank you
Babu Moger

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-09-16 15:55   ` Reinette Chatre
@ 2022-09-16 20:19     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 20:19 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:55 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring
> Event Configuration feature flag
> 
> Hi Babu,
> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > Newer AMD processors support the new feature Bandwidth Monitoring
> > Event Configuration (BMEC).
> >
> > Support of this feature is available via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> > Bits    Field Name       Description
> > 3        EVT_CFG         Bandwidth Monitoring Event Configuration (BMEC)
> >
> > Currently, the bandwidth monitoring events mbm_total_bytes and
> > mbm_local_bytes are set to count all the total and local reads/writes
> > respectively. With the introduction of SLOW memory, the two counters
> > are
> 
> Why is SLOW in caps?

Will fix it.
> 
> > not enough to count all the different types of memory events. With the
> > feature BMEC, the users have the option to configure mbm_total_bytes
> > and mbm_local_bytes to count the specific type of events.
> >
> > Each BMEC event has a configuration MSR, QOS_EVT_CFG (0x000_0400h +
> > EventID) which contains one field for each Bandwidth Type that can be
> > used to configure the bandwidth event to track any combination of
> > supported bandwidth types. The event will count requests from every
> > Bandwidth Type bit that is set in the corresponding configuration register.
> >
> > Following are the types of events supported:
> >
> > ====    ========================================================
> > Bits    Description
> > ====    ========================================================
> > 6       Dirty Victims from the QOS domain to all types of memory
> > 5       Reads to slow memory in the non-local NUMA domain
> > 4       Reads to slow memory in the local NUMA domain
> > 3       Non-temporal writes to non-local NUMA domain
> > 2       Non-temporal writes to local NUMA domain
> > 1       Reads to memory in the non-local NUMA domain
> > 0       Reads to memory in the local NUMA domain
> > ====    ========================================================
> >
> > By default, the mbm_total_bytes configuration is set to 0x7F to count
> > all the event types and the mbm_local_bytes configuration is set to
> > 0x15 to count all the local memory events.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03
> > Publication # 56375
> > Revision: 1.03 Issue Date: February 2022".
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C5ba4a608
> >
> edb04202be2708da97fbe2df%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C
> >
> 637989405343884737%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=hoyj
> > zeINRtp1n%2FWsG0MHxzmQ8aLdDV4V03xlJVsNRv8%3D&amp;reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7C5ba4a608edb04202be2708da97fbe2df%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C637989405343884737%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=n0gR1mooP9QuEzhrvXrvPaateof4XrBABrAoK5N8e8
> o%3D&am
> > p;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Reviewed-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index 1815435c9c88..a4ee02a56d54 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -305,6 +305,7 @@
> >  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB
> during runtime firmware calls */
> >  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on
> VM exit when EIBRS is enabled */
> >  #define X86_FEATURE_SMBA		(11*32+18) /* SLOW Memory
> Bandwidth Allocation */
> > +#define X86_FEATURE_BMEC		(11*32+18) /* AMD
> Bandwidth Monitoring Event Configuration (BMEC) */
> 
> (numbering issue has already been discussed)

Yes. 
> 
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index 885ecf46abb2..7981df0b910e
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> >  	{ X86_FEATURE_SMBA,             CPUID_EBX,  2, 0x80000020, 0 },
> > +	{ X86_FEATURE_BMEC,             CPUID_EBX,  3, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >  };
> >
> >
> 
> Similar to previous - please use same coding style as area being changed.
Sure.
> 
> Is there a feature dependency (cpuid_deps[]) on X86_FEATURE_CQM_LLC
> needed?

Good catch.. Yes. Its needed. Will fix it.
Thanks
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 06/13] x86/resctrl: Include new features in command line options
  2022-09-16 15:55   ` Reinette Chatre
@ 2022-09-16 20:22     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 20:22 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:55 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 06/13] x86/resctrl: Include new features in command
> line options
> 
> Hi Babu,
> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > Add the command line options to disable the new features.
> > smba : Slow Memory Bandwidth Allocation mbec : Bandwidth Monitor Event
> > Configuration.
> 
> mbec -> bmec ?

Yes. Thanks
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask
  2022-09-16 16:17   ` Reinette Chatre
@ 2022-09-16 20:38     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 20:38 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 11:18 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many
> with on_each_cpu_mask
> 
> Hi Babu,
> 
> On 9/7/2022 11:01 AM, Babu Moger wrote:
> > The call on_each_cpu_mask can run the function on each CPU specified
> > by
> 
> parenthesis are used to indicate functions, in this case, please write
> "on_each_cpu_mask()" and "smp_call_function_many()" instead.
> 
Sure. Thanks 
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration
  2022-09-16 15:56   ` Reinette Chatre
@ 2022-09-16 21:23     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-16 21:23 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, September 16, 2022 10:56 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 08/13] x86/resctrl : Introduce data structure to support
> monitor configuration
> 
> Hi Babu,
> 
> (Please watch for the stray space in the subject line before the ":")

Oh. Ok

> 
> On 9/7/2022 11:00 AM, Babu Moger wrote:
> > Add couple of fields in mon_evt to support Bandwidth Monitoring Event
> > Configuratio (BMEC) and also update the "mon_features".
> >
> > The sysfs file "mon_features" will display the monitor configuration
> > if supported.
> >
> > Before the change.
> > 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> > 	llc_occupancy
> > 	mbm_total_bytes
> > 	mbm_local_bytes
> >
> > After the change if BMEC is supported.
> > 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> > 	llc_occupancy
> > 	mbm_total_bytes
> > 	mbm_total_config
> > 	mbm_local_bytes
> > 	mbm_local_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index c049a274383c..45923eb4022f 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -72,11 +72,15 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
> >   * struct mon_evt - Entry in the event list of a resource
> >   * @evtid:		event id
> >   * @name:		name of the event
> > + * @configurable:	true if the event is configurable
> > + * @config_name:	sysfs file name of the event if configurable
> >   * @list:		entry in &rdt_resource->evt_list
> >   */
> >  struct mon_evt {
> >  	u32			evtid;
> >  	char			*name;
> > +	bool 			configurable;
> > +	char			*config_name;
> >  	struct list_head	list;
> >  };
> 
> Please ensure there is no spaces before tabs - this is a checkpatch failure.
> Running this series through checkpatch.pl encounters several formatting issues.
> Could you please run this series through "checkpatch.pl --strict --codespell"
> before the next submission? The warnings related to code where you are
> following the existing style need not be addressed, but the "spaces before tabs"
> like above, unnecessary empty lines, alignment issues, spelling issues ... please
> address those.

Sure. Will do.
Thanks
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-16 15:58   ` Reinette Chatre
@ 2022-09-19 15:46     ` Moger, Babu
  2022-09-19 16:42       ` Reinette Chatre
  0 siblings, 1 reply; 49+ messages in thread
From: Moger, Babu @ 2022-09-19 15:46 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 9/16/22 10:58, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/7/2022 11:01 AM, Babu Moger wrote:
>> Add two new sysfs files to read/write the event configuration if
>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
>> supported. The file mbm_local_config is for the configuration
>> of the event mbm_local_bytes and the file mbm_total_config is
>> for the configuration of mbm_total_bytes.
>>
>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>
>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>
> This patch makes the mbm*config files per monitor group. Looking
> ahead at later patches how the configuration is set it is not clear
> to me that this is the right place for these configuration files.
>
> Looking ahead to patch 10 there is neither rmid nor closid within
> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes
> the bits indicating what access types needs to be counted. Also
> in patch 10 I understand that the scope of this register is per L3 cache
> domain.
Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
>
> Considering this, why is the sysfs file associated with each
> monitor group?
Please see the response below.
>
> For example, consider the following scenario:
> # cd /sys/fs/resctrl
> # mkdir g2
> # mkdir mon_groups/m1
> # mkdir mon_groups/m2
> # find . | grep mbm_local_config
> ./mon_data/mon_L3_00/mbm_local_config
> ./mon_data/mon_L3_01/mbm_local_config
> ./g2/mon_data/mon_L3_00/mbm_local_config
> ./g2/mon_data/mon_L3_01/mbm_local_config
> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
>
>
> From what I understand, the following sysfs files are
> associated with cache domain #0 and thus writing to any of these
> files would change the same configuration:
> ./mon_data/mon_L3_00/mbm_local_config
> ./g2/mon_data/mon_L3_00/mbm_local_config
> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>
> Could you please correct me where I am wrong?

For example, we have CPUs 0-7 in domain 0. We have two counters which are
configurable.

Lets consider same example as your mentioned about.

g2 is a control group.

m1 and m2 are monitor group.

We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth (or
memory bandwidth with required schemata setting).

We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.

We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.

Each group is independently, monitoring two separate thing. Without having
sysfs file (mbm_local_config and mbm_total_config) in each monitor group,
we wont be able to configure the above configuration.


>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   40 ++++++++++++++++++++++++--------
>>  1 file changed, 30 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index f55a693fa958..da11fdad204d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -254,6 +254,10 @@ static const struct kernfs_ops kf_mondata_ops = {
>>  	.seq_show		= rdtgroup_mondata_show,
>>  };
>>  
>> +static const struct kernfs_ops kf_mondata_config_ops = {
>> +	.atomic_write_len       = PAGE_SIZE,
>> +};
>> +
> Please use coding style (tabs vs spaces) that is consistent with area
> you are contributing to.
Sure
>
>>  static bool is_cpu_list(struct kernfs_open_file *of)
>>  {
>>  	struct rftype *rft = of->kn->priv;
>> @@ -2478,24 +2482,40 @@ static struct file_system_type rdt_fs_type = {
>>  	.kill_sb		= rdt_kill_sb,
>>  };
>>  
>> -static int mon_addfile(struct kernfs_node *parent_kn, const char *name,
>> +static int mon_addfile(struct kernfs_node *parent_kn, struct mon_evt *mevt,
>>  		       void *priv)
>>  {
>> -	struct kernfs_node *kn;
>> +	struct kernfs_node *kn_evt, *kn_evt_config;
>>  	int ret = 0;
>>  
>> -	kn = __kernfs_create_file(parent_kn, name, 0444,
>> -				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
>> -				  &kf_mondata_ops, priv, NULL, NULL);
>> -	if (IS_ERR(kn))
>> -		return PTR_ERR(kn);
>> +	kn_evt = __kernfs_create_file(parent_kn, mevt->name, 0444,
>> +			GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
>> +			&kf_mondata_ops, priv, NULL, NULL);
> Please run your series through checkpatch (alignment issue above)
Sure
>
>> +	if (IS_ERR(kn_evt))
>> +		return PTR_ERR(kn_evt);
>>  
>> -	ret = rdtgroup_kn_set_ugid(kn);
>> +	ret = rdtgroup_kn_set_ugid(kn_evt);
>>  	if (ret) {
>> -		kernfs_remove(kn);
>> +		kernfs_remove(kn_evt);
>>  		return ret;
>>  	}
>>  
>> +	if (mevt->configurable) {
>> +		kn_evt_config = __kernfs_create_file(parent_kn,
>> +				mevt->config_name, 0644,
>> +				GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
>> +				&kf_mondata_config_ops, priv, NULL, NULL);
>> +		if (IS_ERR(kn_evt_config))
>> +			return PTR_ERR(kn_evt_config);
>> +
> Since an error is returned here it seems that some cleanup (kn_evt) is missing?

Yes. That is correct.  Will fix it.

Thanks

Babu

>
>
>> +		ret = rdtgroup_kn_set_ugid(kn_evt_config);
>> +		if (ret) {
>> +			kernfs_remove(kn_evt_config);
>> +			kernfs_remove(kn_evt);
>> +			return ret;
>> +		}
>> +	}
>> +
>>  	return ret;
>>  }
>>  
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the event configuration
  2022-09-16 15:59   ` Reinette Chatre
@ 2022-09-19 16:07     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-19 16:07 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,


On 9/16/22 10:59, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/7/2022 11:01 AM, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the sysfs configuration file.
>>
>> Following are the types of events supported:
>>
>> ====  ===========================================================
>> Bits   Description
>> ====  ===========================================================
>> 6      Dirty Victims from the QOS domain to all types of memory
>> 5      Reads to slow memory in the non-local NUMA domain
>> 4      Reads to slow memory in the local NUMA domain
>> 3      Non-temporal writes to non-local NUMA domain
>> 2      Non-temporal writes to local NUMA domain
>> 1      Reads to memory in the non-local NUMA domain
>> 0      Reads to memory in the local NUMA domain
>> ====  ===========================================================
>>
> ...
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 45923eb4022f..96f439324d78 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -15,6 +15,7 @@
>>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  
>>  #define MSR_IA32_QM_CTR			0x0c8e
>>  #define MSR_IA32_QM_EVTSEL		0x0c8d
>> @@ -50,6 +51,29 @@
>>   */
>>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>>  
>> +/* Reads to Local DRAM Memory */
> What prompted the terminology switch between the
> changelog ("local NUMA")  and the comments ("DRAM")?
oh. ok. Will change it.
>
>> +#define READS_TO_LOCAL_MEM		BIT(0)
>> +
>> +/* Reads to Remote DRAM Memory */
>> +#define READS_TO_REMOTE_MEM		BIT(1)
>> +
>> +/* Non-Temporal Writes to Local Memory */
>> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
>> +
>> +/* Non-Temporal Writes to Remote Memory */
>> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
>> +
>> +/* Reads to Local Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_LOCAL_S_MEM		BIT(4)
>> +
>> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_REMOTE_S_MEM		BIT(5)
>> +
>> +/* Dirty Victims to All Types of Memory */
>> +#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
>> +
> Could you please fixup the comments to only capitalize 
> the first word of each sentence (unless it is an acronym
> or required for some other reason)?
Sure.
>
>> +/* Max event bits supported */
>> +#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
>>  
>>  struct rdt_fs_context {
>>  	struct kernfs_fs_context	kfc;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index da11fdad204d..6f067c1ac7c1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -254,8 +254,85 @@ static const struct kernfs_ops kf_mondata_ops = {
>>  	.seq_show		= rdtgroup_mondata_show,
>>  };
>>  
>> +struct mon_config_info {
>> +	u32 evtid;
>> +	u32 mon_config;
>> +};
>> +
>> +/*
>> + * This is called via IPI to read the CQM/MBM counters
>> + * in a domain.
> This comment does not seem accurate - it is not reading the
> actual counters but the configuration of the counters?
Yes, That is correct.
>
>> + */
>> +void mon_event_config_read(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 h, msr_index;
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		/* Not expected to come here */
>> +		return;
>> +	}
>> +
>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
>> +}
>> +
>> +void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
>> +{
>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
>> +}
>> +
>> +int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>> +{
>> +	struct kernfs_open_file *of = m->private;
>> +	struct mon_config_info mon_info;
> Could you please initialize this struct? I think this is important considering
> that there is an (albeit unlikely) chance that uninitialized data can be returned
> to user space.

Sure.

Thanks

Babu

>> +	struct rdt_hw_resource *hw_res;
>> +	u32 resid, evtid, domid;
>> +	struct rdtgroup *rdtgrp;
>> +	struct rdt_resource *r;
>> +	union mon_data_bits md;
>> +	struct rdt_domain *d;
>> +	int ret = 0;
>> +
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	md.priv = of->kn->priv;
>> +	resid = md.u.rid;
>> +	domid = md.u.domid;
>> +	evtid = md.u.evtid;
>> +
>> +	hw_res = &rdt_resources_all[resid];
>> +	r = &hw_res->r_resctrl;
>> +
>> +	d = rdt_find_domain(r, domid, NULL);
>> +	if (IS_ERR_OR_NULL(d)) {
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
>> +
>> +	mon_info.evtid = evtid;
>> +	mondata_config_read(d, &mon_info);
>> +
>> +	seq_printf(m, "0x%x\n", mon_info.mon_config);
>> +
>> +out:
>> +	rdtgroup_kn_unlock(of->kn);
>> +	return ret;
>> +}
>> +
>>  static const struct kernfs_ops kf_mondata_config_ops = {
>>  	.atomic_write_len       = PAGE_SIZE,
>> +	.seq_show               = rdtgroup_mondata_config_show,
>>  };
>   
>>  static bool is_cpu_list(struct kernfs_open_file *of)
>>
>>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-19 15:46     ` Moger, Babu
@ 2022-09-19 16:42       ` Reinette Chatre
  2022-09-19 20:26         ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-19 16:42 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/19/2022 8:46 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 9/16/22 10:58, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 9/7/2022 11:01 AM, Babu Moger wrote:
>>> Add two new sysfs files to read/write the event configuration if
>>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
>>> supported. The file mbm_local_config is for the configuration
>>> of the event mbm_local_bytes and the file mbm_total_config is
>>> for the configuration of mbm_total_bytes.
>>>
>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>>
>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>>
>> This patch makes the mbm*config files per monitor group. Looking
>> ahead at later patches how the configuration is set it is not clear
>> to me that this is the right place for these configuration files.
>>
>> Looking ahead to patch 10 there is neither rmid nor closid within
>> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes
>> the bits indicating what access types needs to be counted. Also
>> in patch 10 I understand that the scope of this register is per L3 cache
>> domain.
> Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
>>
>> Considering this, why is the sysfs file associated with each
>> monitor group?
> Please see the response below.
>>
>> For example, consider the following scenario:
>> # cd /sys/fs/resctrl
>> # mkdir g2
>> # mkdir mon_groups/m1
>> # mkdir mon_groups/m2
>> # find . | grep mbm_local_config
>> ./mon_data/mon_L3_00/mbm_local_config
>> ./mon_data/mon_L3_01/mbm_local_config
>> ./g2/mon_data/mon_L3_00/mbm_local_config
>> ./g2/mon_data/mon_L3_01/mbm_local_config
>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
>>
>>
>> From what I understand, the following sysfs files are
>> associated with cache domain #0 and thus writing to any of these
>> files would change the same configuration:
>> ./mon_data/mon_L3_00/mbm_local_config
>> ./g2/mon_data/mon_L3_00/mbm_local_config
>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>>
>> Could you please correct me where I am wrong?
> 
> For example, we have CPUs 0-7 in domain 0. We have two counters which are
> configurable.
> 
> Lets consider same example as your mentioned about.
> 
> g2 is a control group.
> 
> m1 and m2 are monitor group.
> 
> We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth (or
> memory bandwidth with required schemata setting).
> 
> We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.
> 
> We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.
> 
> Each group is independently, monitoring two separate thing. Without having

Right, because monitoring, the actual counting of the events, is per monitor
group. When a monitor group is created a new RMID is created and when the
counter is read it is per-RMID. 

The event configuration is independent from the RMID using the counter.

> sysfs file (mbm_local_config and mbm_total_config) in each monitor group,
> we wont be able to configure the above configuration.

I do not understand this reasoning. From what I understand the
event configuration is independent from the monitoring group. Thus, changing
an event configuration for one monitoring group would impact all
monitoring groups using that event counter. This implementation associates
an event configuration with each monitoring group and by doing so it
implies that it is unique to the monitoring group, but that is not
how it works.

Reinette









^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write the event configuration
  2022-09-16 16:17   ` Reinette Chatre
@ 2022-09-19 16:50     ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-19 16:50 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian


On 9/16/22 11:17, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/7/2022 11:01 AM, Babu Moger wrote:
>
> ...
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6f067c1ac7c1..59b484eb1267 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -330,9 +330,121 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * This is called via IPI to read the CQM/MBM counters
>> + * in a domain.
> copy&paste from previous patch?
Will correct it.
>
>> + */
>> +void mon_event_config_write(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 msr_index;
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		/* Not expected to come here */
>> +		return;
>> +	}
>> +
>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
>> +}
>> +
>> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
>> +				       char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct mon_config_info mon_info;
>> +	struct rdt_hw_resource *hw_res;
>> +	struct rdtgroup *rdtgrp;
>> +	struct rdt_resource *r;
>> +	unsigned int mon_config;
>> +	cpumask_var_t cpu_mask;
>> +	union mon_data_bits md;
>> +	struct rdt_domain *d;
>> +	u32 resid, domid;
>> +	int ret = 0, cpu;
>> +
>> +	ret = kstrtouint(buf, 0, &mon_config);
>> +	if (ret)
>> +		return ret;
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	/* mon_config cannot be more than the supported set of events */
>> +	if (mon_config > MAX_EVT_CONFIG_BITS) {
>> +		rdt_last_cmd_puts("Invalid event configuration\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	cpus_read_lock();
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp) {
>> +		return -ENOENT;
>> +		goto e_unlock;
>> +	}
>> +
>> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
>> +		ret = -ENOMEM;
>> +		goto e_unlock;
>> +	}
>> +
>> +	md.priv = of->kn->priv;
>> +	resid = md.u.rid;
>> +	domid = md.u.domid;
>> +
>> +	hw_res = &rdt_resources_all[resid];
>> +	r = &hw_res->r_resctrl;
>> +	d = rdt_find_domain(r, domid, NULL);
>> +	if (IS_ERR_OR_NULL(d)) {
>> +		ret = -ENOENT;
>> +		goto e_cpumask;
>> +	}
>> +
>> +	/*
>> +	 * Read the current config value first. If both are same
>> +	 * then we dont need to write again
>> +	 */
>> +	mon_info.evtid = md.u.evtid;
>> +	mondata_config_read(d, &mon_info);
>> +	if (mon_info.mon_config == mon_config)
>> +		goto e_cpumask;
>> +
>> +	mon_info.mon_config = mon_config;
>> +
>> +	/* Pick all the CPUs in the domain instance */
>> +	for_each_cpu(cpu, &d->cpu_mask)
>> +		cpumask_set_cpu(cpu, cpu_mask);
>> +
>> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all the CPUs in cpu_mask */
>> +	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
> If this is required then could you please add a comment why every CPU in
> the domain needs to be updated? Until now configuration changes
> only needed to be made on one CPU per domain. 
> Even in the previous patch when the user reads the current configuration
> value it is only done on one CPU in the domain ... to me that implies
> that the scope is per domain and only one CPU in the domain needs to be
> changed.

Yes, This register is domain specific. However the hardware team
recommends it to update on all the CPU threads. It is not clear in the
specs right now.

I have asked them to make that clear in the specs next time.

>
>> +
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
>> +	 * RMID will be set on the next read to any BwEvent for every RMID.
>> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
> This is a snippet that was copied from the hardware spec and since it is
> inserted into the kernel driver the context makes it hard to understand.
> Could it be translated into what it means in this context?
> Perhaps something like:
>
> 	/*
> 	 * When an Event Configuration is changed, the bandwidth counters
> 	 * for all RMIDs and Events will be cleared by the hardware. The
>          * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for every
> 	 * RMID on the next read to any event for every RMID. Subsequent
> 	 * reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) cleared
> 	 * while it is tracked by the hardware.
> 	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
> 	 */
>
> Please fixup where I got it wrong.

Looks good.

Thanks Babu

>
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-19 16:42       ` Reinette Chatre
@ 2022-09-19 20:26         ` Moger, Babu
  2022-09-19 21:07           ` Reinette Chatre
  0 siblings, 1 reply; 49+ messages in thread
From: Moger, Babu @ 2022-09-19 20:26 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 9/19/22 11:42, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/19/2022 8:46 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 9/16/22 10:58, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 9/7/2022 11:01 AM, Babu Moger wrote:
>>>> Add two new sysfs files to read/write the event configuration if
>>>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
>>>> supported. The file mbm_local_config is for the configuration
>>>> of the event mbm_local_bytes and the file mbm_total_config is
>>>> for the configuration of mbm_total_bytes.
>>>>
>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>>>
>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>>>
>>> This patch makes the mbm*config files per monitor group. Looking
>>> ahead at later patches how the configuration is set it is not clear
>>> to me that this is the right place for these configuration files.
>>>
>>> Looking ahead to patch 10 there is neither rmid nor closid within
>>> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes
>>> the bits indicating what access types needs to be counted. Also
>>> in patch 10 I understand that the scope of this register is per L3 cache
>>> domain.
>> Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
>>> Considering this, why is the sysfs file associated with each
>>> monitor group?
>> Please see the response below.
>>> For example, consider the following scenario:
>>> # cd /sys/fs/resctrl
>>> # mkdir g2
>>> # mkdir mon_groups/m1
>>> # mkdir mon_groups/m2
>>> # find . | grep mbm_local_config
>>> ./mon_data/mon_L3_00/mbm_local_config
>>> ./mon_data/mon_L3_01/mbm_local_config
>>> ./g2/mon_data/mon_L3_00/mbm_local_config
>>> ./g2/mon_data/mon_L3_01/mbm_local_config
>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>>> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>>> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
>>>
>>>
>>> From what I understand, the following sysfs files are
>>> associated with cache domain #0 and thus writing to any of these
>>> files would change the same configuration:
>>> ./mon_data/mon_L3_00/mbm_local_config
>>> ./g2/mon_data/mon_L3_00/mbm_local_config
>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>>>
>>> Could you please correct me where I am wrong?
>> For example, we have CPUs 0-7 in domain 0. We have two counters which are
>> configurable.
>>
>> Lets consider same example as your mentioned about.
>>
>> g2 is a control group.
>>
>> m1 and m2 are monitor group.
>>
>> We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth (or
>> memory bandwidth with required schemata setting).
>>
>> We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.
>>
>> We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.
>>
>> Each group is independently, monitoring two separate thing. Without having
> Right, because monitoring, the actual counting of the events, is per monitor
> group. When a monitor group is created a new RMID is created and when the
> counter is read it is per-RMID. 
>
> The event configuration is independent from the RMID using the counter.
>
>> sysfs file (mbm_local_config and mbm_total_config) in each monitor group,
>> we wont be able to configure the above configuration.
> I do not understand this reasoning. From what I understand the
> event configuration is independent from the monitoring group. Thus, changing
> an event configuration for one monitoring group would impact all
> monitoring groups using that event counter. This implementation associates
> an event configuration with each monitoring group and by doing so it
> implies that it is unique to the monitoring group, but that is not
> how it works.

The event configuration is designed per L3 domain. The mon_data is also
per domain (like mon_L3_00.. mon_L3_01 etc). So, added the event
configuration file inside each domain. We have all the information inside
the domain. Thought, that is right place. I am open for suggestions.

Thanks

Babu



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-19 20:26         ` Moger, Babu
@ 2022-09-19 21:07           ` Reinette Chatre
  2022-09-20 17:09             ` Moger, Babu
  0 siblings, 1 reply; 49+ messages in thread
From: Reinette Chatre @ 2022-09-19 21:07 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 9/19/2022 1:26 PM, Moger, Babu wrote:
> On 9/19/22 11:42, Reinette Chatre wrote:
>> On 9/19/2022 8:46 AM, Moger, Babu wrote:
>>> On 9/16/22 10:58, Reinette Chatre wrote:
>>>> On 9/7/2022 11:01 AM, Babu Moger wrote:
>>>>> Add two new sysfs files to read/write the event configuration if
>>>>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
>>>>> supported. The file mbm_local_config is for the configuration
>>>>> of the event mbm_local_bytes and the file mbm_total_config is
>>>>> for the configuration of mbm_total_bytes.
>>>>>
>>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
>>>>>
>>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
>>>>>
>>>> This patch makes the mbm*config files per monitor group. Looking
>>>> ahead at later patches how the configuration is set it is not clear
>>>> to me that this is the right place for these configuration files.
>>>>
>>>> Looking ahead to patch 10 there is neither rmid nor closid within
>>>> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes
>>>> the bits indicating what access types needs to be counted. Also
>>>> in patch 10 I understand that the scope of this register is per L3 cache
>>>> domain.
>>> Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
>>>> Considering this, why is the sysfs file associated with each
>>>> monitor group?
>>> Please see the response below.
>>>> For example, consider the following scenario:
>>>> # cd /sys/fs/resctrl
>>>> # mkdir g2
>>>> # mkdir mon_groups/m1
>>>> # mkdir mon_groups/m2
>>>> # find . | grep mbm_local_config
>>>> ./mon_data/mon_L3_00/mbm_local_config
>>>> ./mon_data/mon_L3_01/mbm_local_config
>>>> ./g2/mon_data/mon_L3_00/mbm_local_config
>>>> ./g2/mon_data/mon_L3_01/mbm_local_config
>>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>>>> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
>>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>>>> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
>>>>
>>>>
>>>> From what I understand, the following sysfs files are
>>>> associated with cache domain #0 and thus writing to any of these
>>>> files would change the same configuration:
>>>> ./mon_data/mon_L3_00/mbm_local_config
>>>> ./g2/mon_data/mon_L3_00/mbm_local_config
>>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
>>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
>>>>
>>>> Could you please correct me where I am wrong?
>>> For example, we have CPUs 0-7 in domain 0. We have two counters which are
>>> configurable.
>>>
>>> Lets consider same example as your mentioned about.
>>>
>>> g2 is a control group.
>>>
>>> m1 and m2 are monitor group.
>>>
>>> We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth (or
>>> memory bandwidth with required schemata setting).
>>>
>>> We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.
>>>
>>> We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.
>>>
>>> Each group is independently, monitoring two separate thing. Without having
>> Right, because monitoring, the actual counting of the events, is per monitor
>> group. When a monitor group is created a new RMID is created and when the
>> counter is read it is per-RMID. 
>>
>> The event configuration is independent from the RMID using the counter.
>>
>>> sysfs file (mbm_local_config and mbm_total_config) in each monitor group,
>>> we wont be able to configure the above configuration.
>> I do not understand this reasoning. From what I understand the
>> event configuration is independent from the monitoring group. Thus, changing
>> an event configuration for one monitoring group would impact all
>> monitoring groups using that event counter. This implementation associates
>> an event configuration with each monitoring group and by doing so it
>> implies that it is unique to the monitoring group, but that is not
>> how it works.
> 
> The event configuration is designed per L3 domain. The mon_data is also
> per domain (like mon_L3_00.. mon_L3_01 etc). So, added the event
> configuration file inside each domain. We have all the information inside
> the domain. Thought, that is right place. I am open for suggestions.

It is not clear to me if you are also seeing all the duplication that
accompanies this implementation. As you can see in the example I provided in
https://lore.kernel.org/lkml/13294a8f-e76f-a6a9-284c-67adbc80ec7c@intel.com/,
if I understand the implementation correctly, there will be several
configuration files scattered through resctrl that all configure the same
value. I asked you to correct me where I am wrong but you did not correct me.
Instead you keep repeating that placing the files in the duplicate locations
is convenient. I can see how this is convenient for you but please do consider
that having these duplicate configuration files scattered through resctrl
makes for a very confusing user interface and unexpected behavior. Users
would expect that a configuration associated with a monitor group impacts
that monitor group only - not all monitor groups associated with that
domain.

User API is hard so this does need careful thought. Perhaps the architects
can chime in here.

One option could be:
# cd /sys/fs/resctrl/info/L3_MON
# cat mbm_total_config
0=7f;1=7f
# cat mbm_local_config
0=15;1=15

It would be clear when changing mem_total_config or mbm_local_config that
it would impact all monitoring groups within all resource groups. What do
you think?

Reinette

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration
  2022-09-19 21:07           ` Reinette Chatre
@ 2022-09-20 17:09             ` Moger, Babu
  0 siblings, 0 replies; 49+ messages in thread
From: Moger, Babu @ 2022-09-20 17:09 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Monday, September 19, 2022 4:07 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to
> read/write event configuration
> 
> Hi Babu,
> 
> On 9/19/2022 1:26 PM, Moger, Babu wrote:
> > On 9/19/22 11:42, Reinette Chatre wrote:
> >> On 9/19/2022 8:46 AM, Moger, Babu wrote:
> >>> On 9/16/22 10:58, Reinette Chatre wrote:
> >>>> On 9/7/2022 11:01 AM, Babu Moger wrote:
> >>>>> Add two new sysfs files to read/write the event configuration if
> >>>>> the feature Bandwidth Monitoring Event Configuration (BMEC) is
> >>>>> supported. The file mbm_local_config is for the configuration of
> >>>>> the event mbm_local_bytes and the file mbm_total_config is for the
> >>>>> configuration of mbm_total_bytes.
> >>>>>
> >>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local*
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> >>>>>
> >>>>> $ls /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total*
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> >>>>> /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> >>>>>
> >>>> This patch makes the mbm*config files per monitor group. Looking
> >>>> ahead at later patches how the configuration is set it is not clear
> >>>> to me that this is the right place for these configuration files.
> >>>>
> >>>> Looking ahead to patch 10 there is neither rmid nor closid within
> >>>> the (MSR_IA32_EVT_CFG_BASE + index) register - it only takes the
> >>>> bits indicating what access types needs to be counted. Also in
> >>>> patch 10 I understand that the scope of this register is per L3
> >>>> cache domain.
> >>> Yes. Scope of  MSR_IA32_EVT_CFG_BASE per L3 domain.
> >>>> Considering this, why is the sysfs file associated with each
> >>>> monitor group?
> >>> Please see the response below.
> >>>> For example, consider the following scenario:
> >>>> # cd /sys/fs/resctrl
> >>>> # mkdir g2
> >>>> # mkdir mon_groups/m1
> >>>> # mkdir mon_groups/m2
> >>>> # find . | grep mbm_local_config
> >>>> ./mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_data/mon_L3_01/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_01/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_01/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_01/mbm_local_config
> >>>>
> >>>>
> >>>> From what I understand, the following sysfs files are associated
> >>>> with cache domain #0 and thus writing to any of these files would
> >>>> change the same configuration:
> >>>> ./mon_data/mon_L3_00/mbm_local_config
> >>>> ./g2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m2/mon_data/mon_L3_00/mbm_local_config
> >>>> ./mon_groups/m1/mon_data/mon_L3_00/mbm_local_config
> >>>>
> >>>> Could you please correct me where I am wrong?
> >>> For example, we have CPUs 0-7 in domain 0. We have two counters
> >>> which are configurable.
> >>>
> >>> Lets consider same example as your mentioned about.
> >>>
> >>> g2 is a control group.
> >>>
> >>> m1 and m2 are monitor group.
> >>>
> >>> We can have control group g2 with CPUs 0-7 to limit the L3 bandwidth
> >>> (or memory bandwidth with required schemata setting).
> >>>
> >>> We can have mon group m1 with cpus 0-3 to monitor mbm_local_bytes.
> >>>
> >>> We can have mon group m2 with cpus  4-7 to monitor mbm_total_bytes.
> >>>
> >>> Each group is independently, monitoring two separate thing. Without
> >>> having
> >> Right, because monitoring, the actual counting of the events, is per
> >> monitor group. When a monitor group is created a new RMID is created
> >> and when the counter is read it is per-RMID.
> >>
> >> The event configuration is independent from the RMID using the counter.
> >>
> >>> sysfs file (mbm_local_config and mbm_total_config) in each monitor
> >>> group, we wont be able to configure the above configuration.
> >> I do not understand this reasoning. From what I understand the event
> >> configuration is independent from the monitoring group. Thus,
> >> changing an event configuration for one monitoring group would impact
> >> all monitoring groups using that event counter. This implementation
> >> associates an event configuration with each monitoring group and by
> >> doing so it implies that it is unique to the monitoring group, but
> >> that is not how it works.
> >
> > The event configuration is designed per L3 domain. The mon_data is
> > also per domain (like mon_L3_00.. mon_L3_01 etc). So, added the event
> > configuration file inside each domain. We have all the information
> > inside the domain. Thought, that is right place. I am open for suggestions.
> 
> It is not clear to me if you are also seeing all the duplication that accompanies
> this implementation. As you can see in the example I provided in
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2F13294a8f-e76f-a6a9-284c-
> 67adbc80ec7c%40intel.com%2F&amp;data=05%7C01%7Cbabu.moger%40amd.
> com%7Cc22190a25ac044ec5f5408da9a82f5b7%7C3dd8961fe4884e608e11a82
> d994e183d%7C0%7C0%7C637992184504699692%7CUnknown%7CTWFpbGZsb3
> d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C3000%7C%7C%7C&amp;sdata=uPuGOlwaIgwJ9VnwNOS%2B4mUrqJnS65
> OdrEsEXtztUbU%3D&amp;reserved=0,
> if I understand the implementation correctly, there will be several
> configuration files scattered through resctrl that all configure the same value. I
> asked you to correct me where I am wrong but you did not correct me.
> Instead you keep repeating that placing the files in the duplicate locations is
> convenient. I can see how this is convenient for you but please do consider that
> having these duplicate configuration files scattered through resctrl makes for a
> very confusing user interface and unexpected behavior. Users would expect
> that a configuration associated with a monitor group impacts that monitor
> group only - not all monitor groups associated with that domain.
> 
> User API is hard so this does need careful thought. Perhaps the architects can
> chime in here.
> 
> One option could be:
> # cd /sys/fs/resctrl/info/L3_MON
> # cat mbm_total_config
> 0=7f;1=7f
> # cat mbm_local_config
> 0=15;1=15

I think this should work. 
# cat mbm_total_config
0=7f;1=7f
I would think 0 and 1 are domain ids here.

We have to provide interface to write also.
#echo "0=0x70" > mbm_total_config  (update mbm_total_config  for domain 0)
#echo 1=0x10  > mbm_local_config     (update mbm_local_config  for domain 1)

We will have to parse the string and update the specific domains.

> 
> It would be clear when changing mem_total_config or mbm_local_config that
> it would impact all monitoring groups within all resource groups. What do you
> think?

Yes. Thank you. It should work. As long and we have the ways to modify(and read) the specific L3 domains then it should be fine. Let me start on that. Will reply if I see any major issues.
Thanks
Babu

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-09-20 17:09 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07 17:59 [PATCH v4 00/13] x86/resctrl: Support for AMD QoS new features and bug fix Babu Moger
2022-09-07 17:59 ` [PATCH v4 01/13] x86/resctrl: Fix min_cbm_bits for AMD Babu Moger
2022-09-09 17:00   ` James Morse
2022-09-12 14:54     ` Moger, Babu
2022-09-16 15:52       ` Reinette Chatre
2022-09-16 18:28         ` Moger, Babu
2022-09-16 15:53   ` Reinette Chatre
2022-09-16 18:31     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 02/13] x86/resctrl: Remove arch_has_empty_bitmaps Babu Moger
2022-09-16 15:53   ` Reinette Chatre
2022-09-16 19:00     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 03/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
2022-09-16 15:54   ` Reinette Chatre
2022-09-16 19:02     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 04/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
2022-09-16 15:54   ` Reinette Chatre
2022-09-16 19:11     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 05/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
2022-09-07 18:36   ` Daniel Sneddon
2022-09-07 19:59     ` Moger, Babu
2022-09-16 15:55   ` Reinette Chatre
2022-09-16 20:19     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 06/13] x86/resctrl: Include new features in command line options Babu Moger
2022-09-16 15:55   ` Reinette Chatre
2022-09-16 20:22     ` Moger, Babu
2022-09-07 18:00 ` [PATCH v4 07/13] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
2022-09-07 18:00 ` [PATCH v4 08/13] x86/resctrl : Introduce data structure to support monitor configuration Babu Moger
2022-09-16 15:56   ` Reinette Chatre
2022-09-16 21:23     ` Moger, Babu
2022-09-07 18:01 ` [PATCH v4 09/13] x86/resctrl: Add sysfs interface files to read/write event configuration Babu Moger
2022-09-16 15:58   ` Reinette Chatre
2022-09-19 15:46     ` Moger, Babu
2022-09-19 16:42       ` Reinette Chatre
2022-09-19 20:26         ` Moger, Babu
2022-09-19 21:07           ` Reinette Chatre
2022-09-20 17:09             ` Moger, Babu
2022-09-07 18:01 ` [PATCH v4 10/13] x86/resctrl: Add the sysfs interface to read the " Babu Moger
2022-09-16 15:59   ` Reinette Chatre
2022-09-19 16:07     ` Moger, Babu
2022-09-07 18:01 ` [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write " Babu Moger
2022-09-16 16:17   ` Reinette Chatre
2022-09-19 16:50     ` Moger, Babu
2022-09-07 18:01 ` [PATCH v4 12/13] x86/resctrl: Replace smp_call_function_many with on_each_cpu_mask Babu Moger
2022-09-16 16:17   ` Reinette Chatre
2022-09-16 20:38     ` Moger, Babu
2022-09-07 18:01 ` [PATCH v4 13/13] Documentation/x86: Update resctrl_ui.rst for new features Babu Moger
2022-09-08  4:07   ` Bagas Sanjaya
2022-09-08  9:26     ` Bagas Sanjaya
2022-09-08 13:40       ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).