linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features
@ 2022-10-17 22:25 Babu Moger
  2022-10-17 22:26 ` [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:25 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

New AMD processors can now support following QoS features.

1. Slow Memory Bandwidth Allocation (SMBA)
   With this feature, the QOS enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

   Currently, CXL.memory is the only supported "slow" memory device. With
   the support of SMBA feature the hardware enables bandwidth allocation
   on the slow memory devices.

2. Bandwidth Monitoring Event Configuration (BMEC)
   The bandwidth monitoring events mbm_total_event and mbm_local_event 
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not enough
   to count all the different types are memory events. With the feature
   BMEC, the users have the option to configure mbm_total_event and
   mbm_local_event to count the specific type of events.

   Following are the bitmaps of events supported.
   Bits    Description
     6       Dirty Victims from the QOS domain to all types of memory
     5       Reads to slow memory in the non-local NUMA domain
     4       Reads to slow memory in the local NUMA domain
     3       Non-temporal writes to non-local NUMA domain
     2       Non-temporal writes to local NUMA domain
     1       Reads to memory in the non-local NUMA domain
     0       Reads to memory in the local NUMA domain

This series adds support for these features.

Feature description is available in the specification, "AMD64 Technology Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v7:
 Changes:
 Not much of a change. Missed one comment from Reinette from v5. Corrected it now.
 Few format corrections from Sanjaya.

v6:
 https://lore.kernel.org/lkml/166543345606.23830.3120625408601531368.stgit@bmoger-ubuntu/
 Summary of changes:
 1. Rebased on top of lastest tip tree. Fixed few minor conflicts.
 2. Fixed format issue with scattered.c.
 3. Removed config_name from the structure mon_evt. It is not required.
 4. The read/write format for mbm_total_config and mbm_local_config will be same
    as schemata format "id0=val0;id1=val1;...". This is comment from Fenghua.
 5. Added more comments MSR_IA32_EVT_CFG_BASE writng.
 5. Few text changes in resctrl.rst 
 
v5:
  https://lore.kernel.org/lkml/166431016617.373387.1968875281081252467.stgit@bmoger-ubuntu/
  Summary of changes.
  1. Split the series into two. The first two patches are bug fixes. So, sent them separate.
  2. The config files mbm_total_config and mbm_local_config are now under
     /sys/fs/resctrl/info/L3_MON/. Removed these config files from mon groups.
  3. Ran "checkpatch --strict --codespell" on all the patches. Looks good with few known exceptions.
  4. Few minor text changes in resctrl.rst file. 

v4:
  https://lore.kernel.org/lkml/166257348081.1043018.11227924488792315932.stgit@bmoger-ubuntu/
  Got numerios of comments from Reinette Chatre. Addressed most of them. 
  Summary of changes.
  1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.  
  2. Updated mon_features texts if the BMEC is supported.
  3. Added more explanation about the slow memory support.
  4. Replaced smp_call_function_many with on_each_cpu_mask call.
  5. Removed arch_has_empty_bitmaps
  6. Few other text changes.
  7. Removed Reviewed-by if the patch is modified.
  8. Rebased the patches to latest tip.

v3:
  https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest tip. Resolved some conflicts.
     https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  b. Taken care of feedback from Bagas Sanjaya.
  c. Added Reviewed by from Mingo.
  Note: I am still looking for comments from Reinette or Fenghua.

v2:
  https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
  b. Added the patch to fix CBM issue on AMD. This was originally discussed
     https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/

v1:
  https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/

Babu Moger (12):
      x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
      x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
      x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
      x86/resctrl: Include new features in command line options
      x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
      x86/resctrl: Introduce data structure to support monitor configuration
      x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration
      x86/resctrl: Add sysfs interface to read mbm_local_bytes event configuration
      x86/resctrl: Add sysfs interface to write mbm_total_bytes event configuration
      x86/resctrl: Add sysfs interface to write mbm_local_bytes event configuration
      x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
      Documentation/x86: Update resctrl.rst for new features


 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/x86/resctrl.rst                 | 139 +++++++-
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/cpu/resctrl/core.c            |  51 ++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  31 +-
 arch/x86/kernel/cpu/resctrl/monitor.c         |   7 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 313 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   2 +
 10 files changed, 513 insertions(+), 37 deletions(-)

--


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-27 18:49   ` Reinette Chatre
  2022-10-17 22:26 ` [PATCH v7 02/12] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
enforcement policies can be applied to external slow memory connected
to the host. QOS enforcement is accomplished by assigning a Class Of
Service (COS) to a processor and specifying allocations or limits for
that COS for each resource to be allocated.

This feature is identified by the CPUID Function 8000_0020_EBX_x0.

CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers
(ECX=0)

Bits    Field Name      Description
2       L3SBE           L3 external slow memory bandwidth enforcement

Currently, CXL.memory is the only supported "slow" memory device. With
the support of SMBA feature, the hardware enables bandwidth allocation
on the slow memory devices. If there are multiple slow memory devices
in the system, then the throttling logic groups all the slow sources
together and applies the limit on them as a whole.

The presence of the SMBA feature(with CXL.memory) is independent of
whether slow memory device is actually present in the system. If there
is no slow memory in the system, then setting a SMBA limit will have no
impact on the performance of the system.

Presence of CXL memory can be identified by numactl command.

$numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB node 0 free: 59542 MB
node 1 cpus:
node 1 size: 16122 MB
node 1 free: 15627 MB
node distances:
node   0   1
   0:  10  50
   1:  50  10

CPU list for CXL memory will be empty. The cpu-cxl node distance is
greater than cpu-to-cpu distances. Node 1 has the CXL memory in this
case. CXL memory can also be identified using ACPI SRAT table and
memory maps.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..583b88bcc7e1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -304,6 +304,7 @@
 #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
+#define X86_FEATURE_SMBA		(11*32+18) /* Slow Memory Bandwidth Allocation */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index fc01f81f6e2a..5a5f17ed69a2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 02/12] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
  2022-10-17 22:26 ` [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-17 22:26 ` [PATCH v7 03/12] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
enforcement policies on the external slow memory.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index de62b0b87ced..efffce716f3a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -106,6 +106,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_SMBA] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_SMBA,
+			.name			= "SMBA",
+			.cache_level		= 3,
+			.domains		= domain_init(RDT_RESOURCE_SMBA),
+			.parse_ctrlval		= parse_bw,
+			.format_str		= "%d=%*u",
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5f7128686cfd..43d9f6f5a931 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -419,6 +419,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_SMBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 03/12] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
  2022-10-17 22:26 ` [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
  2022-10-17 22:26 ` [PATCH v7 02/12] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-17 22:26 ` [PATCH v7 04/12] x86/resctrl: Include new features in command line options Babu Moger
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
Bits    Field Name    Description
3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)

Currently, the bandwidth monitoring events mbm_total_bytes and
mbm_local_bytes are set to count all the total and local reads/writes
respectively. With the introduction of slow memory, the two counters
are not enough to count all the different types of memory events. With
the feature BMEC, the users have the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of
events.

Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
EventID) which contains one field for each bandwidth type that can be
used to configure the bandwidth event to track any combination of
supported bandwidth types. The event will count requests from every
bandwidth type bit that is set in the corresponding configuration
register.

Following are the types of events supported:

====    ========================================================
Bits    Description
====    ========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 583b88bcc7e1..bf0fd022e80a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -305,6 +305,7 @@
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
 #define X86_FEATURE_SMBA		(11*32+18) /* Slow Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC		(11*32+19) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..4555f9596ccf 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 5a5f17ed69a2..67c4d24e06ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
+	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 04/12] x86/resctrl: Include new features in command line options
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (2 preceding siblings ...)
  2022-10-17 22:26 ` [PATCH v7 03/12] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-17 22:26 ` [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the command line options to disable the new features.
smba : Slow Memory Bandwidth Allocation
bmec : Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a465d5242774..f3f0870144fb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5190,7 +5190,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba.
+			mba, smba, bmec.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index efffce716f3a..c7561c613209 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -665,6 +665,8 @@ enum {
 	RDT_FLAG_L2_CAT,
 	RDT_FLAG_L2_CDP,
 	RDT_FLAG_MBA,
+	RDT_FLAG_SMBA,
+	RDT_FLAG_BMEC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -688,6 +690,8 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
 	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
+	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
+	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (3 preceding siblings ...)
  2022-10-17 22:26 ` [PATCH v7 04/12] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-25 23:43   ` Reinette Chatre
  2022-10-17 22:26 ` [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        |   29 +++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |   16 ++++++++++------
 4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c7561c613209..d79f494a4e91 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -231,9 +231,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	union cpuid_0x10_3_eax eax;
 	union cpuid_0x10_x_edx edx;
-	u32 ebx, ecx;
+	u32 ebx, ecx, subleaf;
+
+	/*
+	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
+	 * CPUID_Fn80000020_EDX_x02 for SMBA
+	 */
+	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
 
-	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
+	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
 	hw_res->num_closid = edx.split.cos_max + 1;
 	r->default_ctrl = MAX_MBA_BW_AMD;
 
@@ -756,6 +762,19 @@ static __init bool get_mem_config(void)
 	return false;
 }
 
+static __init bool get_slow_mem_config(void)
+{
+	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
+
+	if (!rdt_cpu_has(X86_FEATURE_SMBA))
+		return false;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
+
+	return false;
+}
+
 static __init bool get_rdt_alloc_resources(void)
 {
 	struct rdt_resource *r;
@@ -786,6 +805,9 @@ static __init bool get_rdt_alloc_resources(void)
 	if (get_mem_config())
 		ret = true;
 
+	if (get_slow_mem_config())
+		ret = true;
+
 	return ret;
 }
 
@@ -875,6 +897,9 @@ static __init void rdt_init_res_defs_amd(void)
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;
+		} else if (r->rid == RDT_RESOURCE_SMBA) {
+			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
+			hw_res->msr_update = mba_wrmsr_amd;
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 1dafbdc5ac31..42e2ef6fbdb8 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -210,7 +210,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	unsigned long dom_id;
 
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    r->rid == RDT_RESOURCE_MBA) {
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 43d9f6f5a931..16e3c6e03c79 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -14,6 +14,7 @@
 #define MSR_IA32_L2_CBM_BASE		0xd10
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
+#define MSR_IA32_SMBA_BW_BASE		0xc0000280
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e5a48f05e787..1271fd1ae2f3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1213,7 +1213,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA)
+		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->domains, list) {
@@ -1402,7 +1402,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 					ctrl = resctrl_arch_get_config(r, d,
 								       closid,
 								       type);
-				if (r->rid == RDT_RESOURCE_MBA)
+				if (r->rid == RDT_RESOURCE_MBA ||
+				    r->rid == RDT_RESOURCE_SMBA)
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA) {
+		if (r->rid == RDT_RESOURCE_MBA ||
+		    r->rid == RDT_RESOURCE_SMBA) {
 			rdtgroup_init_mba(r, rdtgrp->closid);
 			if (is_mba_sc(r))
 				continue;
@@ -3287,7 +3289,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
 {
 	lockdep_assert_held(&rdtgroup_mutex);
 
-	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
+	if (supports_mba_mbps() &&
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA))
 		mba_sc_domain_destroy(r, d);
 
 	if (!r->mon_capable)
@@ -3354,8 +3357,9 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
-	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
-		/* RDT_RESOURCE_MBA is never mon_capable */
+	if (supports_mba_mbps() &&
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_MBA))
+		/* RDT_RESOURCE_MBA (or SMBA) is never mon_capable */
 		return mba_sc_domain_allocate(r, d);
 
 	if (!r->mon_capable)



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (4 preceding siblings ...)
  2022-10-17 22:26 ` [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-25 23:45   ` Reinette Chatre
  2022-10-17 22:26 ` [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration Babu Moger
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add a new field in mon_evt to support Bandwidth Monitoring Event
Configuration(BMEC) and also update the "mon_features" display.

The sysfs file "mon_features" will display the monitor configuration
if supported.

Before the change.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_local_bytes

After the change if BMEC is supported.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_total_config
	mbm_local_bytes
	mbm_local_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |    3 ++-
 arch/x86/kernel/cpu/resctrl/internal.h |    4 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c  |    7 ++++++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index d79f494a4e91..46813b1c50c2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -814,6 +814,7 @@ static __init bool get_rdt_alloc_resources(void)
 static __init bool get_rdt_mon_resources(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
 		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
@@ -825,7 +826,7 @@ static __init bool get_rdt_mon_resources(void)
 	if (!rdt_mon_features)
 		return false;
 
-	return !rdt_get_mon_l3_config(r);
+	return !rdt_get_mon_l3_config(r, mon_configurable);
 }
 
 static __init void __check_quirks_intel(void)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 16e3c6e03c79..b458f768f30c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
+ * @configurable:	true if the event is configurable
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
 	char			*name;
+	bool			configurable;
 	struct list_head	list;
 };
 
@@ -522,7 +524,7 @@ int closids_supported(void);
 void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
-int rdt_get_mon_l3_config(struct rdt_resource *r);
+int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index efe0c30d3a12..4b8adb7f1c5c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
-int rdt_get_mon_l3_config(struct rdt_resource *r)
+int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -783,6 +783,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (ret)
 		return ret;
 
+	if (configurable) {
+		mbm_total_event.configurable = true;
+		mbm_local_event.configurable = true;
+	}
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1271fd1ae2f3..5f0ef1bf4c78 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list)
+	list_for_each_entry(mevt, &r->evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
+		if (mevt->configurable)
+			seq_printf(seq, "%s_config\n", mevt->name);
+	}
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (5 preceding siblings ...)
  2022-10-17 22:26 ` [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-10-17 22:26 ` Babu Moger
  2022-10-25 23:47   ` Reinette Chatre
  2022-10-17 22:27 ` [PATCH v7 08/12] x86/resctrl: Add sysfs interface to read mbm_local_bytes " Babu Moger
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:26 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_total_bytes configuration is set to 0x7f to count
all the event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
    0=0x7f;1=0x7f;2=0x7f;3=0x7f

    In this case, the event mbm_total_bytes is currently configured
    with 0x7f on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |    3 +
 arch/x86/kernel/cpu/resctrl/internal.h |    2 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   76 ++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 46813b1c50c2..758c5d7553a4 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -826,6 +826,9 @@ static __init bool get_rdt_mon_resources(void)
 	if (!rdt_mon_features)
 		return false;
 
+	if (mon_configurable)
+		mbm_config_rftype_init();
+
 	return !rdt_get_mon_l3_config(r, mon_configurable);
 }
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index b458f768f30c..326a1b582f38 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -15,6 +15,7 @@
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
@@ -541,5 +542,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init thread_throttle_mode_init(void);
+void __init mbm_config_rftype_init(void);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 5f0ef1bf4c78..0982845594d0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1423,6 +1423,67 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+struct mon_config_info {
+	u32 evtid;
+	u32 mon_config;
+};
+
+static void mon_event_config_read(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 h, msr_index;
+
+	switch (mon_info->evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		/* Not expected to come here */
+		return;
+	}
+
+	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
+}
+
+static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+{
+	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
+}
+
+static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
+{
+	struct mon_config_info mon_info = {0};
+	struct rdt_domain *dom;
+	bool sep = false;
+
+	list_for_each_entry(dom, &r->domains, list) {
+		if (sep)
+			seq_puts(s, ";");
+
+		mon_info.evtid = evtid;
+		mondata_config_read(dom, &mon_info);
+
+		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
+		sep = true;
+	}
+	seq_puts(s, "\n");
+
+	return 0;
+}
+
+static int mbm_total_config_show(struct kernfs_open_file *of,
+				 struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1521,6 +1582,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= max_threshold_occ_show,
 		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
 	},
+	{
+		.name		= "mbm_total_config",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_total_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
@@ -1627,6 +1694,15 @@ void __init thread_throttle_mode_init(void)
 	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
 }
 
+void __init mbm_config_rftype_init(void)
+{
+	struct rftype *rft;
+
+	rft = rdtgroup_get_rftype_by_name("mbm_total_config");
+	if (rft)
+		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
+}
+
 /**
  * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
  * @r: The resource group with which the file is associated.



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 08/12] x86/resctrl: Add sysfs interface to read mbm_local_bytes event configuration
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (6 preceding siblings ...)
  2022-10-17 22:26 ` [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration Babu Moger
@ 2022-10-17 22:27 ` Babu Moger
  2022-10-17 22:27 ` [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes " Babu Moger
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:27 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_config.
The event configuration settings are domain specific and will affect
all the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_local_bytes configuration is set to 0x15 to count
all the local event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
    0=0x15;1=0x15;2=0x15;3=0x15

    In this case the event mbm_local_bytes is currently configured with
    0x15 on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0982845594d0..305fb0475970 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1484,6 +1484,16 @@ static int mbm_total_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int mbm_local_config_show(struct kernfs_open_file *of,
+				 struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1588,6 +1598,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_config_show,
 	},
+	{
+		.name		= "mbm_local_config",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_local_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
@@ -1701,6 +1717,10 @@ void __init mbm_config_rftype_init(void)
 	rft = rdtgroup_get_rftype_by_name("mbm_total_config");
 	if (rft)
 		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
+
+	rft = rdtgroup_get_rftype_by_name("mbm_local_config");
+	if (rft)
+		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
 }
 
 /**



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes event configuration
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (7 preceding siblings ...)
  2022-10-17 22:27 ` [PATCH v7 08/12] x86/resctrl: Add sysfs interface to read mbm_local_bytes " Babu Moger
@ 2022-10-17 22:27 ` Babu Moger
  2022-10-25 23:48   ` Reinette Chatre
  2022-10-17 22:27 ` [PATCH v7 10/12] x86/resctrl: Add sysfs interface to write mbm_local_bytes " Babu Moger
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:27 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be changed by the user by writing
to the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
command.
	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_config

To change the mbm_total_bytes to count all the slow memory reads on
domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command.
	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |   23 +++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  146 ++++++++++++++++++++++++++++++++
 2 files changed, 169 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 326a1b582f38..c42b12934a0e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -42,6 +42,29 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
+/* Reads to Local DRAM Memory */
+#define READS_TO_LOCAL_MEM             BIT(0)
+
+/* Reads to Remote DRAM Memory */
+#define READS_TO_REMOTE_MEM            BIT(1)
+
+/* Non-Temporal Writes to Local Memory */
+#define NON_TEMP_WRITE_TO_LOCAL_MEM    BIT(2)
+
+/* Non-Temporal Writes to Remote Memory */
+#define NON_TEMP_WRITE_TO_REMOTE_MEM   BIT(3)
+
+/* Reads to Local Memory the system identifies as "Slow Memory" */
+#define READS_TO_LOCAL_S_MEM           BIT(4)
+
+/* Reads to Remote Memory the system identifies as "Slow Memory" */
+#define READS_TO_REMOTE_S_MEM          BIT(5)
+
+/* Dirty Victims to All Types of Memory */
+#define  DIRTY_VICTIMS_TO_ALL_MEM      BIT(6)
+
+/* Max event bits supported */
+#define MAX_EVT_CONFIG_BITS            GENMASK(6, 0)
 
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 305fb0475970..25ff56ecb817 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1494,6 +1494,151 @@ static int mbm_local_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void mon_event_config_write(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 msr_index;
+
+	switch (mon_info->evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		msr_index = 0;
+		break;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		msr_index = 1;
+		break;
+	default:
+		/* Not expected to come here */
+		return;
+	}
+
+	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
+}
+
+static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 evtid, u32 val)
+{
+	struct mon_config_info mon_info = {0};
+	cpumask_var_t cpu_mask;
+	int ret = 0, cpu;
+
+	rdt_last_cmd_clear();
+
+	/* mon_config cannot be more than the supported set of events */
+	if (val > MAX_EVT_CONFIG_BITS) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return -EINVAL;
+	}
+
+	cpus_read_lock();
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
+		rdt_last_cmd_puts("cpu_mask allocation failed\n");
+		ret = -ENOMEM;
+		goto e_unlock;
+	}
+
+	/*
+	 * Read the current config value first. If both are same then
+	 * we dont need to write it again.
+	 */
+	mon_info.evtid = evtid;
+	mondata_config_read(d, &mon_info);
+	if (mon_info.mon_config == val)
+		goto e_cpumask;
+
+	mon_info.mon_config = val;
+
+	/* Pick all the CPUs in the domain instance */
+	for_each_cpu(cpu, &d->cpu_mask)
+		cpumask_set_cpu(cpu, cpu_mask);
+
+	/*
+	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in
+	 * cpu_mask. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
+	 * are scoped at the domain level. Writing any of these MSRs
+	 * on one CPU is supposed to be observed by all CPUs in the
+	 * domain. However, the hardware team recommends to update
+	 * these MSRs on all the CPUs in the domain.
+	 */
+	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
+
+	/*
+	 * When an Event Configuration is changed, the bandwidth counters
+	 * for all RMIDs and Events will be cleared by the hardware. The
+	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
+	 * every RMID on the next read to any event for every RMID.
+	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
+	 * cleared while it is tracked by the hardware. Clear the
+	 * mbm_local and mbm_total counts for all the RMIDs.
+	 */
+	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
+	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
+
+e_cpumask:
+	free_cpumask_var(cpu_mask);
+
+e_unlock:
+	cpus_read_unlock();
+
+	return ret;
+}
+
+static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
+{
+	char *dom_str = NULL, *id_str;
+	struct rdt_domain *d;
+	unsigned long dom_id, val;
+	int ret = 0;
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+	id_str = strsep(&dom_str, "=");
+
+	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
+		return -EINVAL;
+	}
+
+	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
+		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
+		return -EINVAL;
+	}
+
+	list_for_each_entry(d, &r->domains, list) {
+		if (d->id == dom_id) {
+			ret = mbm_config_write(r, d, evtid, val);
+			if (ret)
+				return -EINVAL;
+			goto next;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t mbm_total_config_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes, loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1597,6 +1742,7 @@ static struct rftype res_common_files[] = {
 		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_config_show,
+		.write		= mbm_total_config_write,
 	},
 	{
 		.name		= "mbm_local_config",



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 10/12] x86/resctrl: Add sysfs interface to write mbm_local_bytes event configuration
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (8 preceding siblings ...)
  2022-10-17 22:27 ` [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes " Babu Moger
@ 2022-10-17 22:27 ` Babu Moger
  2022-10-17 22:27 ` [PATCH v7 11/12] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
  2022-10-17 22:27 ` [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features Babu Moger
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:27 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be changed by the user by writing
to the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_local_bytes to count all the non-temporal writes on
domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex 0xc).
Run the command.
    $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command.
    $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 25ff56ecb817..e744db5dc057 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1639,6 +1639,26 @@ static ssize_t mbm_total_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static ssize_t mbm_local_config_write(struct kernfs_open_file *of,
+				      char *buf, size_t nbytes,
+				      loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_parse(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1749,6 +1769,7 @@ static struct rftype res_common_files[] = {
 		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_local_config_show,
+		.write		= mbm_local_config_write,
 	},
 	{
 		.name		= "cpus",



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 11/12] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (9 preceding siblings ...)
  2022-10-17 22:27 ` [PATCH v7 10/12] x86/resctrl: Add sysfs interface to write mbm_local_bytes " Babu Moger
@ 2022-10-17 22:27 ` Babu Moger
  2022-10-17 22:27 ` [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features Babu Moger
  11 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:27 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The call on_each_cpu_mask() can run the function on each CPU specified
by cpumask, which may include the local processor. So, replace the call
smp_call_function_many() with on_each_cpu_mask() to simplify the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e744db5dc057..4f8a1c9c3e7a 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -325,12 +325,7 @@ static void update_cpu_closid_rmid(void *info)
 static void
 update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 {
-	int cpu = get_cpu();
-
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
-	put_cpu();
+	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
 }
 
 static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
@@ -2131,13 +2126,9 @@ static int set_cache_qos_cfg(int level, bool enable)
 			/* Pick one CPU from each domain instance to update MSR */
 			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 	}
-	cpu = get_cpu();
-	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update(&enable);
-	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
-	put_cpu();
+
+	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, update, &enable, 1);
 
 	free_cpumask_var(cpu_mask);
 
@@ -2614,7 +2605,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	int i, cpu;
+	int i;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
@@ -2635,13 +2626,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
 		for (i = 0; i < hw_res->num_closid; i++)
 			hw_dom->ctrl_val[i] = r->default_ctrl;
 	}
-	cpu = get_cpu();
-	/* Update CBM on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_ctrl_update(&msr_param);
-	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-	put_cpu();
+
+	/* Update CBM on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 
 	free_cpumask_var(cpu_mask);
 



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features
  2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
                   ` (10 preceding siblings ...)
  2022-10-17 22:27 ` [PATCH v7 11/12] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
@ 2022-10-17 22:27 ` Babu Moger
  2022-10-18  3:24   ` Bagas Sanjaya
  2022-10-25 23:50   ` Reinette Chatre
  11 siblings, 2 replies; 30+ messages in thread
From: Babu Moger @ 2022-10-17 22:27 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Update the documentation for the new features:
1. Slow Memory Bandwidth allocation (SMBA).
   With this feature, the QOS  enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC).
   The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not
   enough to count all the different types of memory events. With the
   feature BMEC, the users have the option to configure mbm_total_bytes
   and mbm_local_bytes to count the specific type of events.

Also add configuration instructions with examples.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/x86/resctrl.rst |  139 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 71a531061e4e..d0b4e1a2cb8d 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
 flag bits:
 
-=============================================	================================
+===============================================	================================
 RDT (Resource Director Technology) Allocation	"rdt_a"
 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
-=============================================	================================
+SMBA (Slow Memory Bandwidth Allocation)         "smba"
+BMEC (Bandwidth Monitoring Event Configuration) "bmec"
+===============================================	================================
 
 To use the feature mount the file system::
 
@@ -161,6 +163,79 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
+                Example::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_local_bytes
+
+                If the system supports Bandwidth Monitoring Event
+                Configuration (BMEC), then the bandwidth events will
+                be configurable. The output will be::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_total_config
+                   mbm_local_bytes
+                   mbm_local_config
+
+"mbm_total_config", "mbm_local_config":
+        These files contain the current event configuration for the events
+        mbm_total_bytes and mbm_local_bytes, respectively, when the
+        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
+        The event configuration settings are domain specific and will affect
+        all the CPUs in the domain.
+
+        Following are the types of events supported:
+
+        ====    ========================================================
+        Bits    Description
+        ====    ========================================================
+        6       Dirty Victims from the QOS domain to all types of memory
+        5       Reads to slow memory in the non-local NUMA domain
+        4       Reads to slow memory in the local NUMA domain
+        3       Non-temporal writes to non-local NUMA domain
+        2       Non-temporal writes to local NUMA domain
+        1       Reads to memory in the non-local NUMA domain
+        0       Reads to memory in the local NUMA domain
+        ====    ========================================================
+
+        By default, the mbm_total_bytes configuration is set to 0x7f to count
+        all the event types and the mbm_local_bytes configuration is set to
+        0x15 to count all the local memory events.
+
+        Examples:
+
+        * To view the current configuration::
+          ::
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
+            0=0x7f;1=0x7f;2=0x7f;3=0x7f
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
+            0=0x15;1=0x15;3=0x15;4=0x15
+
+        * To change the mbm_total_bytes to count only reads on domain 0,
+          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
+          (in hexadecimal 0x33):
+          ::
+
+            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
+            0=0x33;1=0x7f;2=0x7f;3=0x7f
+
+        * To change the mbm_local_bytes to count all the slow memory reads
+          on domain 1, the bits 4 and 5 needs to be set, which is 110000b
+          in binary (in hexadecimal 0x30):
+          ::
+
+            # echo  "1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
+            0=0x15;1=0x30;3=0x15;4=0x15
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
@@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
 
 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
 
+Slow Memory bandwidth Allocation (SMBA)
+---------------------------------------
+AMD hardwares support Slow Memory bandwidth Allocation (SMBA) feature.
+Currently, CXL.memory is the only supported "slow" memory device.
+With the support of SMBA, the hardware enables bandwidth allocation
+on the slow memory devices. If there are multiple such devices in the
+system, the throttling logic groups all the slow sources together
+and applies the limit on them as a whole.
+
+The presence of SMBA (with CXL.memory) is independent of slow memory
+devices presence. If there is no such devices on the system, then
+setting the configuring SMBA will have no impact on the performance
+of the system.
+
+The bandwidth domain for slow memory is L3 cache. Its schemata file
+is formatted as:
+::
+
+	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Reading/writing the schemata file
 ---------------------------------
 Reading the schemata file will show the state of all resources
@@ -479,6 +574,46 @@ which you wish to change.  E.g.
   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Reading/writing the schemata file (on AMD systems)
+--------------------------------------------------
+Reading the schemata file will show the current bandwidth limit on all
+domains. The allocated resources are in multiples of one eighth GB/s.
+When writing to the file, you need to specify what cache id you wish to
+configure the bandwidth limit.
+
+For example, to allocate 2GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "MB:1=16" > schemata
+  # cat schemata
+    MB:0=2048;1=  16;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+Reading/writing the schemata file (on AMD systems) with SMBA feature
+--------------------------------------------------------------------
+Reading and writing the schemata file is the same as without SMBA in
+above section.
+
+For example, to allocate 8GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    SMBA:0=2048;1=2048;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "SMBA:1=64" > schemata
+  # cat schemata
+    SMBA:0=2048;1=  64;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
 Cache Pseudo-Locking
 ====================
 CAT enables a user to specify the amount of cache space that an



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features
  2022-10-17 22:27 ` [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features Babu Moger
@ 2022-10-18  3:24   ` Bagas Sanjaya
  2022-10-25 23:50   ` Reinette Chatre
  1 sibling, 0 replies; 30+ messages in thread
From: Bagas Sanjaya @ 2022-10-18  3:24 UTC (permalink / raw)
  To: Babu Moger
  Cc: corbet, reinette.chatre, tglx, mingo, bp, fenghua.yu,
	dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	eranian

[-- Attachment #1: Type: text/plain, Size: 8964 bytes --]

On Mon, Oct 17, 2022 at 05:27:34PM -0500, Babu Moger wrote:
> Update the documentation for the new features:
> 1. Slow Memory Bandwidth allocation (SMBA).
>    With this feature, the QOS  enforcement policies can be applied
>    to the external slow memory connected to the host. QOS enforcement
>    is accomplished by assigning a Class Of Service (COS) to a processor
>    and specifying allocations or limits for that COS for each resource
>    to be allocated.
> 
> 2. Bandwidth Monitoring Event Configuration (BMEC).
>    The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
>    are set to count all the total and local reads/writes respectively.
>    With the introduction of slow memory, the two counters are not
>    enough to count all the different types of memory events. With the
>    feature BMEC, the users have the option to configure mbm_total_bytes
>    and mbm_local_bytes to count the specific type of events.
> 
> Also add configuration instructions with examples.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  Documentation/x86/resctrl.rst |  139 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 71a531061e4e..d0b4e1a2cb8d 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
>  This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
>  flag bits:
>  
> -=============================================	================================
> +===============================================	================================
>  RDT (Resource Director Technology) Allocation	"rdt_a"
>  CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
>  CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
>  CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
>  MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
>  MBA (Memory Bandwidth Allocation)		"mba"
> -=============================================	================================
> +SMBA (Slow Memory Bandwidth Allocation)         "smba"
> +BMEC (Bandwidth Monitoring Event Configuration) "bmec"
> +===============================================	================================
>  
>  To use the feature mount the file system::
>  
> @@ -161,6 +163,79 @@ with the following files:
>  "mon_features":
>  		Lists the monitoring events if
>  		monitoring is enabled for the resource.
> +                Example::
> +
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_local_bytes
> +
> +                If the system supports Bandwidth Monitoring Event
> +                Configuration (BMEC), then the bandwidth events will
> +                be configurable. The output will be::
> +
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_total_config
> +                   mbm_local_bytes
> +                   mbm_local_config
> +
> +"mbm_total_config", "mbm_local_config":
> +        These files contain the current event configuration for the events
> +        mbm_total_bytes and mbm_local_bytes, respectively, when the
> +        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
> +        The event configuration settings are domain specific and will affect
> +        all the CPUs in the domain.
> +
> +        Following are the types of events supported:
> +
> +        ====    ========================================================
> +        Bits    Description
> +        ====    ========================================================
> +        6       Dirty Victims from the QOS domain to all types of memory
> +        5       Reads to slow memory in the non-local NUMA domain
> +        4       Reads to slow memory in the local NUMA domain
> +        3       Non-temporal writes to non-local NUMA domain
> +        2       Non-temporal writes to local NUMA domain
> +        1       Reads to memory in the non-local NUMA domain
> +        0       Reads to memory in the local NUMA domain
> +        ====    ========================================================
> +
> +        By default, the mbm_total_bytes configuration is set to 0x7f to count
> +        all the event types and the mbm_local_bytes configuration is set to
> +        0x15 to count all the local memory events.
> +
> +        Examples:
> +
> +        * To view the current configuration::
> +          ::
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +            0=0x7f;1=0x7f;2=0x7f;3=0x7f
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +            0=0x15;1=0x15;3=0x15;4=0x15
> +
> +        * To change the mbm_total_bytes to count only reads on domain 0,
> +          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
> +          (in hexadecimal 0x33):
> +          ::
> +
> +            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +            0=0x33;1=0x7f;2=0x7f;3=0x7f
> +
> +        * To change the mbm_local_bytes to count all the slow memory reads
> +          on domain 1, the bits 4 and 5 needs to be set, which is 110000b
> +          in binary (in hexadecimal 0x30):
> +          ::
> +
> +            # echo  "1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +            0=0x15;1=0x30;3=0x15;4=0x15
>  
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
> @@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
>  
>  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>  
> +Slow Memory bandwidth Allocation (SMBA)
> +---------------------------------------
> +AMD hardwares support Slow Memory bandwidth Allocation (SMBA) feature.
> +Currently, CXL.memory is the only supported "slow" memory device.
> +With the support of SMBA, the hardware enables bandwidth allocation
> +on the slow memory devices. If there are multiple such devices in the
> +system, the throttling logic groups all the slow sources together
> +and applies the limit on them as a whole.
> +
> +The presence of SMBA (with CXL.memory) is independent of slow memory
> +devices presence. If there is no such devices on the system, then
> +setting the configuring SMBA will have no impact on the performance
> +of the system.
> +
> +The bandwidth domain for slow memory is L3 cache. Its schemata file
> +is formatted as:
> +::
> +
> +	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
> +
>  Reading/writing the schemata file
>  ---------------------------------
>  Reading the schemata file will show the state of all resources
> @@ -479,6 +574,46 @@ which you wish to change.  E.g.
>    L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
>    L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
>  
> +Reading/writing the schemata file (on AMD systems)
> +--------------------------------------------------
> +Reading the schemata file will show the current bandwidth limit on all
> +domains. The allocated resources are in multiples of one eighth GB/s.
> +When writing to the file, you need to specify what cache id you wish to
> +configure the bandwidth limit.
> +
> +For example, to allocate 2GB/s limit on the first cache id:
> +
> +::
> +
> +  # cat schemata
> +    MB:0=2048;1=2048;2=2048;3=2048
> +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +  # echo "MB:1=16" > schemata
> +  # cat schemata
> +    MB:0=2048;1=  16;2=2048;3=2048
> +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +Reading/writing the schemata file (on AMD systems) with SMBA feature
> +--------------------------------------------------------------------
> +Reading and writing the schemata file is the same as without SMBA in
> +above section.
> +
> +For example, to allocate 8GB/s limit on the first cache id:
> +
> +::
> +
> +  # cat schemata
> +    SMBA:0=2048;1=2048;2=2048;3=2048
> +      MB:0=2048;1=2048;2=2048;3=2048
> +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +  # echo "SMBA:1=64" > schemata
> +  # cat schemata
> +    SMBA:0=2048;1=  64;2=2048;3=2048
> +      MB:0=2048;1=2048;2=2048;3=2048
> +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
>  Cache Pseudo-Locking
>  ====================
>  CAT enables a user to specify the amount of cache space that an
> 
> 

Finally LGTM, thanks.

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-17 22:26 ` [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
@ 2022-10-25 23:43   ` Reinette Chatre
  2022-10-26 19:07     ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-25 23:43 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

Nitpick in Subject ... "allocation" -> "Allocation"?

On 10/17/2022 3:26 PM, Babu Moger wrote:

...

> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>  
>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>  		r = s->res;
> -		if (r->rid == RDT_RESOURCE_MBA) {
> +		if (r->rid == RDT_RESOURCE_MBA ||
> +		    r->rid == RDT_RESOURCE_SMBA) {
>  			rdtgroup_init_mba(r, rdtgrp->closid);
>  			if (is_mba_sc(r))
>  				continue;

The above hunk and the ones that follow are unexpected.

Note that the software controller, when resctrl is mounted with "mba_MBps", is 
only supported by RDT_RESOURCE_MBA. At this time this really is hard coded all
over the place, for example:

static int set_mba_sc(bool mba_sc)
{
	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
	...

}

Since SMBA hardcodes "delay_linear = false" I do not expect it to support the software
controller ... but these hunks appear to treat SMBA as though it does. It is the "MBA software
controller", not "SMBA software controller". Why does it check above if the MBA software
controller is enabled on SMBA?			

> @@ -3287,7 +3289,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>  {
>  	lockdep_assert_held(&rdtgroup_mutex);
>  
> -	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
> +	if (supports_mba_mbps() &&
> +	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA))
>  		mba_sc_domain_destroy(r, d);
>  
>  	if (!r->mon_capable)
> @@ -3354,8 +3357,9 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  
>  	lockdep_assert_held(&rdtgroup_mutex);
>  
> -	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
> -		/* RDT_RESOURCE_MBA is never mon_capable */
> +	if (supports_mba_mbps() &&
> +	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_MBA))
> +		/* RDT_RESOURCE_MBA (or SMBA) is never mon_capable */

What does this change do? Did you mean to add a r->rid == RDT_RESOURCE_SMBA check?

>  		return mba_sc_domain_allocate(r, d);
>  
>  	if (!r->mon_capable)
> 
> 

Why are the MBA software controller resources allocated/destroyed for a SMBA resource? If
you want to support the software controller for SMBA then there are a lot of other changes
missing.

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration
  2022-10-17 22:26 ` [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-10-25 23:45   ` Reinette Chatre
  2022-10-26 19:25     ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-25 23:45 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/17/2022 3:26 PM, Babu Moger wrote:
> Add a new field in mon_evt to support Bandwidth Monitoring Event
> Configuration(BMEC) and also update the "mon_features" display.
> 
> The sysfs file "mon_features" will display the monitor configuration
> if supported.
> 
> Before the change.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_local_bytes
> 
> After the change if BMEC is supported.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_total_config
> 	mbm_local_bytes
> 	mbm_local_config

This does not seem to be what the code in this patch does.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |    3 ++-
>  arch/x86/kernel/cpu/resctrl/internal.h |    4 +++-
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    7 ++++++-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
>  4 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index d79f494a4e91..46813b1c50c2 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -814,6 +814,7 @@ static __init bool get_rdt_alloc_resources(void)
>  static __init bool get_rdt_mon_resources(void)
>  {
>  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
>  
>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
>  		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
> @@ -825,7 +826,7 @@ static __init bool get_rdt_mon_resources(void)
>  	if (!rdt_mon_features)
>  		return false;
>  
> -	return !rdt_get_mon_l3_config(r);
> +	return !rdt_get_mon_l3_config(r, mon_configurable);
>  }

This seems to do a portion of configuration in the calling function, pass the
results of to the actual configuration function where the rest of the configuration is
done. Determining "mon_configurable" really looks like it belongs in 
rdt_get_mon_l3_config(). Is it availability of rdt_cpu_has() that prevented
that change? Why not make it available internally to all resctrl code?

Patch 7's mbm_config_rftype_init() can also be moved to rdt_get_mon_l3_config()
to match how other related configs (thread_throttle_mode_init()) are done.

>  
>  static __init void __check_quirks_intel(void)
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 16e3c6e03c79..b458f768f30c 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> + * @configurable:	true if the event is configurable
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	enum resctrl_event_id	evtid;
>  	char			*name;
> +	bool			configurable;
>  	struct list_head	list;
>  };
>  
> @@ -522,7 +524,7 @@ int closids_supported(void);
>  void closid_free(int closid);
>  int alloc_rmid(void);
>  void free_rmid(u32 rmid);
> -int rdt_get_mon_l3_config(struct rdt_resource *r);
> +int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable);
>  void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index efe0c30d3a12..4b8adb7f1c5c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>  		list_add_tail(&mbm_local_event.list, &r->evt_list);
>  }
>  
> -int rdt_get_mon_l3_config(struct rdt_resource *r)
> +int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable)
>  {
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> @@ -783,6 +783,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  	if (ret)
>  		return ret;
>  
> +	if (configurable) {
> +		mbm_total_event.configurable = true;
> +		mbm_local_event.configurable = true;
> +	}
> +
>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 1271fd1ae2f3..5f0ef1bf4c78 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>  	struct rdt_resource *r = of->kn->parent->priv;
>  	struct mon_evt *mevt;
>  
> -	list_for_each_entry(mevt, &r->evt_list, list)
> +	list_for_each_entry(mevt, &r->evt_list, list) {
>  		seq_printf(seq, "%s\n", mevt->name);
> +		if (mevt->configurable)
> +			seq_printf(seq, "%s_config\n", mevt->name);
> +	}
>  
>  	return 0;
>  }
> 

If mevt->name is "mbm_total_bytes", then would this not
print "mbm_total_bytes_config"? This is different from "mbm_total_config"
in the changelog and does not match the actual files created in later
patches..

Reinette


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration
  2022-10-17 22:26 ` [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration Babu Moger
@ 2022-10-25 23:47   ` Reinette Chatre
  2022-10-26 19:36     ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-25 23:47 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/17/2022 3:26 PM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> By default, the mbm_total_bytes configuration is set to 0x7f to count
> all the event types.
> 
> For example:
>     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
>     0=0x7f;1=0x7f;2=0x7f;3=0x7f
> 
>     In this case, the event mbm_total_bytes is currently configured
>     with 0x7f on domains 0 to 3.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |    3 +
>  arch/x86/kernel/cpu/resctrl/internal.h |    2 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   76 ++++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 46813b1c50c2..758c5d7553a4 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -826,6 +826,9 @@ static __init bool get_rdt_mon_resources(void)
>  	if (!rdt_mon_features)
>  		return false;
>  
> +	if (mon_configurable)
> +		mbm_config_rftype_init();
> +

Please move this to be within rdt_get_mon_l3_config().

>  	return !rdt_get_mon_l3_config(r, mon_configurable);
>  }
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index b458f768f30c..326a1b582f38 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -15,6 +15,7 @@
>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
>  #define MSR_IA32_QM_CTR			0x0c8e
>  #define MSR_IA32_QM_EVTSEL		0x0c8d
> @@ -541,5 +542,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
>  void __check_limbo(struct rdt_domain *d, bool force_free);
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  void __init thread_throttle_mode_init(void);
> +void __init mbm_config_rftype_init(void);
>  
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 5f0ef1bf4c78..0982845594d0 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1423,6 +1423,67 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +struct mon_config_info {
> +	u32 evtid;
> +	u32 mon_config;
> +};
> +
> +static void mon_event_config_read(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 h, msr_index;
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		/* Not expected to come here */
> +		return;
> +	}
> +
> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
> +}
> +
> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> +{
> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> +}
> +
> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> +{
> +	struct mon_config_info mon_info = {0};
> +	struct rdt_domain *dom;
> +	bool sep = false;
> +
> +	list_for_each_entry(dom, &r->domains, list) {

This is done with no protection ... I do not see anything preventing last CPU of
a domain going offline around this time taking this domain with it while this code
still tries to access its members.

I think all of this needs protection with rdtgroup_mutex.

> +		if (sep)
> +			seq_puts(s, ";");
> +
> +		mon_info.evtid = evtid;
> +		mondata_config_read(dom, &mon_info);
> +
> +		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);

It does not seem robust to me to use printf field width to manage the data
returned from hardware. At this time bits 0 - 6 are supported by hardware ...
what happens if hardware starts using bit 7 for something? 
It seems to me that the mask introduced later needs to be pulled here to
ensure that only the valid bits are handled.

> +		sep = true;
> +	}
> +	seq_puts(s, "\n");
> +
> +	return 0;
> +}
> +
> +static int mbm_total_config_show(struct kernfs_open_file *of,
> +				 struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	return 0;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1521,6 +1582,12 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= max_threshold_occ_show,
>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
>  	},
> +	{
> +		.name		= "mbm_total_config",
> +		.mode		= 0644,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= mbm_total_config_show,
> +	},

Please only make mode writable when there is support for it.

>  	{
>  		.name		= "cpus",
>  		.mode		= 0644,
> @@ -1627,6 +1694,15 @@ void __init thread_throttle_mode_init(void)
>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
>  }
>  
> +void __init mbm_config_rftype_init(void)
> +{
> +	struct rftype *rft;
> +
> +	rft = rdtgroup_get_rftype_by_name("mbm_total_config");
> +	if (rft)
> +		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
> +}
> +
>  /**
>   * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
>   * @r: The resource group with which the file is associated.
> 
> 


Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes event configuration
  2022-10-17 22:27 ` [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes " Babu Moger
@ 2022-10-25 23:48   ` Reinette Chatre
  2022-10-26 19:52     ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-25 23:48 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/17/2022 3:27 PM, Babu Moger wrote:
> The current event configuration can be changed by the user by writing
> to the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> 
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> For example:
> To change the mbm_total_bytes to count only reads on domain 0, the bits
> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
> command.
> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_config
> 
> To change the mbm_total_bytes to count all the slow memory reads on
> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> Run the command.
> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |   23 +++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  146 ++++++++++++++++++++++++++++++++
>  2 files changed, 169 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 326a1b582f38..c42b12934a0e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -42,6 +42,29 @@
>   */
>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>  
> +/* Reads to Local DRAM Memory */
> +#define READS_TO_LOCAL_MEM             BIT(0)
> +
> +/* Reads to Remote DRAM Memory */
> +#define READS_TO_REMOTE_MEM            BIT(1)
> +
> +/* Non-Temporal Writes to Local Memory */
> +#define NON_TEMP_WRITE_TO_LOCAL_MEM    BIT(2)
> +
> +/* Non-Temporal Writes to Remote Memory */
> +#define NON_TEMP_WRITE_TO_REMOTE_MEM   BIT(3)
> +
> +/* Reads to Local Memory the system identifies as "Slow Memory" */
> +#define READS_TO_LOCAL_S_MEM           BIT(4)
> +
> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
> +#define READS_TO_REMOTE_S_MEM          BIT(5)
> +
> +/* Dirty Victims to All Types of Memory */
> +#define  DIRTY_VICTIMS_TO_ALL_MEM      BIT(6)
> +
> +/* Max event bits supported */
> +#define MAX_EVT_CONFIG_BITS            GENMASK(6, 0)
>  
>  struct rdt_fs_context {
>  	struct kernfs_fs_context	kfc;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 305fb0475970..25ff56ecb817 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1494,6 +1494,151 @@ static int mbm_local_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void mon_event_config_write(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 msr_index;
> +
> +	switch (mon_info->evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		/* Not expected to come here */
> +		return;
> +	}
> +
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
> +}

This duplicates most of mon_event_config_read() from earlier patch. Would a
utility like "mon_event_config_index_get()" help here?

> +
> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 evtid, u32 val)
> +{
> +	struct mon_config_info mon_info = {0};
> +	cpumask_var_t cpu_mask;
> +	int ret = 0, cpu;
> +
> +	rdt_last_cmd_clear();
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (val > MAX_EVT_CONFIG_BITS) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	cpus_read_lock();
> +
> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
> +		rdt_last_cmd_puts("cpu_mask allocation failed\n");
> +		ret = -ENOMEM;
> +		goto e_unlock;
> +	}
> +
> +	/*
> +	 * Read the current config value first. If both are same then
> +	 * we dont need to write it again.

"we dont need" -> "we don't need"

> +	 */
> +	mon_info.evtid = evtid;
> +	mondata_config_read(d, &mon_info);
> +	if (mon_info.mon_config == val)
> +		goto e_cpumask;
> +
> +	mon_info.mon_config = val;
> +
> +	/* Pick all the CPUs in the domain instance */
> +	for_each_cpu(cpu, &d->cpu_mask)
> +		cpumask_set_cpu(cpu, cpu_mask);
> +

Why is it necessary to create a new CPU mask just to populate it
with all CPUs in another CPU mask? Why not just use the original CPU mask?

> +	/*
> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in
> +	 * cpu_mask. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> +	 * are scoped at the domain level. Writing any of these MSRs
> +	 * on one CPU is supposed to be observed by all CPUs in the
> +	 * domain. However, the hardware team recommends to update
> +	 * these MSRs on all the CPUs in the domain.
> +	 */
> +	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared by the hardware. The
> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> +	 * every RMID on the next read to any event for every RMID.
> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> +	 * cleared while it is tracked by the hardware. Clear the
> +	 * mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +
> +e_cpumask:
> +	free_cpumask_var(cpu_mask);
> +
> +e_unlock:
> +	cpus_read_unlock();
> +
> +	return ret;
> +}
> +
> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
> +{
> +	char *dom_str = NULL, *id_str;
> +	struct rdt_domain *d;
> +	unsigned long dom_id, val;

Please use reverse fir tree order.

> +	int ret = 0;
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +	id_str = strsep(&dom_str, "=");
> +
> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
> +		return -EINVAL;
> +	}
> +
> +	list_for_each_entry(d, &r->domains, list) {
> +		if (d->id == dom_id) {
> +			ret = mbm_config_write(r, d, evtid, val);
> +			if (ret)
> +				return -EINVAL;
> +			goto next;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t mbm_total_config_write(struct kernfs_open_file *of,
> +				      char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	rdt_last_cmd_clear();
> +

How is this code protected from interference by other actions? Needs rdtgroup_mutex?

> +	buf[nbytes - 1] = '\0';
> +
> +	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1597,6 +1742,7 @@ static struct rftype res_common_files[] = {
>  		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= mbm_total_config_show,
> +		.write		= mbm_total_config_write,
>  	},
>  	{
>  		.name		= "mbm_local_config",
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features
  2022-10-17 22:27 ` [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features Babu Moger
  2022-10-18  3:24   ` Bagas Sanjaya
@ 2022-10-25 23:50   ` Reinette Chatre
  2022-10-26 20:00     ` Moger, Babu
  1 sibling, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-25 23:50 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/17/2022 3:27 PM, Babu Moger wrote:

...

> +"mbm_total_config", "mbm_local_config":
> +        These files contain the current event configuration for the events
> +        mbm_total_bytes and mbm_local_bytes, respectively, when the
> +        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
> +        The event configuration settings are domain specific and will affect
> +        all the CPUs in the domain.
> +
> +        Following are the types of events supported:
> +
> +        ====    ========================================================
> +        Bits    Description
> +        ====    ========================================================
> +        6       Dirty Victims from the QOS domain to all types of memory
> +        5       Reads to slow memory in the non-local NUMA domain
> +        4       Reads to slow memory in the local NUMA domain
> +        3       Non-temporal writes to non-local NUMA domain
> +        2       Non-temporal writes to local NUMA domain
> +        1       Reads to memory in the non-local NUMA domain
> +        0       Reads to memory in the local NUMA domain
> +        ====    ========================================================
> +
> +        By default, the mbm_total_bytes configuration is set to 0x7f to count
> +        all the event types and the mbm_local_bytes configuration is set to
> +        0x15 to count all the local memory events.
> +
> +        Examples:
> +
> +        * To view the current configuration::
> +          ::
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +            0=0x7f;1=0x7f;2=0x7f;3=0x7f
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +            0=0x15;1=0x15;3=0x15;4=0x15
> +
> +        * To change the mbm_total_bytes to count only reads on domain 0,
> +          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
> +          (in hexadecimal 0x33):
> +          ::
> +
> +            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
> +            0=0x33;1=0x7f;2=0x7f;3=0x7f
> +
> +        * To change the mbm_local_bytes to count all the slow memory reads
> +          on domain 1, the bits 4 and 5 needs to be set, which is 110000b
> +          in binary (in hexadecimal 0x30):
> +          ::
> +
> +            # echo  "1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
> +            0=0x15;1=0x30;3=0x15;4=0x15
>  

The code supports modifying multiple domains with one write. To help users
understand this it may be useful to have one of these examples do so to show
that and how it can be done.

>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
> @@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
>  
>  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>  
> +Slow Memory bandwidth Allocation (SMBA)

bandwidth -> Bandwidth?

> +---------------------------------------
> +AMD hardwares support Slow Memory bandwidth Allocation (SMBA) feature.

hardware -> hardware?
bandwidth -> Bandwidth?

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-25 23:43   ` Reinette Chatre
@ 2022-10-26 19:07     ` Moger, Babu
  2022-10-26 20:23       ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Moger, Babu @ 2022-10-26 19:07 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 10/25/22 18:43, Reinette Chatre wrote:
> Hi Babu,
>
> Nitpick in Subject ... "allocation" -> "Allocation"?
Sure.
>
> On 10/17/2022 3:26 PM, Babu Moger wrote:
>
> ...
>
>> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>>  
>>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>>  		r = s->res;
>> -		if (r->rid == RDT_RESOURCE_MBA) {
>> +		if (r->rid == RDT_RESOURCE_MBA ||
>> +		    r->rid == RDT_RESOURCE_SMBA) {
>>  			rdtgroup_init_mba(r, rdtgrp->closid);
>>  			if (is_mba_sc(r))
>>  				continue;
> The above hunk and the ones that follow are unexpected.

I am thinking the above check is required, It is updating the
staged_config with default values. Right now, the default value for SMBA
is same as MBA default value. So, I used this code to initialize.

Did I miss something?

>
> Note that the software controller, when resctrl is mounted with "mba_MBps", is 
> only supported by RDT_RESOURCE_MBA. At this time this really is hard coded all
> over the place, for example:
>
> static int set_mba_sc(bool mba_sc)
> {
> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
> 	...
>
> }
>
> Since SMBA hardcodes "delay_linear = false" I do not expect it to support the software
> controller ... but these hunks appear to treat SMBA as though it does. It is the "MBA software
> controller", not "SMBA software controller". Why does it check above if the MBA software
> controller is enabled on SMBA?

There is no plan to support SMBA software controller. Yes, I think below
checks are not required.


> 			
>
>> @@ -3287,7 +3289,8 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d)
>>  {
>>  	lockdep_assert_held(&rdtgroup_mutex);
>>  
>> -	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
>> +	if (supports_mba_mbps() &&
>> +	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA))
>>  		mba_sc_domain_destroy(r, d);
This check is not required.
>>  
>>  	if (!r->mon_capable)
>> @@ -3354,8 +3357,9 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>>  
>>  	lockdep_assert_held(&rdtgroup_mutex);
>>  
>> -	if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA)
>> -		/* RDT_RESOURCE_MBA is never mon_capable */
>> +	if (supports_mba_mbps() &&
>> +	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_MBA))
>> +		/* RDT_RESOURCE_MBA (or SMBA) is never mon_capable */
> What does this change do? Did you mean to add a r->rid == RDT_RESOURCE_SMBA check?

Good catch. I meant  r->rid == RDT_RESOURCE_SMBA.

But this check is not required at all.

>
>>  		return mba_sc_domain_allocate(r, d);
>>  
>>  	if (!r->mon_capable)
>>
>>
> Why are the MBA software controller resources allocated/destroyed for a SMBA resource? If
> you want to support the software controller for SMBA then there are a lot of other changes

No..There is no plan to support software controller for SMBA.

Thanks

Babu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration
  2022-10-25 23:45   ` Reinette Chatre
@ 2022-10-26 19:25     ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-26 19:25 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 10/25/22 18:45, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/17/2022 3:26 PM, Babu Moger wrote:
>> Add a new field in mon_evt to support Bandwidth Monitoring Event
>> Configuration(BMEC) and also update the "mon_features" display.
>>
>> The sysfs file "mon_features" will display the monitor configuration
>> if supported.
>>
>> Before the change.
>> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_local_bytes
>>
>> After the change if BMEC is supported.
>> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_total_config
>> 	mbm_local_bytes
>> 	mbm_local_config
> This does not seem to be what the code in this patch does.

You are right. it is

# cat /sys/fs/resctrl/info/L3_MON/mon_features
llc_occupancy
mbm_total_bytes
mbm_total_bytes_config
mbm_local_bytes
mbm_local_bytes_config

>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/core.c     |    3 ++-
>>  arch/x86/kernel/cpu/resctrl/internal.h |    4 +++-
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |    7 ++++++-
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
>>  4 files changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index d79f494a4e91..46813b1c50c2 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -814,6 +814,7 @@ static __init bool get_rdt_alloc_resources(void)
>>  static __init bool get_rdt_mon_resources(void)
>>  {
>>  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
>>  
>>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
>>  		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
>> @@ -825,7 +826,7 @@ static __init bool get_rdt_mon_resources(void)
>>  	if (!rdt_mon_features)
>>  		return false;
>>  
>> -	return !rdt_get_mon_l3_config(r);
>> +	return !rdt_get_mon_l3_config(r, mon_configurable);
>>  }
> This seems to do a portion of configuration in the calling function, pass the
> results of to the actual configuration function where the rest of the configuration is
> done. Determining "mon_configurable" really looks like it belongs in 
> rdt_get_mon_l3_config(). Is it availability of rdt_cpu_has() that prevented
> that change? Why not make it available internally to all resctrl code?

Yes. It is because of rdt_cpu_has availability. Also, it is __init routine.

Yes. I can make it available.

>
> Patch 7's mbm_config_rftype_init() can also be moved to rdt_get_mon_l3_config()
> to match how other related configs (thread_throttle_mode_init()) are done.
Sure. Will do.
>
>>  
>>  static __init void __check_quirks_intel(void)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 16e3c6e03c79..b458f768f30c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>>   * struct mon_evt - Entry in the event list of a resource
>>   * @evtid:		event id
>>   * @name:		name of the event
>> + * @configurable:	true if the event is configurable
>>   * @list:		entry in &rdt_resource->evt_list
>>   */
>>  struct mon_evt {
>>  	enum resctrl_event_id	evtid;
>>  	char			*name;
>> +	bool			configurable;
>>  	struct list_head	list;
>>  };
>>  
>> @@ -522,7 +524,7 @@ int closids_supported(void);
>>  void closid_free(int closid);
>>  int alloc_rmid(void);
>>  void free_rmid(u32 rmid);
>> -int rdt_get_mon_l3_config(struct rdt_resource *r);
>> +int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable);
>>  void mon_event_count(void *info);
>>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index efe0c30d3a12..4b8adb7f1c5c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>>  		list_add_tail(&mbm_local_event.list, &r->evt_list);
>>  }
>>  
>> -int rdt_get_mon_l3_config(struct rdt_resource *r)
>> +int rdt_get_mon_l3_config(struct rdt_resource *r, bool configurable)
>>  {
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> @@ -783,6 +783,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (configurable) {
>> +		mbm_total_event.configurable = true;
>> +		mbm_local_event.configurable = true;
>> +	}
>> +
>>  	l3_mon_evt_init(r);
>>  
>>  	r->mon_capable = true;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1271fd1ae2f3..5f0ef1bf4c78 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>>  	struct rdt_resource *r = of->kn->parent->priv;
>>  	struct mon_evt *mevt;
>>  
>> -	list_for_each_entry(mevt, &r->evt_list, list)
>> +	list_for_each_entry(mevt, &r->evt_list, list) {
>>  		seq_printf(seq, "%s\n", mevt->name);
>> +		if (mevt->configurable)
>> +			seq_printf(seq, "%s_config\n", mevt->name);
>> +	}
>>  
>>  	return 0;
>>  }
>>
> If mevt->name is "mbm_total_bytes", then would this not
> print "mbm_total_bytes_config"? This is different from "mbm_total_config"
> in the changelog and does not match the actual files created in later
> patches..

Sure. Will update the changelog,

Thanks

Babu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration
  2022-10-25 23:47   ` Reinette Chatre
@ 2022-10-26 19:36     ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-26 19:36 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 10/25/22 18:47, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/17/2022 3:26 PM, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
>> The event configuration settings are domain specific and will affect all
>> the CPUs in the domain.
>>
>> Following are the types of events supported:
>> ====  ===========================================================
>> Bits   Description
>> ====  ===========================================================
>> 6      Dirty Victims from the QOS domain to all types of memory
>> 5      Reads to slow memory in the non-local NUMA domain
>> 4      Reads to slow memory in the local NUMA domain
>> 3      Non-temporal writes to non-local NUMA domain
>> 2      Non-temporal writes to local NUMA domain
>> 1      Reads to memory in the non-local NUMA domain
>> 0      Reads to memory in the local NUMA domain
>> ====  ===========================================================
>>
>> By default, the mbm_total_bytes configuration is set to 0x7f to count
>> all the event types.
>>
>> For example:
>>     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
>>     0=0x7f;1=0x7f;2=0x7f;3=0x7f
>>
>>     In this case, the event mbm_total_bytes is currently configured
>>     with 0x7f on domains 0 to 3.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/core.c     |    3 +
>>  arch/x86/kernel/cpu/resctrl/internal.h |    2 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   76 ++++++++++++++++++++++++++++++++
>>  3 files changed, 81 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 46813b1c50c2..758c5d7553a4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -826,6 +826,9 @@ static __init bool get_rdt_mon_resources(void)
>>  	if (!rdt_mon_features)
>>  		return false;
>>  
>> +	if (mon_configurable)
>> +		mbm_config_rftype_init();
>> +
> Please move this to be within rdt_get_mon_l3_config().
Sure.
>
>>  	return !rdt_get_mon_l3_config(r, mon_configurable);
>>  }
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index b458f768f30c..326a1b582f38 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -15,6 +15,7 @@
>>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  
>>  #define MSR_IA32_QM_CTR			0x0c8e
>>  #define MSR_IA32_QM_EVTSEL		0x0c8d
>> @@ -541,5 +542,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
>>  void __check_limbo(struct rdt_domain *d, bool force_free);
>>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>  void __init thread_throttle_mode_init(void);
>> +void __init mbm_config_rftype_init(void);
>>  
>>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 5f0ef1bf4c78..0982845594d0 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1423,6 +1423,67 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +struct mon_config_info {
>> +	u32 evtid;
>> +	u32 mon_config;
>> +};
>> +
>> +static void mon_event_config_read(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 h, msr_index;
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		/* Not expected to come here */
>> +		return;
>> +	}
>> +
>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, h);
>> +}
>> +
>> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
>> +{
>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
>> +}
>> +
>> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
>> +{
>> +	struct mon_config_info mon_info = {0};
>> +	struct rdt_domain *dom;
>> +	bool sep = false;
>> +
>> +	list_for_each_entry(dom, &r->domains, list) {
> This is done with no protection ... I do not see anything preventing last CPU of
> a domain going offline around this time taking this domain with it while this code
> still tries to access its members.
>
> I think all of this needs protection with rdtgroup_mutex.
Yes. I agree. Will change it.
>
>> +		if (sep)
>> +			seq_puts(s, ";");
>> +
>> +		mon_info.evtid = evtid;
>> +		mondata_config_read(dom, &mon_info);
>> +
>> +		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
> It does not seem robust to me to use printf field width to manage the data
> returned from hardware. At this time bits 0 - 6 are supported by hardware ...
> what happens if hardware starts using bit 7 for something? 
> It seems to me that the mask introduced later needs to be pulled here to
> ensure that only the valid bits are handled.
Ok. Sure.
>
>> +		sep = true;
>> +	}
>> +	seq_puts(s, "\n");
>> +
>> +	return 0;
>> +}
>> +
>> +static int mbm_total_config_show(struct kernfs_open_file *of,
>> +				 struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	return 0;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -1521,6 +1582,12 @@ static struct rftype res_common_files[] = {
>>  		.seq_show	= max_threshold_occ_show,
>>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
>>  	},
>> +	{
>> +		.name		= "mbm_total_config",
>> +		.mode		= 0644,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= mbm_total_config_show,
>> +	},
> Please only make mode writable when there is support for it.

Ok.

Thanks

Babu

>
>>  	{
>>  		.name		= "cpus",
>>  		.mode		= 0644,
>> @@ -1627,6 +1694,15 @@ void __init thread_throttle_mode_init(void)
>>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
>>  }
>>  
>> +void __init mbm_config_rftype_init(void)
>> +{
>> +	struct rftype *rft;
>> +
>> +	rft = rdtgroup_get_rftype_by_name("mbm_total_config");
>> +	if (rft)
>> +		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
>> +}
>> +
>>  /**
>>   * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
>>   * @r: The resource group with which the file is associated.
>>
>>
>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes event configuration
  2022-10-25 23:48   ` Reinette Chatre
@ 2022-10-26 19:52     ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-26 19:52 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,


On 10/25/22 18:48, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/17/2022 3:27 PM, Babu Moger wrote:
>> The current event configuration can be changed by the user by writing
>> to the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_config.
>> The event configuration settings are domain specific and will affect all
>> the CPUs in the domain.
>>
>> Following are the types of events supported:
>>
>> ====  ===========================================================
>> Bits   Description
>> ====  ===========================================================
>> 6      Dirty Victims from the QOS domain to all types of memory
>> 5      Reads to slow memory in the non-local NUMA domain
>> 4      Reads to slow memory in the local NUMA domain
>> 3      Non-temporal writes to non-local NUMA domain
>> 2      Non-temporal writes to local NUMA domain
>> 1      Reads to memory in the non-local NUMA domain
>> 0      Reads to memory in the local NUMA domain
>> ====  ===========================================================
>>
>> For example:
>> To change the mbm_total_bytes to count only reads on domain 0, the bits
>> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
>> command.
>> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_config
>>
>> To change the mbm_total_bytes to count all the slow memory reads on
>> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
>> Run the command.
>> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_config
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |   23 +++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  146 ++++++++++++++++++++++++++++++++
>>  2 files changed, 169 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 326a1b582f38..c42b12934a0e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -42,6 +42,29 @@
>>   */
>>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>>  
>> +/* Reads to Local DRAM Memory */
>> +#define READS_TO_LOCAL_MEM             BIT(0)
>> +
>> +/* Reads to Remote DRAM Memory */
>> +#define READS_TO_REMOTE_MEM            BIT(1)
>> +
>> +/* Non-Temporal Writes to Local Memory */
>> +#define NON_TEMP_WRITE_TO_LOCAL_MEM    BIT(2)
>> +
>> +/* Non-Temporal Writes to Remote Memory */
>> +#define NON_TEMP_WRITE_TO_REMOTE_MEM   BIT(3)
>> +
>> +/* Reads to Local Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_LOCAL_S_MEM           BIT(4)
>> +
>> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_REMOTE_S_MEM          BIT(5)
>> +
>> +/* Dirty Victims to All Types of Memory */
>> +#define  DIRTY_VICTIMS_TO_ALL_MEM      BIT(6)
>> +
>> +/* Max event bits supported */
>> +#define MAX_EVT_CONFIG_BITS            GENMASK(6, 0)
>>  
>>  struct rdt_fs_context {
>>  	struct kernfs_fs_context	kfc;
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 305fb0475970..25ff56ecb817 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1494,6 +1494,151 @@ static int mbm_local_config_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +static void mon_event_config_write(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 msr_index;
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		/* Not expected to come here */
>> +		return;
>> +	}
>> +
>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
>> +}
> This duplicates most of mon_event_config_read() from earlier patch. Would a
> utility like "mon_event_config_index_get()" help here?
Yes. We can do that.
>
>> +
>> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
>> +			    u32 evtid, u32 val)
>> +{
>> +	struct mon_config_info mon_info = {0};
>> +	cpumask_var_t cpu_mask;
>> +	int ret = 0, cpu;
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	/* mon_config cannot be more than the supported set of events */
>> +	if (val > MAX_EVT_CONFIG_BITS) {
>> +		rdt_last_cmd_puts("Invalid event configuration\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	cpus_read_lock();
>> +
>> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
>> +		rdt_last_cmd_puts("cpu_mask allocation failed\n");
>> +		ret = -ENOMEM;
>> +		goto e_unlock;
>> +	}
>> +
>> +	/*
>> +	 * Read the current config value first. If both are same then
>> +	 * we dont need to write it again.
> "we dont need" -> "we don't need"
Sure.
>
>> +	 */
>> +	mon_info.evtid = evtid;
>> +	mondata_config_read(d, &mon_info);
>> +	if (mon_info.mon_config == val)
>> +		goto e_cpumask;
>> +
>> +	mon_info.mon_config = val;
>> +
>> +	/* Pick all the CPUs in the domain instance */
>> +	for_each_cpu(cpu, &d->cpu_mask)
>> +		cpumask_set_cpu(cpu, cpu_mask);
>> +
> Why is it necessary to create a new CPU mask just to populate it
> with all CPUs in another CPU mask? Why not just use the original CPU mask?

I thought convention was to create a new mask when calling on_each_cpu_mask().

But, I dont see that requirement. I can use d->cpu_mask for this directly.

>
>> +	/*
>> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in
>> +	 * cpu_mask. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
>> +	 * are scoped at the domain level. Writing any of these MSRs
>> +	 * on one CPU is supposed to be observed by all CPUs in the
>> +	 * domain. However, the hardware team recommends to update
>> +	 * these MSRs on all the CPUs in the domain.
>> +	 */
>> +	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
>> +
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>> +	 * every RMID on the next read to any event for every RMID.
>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>> +	 * cleared while it is tracked by the hardware. Clear the
>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +
>> +e_cpumask:
>> +	free_cpumask_var(cpu_mask);
>> +
>> +e_unlock:
>> +	cpus_read_unlock();
>> +
>> +	return ret;
>> +}
>> +
>> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
>> +{
>> +	char *dom_str = NULL, *id_str;
>> +	struct rdt_domain *d;
>> +	unsigned long dom_id, val;
> Please use reverse fir tree order.
Sure.
>
>> +	int ret = 0;
>> +
>> +next:
>> +	if (!tok || tok[0] == '\0')
>> +		return 0;
>> +
>> +	/* Start processing the strings for each domain */
>> +	dom_str = strim(strsep(&tok, ";"));
>> +	id_str = strsep(&dom_str, "=");
>> +
>> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
>> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
>> +		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	list_for_each_entry(d, &r->domains, list) {
>> +		if (d->id == dom_id) {
>> +			ret = mbm_config_write(r, d, evtid, val);
>> +			if (ret)
>> +				return -EINVAL;
>> +			goto next;
>> +		}
>> +	}
>> +
>> +	return -EINVAL;
>> +}
>> +
>> +static ssize_t mbm_total_config_write(struct kernfs_open_file *of,
>> +				      char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	int ret;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	rdt_last_cmd_clear();
>> +
> How is this code protected from interference by other actions? Needs rdtgroup_mutex?

Yes. Sure.

Thanks

Babu

>
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -1597,6 +1742,7 @@ static struct rftype res_common_files[] = {
>>  		.mode		= 0644,
>>  		.kf_ops		= &rdtgroup_kf_single_ops,
>>  		.seq_show	= mbm_total_config_show,
>> +		.write		= mbm_total_config_write,
>>  	},
>>  	{
>>  		.name		= "mbm_local_config",
>>
>>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features
  2022-10-25 23:50   ` Reinette Chatre
@ 2022-10-26 20:00     ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-26 20:00 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,


On 10/25/22 18:50, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/17/2022 3:27 PM, Babu Moger wrote:
>
> ...
>
>> +"mbm_total_config", "mbm_local_config":
>> +        These files contain the current event configuration for the events
>> +        mbm_total_bytes and mbm_local_bytes, respectively, when the
>> +        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
>> +        The event configuration settings are domain specific and will affect
>> +        all the CPUs in the domain.
>> +
>> +        Following are the types of events supported:
>> +
>> +        ====    ========================================================
>> +        Bits    Description
>> +        ====    ========================================================
>> +        6       Dirty Victims from the QOS domain to all types of memory
>> +        5       Reads to slow memory in the non-local NUMA domain
>> +        4       Reads to slow memory in the local NUMA domain
>> +        3       Non-temporal writes to non-local NUMA domain
>> +        2       Non-temporal writes to local NUMA domain
>> +        1       Reads to memory in the non-local NUMA domain
>> +        0       Reads to memory in the local NUMA domain
>> +        ====    ========================================================
>> +
>> +        By default, the mbm_total_bytes configuration is set to 0x7f to count
>> +        all the event types and the mbm_local_bytes configuration is set to
>> +        0x15 to count all the local memory events.
>> +
>> +        Examples:
>> +
>> +        * To view the current configuration::
>> +          ::
>> +
>> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
>> +            0=0x7f;1=0x7f;2=0x7f;3=0x7f
>> +
>> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
>> +            0=0x15;1=0x15;3=0x15;4=0x15
>> +
>> +        * To change the mbm_total_bytes to count only reads on domain 0,
>> +          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
>> +          (in hexadecimal 0x33):
>> +          ::
>> +
>> +            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_config
>> +
>> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_config
>> +            0=0x33;1=0x7f;2=0x7f;3=0x7f
>> +
>> +        * To change the mbm_local_bytes to count all the slow memory reads
>> +          on domain 1, the bits 4 and 5 needs to be set, which is 110000b
>> +          in binary (in hexadecimal 0x30):
>> +          ::
>> +
>> +            # echo  "1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_config
>> +
>> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_config
>> +            0=0x15;1=0x30;3=0x15;4=0x15
>>  
> The code supports modifying multiple domains with one write. To help users
> understand this it may be useful to have one of these examples do so to show
> that and how it can be done.
Sure.
>
>>  "max_threshold_occupancy":
>>  		Read/write file provides the largest value (in
>> @@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
>>  
>>  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>>  
>> +Slow Memory bandwidth Allocation (SMBA)
> bandwidth -> Bandwidth?
ok
>
>> +---------------------------------------
>> +AMD hardwares support Slow Memory bandwidth Allocation (SMBA) feature.
> hardware -> hardware?
> bandwidth -> Bandwidth?

Sure. Thanks

Babu



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-26 19:07     ` Moger, Babu
@ 2022-10-26 20:23       ` Reinette Chatre
  2022-10-27 15:30         ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-26 20:23 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/26/2022 12:07 PM, Moger, Babu wrote:
> On 10/25/22 18:43, Reinette Chatre wrote:
>> On 10/17/2022 3:26 PM, Babu Moger wrote:
>>
>> ...
>>
>>> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>>>  
>>>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>>>  		r = s->res;
>>> -		if (r->rid == RDT_RESOURCE_MBA) {
>>> +		if (r->rid == RDT_RESOURCE_MBA ||
>>> +		    r->rid == RDT_RESOURCE_SMBA) {
>>>  			rdtgroup_init_mba(r, rdtgrp->closid);
>>>  			if (is_mba_sc(r))
>>>  				continue;
>> The above hunk and the ones that follow are unexpected.
> 
> I am thinking the above check is required, It is updating the
> staged_config with default values. Right now, the default value for SMBA
> is same as MBA default value. So, I used this code to initialize.
> 
> Did I miss something?

As I described in the following comments my concern is related to all the
software controller code still executing for SMBA. Yes, in the above hunk
SMBA would need (some of) rdtgroup_init_mba() ... but note that it contains
software controller checks and in the above hunk its call is also followed
by another software controller check.

The software controller is just applicable to MBA and these checks have been
isolated to the MBA resource. Using it for SMBA that does not support
software controller at all is making the code harder to follow and sets this
code up for future mistakes. I think it would make the code easier to understand
if this is made very clear that software controller is not applicable to SMBA at
all instead of repurposing these flows.

>> Note that the software controller, when resctrl is mounted with "mba_MBps", is 
>> only supported by RDT_RESOURCE_MBA. At this time this really is hard coded all
>> over the place, for example:
>>
>> static int set_mba_sc(bool mba_sc)
>> {
>> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>> 	...
>>
>> }
>>
>> Since SMBA hardcodes "delay_linear = false" I do not expect it to support the software
>> controller ... but these hunks appear to treat SMBA as though it does. It is the "MBA software
>> controller", not "SMBA software controller". Why does it check above if the MBA software
>> controller is enabled on SMBA?
> 
> There is no plan to support SMBA software controller. Yes, I think below
> checks are not required.

Thank you for clarifying. Having the code reflect that clearly would make everything
easier to understand and maintain.

Reinette
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-26 20:23       ` Reinette Chatre
@ 2022-10-27 15:30         ` Moger, Babu
  2022-10-27 18:37           ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Moger, Babu @ 2022-10-27 15:30 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 10/26/22 15:23, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/26/2022 12:07 PM, Moger, Babu wrote:
>> On 10/25/22 18:43, Reinette Chatre wrote:
>>> On 10/17/2022 3:26 PM, Babu Moger wrote:
>>>
>>> ...
>>>
>>>> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>>>>  
>>>>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>>>>  		r = s->res;
>>>> -		if (r->rid == RDT_RESOURCE_MBA) {
>>>> +		if (r->rid == RDT_RESOURCE_MBA ||
>>>> +		    r->rid == RDT_RESOURCE_SMBA) {
>>>>  			rdtgroup_init_mba(r, rdtgrp->closid);
>>>>  			if (is_mba_sc(r))
>>>>  				continue;
>>> The above hunk and the ones that follow are unexpected.
>> I am thinking the above check is required, It is updating the
>> staged_config with default values. Right now, the default value for SMBA
>> is same as MBA default value. So, I used this code to initialize.
>>
>> Did I miss something?
> As I described in the following comments my concern is related to all the
> software controller code still executing for SMBA. Yes, in the above hunk
> SMBA would need (some of) rdtgroup_init_mba() ... but note that it contains
> software controller checks and in the above hunk its call is also followed
> by another software controller check.
>
> The software controller is just applicable to MBA and these checks have been
> isolated to the MBA resource. Using it for SMBA that does not support
> software controller at all is making the code harder to follow and sets this
> code up for future mistakes. I think it would make the code easier to understand
> if this is made very clear that software controller is not applicable to SMBA at
> all instead of repurposing these flows.

Yes. Understood.  How about this? I feel this is much more cleaner.

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e5a48f05e787..d91a6a513681 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2845,16 +2845,18 @@ static int rdtgroup_init_alloc(struct rdtgroup
*rdtgrp)
 
        list_for_each_entry(s, &resctrl_schema_all, list) {
                r = s->res;
-               if (r->rid == RDT_RESOURCE_MBA) {
+               if (r->rid == RDT_RESOURCE_MBA ||
+                   r->rid == RDT_RESOURCE_SMBA) {
                        rdtgroup_init_mba(r, rdtgrp->closid);
-                       if (is_mba_sc(r))
-                               continue;
                } else {
                        ret = rdtgroup_init_cat(s, rdtgrp->closid);
                        if (ret < 0)
                                return ret;
                }
 
+               if (is_mba_sc(r))
+                       continue;
+
                ret = resctrl_arch_update_domains(r, rdtgrp->closid);
                if (ret < 0) {
                        rdt_last_cmd_puts("Failed to initialize
allocations\n");

Thanks

Babu

>
>>> Note that the software controller, when resctrl is mounted with "mba_MBps", is 
>>> only supported by RDT_RESOURCE_MBA. At this time this really is hard coded all
>>> over the place, for example:
>>>
>>> static int set_mba_sc(bool mba_sc)
>>> {
>>> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
>>> 	...
>>>
>>> }
>>>
>>> Since SMBA hardcodes "delay_linear = false" I do not expect it to support the software
>>> controller ... but these hunks appear to treat SMBA as though it does. It is the "MBA software
>>> controller", not "SMBA software controller". Why does it check above if the MBA software
>>> controller is enabled on SMBA?
>> There is no plan to support SMBA software controller. Yes, I think below
>> checks are not required.
> Thank you for clarifying. Having the code reflect that clearly would make everything
> easier to understand and maintain.
>
> Reinette
>  

-- 
Thanks
Babu Moger


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-27 15:30         ` Moger, Babu
@ 2022-10-27 18:37           ` Reinette Chatre
  2022-10-28 15:16             ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-27 18:37 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/27/2022 8:30 AM, Moger, Babu wrote:
> On 10/26/22 15:23, Reinette Chatre wrote:
>> On 10/26/2022 12:07 PM, Moger, Babu wrote:
>>> On 10/25/22 18:43, Reinette Chatre wrote:
>>>> On 10/17/2022 3:26 PM, Babu Moger wrote:
>>>>
>>>> ...
>>>>
>>>>> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>>>>>  
>>>>>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>>>>>  		r = s->res;
>>>>> -		if (r->rid == RDT_RESOURCE_MBA) {
>>>>> +		if (r->rid == RDT_RESOURCE_MBA ||
>>>>> +		    r->rid == RDT_RESOURCE_SMBA) {
>>>>>  			rdtgroup_init_mba(r, rdtgrp->closid);
>>>>>  			if (is_mba_sc(r))
>>>>>  				continue;
>>>> The above hunk and the ones that follow are unexpected.
>>> I am thinking the above check is required, It is updating the
>>> staged_config with default values. Right now, the default value for SMBA
>>> is same as MBA default value. So, I used this code to initialize.
>>>
>>> Did I miss something?
>> As I described in the following comments my concern is related to all the
>> software controller code still executing for SMBA. Yes, in the above hunk
>> SMBA would need (some of) rdtgroup_init_mba() ... but note that it contains
>> software controller checks and in the above hunk its call is also followed
>> by another software controller check.
>>
>> The software controller is just applicable to MBA and these checks have been
>> isolated to the MBA resource. Using it for SMBA that does not support
>> software controller at all is making the code harder to follow and sets this
>> code up for future mistakes. I think it would make the code easier to understand
>> if this is made very clear that software controller is not applicable to SMBA at
>> all instead of repurposing these flows.
> 
> Yes. Understood.  How about this? I feel this is much more cleaner.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e5a48f05e787..d91a6a513681 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2845,16 +2845,18 @@ static int rdtgroup_init_alloc(struct rdtgroup
> *rdtgrp)
>  
>         list_for_each_entry(s, &resctrl_schema_all, list) {
>                 r = s->res;
> -               if (r->rid == RDT_RESOURCE_MBA) {
> +               if (r->rid == RDT_RESOURCE_MBA ||
> +                   r->rid == RDT_RESOURCE_SMBA) {
>                         rdtgroup_init_mba(r, rdtgrp->closid);
> -                       if (is_mba_sc(r))
> -                               continue;
>                 } else {
>                         ret = rdtgroup_init_cat(s, rdtgrp->closid);
>                         if (ret < 0)
>                                 return ret;
>                 }
>  
> +               if (is_mba_sc(r))
> +                       continue;
> +
>                 ret = resctrl_arch_update_domains(r, rdtgrp->closid);
>                 if (ret < 0) {
>                         rdt_last_cmd_puts("Failed to initialize
> allocations\n");
> 

I do not see how that move changes what is run in the SMBA case and it ignores the
is_mba_sc() call within rdtgroup_init_mba(). How about making is_mba_sc() more
robust in support of your original snippet?

Something like:

bool is_mba_sc(struct rdt_resource *r)
{
	if (!r)
		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;

	if (r->rid != RDT_RESOURCE_MBA)
		return false;

	return r->membw.mba_sc;
}

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-10-17 22:26 ` [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-10-27 18:49   ` Reinette Chatre
  2022-10-27 19:17     ` Moger, Babu
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2022-10-27 18:49 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 10/17/2022 3:26 PM, Babu Moger wrote:
> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  2 files changed, 2 insertions(+)
> 

I see arch/x86/include/asm/cpufeatures.h patches that also include
a change to tools/arch/x86/include/asm/cpufeatures.h in support
of perf. I do not know the customs here but I see others
change both in the same patch and if it is not done together a separate
patch to tools/arch/x86/include/asm/cpufeatures.h would be required
anyway to address the inevitable perf build warning.

Reinette


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-10-27 18:49   ` Reinette Chatre
@ 2022-10-27 19:17     ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-27 19:17 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 10/27/22 13:49, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/17/2022 3:26 PM, Babu Moger wrote:
>> ---
>>  arch/x86/include/asm/cpufeatures.h |    1 +
>>  arch/x86/kernel/cpu/scattered.c    |    1 +
>>  2 files changed, 2 insertions(+)
>>
> I see arch/x86/include/asm/cpufeatures.h patches that also include
> a change to tools/arch/x86/include/asm/cpufeatures.h in support
> of perf. I do not know the customs here but I see others
> change both in the same patch and if it is not done together a separate
> patch to tools/arch/x86/include/asm/cpufeatures.h would be required
> anyway to address the inevitable perf build warning.

Lately, I see that these two files are synced after the feature bits are
committed. I can send the separate once these changes are committed.

Thanks

Babu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation
  2022-10-27 18:37           ` Reinette Chatre
@ 2022-10-28 15:16             ` Moger, Babu
  0 siblings, 0 replies; 30+ messages in thread
From: Moger, Babu @ 2022-10-28 15:16 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, October 27, 2022 1:38 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory
> Bandwidth allocation
> 
> Hi Babu,
> 
> On 10/27/2022 8:30 AM, Moger, Babu wrote:
> > On 10/26/22 15:23, Reinette Chatre wrote:
> >> On 10/26/2022 12:07 PM, Moger, Babu wrote:
> >>> On 10/25/22 18:43, Reinette Chatre wrote:
> >>>> On 10/17/2022 3:26 PM, Babu Moger wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct
> >>>>> rdtgroup *rdtgrp)
> >>>>>
> >>>>>  	list_for_each_entry(s, &resctrl_schema_all, list) {
> >>>>>  		r = s->res;
> >>>>> -		if (r->rid == RDT_RESOURCE_MBA) {
> >>>>> +		if (r->rid == RDT_RESOURCE_MBA ||
> >>>>> +		    r->rid == RDT_RESOURCE_SMBA) {
> >>>>>  			rdtgroup_init_mba(r, rdtgrp->closid);
> >>>>>  			if (is_mba_sc(r))
> >>>>>  				continue;
> >>>> The above hunk and the ones that follow are unexpected.
> >>> I am thinking the above check is required, It is updating the
> >>> staged_config with default values. Right now, the default value for
> >>> SMBA is same as MBA default value. So, I used this code to initialize.
> >>>
> >>> Did I miss something?
> >> As I described in the following comments my concern is related to all
> >> the software controller code still executing for SMBA. Yes, in the
> >> above hunk SMBA would need (some of) rdtgroup_init_mba() ... but note
> >> that it contains software controller checks and in the above hunk its
> >> call is also followed by another software controller check.
> >>
> >> The software controller is just applicable to MBA and these checks
> >> have been isolated to the MBA resource. Using it for SMBA that does
> >> not support software controller at all is making the code harder to
> >> follow and sets this code up for future mistakes. I think it would
> >> make the code easier to understand if this is made very clear that
> >> software controller is not applicable to SMBA at all instead of repurposing
> these flows.
> >
> > Yes. Understood.  How about this? I feel this is much more cleaner.
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index e5a48f05e787..d91a6a513681 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -2845,16 +2845,18 @@ static int rdtgroup_init_alloc(struct rdtgroup
> > *rdtgrp)
> >
> >         list_for_each_entry(s, &resctrl_schema_all, list) {
> >                 r = s->res;
> > -               if (r->rid == RDT_RESOURCE_MBA) {
> > +               if (r->rid == RDT_RESOURCE_MBA ||
> > +                   r->rid == RDT_RESOURCE_SMBA) {
> >                         rdtgroup_init_mba(r, rdtgrp->closid);
> > -                       if (is_mba_sc(r))
> > -                               continue;
> >                 } else {
> >                         ret = rdtgroup_init_cat(s, rdtgrp->closid);
> >                         if (ret < 0)
> >                                 return ret;
> >                 }
> >
> > +               if (is_mba_sc(r))
> > +                       continue;
> > +
> >                 ret = resctrl_arch_update_domains(r, rdtgrp->closid);
> >                 if (ret < 0) {
> >                         rdt_last_cmd_puts("Failed to initialize
> > allocations\n");
> >
> 
> I do not see how that move changes what is run in the SMBA case and it ignores
> the
> is_mba_sc() call within rdtgroup_init_mba(). How about making is_mba_sc()
> more robust in support of your original snippet?
> 
> Something like:
> 
> bool is_mba_sc(struct rdt_resource *r)
> {
> 	if (!r)
> 		return
> rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
> 
> 	if (r->rid != RDT_RESOURCE_MBA)
> 		return false;
> 
> 	return r->membw.mba_sc;
> }

Yes. Sure. That should work.
Thanks
Babu

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-10-28 15:17 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17 22:25 [PATCH v7 00/12] x86/resctrl: Support for AMD QoS new features Babu Moger
2022-10-17 22:26 ` [PATCH v7 01/12] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
2022-10-27 18:49   ` Reinette Chatre
2022-10-27 19:17     ` Moger, Babu
2022-10-17 22:26 ` [PATCH v7 02/12] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
2022-10-17 22:26 ` [PATCH v7 03/12] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
2022-10-17 22:26 ` [PATCH v7 04/12] x86/resctrl: Include new features in command line options Babu Moger
2022-10-17 22:26 ` [PATCH v7 05/12] x86/resctrl: Detect and configure Slow Memory Bandwidth allocation Babu Moger
2022-10-25 23:43   ` Reinette Chatre
2022-10-26 19:07     ` Moger, Babu
2022-10-26 20:23       ` Reinette Chatre
2022-10-27 15:30         ` Moger, Babu
2022-10-27 18:37           ` Reinette Chatre
2022-10-28 15:16             ` Moger, Babu
2022-10-17 22:26 ` [PATCH v7 06/12] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
2022-10-25 23:45   ` Reinette Chatre
2022-10-26 19:25     ` Moger, Babu
2022-10-17 22:26 ` [PATCH v7 07/12] x86/resctrl: Add sysfs interface to read mbm_total_bytes event configuration Babu Moger
2022-10-25 23:47   ` Reinette Chatre
2022-10-26 19:36     ` Moger, Babu
2022-10-17 22:27 ` [PATCH v7 08/12] x86/resctrl: Add sysfs interface to read mbm_local_bytes " Babu Moger
2022-10-17 22:27 ` [PATCH v7 09/12] x86/resctrl: Add sysfs interface to write mbm_total_bytes " Babu Moger
2022-10-25 23:48   ` Reinette Chatre
2022-10-26 19:52     ` Moger, Babu
2022-10-17 22:27 ` [PATCH v7 10/12] x86/resctrl: Add sysfs interface to write mbm_local_bytes " Babu Moger
2022-10-17 22:27 ` [PATCH v7 11/12] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
2022-10-17 22:27 ` [PATCH v7 12/12] Documentation/x86: Update resctrl.rst for new features Babu Moger
2022-10-18  3:24   ` Bagas Sanjaya
2022-10-25 23:50   ` Reinette Chatre
2022-10-26 20:00     ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).