linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/13] Support for AMD QoS new features
@ 2022-11-04 19:59 Babu Moger
  2022-11-04 19:59 ` [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
                   ` (13 more replies)
  0 siblings, 14 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 19:59 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

New AMD processors can now support following QoS features.

1. Slow Memory Bandwidth Allocation (SMBA)
   With this feature, the QOS enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

   Currently, CXL.memory is the only supported "slow" memory device. With
   the support of SMBA feature the hardware enables bandwidth allocation
   on the slow memory devices.

2. Bandwidth Monitoring Event Configuration (BMEC)
   The bandwidth monitoring events mbm_total_event and mbm_local_event 
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not enough
   to count all the different types are memory events. With the feature
   BMEC, the users have the option to configure mbm_total_event and
   mbm_local_event to count the specific type of events.

   Following are the bitmaps of events supported.
   Bits    Description
     6       Dirty Victims from the QOS domain to all types of memory
     5       Reads to slow memory in the non-local NUMA domain
     4       Reads to slow memory in the local NUMA domain
     3       Non-temporal writes to non-local NUMA domain
     2       Non-temporal writes to local NUMA domain
     1       Reads to memory in the non-local NUMA domain
     0       Reads to memory in the local NUMA domain

This series adds support for these features.

Feature description is available in the specification, "AMD64 Technology Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v8:
 Changes:
 1. Removed init attribute for rdt_cpu_has to make it available for all the files.
 2. Updated the change log for mon_features to correct the names of config files.
 3. Changed configuration file name from mbm_total_config to mbm_total_bytes_config.
    This is more consistant with other changes.
 4. Added lock protection while reading/writing the config file.
 5. Other few minor text changes. I have been missing few comments in last couple of
    revisions. Hope I have addressed all of them this time.

v7:
 https://lore.kernel.org/lkml/166604543832.5345.9696970469830919982.stgit@bmoger-ubuntu/
 Changes:
 Not much of a change. Missed one comment from Reinette from v5. Corrected it now.
 Few format corrections from Sanjaya.

v6:
 https://lore.kernel.org/lkml/166543345606.23830.3120625408601531368.stgit@bmoger-ubuntu/
 Summary of changes:
 1. Rebased on top of lastest tip tree. Fixed few minor conflicts.
 2. Fixed format issue with scattered.c.
 3. Removed config_name from the structure mon_evt. It is not required.
 4. The read/write format for mbm_total_config and mbm_local_config will be same
    as schemata format "id0=val0;id1=val1;...". This is comment from Fenghua.
 5. Added more comments MSR_IA32_EVT_CFG_BASE writng.
 5. Few text changes in resctrl.rst 
 
v5:
  https://lore.kernel.org/lkml/166431016617.373387.1968875281081252467.stgit@bmoger-ubuntu/
  Summary of changes.
  1. Split the series into two. The first two patches are bug fixes. So, sent them separate.
  2. The config files mbm_total_config and mbm_local_config are now under
     /sys/fs/resctrl/info/L3_MON/. Removed these config files from mon groups.
  3. Ran "checkpatch --strict --codespell" on all the patches. Looks good with few known exceptions.
  4. Few minor text changes in resctrl.rst file. 

v4:
  https://lore.kernel.org/lkml/166257348081.1043018.11227924488792315932.stgit@bmoger-ubuntu/
  Got numerios of comments from Reinette Chatre. Addressed most of them. 
  Summary of changes.
  1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.  
  2. Updated mon_features texts if the BMEC is supported.
  3. Added more explanation about the slow memory support.
  4. Replaced smp_call_function_many with on_each_cpu_mask call.
  5. Removed arch_has_empty_bitmaps
  6. Few other text changes.
  7. Removed Reviewed-by if the patch is modified.
  8. Rebased the patches to latest tip.

v3:
  https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest tip. Resolved some conflicts.
     https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  b. Taken care of feedback from Bagas Sanjaya.
  c. Added Reviewed by from Mingo.
  Note: I am still looking for comments from Reinette or Fenghua.

v2:
  https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
  b. Added the patch to fix CBM issue on AMD. This was originally discussed
     https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/

v1:
  https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/

Babu Moger (13):
      x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
      x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
      x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
      x86/resctrl: Include new features in command line options
      x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
      x86/resctrl: Remove the init attribute for rdt_cpu_has()
      x86/resctrl: Introduce data structure to support monitor configuration
      x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
      x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
      x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
      x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
      x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
      Documentation/x86: Update resctrl.rst for new features


 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/x86/resctrl.rst                 | 139 +++++++-
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/cpu/resctrl/core.c            |  56 +++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  33 ++
 arch/x86/kernel/cpu/resctrl/monitor.c         |   7 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 304 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   2 +
 10 files changed, 515 insertions(+), 33 deletions(-)

--


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
@ 2022-11-04 19:59 ` Babu Moger
  2022-11-23 18:21   ` Yu, Fenghua
  2022-11-04 20:00 ` [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 19:59 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
enforcement policies can be applied to external slow memory connected
to the host. QOS enforcement is accomplished by assigning a Class Of
Service (COS) to a processor and specifying allocations or limits for
that COS for each resource to be allocated.

This feature is identified by the CPUID Function 8000_0020_EBX_x0.

CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers
(ECX=0)

Bits    Field Name      Description
2       L3SBE           L3 external slow memory bandwidth enforcement

Currently, CXL.memory is the only supported "slow" memory device. With
the support of SMBA feature, the hardware enables bandwidth allocation
on the slow memory devices. If there are multiple slow memory devices
in the system, then the throttling logic groups all the slow sources
together and applies the limit on them as a whole.

The presence of the SMBA feature(with CXL.memory) is independent of
whether slow memory device is actually present in the system. If there
is no slow memory in the system, then setting a SMBA limit will have no
impact on the performance of the system.

Presence of CXL memory can be identified by numactl command.

$numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB node 0 free: 59542 MB
node 1 cpus:
node 1 size: 16122 MB
node 1 free: 15627 MB
node distances:
node   0   1
   0:  10  50
   1:  50  10

CPU list for CXL memory will be empty. The cpu-cxl node distance is
greater than cpu-to-cpu distances. Node 1 has the CXL memory in this
case. CXL memory can also be identified using ACPI SRAT table and
memory maps.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index aefd0816a333..d68b4c9c181d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -305,6 +305,7 @@
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
 #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth tracking for RSB stuffing */
+#define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory Bandwidth Allocation */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index fc01f81f6e2a..5a5f17ed69a2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
  2022-11-04 19:59 ` [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:04   ` Reinette Chatre
  2022-11-04 20:00 ` [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
enforcement policies on the external slow memory.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 03cfbf0fe000..4b970e7192e8 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_SMBA] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_SMBA,
+			.name			= "SMBA",
+			.cache_level		= 3,
+			.domains		= domain_init(RDT_RESOURCE_SMBA),
+			.parse_ctrlval		= parse_bw,
+			.format_str		= "%d=%*u",
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5f7128686cfd..43d9f6f5a931 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -419,6 +419,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_SMBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
  2022-11-04 19:59 ` [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
  2022-11-04 20:00 ` [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:09   ` Reinette Chatre
  2022-11-23 18:17   ` Yu, Fenghua
  2022-11-04 20:00 ` [PATCH v8 04/13] x86/resctrl: Include new features in command line options Babu Moger
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
Bits    Field Name    Description
3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)

Currently, the bandwidth monitoring events mbm_total_bytes and
mbm_local_bytes are set to count all the total and local reads/writes
respectively. With the introduction of slow memory, the two counters
are not enough to count all the different types of memory events. With
the feature BMEC, the users have the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of
events.

Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
EventID) which contains one field for each bandwidth type that can be
used to configure the bandwidth event to track any combination of
supported bandwidth types. The event will count requests from every
bandwidth type bit that is set in the corresponding configuration
register.

Following are the types of events supported:

====    ========================================================
Bits    Description
====    ========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d68b4c9c181d..6732ca0117be 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -306,6 +306,7 @@
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
 #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth tracking for RSB stuffing */
 #define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC		(11*32+20) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..4555f9596ccf 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 5a5f17ed69a2..67c4d24e06ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
+	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 04/13] x86/resctrl: Include new features in command line options
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (2 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23 18:26   ` Yu, Fenghua
  2022-11-04 20:00 ` [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add the command line options to disable the new features.
smba : Slow Memory Bandwidth Allocation
bmec : Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a465d5242774..f3f0870144fb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5190,7 +5190,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba.
+			mba, smba, bmec.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4b970e7192e8..e31c98e2fafc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -659,6 +659,8 @@ enum {
 	RDT_FLAG_L2_CAT,
 	RDT_FLAG_L2_CDP,
 	RDT_FLAG_MBA,
+	RDT_FLAG_SMBA,
+	RDT_FLAG_BMEC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
 	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
+	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
+	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (3 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 04/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:12   ` Reinette Chatre
  2022-11-04 20:00 ` [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has() Babu Moger
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index e31c98e2fafc..6571d08e2b0d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
 	if (!r)
 		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
 
+	/*
+	 * The software controller support is only applicable to MBA resource.
+	 * Make sure to check for resource type again.
+	 */
+	if (r->rid != RDT_RESOURCE_MBA)
+		return false;
+
 	return r->membw.mba_sc;
 }
 
@@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	union cpuid_0x10_3_eax eax;
 	union cpuid_0x10_x_edx edx;
-	u32 ebx, ecx;
+	u32 ebx, ecx, subleaf;
 
-	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
+	/*
+	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
+	 * CPUID_Fn80000020_EDX_x02 for SMBA
+	 */
+	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
+
+	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
 	hw_res->num_closid = edx.split.cos_max + 1;
 	r->default_ctrl = MAX_MBA_BW_AMD;
 
@@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
 	return false;
 }
 
+static __init bool get_slow_mem_config(void)
+{
+	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
+
+	if (!rdt_cpu_has(X86_FEATURE_SMBA))
+		return false;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
+
+	return false;
+}
+
 static __init bool get_rdt_alloc_resources(void)
 {
 	struct rdt_resource *r;
@@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
 	if (get_mem_config())
 		ret = true;
 
+	if (get_slow_mem_config())
+		ret = true;
+
 	return ret;
 }
 
@@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;
+		} else if (r->rid == RDT_RESOURCE_SMBA) {
+			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
+			hw_res->msr_update = mba_wrmsr_amd;
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 1df0e3262bca..2dd4b8c47f23 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -209,7 +209,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	unsigned long dom_id;
 
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    r->rid == RDT_RESOURCE_MBA) {
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 43d9f6f5a931..16e3c6e03c79 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -14,6 +14,7 @@
 #define MSR_IA32_L2_CBM_BASE		0xd10
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
+#define MSR_IA32_SMBA_BW_BASE		0xc0000280
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e5a48f05e787..8a3dafc0dbf7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1213,7 +1213,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA)
+		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->domains, list) {
@@ -1402,7 +1402,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 					ctrl = resctrl_arch_get_config(r, d,
 								       closid,
 								       type);
-				if (r->rid == RDT_RESOURCE_MBA)
+				if (r->rid == RDT_RESOURCE_MBA ||
+				    r->rid == RDT_RESOURCE_SMBA)
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA) {
+		if (r->rid == RDT_RESOURCE_MBA ||
+		    r->rid == RDT_RESOURCE_SMBA) {
 			rdtgroup_init_mba(r, rdtgrp->closid);
 			if (is_mba_sc(r))
 				continue;



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has()
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (4 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:13   ` Reinette Chatre
  2022-11-04 20:00 ` [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The monitor code in resctrl/monitor.c needs to call rdt_cpu_has() to
detect the monitor related features. It has the init attribute and
cannot be called in non-init routines. Remove the init attribute and
make it available for all the resctrl files.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |    4 ++--
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6571d08e2b0d..b33a541f5c80 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -688,7 +688,7 @@ struct rdt_options {
 	bool	force_off, force_on;
 };
 
-static struct rdt_options rdt_options[]  __initdata = {
+static struct rdt_options rdt_options[] = {
 	RDT_OPT(RDT_FLAG_CMT,	    "cmt",	X86_FEATURE_CQM_OCCUP_LLC),
 	RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL),
 	RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL),
@@ -728,7 +728,7 @@ static int __init set_rdt_options(char *str)
 }
 __setup("rdt", set_rdt_options);
 
-static bool __init rdt_cpu_has(int flag)
+bool rdt_cpu_has(int flag)
 {
 	bool ret = boot_cpu_has(flag);
 	struct rdt_options *o;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 16e3c6e03c79..e30e8b23f6b5 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -499,6 +499,7 @@ int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
 			     umode_t mask);
 struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 				   struct list_head **pos);
+bool rdt_cpu_has(int flag);
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (5 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has() Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:14   ` Reinette Chatre
  2022-11-04 20:00 ` [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Add a new field in mon_evt to support Bandwidth Monitoring Event
Configuration(BMEC) and also update the "mon_features" display.

The sysfs file "mon_features" will display the monitor configuration
if supported.

Before the change.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_local_bytes

After the change when BMEC is supported.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_total_bytes_config
	mbm_local_bytes
	mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
 arch/x86/kernel/cpu/resctrl/monitor.c  |    6 ++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e30e8b23f6b5..5459b5022760 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
+ * @configurable:	true if the event is configurable
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
 	char			*name;
+	bool			configurable;
 	struct list_head	list;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index efe0c30d3a12..06c2dc980855 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -750,6 +750,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
 	unsigned int threshold;
 	int ret;
 
@@ -783,6 +784,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (ret)
 		return ret;
 
+	if (mon_configurable) {
+		mbm_total_event.configurable = true;
+		mbm_local_event.configurable = true;
+	}
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8a3dafc0dbf7..8342feb54a7f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list)
+	list_for_each_entry(mevt, &r->evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
+		if (mevt->configurable)
+			seq_printf(seq, "%s_config\n", mevt->name);
+	}
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (6 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-11-04 20:00 ` Babu Moger
  2022-11-23  0:19   ` Reinette Chatre
  2022-11-04 20:01 ` [PATCH v8 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:00 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
    0=0x7f;1=0x7f;2=0x7f;3=0x7f

    In this case, the event mbm_total_bytes is currently configured
    with 0x7f on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |   28 ++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |    1 
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   87 ++++++++++++++++++++++++++++++++
 3 files changed, 116 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5459b5022760..c74285fd0f6e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -15,6 +15,7 @@
 #define MSR_IA32_MBA_THRTL_BASE		0xd50
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 #define MSR_IA32_QM_CTR			0x0c8e
 #define MSR_IA32_QM_EVTSEL		0x0c8d
@@ -41,6 +42,32 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
+/* Reads to Local DRAM Memory */
+#define READS_TO_LOCAL_MEM		BIT(0)
+
+/* Reads to Remote DRAM Memory */
+#define READS_TO_REMOTE_MEM		BIT(1)
+
+/* Non-Temporal Writes to Local Memory */
+#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
+
+/* Non-Temporal Writes to Remote Memory */
+#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
+
+/* Reads to Local Memory the system identifies as "Slow Memory" */
+#define READS_TO_LOCAL_S_MEM		BIT(4)
+
+/* Reads to Remote Memory the system identifies as "Slow Memory" */
+#define READS_TO_REMOTE_S_MEM		BIT(5)
+
+/* Dirty Victims to All Types of Memory */
+#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
+
+/* Max event bits supported */
+#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
+
+/* Max configurable events */
+#define MAX_CONFIG_EVENTS		2
 
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
@@ -542,5 +569,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init thread_throttle_mode_init(void);
+void mbm_config_rftype_init(void);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 06c2dc980855..a188dacab6c8 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -787,6 +787,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (mon_configurable) {
 		mbm_total_event.configurable = true;
 		mbm_local_event.configurable = true;
+		mbm_config_rftype_init();
 	}
 
 	l3_mon_evt_init(r);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8342feb54a7f..dea58b6b4aa4 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1423,6 +1423,78 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+struct mon_config_info {
+	u32 evtid;
+	u32 mon_config;
+};
+
+/**
+ * mon_event_config_index_get - get the index for the configurable event
+ * @evtid: event id.
+ *
+ * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ *         > 1 otherwise
+ */
+static inline unsigned int mon_event_config_index_get(u32 evtid)
+{
+	return evtid - QOS_L3_MBM_TOTAL_EVENT_ID;
+}
+
+static void mon_event_config_read(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 h, index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index >= MAX_CONFIG_EVENTS) {
+		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
+		return;
+	}
+	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
+}
+
+static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+{
+	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
+}
+
+static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
+{
+	struct mon_config_info mon_info = {0};
+	struct rdt_domain *dom;
+	bool sep = false;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	list_for_each_entry(dom, &r->domains, list) {
+		if (sep)
+			seq_puts(s, ";");
+
+		mon_info.evtid = evtid;
+		mondata_config_read(dom, &mon_info);
+
+		seq_printf(s, "%d=0x%02lx", dom->id,
+			   mon_info.mon_config & MAX_EVT_CONFIG_BITS);
+		sep = true;
+	}
+	seq_puts(s, "\n");
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
+				       struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1521,6 +1593,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= max_threshold_occ_show,
 		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
 	},
+	{
+		.name		= "mbm_total_bytes_config",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_total_bytes_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
@@ -1627,6 +1705,15 @@ void __init thread_throttle_mode_init(void)
 	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
 }
 
+void mbm_config_rftype_init(void)
+{
+	struct rftype *rft;
+
+	rft = rdtgroup_get_rftype_by_name("mbm_total_bytes_config");
+	if (rft)
+		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
+}
+
 /**
  * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
  * @r: The resource group with which the file is associated.



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (7 preceding siblings ...)
  2022-11-04 20:00 ` [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
@ 2022-11-04 20:01 ` Babu Moger
  2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
The event configuration settings are domain specific and will affect
all the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_local_bytes_config is set to 0x15 to count all the
local event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
    0=0x15;1=0x15;2=0x15;3=0x15

    In this case, the event mbm_local_bytes is currently configured with
    0x15 on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index dea58b6b4aa4..18f9588a41cf 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1495,6 +1495,16 @@ static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
+				       struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1599,6 +1609,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_bytes_config_show,
 	},
+	{
+		.name		= "mbm_local_bytes_config",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_local_bytes_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
@@ -1712,6 +1728,10 @@ void mbm_config_rftype_init(void)
 	rft = rdtgroup_get_rftype_by_name("mbm_total_bytes_config");
 	if (rft)
 		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
+
+	rft = rdtgroup_get_rftype_by_name("mbm_local_bytes_config");
+	if (rft)
+		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
 }
 
 /**



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (8 preceding siblings ...)
  2022-11-04 20:01 ` [PATCH v8 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
@ 2022-11-04 20:01 ` Babu Moger
  2022-11-07 10:21   ` Peter Newman
                     ` (2 more replies)
  2022-11-04 20:01 ` [PATCH v8 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
                   ` (3 subsequent siblings)
  13 siblings, 3 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration for mbm_total_bytes can be changed by
the user by writing to the file
/sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
command.
	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

To change the mbm_total_bytes to count all the slow memory reads on
domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command.
	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  130 ++++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 18f9588a41cf..0cdccb69386e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void mon_event_config_write(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index >= MAX_CONFIG_EVENTS) {
+		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
+		return;
+	}
+	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+}
+
+static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
+			    u32 evtid, u32 val)
+{
+	struct mon_config_info mon_info = {0};
+	int ret = 0;
+
+	rdt_last_cmd_clear();
+
+	/* mon_config cannot be more than the supported set of events */
+	if (val > MAX_EVT_CONFIG_BITS) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Read the current config value first. If both are same then
+	 * we don't need to write it again.
+	 */
+	mon_info.evtid = evtid;
+	mondata_config_read(d, &mon_info);
+	if (mon_info.mon_config == val)
+		goto write_exit;
+
+	mon_info.mon_config = val;
+
+	/*
+	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
+	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
+	 * are scoped at the domain level. Writing any of these MSRs
+	 * on one CPU is supposed to be observed by all CPUs in the
+	 * domain. However, the hardware team recommends to update
+	 * these MSRs on all the CPUs in the domain.
+	 */
+	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
+
+	/*
+	 * When an Event Configuration is changed, the bandwidth counters
+	 * for all RMIDs and Events will be cleared by the hardware. The
+	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
+	 * every RMID on the next read to any event for every RMID.
+	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
+	 * cleared while it is tracked by the hardware. Clear the
+	 * mbm_local and mbm_total counts for all the RMIDs.
+	 */
+	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
+	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
+
+write_exit:
+	return ret;
+}
+
+static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
+{
+	char *dom_str = NULL, *id_str;
+	unsigned long dom_id, val;
+	struct rdt_domain *d;
+	int ret = 0;
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+	id_str = strsep(&dom_str, "=");
+
+	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
+		return -EINVAL;
+	}
+
+	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
+		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
+		return -EINVAL;
+	}
+
+	list_for_each_entry(d, &r->domains, list) {
+		if (d->id == dom_id) {
+			ret = mbm_config_write(r, d, evtid, val);
+			if (ret)
+				return -EINVAL;
+			goto next;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
+					    char *buf, size_t nbytes,
+					    loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1605,9 +1732,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_total_bytes_config",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_bytes_config_show,
+		.write		= mbm_total_bytes_config_write,
 	},
 	{
 		.name		= "mbm_local_bytes_config",



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (9 preceding siblings ...)
  2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
@ 2022-11-04 20:01 ` Babu Moger
  2022-11-04 20:01 ` [PATCH v8 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The current event configuration for mbm_local_bytes can be changed by
the user by writing to the configuration file
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_local_bytes_config to count all the non-temporal writes
on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex 0xc).
Run the command.
    $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command.
    $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0cdccb69386e..f37ecc16b34b 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1632,6 +1632,32 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
+					    char *buf, size_t nbytes,
+					    loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_parse(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1739,9 +1765,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_local_bytes_config",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_local_bytes_config_show,
+		.write		= mbm_local_bytes_config_write,
 	},
 	{
 		.name		= "cpus",



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (10 preceding siblings ...)
  2022-11-04 20:01 ` [PATCH v8 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
@ 2022-11-04 20:01 ` Babu Moger
  2022-11-04 20:01 ` [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
  2022-11-15 20:50 ` [PATCH v8 00/13] Support for AMD QoS " Moger, Babu
  13 siblings, 0 replies; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

The call on_each_cpu_mask() can run the function on each CPU specified
by cpumask, which may include the local processor. So, replace the call
smp_call_function_many() with on_each_cpu_mask() to simplify the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f37ecc16b34b..6b222f8e58ae 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -325,12 +325,7 @@ static void update_cpu_closid_rmid(void *info)
 static void
 update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 {
-	int cpu = get_cpu();
-
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
-	put_cpu();
+	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
 }
 
 static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
@@ -2130,13 +2125,9 @@ static int set_cache_qos_cfg(int level, bool enable)
 			/* Pick one CPU from each domain instance to update MSR */
 			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 	}
-	cpu = get_cpu();
-	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update(&enable);
-	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
-	put_cpu();
+
+	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, update, &enable, 1);
 
 	free_cpumask_var(cpu_mask);
 
@@ -2613,7 +2604,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	int i, cpu;
+	int i;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
@@ -2634,13 +2625,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
 		for (i = 0; i < hw_res->num_closid; i++)
 			hw_dom->ctrl_val[i] = r->default_ctrl;
 	}
-	cpu = get_cpu();
-	/* Update CBM on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_ctrl_update(&msr_param);
-	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-	put_cpu();
+
+	/* Update CBM on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 
 	free_cpumask_var(cpu_mask);
 



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (11 preceding siblings ...)
  2022-11-04 20:01 ` [PATCH v8 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
@ 2022-11-04 20:01 ` Babu Moger
  2022-11-23  0:26   ` Reinette Chatre
  2022-11-15 20:50 ` [PATCH v8 00/13] Support for AMD QoS " Moger, Babu
  13 siblings, 1 reply; 60+ messages in thread
From: Babu Moger @ 2022-11-04 20:01 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian

Update the documentation for the new features:
1. Slow Memory Bandwidth allocation (SMBA).
   With this feature, the QOS  enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC).
   The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not
   enough to count all the different types of memory events. With the
   feature BMEC, the users have the option to configure mbm_total_bytes
   and mbm_local_bytes to count the specific type of events.

Also add configuration instructions with examples.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
 Documentation/x86/resctrl.rst |  139 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 71a531061e4e..12adba98afc5 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
 flag bits:
 
-=============================================	================================
+===============================================	================================
 RDT (Resource Director Technology) Allocation	"rdt_a"
 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
-=============================================	================================
+SMBA (Slow Memory Bandwidth Allocation)         "smba"
+BMEC (Bandwidth Monitoring Event Configuration) "bmec"
+===============================================	================================
 
 To use the feature mount the file system::
 
@@ -161,6 +163,79 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
+                Example::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_local_bytes
+
+                If the system supports Bandwidth Monitoring Event
+                Configuration (BMEC), then the bandwidth events will
+                be configurable. The output will be::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_total_bytes_config
+                   mbm_local_bytes
+                   mbm_local_bytes_config
+
+"mbm_total_bytes_config", "mbm_local_bytes_config":
+        These files contain the current event configuration for the events
+        mbm_total_bytes and mbm_local_bytes, respectively, when the
+        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
+        The event configuration settings are domain specific and will affect
+        all the CPUs in the domain.
+
+        Following are the types of events supported:
+
+        ====    ========================================================
+        Bits    Description
+        ====    ========================================================
+        6       Dirty Victims from the QOS domain to all types of memory
+        5       Reads to slow memory in the non-local NUMA domain
+        4       Reads to slow memory in the local NUMA domain
+        3       Non-temporal writes to non-local NUMA domain
+        2       Non-temporal writes to local NUMA domain
+        1       Reads to memory in the non-local NUMA domain
+        0       Reads to memory in the local NUMA domain
+        ====    ========================================================
+
+        By default, the mbm_total_bytes configuration is set to 0x7f to count
+        all the event types and the mbm_local_bytes configuration is set to
+        0x15 to count all the local memory events.
+
+        Examples:
+
+        * To view the current configuration::
+          ::
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+            0=0x7f;1=0x7f;2=0x7f;3=0x7f
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+            0=0x15;1=0x15;3=0x15;4=0x15
+
+        * To change the mbm_total_bytes to count only reads on domain 0,
+          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
+          (in hexadecimal 0x33):
+          ::
+
+            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+            0=0x33;1=0x7f;2=0x7f;3=0x7f
+
+        * To change the mbm_local_bytes to count all the slow memory reads on
+          domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
+          in binary (in hexadecimal 0x30):
+          ::
+
+            # echo  "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+            0=0x30;1=0x30;3=0x15;4=0x15
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
@@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
 
 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
 
+Slow Memory Bandwidth Allocation (SMBA)
+---------------------------------------
+AMD hardware support Slow Memory Bandwidth Allocation (SMBA) feature.
+Currently, CXL.memory is the only supported "slow" memory device.
+With the support of SMBA, the hardware enables bandwidth allocation
+on the slow memory devices. If there are multiple such devices in the
+system, the throttling logic groups all the slow sources together
+and applies the limit on them as a whole.
+
+The presence of SMBA (with CXL.memory) is independent of slow memory
+devices presence. If there is no such devices on the system, then
+setting the configuring SMBA will have no impact on the performance
+of the system.
+
+The bandwidth domain for slow memory is L3 cache. Its schemata file
+is formatted as:
+::
+
+	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Reading/writing the schemata file
 ---------------------------------
 Reading the schemata file will show the state of all resources
@@ -479,6 +574,46 @@ which you wish to change.  E.g.
   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Reading/writing the schemata file (on AMD systems)
+--------------------------------------------------
+Reading the schemata file will show the current bandwidth limit on all
+domains. The allocated resources are in multiples of one eighth GB/s.
+When writing to the file, you need to specify what cache id you wish to
+configure the bandwidth limit.
+
+For example, to allocate 2GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "MB:1=16" > schemata
+  # cat schemata
+    MB:0=2048;1=  16;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+Reading/writing the schemata file (on AMD systems) with SMBA feature
+--------------------------------------------------------------------
+Reading and writing the schemata file is the same as without SMBA in
+above section.
+
+For example, to allocate 8GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    SMBA:0=2048;1=2048;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "SMBA:1=64" > schemata
+  # cat schemata
+    SMBA:0=2048;1=  64;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
 Cache Pseudo-Locking
 ====================
 CAT enables a user to specify the amount of cache space that an



^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
@ 2022-11-07 10:21   ` Peter Newman
  2022-11-07 18:50     ` Moger, Babu
  2022-11-07 19:00     ` Moger, Babu
  2022-11-23  0:22   ` Reinette Chatre
  2022-12-07 17:20   ` James Morse
  2 siblings, 2 replies; 60+ messages in thread
From: Peter Newman @ 2022-11-07 10:21 UTC (permalink / raw)
  To: babu.moger
  Cc: akpm, bagasdotme, bp, chang.seok.bae, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, eranian, fenghua.yu, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, reinette.chatre, sandipan.das, songmuchun, tglx,
	tony.luck, x86

Hi Babu,

On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared by the hardware. The
> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> +	 * every RMID on the next read to any event for every RMID.
> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> +	 * cleared while it is tracked by the hardware. Clear the
> +	 * mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);

Looking around, I can't find a reader for mbm_total anymore. It looks
like the last place it was used went away in James's recent change:

https://lore.kernel.org/all/20220902154829.30399-19-james.morse@arm.com

Are we supposed to be clearing arch_mbm_total now?

Thanks!
-Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-07 10:21   ` Peter Newman
@ 2022-11-07 18:50     ` Moger, Babu
  2022-11-07 19:00     ` Moger, Babu
  1 sibling, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-07 18:50 UTC (permalink / raw)
  To: Peter Newman
  Cc: akpm, bagasdotme, bp, chang.seok.bae, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, eranian, fenghua.yu, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, reinette.chatre, sandipan.das, songmuchun, tglx,
	tony.luck, x86

Hi Peter,

On 11/7/22 04:21, Peter Newman wrote:
> Hi Babu,
>
> On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>> +	 * every RMID on the next read to any event for every RMID.
>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>> +	 * cleared while it is tracked by the hardware. Clear the
>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> Looking around, I can't find a reader for mbm_total anymore. It looks
> like the last place it was used went away in James's recent change:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220902154829.30399-19-james.morse%40arm.com&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C84a9d0f934894a3031a608dac0a9db33%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638034133003350939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=i3isjCzxnBp4b2VblC7ZpH3hShUEe7unKiKfngG1kzE%3D&amp;reserved=0
>
> Are we supposed to be clearing arch_mbm_total now?

Yes. You are right. We should be using resctrl_arch_reset_rmid to reset
the rmids here.

This patch should work. Will fix it in next revision after the other comments.

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c             │
index 6b222f8e58ae..28d9d99a639e
100644                                                                  │
---
a/arch/x86/kernel/cpu/resctrl/rdtgroup.c                                                            
│
+++
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c                                                            
│
@@ -1517,7 +1517,7 @@ static int mbm_config_write(struct rdt_resource *r,
struct rdt_domain *d,          │
                            u32 evtid, u32
val)                                                          │
 {                                                                                                      
│
        struct mon_config_info mon_info =
{0};                                                           │
-       int ret =
0;                                                                                    
│
+       int ret = 0,
i;                                                                                 
│
                                                                                                        
│
       
rdt_last_cmd_clear();                                                                           
│
                                                                                                        
│
@@ -1557,8 +1557,10 @@ static int mbm_config_write(struct rdt_resource *r,
struct rdt_domain *d,         │
         * cleared while it is tracked by the hardware. Clear
the                                        │
         * mbm_local and mbm_total counts for all the
RMIDs.                                             │
        
*/                                                                                             
│
-       memset(d->mbm_local, 0, sizeof(struct mbm_state) *
r->num_rmid);                                 │
-       memset(d->mbm_total, 0, sizeof(struct mbm_state) *
r->num_rmid);                                 │
+       for (i = 0; i < r->num_rmid; i++)
{                                                              │
+               resctrl_arch_reset_rmid(r, d, i,
QOS_L3_MBM_TOTAL_EVENT_ID);                             │
+               resctrl_arch_reset_rmid(r, d, i,
QOS_L3_MBM_LOCAL_EVENT_ID);                             │
+      
}                                                                                               
│
                                                                                                        
│
 write_exit:                                                                                            
│
        return ret;                                   


Thanks

Babu


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-07 10:21   ` Peter Newman
  2022-11-07 18:50     ` Moger, Babu
@ 2022-11-07 19:00     ` Moger, Babu
  2022-11-22 23:43       ` Reinette Chatre
  1 sibling, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-07 19:00 UTC (permalink / raw)
  To: Peter Newman
  Cc: akpm, bagasdotme, bp, chang.seok.bae, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, eranian, fenghua.yu, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, reinette.chatre, sandipan.das, songmuchun, tglx,
	tony.luck, x86


On 11/7/22 04:21, Peter Newman wrote:
> Hi Babu,
>
> On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>> +	 * every RMID on the next read to any event for every RMID.
>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>> +	 * cleared while it is tracked by the hardware. Clear the
>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> Looking around, I can't find a reader for mbm_total anymore. It looks
> like the last place it was used went away in James's recent change:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220902154829.30399-19-james.morse%40arm.com&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C84a9d0f934894a3031a608dac0a9db33%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638034133003350939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=i3isjCzxnBp4b2VblC7ZpH3hShUEe7unKiKfngG1kzE%3D&amp;reserved=0
>
> Are we supposed to be clearing arch_mbm_total now?
>
Patch got garbled in previous response.

Here is it now.

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6b222f8e58ae..28d9d99a639e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1517,7 +1517,7 @@ static int mbm_config_write(struct rdt_resource *r,
struct rdt_domain *d,
                            u32 evtid, u32 val)
 {
        struct mon_config_info mon_info = {0};
-       int ret = 0;
+       int ret = 0, i;
 
        rdt_last_cmd_clear();
 
@@ -1557,8 +1557,10 @@ static int mbm_config_write(struct rdt_resource *r,
struct rdt_domain *d,
         * cleared while it is tracked by the hardware. Clear the
         * mbm_local and mbm_total counts for all the RMIDs.
         */
-       memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
-       memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
+       for (i = 0; i < r->num_rmid; i++) {
+               resctrl_arch_reset_rmid(r, d, i, QOS_L3_MBM_TOTAL_EVENT_ID);
+               resctrl_arch_reset_rmid(r, d, i, QOS_L3_MBM_LOCAL_EVENT_ID);
+       }
 
 write_exit:
        return ret;

Tthanks

Babu


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 00/13] Support for AMD QoS new features
  2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
                   ` (12 preceding siblings ...)
  2022-11-04 20:01 ` [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
@ 2022-11-15 20:50 ` Moger, Babu
  2022-11-15 21:07   ` Reinette Chatre
  13 siblings, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-15 20:50 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette and Others,

I was planning to refresh the series later this week. I have one comment
from Peter Newman.  Let me know if you have any comments.

Thanks

Babu


On 11/4/22 14:59, Babu Moger wrote:
> New AMD processors can now support following QoS features.
>
> 1. Slow Memory Bandwidth Allocation (SMBA)
>    With this feature, the QOS enforcement policies can be applied
>    to the external slow memory connected to the host. QOS enforcement
>    is accomplished by assigning a Class Of Service (COS) to a processor
>    and specifying allocations or limits for that COS for each resource
>    to be allocated.
>
>    Currently, CXL.memory is the only supported "slow" memory device. With
>    the support of SMBA feature the hardware enables bandwidth allocation
>    on the slow memory devices.
>
> 2. Bandwidth Monitoring Event Configuration (BMEC)
>    The bandwidth monitoring events mbm_total_event and mbm_local_event 
>    are set to count all the total and local reads/writes respectively.
>    With the introduction of slow memory, the two counters are not enough
>    to count all the different types are memory events. With the feature
>    BMEC, the users have the option to configure mbm_total_event and
>    mbm_local_event to count the specific type of events.
>
>    Following are the bitmaps of events supported.
>    Bits    Description
>      6       Dirty Victims from the QOS domain to all types of memory
>      5       Reads to slow memory in the non-local NUMA domain
>      4       Reads to slow memory in the local NUMA domain
>      3       Non-temporal writes to non-local NUMA domain
>      2       Non-temporal writes to local NUMA domain
>      1       Reads to memory in the non-local NUMA domain
>      0       Reads to memory in the local NUMA domain
>
> This series adds support for these features.
>
> Feature description is available in the specification, "AMD64 Technology Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
> Revision: 1.03 Issue Date: February 2022".
>
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v8:
>  Changes:
>  1. Removed init attribute for rdt_cpu_has to make it available for all the files.
>  2. Updated the change log for mon_features to correct the names of config files.
>  3. Changed configuration file name from mbm_total_config to mbm_total_bytes_config.
>     This is more consistant with other changes.
>  4. Added lock protection while reading/writing the config file.
>  5. Other few minor text changes. I have been missing few comments in last couple of
>     revisions. Hope I have addressed all of them this time.
>
> v7:
>  https://lore.kernel.org/lkml/166604543832.5345.9696970469830919982.stgit@bmoger-ubuntu/
>  Changes:
>  Not much of a change. Missed one comment from Reinette from v5. Corrected it now.
>  Few format corrections from Sanjaya.
>
> v6:
>  https://lore.kernel.org/lkml/166543345606.23830.3120625408601531368.stgit@bmoger-ubuntu/
>  Summary of changes:
>  1. Rebased on top of lastest tip tree. Fixed few minor conflicts.
>  2. Fixed format issue with scattered.c.
>  3. Removed config_name from the structure mon_evt. It is not required.
>  4. The read/write format for mbm_total_config and mbm_local_config will be same
>     as schemata format "id0=val0;id1=val1;...". This is comment from Fenghua.
>  5. Added more comments MSR_IA32_EVT_CFG_BASE writng.
>  5. Few text changes in resctrl.rst 
>  
> v5:
>   https://lore.kernel.org/lkml/166431016617.373387.1968875281081252467.stgit@bmoger-ubuntu/
>   Summary of changes.
>   1. Split the series into two. The first two patches are bug fixes. So, sent them separate.
>   2. The config files mbm_total_config and mbm_local_config are now under
>      /sys/fs/resctrl/info/L3_MON/. Removed these config files from mon groups.
>   3. Ran "checkpatch --strict --codespell" on all the patches. Looks good with few known exceptions.
>   4. Few minor text changes in resctrl.rst file. 
>
> v4:
>   https://lore.kernel.org/lkml/166257348081.1043018.11227924488792315932.stgit@bmoger-ubuntu/
>   Got numerios of comments from Reinette Chatre. Addressed most of them. 
>   Summary of changes.
>   1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.  
>   2. Updated mon_features texts if the BMEC is supported.
>   3. Added more explanation about the slow memory support.
>   4. Replaced smp_call_function_many with on_each_cpu_mask call.
>   5. Removed arch_has_empty_bitmaps
>   6. Few other text changes.
>   7. Removed Reviewed-by if the patch is modified.
>   8. Rebased the patches to latest tip.
>
> v3:
>   https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit@bmoger-ubuntu/
>   a. Rebased the patches to latest tip. Resolved some conflicts.
>      https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>   b. Taken care of feedback from Bagas Sanjaya.
>   c. Added Reviewed by from Mingo.
>   Note: I am still looking for comments from Reinette or Fenghua.
>
> v2:
>   https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/
>   a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
>   b. Added the patch to fix CBM issue on AMD. This was originally discussed
>      https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/
>
> v1:
>   https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/
>
> Babu Moger (13):
>       x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
>       x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
>       x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
>       x86/resctrl: Include new features in command line options
>       x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
>       x86/resctrl: Remove the init attribute for rdt_cpu_has()
>       x86/resctrl: Introduce data structure to support monitor configuration
>       x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
>       x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
>       x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
>       x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
>       x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
>       Documentation/x86: Update resctrl.rst for new features
>
>
>  .../admin-guide/kernel-parameters.txt         |   2 +-
>  Documentation/x86/resctrl.rst                 | 139 +++++++-
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
>  arch/x86/kernel/cpu/resctrl/core.c            |  56 +++-
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h        |  33 ++
>  arch/x86/kernel/cpu/resctrl/monitor.c         |   7 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 304 ++++++++++++++++--
>  arch/x86/kernel/cpu/scattered.c               |   2 +
>  10 files changed, 515 insertions(+), 33 deletions(-)
>
> --
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 00/13] Support for AMD QoS new features
  2022-11-15 20:50 ` [PATCH v8 00/13] Support for AMD QoS " Moger, Babu
@ 2022-11-15 21:07   ` Reinette Chatre
  2022-11-15 21:34     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-15 21:07 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/15/2022 12:50 PM, Moger, Babu wrote:
> Hi Reinette and Others,
> 
> I was planning to refresh the series later this week. I have one comment
> from Peter Newman.  Let me know if you have any comments.
> 

I am behind on resctrl work and have not had a chance to look
at this series yet.

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 00/13] Support for AMD QoS new features
  2022-11-15 21:07   ` Reinette Chatre
@ 2022-11-15 21:34     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-15 21:34 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 11/15/22 15:07, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/15/2022 12:50 PM, Moger, Babu wrote:
>> Hi Reinette and Others,
>>
>> I was planning to refresh the series later this week. I have one comment
>> from Peter Newman.  Let me know if you have any comments.
>>
> I am behind on resctrl work and have not had a chance to look
> at this series yet.

Sure. Thanks for the update. I will wait.

Thanks

Babu



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-07 19:00     ` Moger, Babu
@ 2022-11-22 23:43       ` Reinette Chatre
  2022-11-23 21:44         ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-22 23:43 UTC (permalink / raw)
  To: babu.moger, Peter Newman
  Cc: akpm, bagasdotme, bp, chang.seok.bae, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, eranian, fenghua.yu, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, sandipan.das, songmuchun, tglx, tony.luck, x86

Hi Babu,

On 11/7/2022 11:00 AM, Moger, Babu wrote:
> 
> On 11/7/22 04:21, Peter Newman wrote:
>> Hi Babu,
>>
>> On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
>>> +	/*
>>> +	 * When an Event Configuration is changed, the bandwidth counters
>>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>> +	 * every RMID on the next read to any event for every RMID.
>>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>> +	 * cleared while it is tracked by the hardware. Clear the
>>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>>> +	 */
>>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>> Looking around, I can't find a reader for mbm_total anymore. It looks
>> like the last place it was used went away in James's recent change:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220902154829.30399-19-james.morse%40arm.com&amp;data=05%7C01%7Cbabu.moger%40amd.com%7C84a9d0f934894a3031a608dac0a9db33%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638034133003350939%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=i3isjCzxnBp4b2VblC7ZpH3hShUEe7unKiKfngG1kzE%3D&amp;reserved=0
>>
>> Are we supposed to be clearing arch_mbm_total now?
>>
> Patch got garbled in previous response.
> 
> Here is it now.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6b222f8e58ae..28d9d99a639e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1517,7 +1517,7 @@ static int mbm_config_write(struct rdt_resource *r,
> struct rdt_domain *d,
>                             u32 evtid, u32 val)
>  {
>         struct mon_config_info mon_info = {0};
> -       int ret = 0;
> +       int ret = 0, i;
>  
>         rdt_last_cmd_clear();
>  
> @@ -1557,8 +1557,10 @@ static int mbm_config_write(struct rdt_resource *r,
> struct rdt_domain *d,
>          * cleared while it is tracked by the hardware. Clear the
>          * mbm_local and mbm_total counts for all the RMIDs.
>          */
> -       memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> -       memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +       for (i = 0; i < r->num_rmid; i++) {
> +               resctrl_arch_reset_rmid(r, d, i, QOS_L3_MBM_TOTAL_EVENT_ID);
> +               resctrl_arch_reset_rmid(r, d, i, QOS_L3_MBM_LOCAL_EVENT_ID);
> +       }
>  
>  write_exit:
>         return ret;

Resetting each member of an array individually seems unnecessary when the
array could just be reset as a unit. How about instead a new
resctrl_arch_reset_rmid_all() that can do so?

Reinette


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-11-04 20:00 ` [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-11-23  0:04   ` Reinette Chatre
  2022-11-23 15:13     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:04 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
> enforcement policies on the external slow memory.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 03cfbf0fe000..4b970e7192e8 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.fflags			= RFTYPE_RES_MB,
>  		},
>  	},
> +	[RDT_RESOURCE_SMBA] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_SMBA,
> +			.name			= "SMBA",
> +			.cache_level		= 3,
> +			.domains		= domain_init(RDT_RESOURCE_SMBA),
> +			.parse_ctrlval		= parse_bw,
> +			.format_str		= "%d=%*u",
> +			.fflags			= RFTYPE_RES_MB,
> +		},
> +	},
>  };
>  

Looking ahead at patch #5, I think that the initialization of
msr_base and msr_update (in rdt_init_res_defs_amd()) can be moved
here also.

>  /*
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5f7128686cfd..43d9f6f5a931 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -419,6 +419,7 @@ enum resctrl_res_level {
>  	RDT_RESOURCE_L3,
>  	RDT_RESOURCE_L2,
>  	RDT_RESOURCE_MBA,
> +	RDT_RESOURCE_SMBA,
>  
>  	/* Must be the last */
>  	RDT_NUM_RESOURCES,
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-11-04 20:00 ` [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-11-23  0:09   ` Reinette Chatre
  2022-11-23 15:16     ` Moger, Babu
  2022-11-23 18:17   ` Yu, Fenghua
  1 sibling, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:09 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> Newer AMD processors support the new feature Bandwidth Monitoring Event
> Configuration (BMEC).
> 
> The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> Bits    Field Name    Description
> 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> 
> Currently, the bandwidth monitoring events mbm_total_bytes and
> mbm_local_bytes are set to count all the total and local reads/writes
> respectively. With the introduction of slow memory, the two counters
> are not enough to count all the different types of memory events. With
> the feature BMEC, the users have the option to configure
> mbm_total_bytes and mbm_local_bytes to count the specific type of
> events.
> 
> Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
> EventID) which contains one field for each bandwidth type that can be

Looking at later patches it seems that it is not really 0xc000_0400h +
EventID but instead "0xc000_0400h + index_based_on_EventID"? This may be
too much detail for this changelog so maybe these specifics can
be deferred and just refer to the "configuration MSR".

> used to configure the bandwidth event to track any combination of
> supported bandwidth types. The event will count requests from every
> bandwidth type bit that is set in the corresponding configuration
> register.
> 
> Following are the types of events supported:
> 
> ====    ========================================================
> Bits    Description
> ====    ========================================================
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> ====    ========================================================
> 
> By default, the mbm_total_bytes configuration is set to 0x7F to count
> all the event types and the mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> Feature description is available in the specification, "AMD64
> Technology Platform Quality of Service Extensions, Revision: 1.03
> Publication
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index d68b4c9c181d..6732ca0117be 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -306,6 +306,7 @@
>  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
>  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth tracking for RSB stuffing */
>  #define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory Bandwidth Allocation */
> +#define X86_FEATURE_BMEC		(11*32+20) /* AMD Bandwidth Monitoring Event Configuration (BMEC) */

Surely a nitpick but it is strange that the two features introduced in this
series are described differently. Why does SMBA deserve the "AMD" prefix
but BMEC does not? I do not think the "(BMEC)" is necessary since
it is in X86_FEATURE_BMEC.
  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index c881bcafba7d..4555f9596ccf 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
> +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
>  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
>  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 5a5f17ed69a2..67c4d24e06ef 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> +	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-04 20:00 ` [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
@ 2022-11-23  0:12   ` Reinette Chatre
  2022-11-23 15:17     ` Moger, Babu
  2022-11-30 18:43     ` Moger, Babu
  0 siblings, 2 replies; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:12 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> The QoS slow memory configuration details are available via
> CPUID_Fn80000020_EDX_x02. Detect the available details and
> initialize the rest to defaults.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>  4 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index e31c98e2fafc..6571d08e2b0d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>  	if (!r)
>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>  
> +	/*
> +	 * The software controller support is only applicable to MBA resource.
> +	 * Make sure to check for resource type again.
> +	 */

/again/d

Not all callers of is_mba_sc() check if it is called for an MBA resource.

> +	if (r->rid != RDT_RESOURCE_MBA)
> +		return false;
> +
>  	return r->membw.mba_sc;
>  }
>  
> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>  	union cpuid_0x10_3_eax eax;
>  	union cpuid_0x10_x_edx edx;
> -	u32 ebx, ecx;
> +	u32 ebx, ecx, subleaf;
>  
> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
> +	/*
> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
> +	 */
> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
> +
> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>  	hw_res->num_closid = edx.split.cos_max + 1;
>  	r->default_ctrl = MAX_MBA_BW_AMD;
>  
> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>  	return false;
>  }
>  
> +static __init bool get_slow_mem_config(void)
> +{
> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
> +
> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
> +		return false;
> +
> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
> +
> +	return false;
> +}
> +
>  static __init bool get_rdt_alloc_resources(void)
>  {
>  	struct rdt_resource *r;
> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>  	if (get_mem_config())
>  		ret = true;
>  
> +	if (get_slow_mem_config())
> +		ret = true;
> +
>  	return ret;
>  }
>  
> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>  			hw_res->msr_update = mba_wrmsr_amd;
> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
> +			hw_res->msr_update = mba_wrmsr_amd;
>  		}
>  	}
>  }

I mentioned earlier that this can be moved to init of
rdt_resources_all[]. No strong preference, leaving here works
also.

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has()
  2022-11-04 20:00 ` [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has() Babu Moger
@ 2022-11-23  0:13   ` Reinette Chatre
  2022-11-23 17:48     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:13 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> The monitor code in resctrl/monitor.c needs to call rdt_cpu_has() to
> detect the monitor related features. It has the init attribute and
> cannot be called in non-init routines. Remove the init attribute and
> make it available for all the resctrl files.

I think this is the wrong way to go. The rdt_cpu_has() callers are
init code and they should rather get the __init attribute instead of
rdt_cpu_has() losing it.

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-11-04 20:00 ` [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-11-23  0:14   ` Reinette Chatre
  2022-11-23 18:23     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:14 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> Add a new field in mon_evt to support Bandwidth Monitoring Event
> Configuration(BMEC) and also update the "mon_features" display.
> 
> The sysfs file "mon_features" will display the monitor configuration

sysfs -> resctrl ?

> if supported.

This is not clear. "mon_features" does not display the monitor
configuration, it displays the name of the file that can be used to 
see the monitor configuration.

> 
> Before the change.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_local_bytes
> 
> After the change when BMEC is supported.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_total_bytes_config
> 	mbm_local_bytes
> 	mbm_local_bytes_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    6 ++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e30e8b23f6b5..5459b5022760 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> + * @configurable:	true if the event is configurable
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	enum resctrl_event_id	evtid;
>  	char			*name;
> +	bool			configurable;
>  	struct list_head	list;
>  };
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index efe0c30d3a12..06c2dc980855 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -750,6 +750,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  {
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> +	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
>  	unsigned int threshold;
>  	int ret;
>  
> @@ -783,6 +784,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  	if (ret)
>  		return ret;
>  
> +	if (mon_configurable) {
> +		mbm_total_event.configurable = true;
> +		mbm_local_event.configurable = true;
> +	}
> +

Is the local variable needed? Why not just:
	if (rdt_cpu_has(X86_FEATURE_BMEC))


Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-11-04 20:00 ` [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
@ 2022-11-23  0:19   ` Reinette Chatre
  2022-11-23 18:35     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:19 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:00 PM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> By default, the mbm_total_bytes_config is set to 0x7f to count all the
> event types.
> 
> For example:
>     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>     0=0x7f;1=0x7f;2=0x7f;3=0x7f
> 
>     In this case, the event mbm_total_bytes is currently configured
>     with 0x7f on domains 0 to 3.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |   28 ++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    1 
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   87 ++++++++++++++++++++++++++++++++
>  3 files changed, 116 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5459b5022760..c74285fd0f6e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -15,6 +15,7 @@
>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>  
>  #define MSR_IA32_QM_CTR			0x0c8e
>  #define MSR_IA32_QM_EVTSEL		0x0c8d
> @@ -41,6 +42,32 @@
>   */
>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>  
> +/* Reads to Local DRAM Memory */
> +#define READS_TO_LOCAL_MEM		BIT(0)
> +
> +/* Reads to Remote DRAM Memory */
> +#define READS_TO_REMOTE_MEM		BIT(1)
> +
> +/* Non-Temporal Writes to Local Memory */
> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
> +
> +/* Non-Temporal Writes to Remote Memory */
> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
> +
> +/* Reads to Local Memory the system identifies as "Slow Memory" */
> +#define READS_TO_LOCAL_S_MEM		BIT(4)
> +
> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
> +#define READS_TO_REMOTE_S_MEM		BIT(5)
> +
> +/* Dirty Victims to All Types of Memory */
> +#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
> +
> +/* Max event bits supported */
> +#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
> +
> +/* Max configurable events */
> +#define MAX_CONFIG_EVENTS		2
>  

This max being disconnected from what it is a max of looks like
a source of future confusion.

>  struct rdt_fs_context {
>  	struct kernfs_fs_context	kfc;
> @@ -542,5 +569,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
>  void __check_limbo(struct rdt_domain *d, bool force_free);
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  void __init thread_throttle_mode_init(void);
> +void mbm_config_rftype_init(void);
>  
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 06c2dc980855..a188dacab6c8 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -787,6 +787,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>  	if (mon_configurable) {
>  		mbm_total_event.configurable = true;
>  		mbm_local_event.configurable = true;
> +		mbm_config_rftype_init();
>  	}
>  
>  	l3_mon_evt_init(r);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8342feb54a7f..dea58b6b4aa4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1423,6 +1423,78 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +struct mon_config_info {
> +	u32 evtid;
> +	u32 mon_config;
> +};
> +
> +/**
> + * mon_event_config_index_get - get the index for the configurable event
> + * @evtid: event id.
> + *
> + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + *         > 1 otherwise
> + */
> +static inline unsigned int mon_event_config_index_get(u32 evtid)
> +{
> +	return evtid - QOS_L3_MBM_TOTAL_EVENT_ID;
> +}

It seems strange that the validation of the index is split
from where the index is determined. I think it would be easier
to understand, and reduce code duplication, it if is done together.

How about:
#define INVALID_CONFIG_INDEX   UINT_MAX

static inline unsigned int mon_event_config_index_get(u32 evtid)
{
	switch (evtid) {
	case QOS_L3_MBM_TOTAL_EVENT_ID:
		return 0;
	case QOS_L3_MBM_LOCAL_EVENT_ID:
		return 1;
	default:
		/* WARN */
		return INVALID_CONFIG_INDEX;
	}
}

What do you think?

> +
> +static void mon_event_config_read(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 h, index;
> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index >= MAX_CONFIG_EVENTS) {
> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
> +}
> +
> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> +{
> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> +}
> +
> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> +{
> +	struct mon_config_info mon_info = {0};

> +	struct rdt_domain *dom;
> +	bool sep = false;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	list_for_each_entry(dom, &r->domains, list) {
> +		if (sep)
> +			seq_puts(s, ";");
> +
> +		mon_info.evtid = evtid;
> +		mondata_config_read(dom, &mon_info);
> +
> +		seq_printf(s, "%d=0x%02lx", dom->id,

This is a u32 ... is just x sufficient?

> +			   mon_info.mon_config & MAX_EVT_CONFIG_BITS);

Please do this masking within mondata_config_read(). It should
not be required for every mon_config_read() caller to validate the
data because they may forget (re. patch 10).

> +		sep = true;
> +	}
> +	seq_puts(s, "\n");
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return 0;
> +}
> +
> +static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
> +				       struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	return 0;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1521,6 +1593,12 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= max_threshold_occ_show,
>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
>  	},
> +	{
> +		.name		= "mbm_total_bytes_config",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= mbm_total_bytes_config_show,
> +	},
>  	{
>  		.name		= "cpus",
>  		.mode		= 0644,
> @@ -1627,6 +1705,15 @@ void __init thread_throttle_mode_init(void)
>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
>  }
>  
> +void mbm_config_rftype_init(void)

Does this need __init?

> +{
> +	struct rftype *rft;
> +
> +	rft = rdtgroup_get_rftype_by_name("mbm_total_bytes_config");
> +	if (rft)
> +		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
> +}
> +
>  /**
>   * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
>   * @r: The resource group with which the file is associated.
> 
> 

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
  2022-11-07 10:21   ` Peter Newman
@ 2022-11-23  0:22   ` Reinette Chatre
  2022-11-23 22:44     ` Moger, Babu
  2022-12-07 17:20   ` James Morse
  2 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:22 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:01 PM, Babu Moger wrote:
> The current event configuration for mbm_total_bytes can be changed by
> the user by writing to the file
> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> 
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> 
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> For example:
> To change the mbm_total_bytes to count only reads on domain 0, the bits
> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
> command.
> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> To change the mbm_total_bytes to count all the slow memory reads on
> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> Run the command.
> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  130 ++++++++++++++++++++++++++++++++
>  1 file changed, 129 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 18f9588a41cf..0cdccb69386e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void mon_event_config_write(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 index;
> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index >= MAX_CONFIG_EVENTS) {
> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +}
> +
> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 evtid, u32 val)
> +{
> +	struct mon_config_info mon_info = {0};
> +	int ret = 0;
> +
> +	rdt_last_cmd_clear();
> +

Why is this extra clear() needed?

> +	/* mon_config cannot be more than the supported set of events */
> +	if (val > MAX_EVT_CONFIG_BITS) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Read the current config value first. If both are same then
> +	 * we don't need to write it again.

Please no "we". Maybe just "If both are the same then no need to write it again."

> +	 */
> +	mon_info.evtid = evtid;
> +	mondata_config_read(d, &mon_info);

Here I see motivation for doing validity check in mondata_config_read() as
mentioned in feedback for patch #8. If hardware decides to use the other bits
in that MSR then the check below would have trouble.

> +	if (mon_info.mon_config == val)
> +		goto write_exit;
> +

Could you please follow the custom in this area? Please see goto usage in the rest
of the file that you are changing. The label should reflect the action being
jumped to. In that sense, "write_exit" is not clear. A simple "goto out"
would be clear and matches usage in rest of file.

> +	mon_info.mon_config = val;
> +
> +	/*
> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> +	 * are scoped at the domain level. Writing any of these MSRs
> +	 * on one CPU is supposed to be observed by all CPUs in the
> +	 * domain. However, the hardware team recommends to update
> +	 * these MSRs on all the CPUs in the domain.
> +	 */
> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared by the hardware. The
> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> +	 * every RMID on the next read to any event for every RMID.
> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> +	 * cleared while it is tracked by the hardware. Clear the
> +	 * mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +
> +write_exit:
> +	return ret;
> +}
> +
> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
> +{
> +	char *dom_str = NULL, *id_str;
> +	unsigned long dom_id, val;
> +	struct rdt_domain *d;
> +	int ret = 0;
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +	id_str = strsep(&dom_str, "=");
> +
> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
> +		return -EINVAL;
> +	}

There is some duplication above ... both if () statememts
check for "!dom_str" - is one intended to be "!id_str"? 
Could both checks really mean that a "=" may be missing?

> +
> +	list_for_each_entry(d, &r->domains, list) {
> +		if (d->id == dom_id) {
> +			ret = mbm_config_write(r, d, evtid, val);
> +			if (ret)
> +				return -EINVAL;
> +			goto next;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> +					    char *buf, size_t nbytes,
> +					    loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> +

The naming here does not reflect what is done ... much more than
parsing is done here.

How about renaming mon_config_parse() to mon_config_write(), and
renaming mon_config_write() to mon_config_write_domain() ? 


Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-11-04 20:01 ` [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
@ 2022-11-23  0:26   ` Reinette Chatre
  2022-11-23 23:02     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23  0:26 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/4/2022 1:01 PM, Babu Moger wrote:
...

> @@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
>  
>  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>  
> +Slow Memory Bandwidth Allocation (SMBA)
> +---------------------------------------
> +AMD hardware support Slow Memory Bandwidth Allocation (SMBA) feature.

How about
AMD hardware supports the Slow Memory Bandwidth Allocation (SMBA) feature.
or
AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).

> +Currently, CXL.memory is the only supported "slow" memory device.

What does "Currently" mean here? If there is a plan for changes, could
that be shared? Otherwise maybe just remove it: "CXL.memory is the only
supported "slow" memory device."

> +With the support of SMBA, the hardware enables bandwidth allocation
> +on the slow memory devices. If there are multiple such devices in the
> +system, the throttling logic groups all the slow sources together
> +and applies the limit on them as a whole.
> +
> +The presence of SMBA (with CXL.memory) is independent of slow memory
> +devices presence. If there is no such devices on the system, then

Maybe "is no such device" or "are no such devices"?

> +setting the configuring SMBA will have no impact on the performance

"setting the configuring SMBA" is hard to parse. How about just
"configuring SMBA"?


Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-11-23  0:04   ` Reinette Chatre
@ 2022-11-23 15:13     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 15:13 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 2864 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 6:05 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 02/13] x86/resctrl: Add a new resource type
> RDT_RESOURCE_SMBA
> 
> Hi Babu,
> 
> On 11/4/2022 1:00 PM, Babu Moger wrote:
> > Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
> > enforcement policies on the external slow memory.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Reviewed-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
> >  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index 03cfbf0fe000..4b970e7192e8 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
> >  			.fflags			= RFTYPE_RES_MB,
> >  		},
> >  	},
> > +	[RDT_RESOURCE_SMBA] =
> > +	{
> > +		.r_resctrl = {
> > +			.rid			= RDT_RESOURCE_SMBA,
> > +			.name			= "SMBA",
> > +			.cache_level		= 3,
> > +			.domains		=
> domain_init(RDT_RESOURCE_SMBA),
> > +			.parse_ctrlval		= parse_bw,
> > +			.format_str		= "%d=%*u",
> > +			.fflags			= RFTYPE_RES_MB,
> > +		},
> > +	},
> >  };
> >
> 
> Looking ahead at patch #5, I think that the initialization of msr_base and
> msr_update (in rdt_init_res_defs_amd()) can be moved here also.

Sure. Will do.
Thanks
Babu
> 
> >  /*
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 5f7128686cfd..43d9f6f5a931 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -419,6 +419,7 @@ enum resctrl_res_level {
> >  	RDT_RESOURCE_L3,
> >  	RDT_RESOURCE_L2,
> >  	RDT_RESOURCE_MBA,
> > +	RDT_RESOURCE_SMBA,
> >
> >  	/* Must be the last */
> >  	RDT_NUM_RESOURCES,
> >
> >
> 
> Reinette

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18380 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-11-23  0:09   ` Reinette Chatre
@ 2022-11-23 15:16     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 15:16 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 7492 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 6:09 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring
> Event Configuration feature flag
> 
> Hi Babu,
> 
> On 11/4/2022 1:00 PM, Babu Moger wrote:
> > Newer AMD processors support the new feature Bandwidth Monitoring
> > Event Configuration (BMEC).
> >
> > The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> > Bits    Field Name    Description
> > 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> >
> > Currently, the bandwidth monitoring events mbm_total_bytes and
> > mbm_local_bytes are set to count all the total and local reads/writes
> > respectively. With the introduction of slow memory, the two counters
> > are not enough to count all the different types of memory events. With
> > the feature BMEC, the users have the option to configure
> > mbm_total_bytes and mbm_local_bytes to count the specific type of
> > events.
> >
> > Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
> > EventID) which contains one field for each bandwidth type that can be
> 
> Looking at later patches it seems that it is not really 0xc000_0400h + EventID
> but instead "0xc000_0400h + index_based_on_EventID"? This may be too much
> detail for this changelog so maybe these specifics can be deferred and just
> refer to the "configuration MSR".
Sure.
> 
> > used to configure the bandwidth event to track any combination of
> > supported bandwidth types. The event will count requests from every
> > bandwidth type bit that is set in the corresponding configuration
> > register.
> >
> > Following are the types of events supported:
> >
> > ====    ========================================================
> > Bits    Description
> > ====    ========================================================
> > 6       Dirty Victims from the QOS domain to all types of memory
> > 5       Reads to slow memory in the non-local NUMA domain
> > 4       Reads to slow memory in the local NUMA domain
> > 3       Non-temporal writes to non-local NUMA domain
> > 2       Non-temporal writes to local NUMA domain
> > 1       Reads to memory in the non-local NUMA domain
> > 0       Reads to memory in the local NUMA domain
> > ====    ========================================================
> >
> > By default, the mbm_total_bytes configuration is set to 0x7F to count
> > all the event types and the mbm_local_bytes configuration is set to
> > 0x15 to count all the local memory events.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03
> > Publication
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cb1bc7003
> >
> 552c454ebd7108dacce701e2%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C
> >
> 638047589785935363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=rzAi
> > HpRRXRNE37bfTt318tSj4sMhBXftW9inSi30rFk%3D&amp;reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7Cb1bc7003552c454ebd7108dacce701e2%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C638047589785935363%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=ghlUpN23gdyaJ7FZQFGgJTZOgo4LNJaE5JFLa1ezaTw
> %3D&am
> > p;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  3 files changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index d68b4c9c181d..6732ca0117be 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -306,6 +306,7 @@
> >  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on
> VM exit when EIBRS is enabled */
> >  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth
> tracking for RSB stuffing */
> >  #define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory
> Bandwidth Allocation */
> > +#define X86_FEATURE_BMEC		(11*32+20) /* AMD
> Bandwidth Monitoring Event Configuration (BMEC) */
> 
> Surely a nitpick but it is strange that the two features introduced in this series
> are described differently. Why does SMBA deserve the "AMD" prefix but BMEC
> does not? I do not think the "(BMEC)" is necessary since it is in
> X86_FEATURE_BMEC.

Sure. Wil remove AMD prefix and "BMEC)".
Thanks
Babu
> 
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> > b/arch/x86/kernel/cpu/cpuid-deps.c
> > index c881bcafba7d..4555f9596ccf 100644
> > --- a/arch/x86/kernel/cpu/cpuid-deps.c
> > +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> > @@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> >  	{ X86_FEATURE_CQM_OCCUP_LLC,
> 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_TOTAL,
> 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_LOCAL,
> 	X86_FEATURE_CQM_LLC   },
> > +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
> >  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
> >  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES
> },
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index 5a5f17ed69a2..67c4d24e06ef
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> >  	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> > +	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >
> >
> 
> Reinette

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 21536 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-23  0:12   ` Reinette Chatre
@ 2022-11-23 15:17     ` Moger, Babu
  2022-11-30 18:43     ` Moger, Babu
  1 sibling, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 15:17 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 4475 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 6:13 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory
> Bandwidth Allocation
> 
> Hi Babu,
> 
> On 11/4/2022 1:00 PM, Babu Moger wrote:
> > The QoS slow memory configuration details are available via
> > CPUID_Fn80000020_EDX_x02. Detect the available details and initialize
> > the rest to defaults.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/core.c        |   36
> +++++++++++++++++++++++++++--
> >  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
> >  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
> >  4 files changed, 41 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index e31c98e2fafc..6571d08e2b0d 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
> >  	if (!r)
> >  		return
> rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
> >
> > +	/*
> > +	 * The software controller support is only applicable to MBA resource.
> > +	 * Make sure to check for resource type again.
> > +	 */
> 
> /again/d

Sure.

> 
> Not all callers of is_mba_sc() check if it is called for an MBA resource.
> 
> > +	if (r->rid != RDT_RESOURCE_MBA)
> > +		return false;
> > +
> >  	return r->membw.mba_sc;
> >  }
> >
> > @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct
> rdt_resource *r)
> >  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> >  	union cpuid_0x10_3_eax eax;
> >  	union cpuid_0x10_x_edx edx;
> > -	u32 ebx, ecx;
> > +	u32 ebx, ecx, subleaf;
> >
> > -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
> > +	/*
> > +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
> > +	 * CPUID_Fn80000020_EDX_x02 for SMBA
> > +	 */
> > +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
> > +
> > +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
> >  	hw_res->num_closid = edx.split.cos_max + 1;
> >  	r->default_ctrl = MAX_MBA_BW_AMD;
> >
> > @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
> >  	return false;
> >  }
> >
> > +static __init bool get_slow_mem_config(void) {
> > +	struct rdt_hw_resource *hw_res =
> > +&rdt_resources_all[RDT_RESOURCE_SMBA];
> > +
> > +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
> > +		return false;
> > +
> > +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> > +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
> > +
> > +	return false;
> > +}
> > +
> >  static __init bool get_rdt_alloc_resources(void)  {
> >  	struct rdt_resource *r;
> > @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
> >  	if (get_mem_config())
> >  		ret = true;
> >
> > +	if (get_slow_mem_config())
> > +		ret = true;
> > +
> >  	return ret;
> >  }
> >
> > @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
> >  		} else if (r->rid == RDT_RESOURCE_MBA) {
> >  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
> >  			hw_res->msr_update = mba_wrmsr_amd;
> > +		} else if (r->rid == RDT_RESOURCE_SMBA) {
> > +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
> > +			hw_res->msr_update = mba_wrmsr_amd;
> >  		}
> >  	}
> >  }
> 
> I mentioned earlier that this can be moved to init of rdt_resources_all[]. No
> strong preference, leaving here works also.

Sure. Will do.
Thanks
Babu
> 
> Reinette

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 19160 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has()
  2022-11-23  0:13   ` Reinette Chatre
@ 2022-11-23 17:48     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 17:48 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,


On 11/22/22 18:13, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/4/2022 1:00 PM, Babu Moger wrote:
>> The monitor code in resctrl/monitor.c needs to call rdt_cpu_has() to
>> detect the monitor related features. It has the init attribute and
>> cannot be called in non-init routines. Remove the init attribute and
>> make it available for all the resctrl files.
> I think this is the wrong way to go. The rdt_cpu_has() callers are
> init code and they should rather get the __init attribute instead of
> rdt_cpu_has() losing it.

Ok. I will add __init  attribute to rdt_get_mon_l3_config. That should work.

Thanks

Babu


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-11-04 20:00 ` [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
  2022-11-23  0:09   ` Reinette Chatre
@ 2022-11-23 18:17   ` Yu, Fenghua
  2022-11-23 23:06     ` Moger, Babu
  1 sibling, 1 reply; 60+ messages in thread
From: Yu, Fenghua @ 2022-11-23 18:17 UTC (permalink / raw)
  To: Babu Moger, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

Hi, Babu,

> Newer AMD processors support the new feature Bandwidth Monitoring Event
> Configuration (BMEC).
> 
> The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> Bits    Field Name    Description
> 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> 
> Currently, the bandwidth monitoring events mbm_total_bytes and
> mbm_local_bytes are set to count all the total and local reads/writes
> respectively. With the introduction of slow memory, the two counters are not
> enough to count all the different types of memory events. With the feature
> BMEC, the users have the option to configure mbm_total_bytes and
> mbm_local_bytes to count the specific type of events.
> 
> Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
> EventID) which contains one field for each bandwidth type that can be used to
> configure the bandwidth event to track any combination of supported
> bandwidth types. The event will count requests from every bandwidth type bit
> that is set in the corresponding configuration register.
> 
> Following are the types of events supported:
> 
> ====    ========================================================
> Bits    Description
> ====    ========================================================
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> ====    ========================================================
> 
> By default, the mbm_total_bytes configuration is set to 0x7F to count all the
> event types and the mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> Feature description is available in the specification, "AMD64 Technology
> Platform Quality of Service Extensions, Revision: 1.03 Publication
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-
> platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index d68b4c9c181d..6732ca0117be 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -306,6 +306,7 @@
>  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM
> exit when EIBRS is enabled */
>  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth
> tracking for RSB stuffing */
>  #define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory
> Bandwidth Allocation */
> +#define X86_FEATURE_BMEC		(11*32+20) /* AMD Bandwidth
> Monitoring Event Configuration (BMEC) */
> 
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-
> deps.c
> index c881bcafba7d..4555f9596ccf 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_CQM_OCCUP_LLC,
> 	X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_TOTAL,
> 	X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_LOCAL,
> 	X86_FEATURE_CQM_LLC   },
> +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },

Shouldn't X86_FEATURE_BMEC really depend on X86_FEATURE_CQM_MBM_LOCAL and _TOTAL?

CQM_MBM_LOCAL and/or _TOTAL can be disabled but CQM_LLC can still be enabled. In this
case, BMEC shouldn't be enabled, right? But with this patch, BMEC will be enabled but it won't
work well as CQM_MBM_TOTAL/_LOCAL are not enabled.

You may remove the above line and add these two lines:

+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },  
+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },  

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-11-04 19:59 ` [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-11-23 18:21   ` Yu, Fenghua
  2022-11-23 23:09     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Yu, Fenghua @ 2022-11-23 18:21 UTC (permalink / raw)
  To: Babu Moger, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

Hi, Babu,

> diff --git a/arch/x86/include/asm/cpufeatures.h
> b/arch/x86/include/asm/cpufeatures.h
> index aefd0816a333..d68b4c9c181d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -305,6 +305,7 @@
>  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB
> during runtime firmware calls */
>  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM
> exit when EIBRS is enabled */
>  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth
> tracking for RSB stuffing */
> +#define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory
> Bandwidth Allocation */
> 
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index fc01f81f6e2a..5a5f17ed69a2 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> +	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
> 

Shouldn't X86_FEATURE_SMBA depend on X86_FEATURE_MBA? Need to add the
dependency in cpuid-deps.c

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-11-23  0:14   ` Reinette Chatre
@ 2022-11-23 18:23     ` Moger, Babu
  2022-11-23 19:05       ` Reinette Chatre
  0 siblings, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 18:23 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 11/22/22 18:14, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/4/2022 1:00 PM, Babu Moger wrote:
>> Add a new field in mon_evt to support Bandwidth Monitoring Event
>> Configuration(BMEC) and also update the "mon_features" display.
>>
>> The sysfs file "mon_features" will display the monitor configuration
> sysfs -> resctrl ?
Sure.
>
>> if supported.
> This is not clear. "mon_features" does not display the monitor
> configuration, it displays the name of the file that can be used to 
> see the monitor configuration.

Will change it to:

The sysfs -> resctrl file "mon_features" will display the supported events
and files that can be used to configure those events if monitor
configuration is supported.


>
>> Before the change.
>> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_local_bytes
>>
>> After the change when BMEC is supported.
>> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_total_bytes_config
>> 	mbm_local_bytes
>> 	mbm_local_bytes_config
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |    6 ++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
>>  3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index e30e8b23f6b5..5459b5022760 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -63,11 +63,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>>   * struct mon_evt - Entry in the event list of a resource
>>   * @evtid:		event id
>>   * @name:		name of the event
>> + * @configurable:	true if the event is configurable
>>   * @list:		entry in &rdt_resource->evt_list
>>   */
>>  struct mon_evt {
>>  	enum resctrl_event_id	evtid;
>>  	char			*name;
>> +	bool			configurable;
>>  	struct list_head	list;
>>  };
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index efe0c30d3a12..06c2dc980855 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -750,6 +750,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  {
>>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>> +	bool mon_configurable = rdt_cpu_has(X86_FEATURE_BMEC);
>>  	unsigned int threshold;
>>  	int ret;
>>  
>> @@ -783,6 +784,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	if (ret)
>>  		return ret;
>>  
>> +	if (mon_configurable) {
>> +		mbm_total_event.configurable = true;
>> +		mbm_local_event.configurable = true;
>> +	}
>> +
> Is the local variable needed? Why not just:
> 	if (rdt_cpu_has(X86_FEATURE_BMEC))


Local variable not requited. Will change it.

Thanks

Babu

>
>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 04/13] x86/resctrl: Include new features in command line options
  2022-11-04 20:00 ` [PATCH v8 04/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-11-23 18:26   ` Yu, Fenghua
  2022-11-23 23:10     ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Yu, Fenghua @ 2022-11-23 18:26 UTC (permalink / raw)
  To: Babu Moger, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

Hi, Babu,

> Add the command line options to disable the new features.
s/disable/disable or enable/

> smba : Slow Memory Bandwidth Allocation
> bmec : Bandwidth Monitor Event Configuration.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |    2 +-
>  arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index a465d5242774..f3f0870144fb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5190,7 +5190,7 @@
>  	rdt=		[HW,X86,RDT]
>  			Turn on/off individual RDT features. List is:
>  			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> -			mba.
> +			mba, smba, bmec.
>  			E.g. to turn on cmt and turn off mba use:
>  				rdt=cmt,!mba
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> index 4b970e7192e8..e31c98e2fafc 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -659,6 +659,8 @@ enum {
>  	RDT_FLAG_L2_CAT,
>  	RDT_FLAG_L2_CDP,
>  	RDT_FLAG_MBA,
> +	RDT_FLAG_SMBA,
> +	RDT_FLAG_BMEC,
>  };
> 
>  #define RDT_OPT(idx, n, f)	\
> @@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
>  	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
>  	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
>  	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
> +	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
> +	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
>  };
>  #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
> 
> 
Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-11-23  0:19   ` Reinette Chatre
@ 2022-11-23 18:35     ` Moger, Babu
  2022-11-23 22:27       ` Reinette Chatre
  0 siblings, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 18:35 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,


On 11/22/22 18:19, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/4/2022 1:00 PM, Babu Moger wrote:
>> The current event configuration can be viewed by the user by reading
>> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
>> The event configuration settings are domain specific and will affect all
>> the CPUs in the domain.
>>
>> Following are the types of events supported:
>> ====  ===========================================================
>> Bits   Description
>> ====  ===========================================================
>> 6      Dirty Victims from the QOS domain to all types of memory
>> 5      Reads to slow memory in the non-local NUMA domain
>> 4      Reads to slow memory in the local NUMA domain
>> 3      Non-temporal writes to non-local NUMA domain
>> 2      Non-temporal writes to local NUMA domain
>> 1      Reads to memory in the non-local NUMA domain
>> 0      Reads to memory in the local NUMA domain
>> ====  ===========================================================
>>
>> By default, the mbm_total_bytes_config is set to 0x7f to count all the
>> event types.
>>
>> For example:
>>     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>     0=0x7f;1=0x7f;2=0x7f;3=0x7f
>>
>>     In this case, the event mbm_total_bytes is currently configured
>>     with 0x7f on domains 0 to 3.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/internal.h |   28 ++++++++++
>>  arch/x86/kernel/cpu/resctrl/monitor.c  |    1 
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   87 ++++++++++++++++++++++++++++++++
>>  3 files changed, 116 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index 5459b5022760..c74285fd0f6e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -15,6 +15,7 @@
>>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
>>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
>>  #define MSR_IA32_SMBA_BW_BASE		0xc0000280
>> +#define MSR_IA32_EVT_CFG_BASE		0xc0000400
>>  
>>  #define MSR_IA32_QM_CTR			0x0c8e
>>  #define MSR_IA32_QM_EVTSEL		0x0c8d
>> @@ -41,6 +42,32 @@
>>   */
>>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
>>  
>> +/* Reads to Local DRAM Memory */
>> +#define READS_TO_LOCAL_MEM		BIT(0)
>> +
>> +/* Reads to Remote DRAM Memory */
>> +#define READS_TO_REMOTE_MEM		BIT(1)
>> +
>> +/* Non-Temporal Writes to Local Memory */
>> +#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
>> +
>> +/* Non-Temporal Writes to Remote Memory */
>> +#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
>> +
>> +/* Reads to Local Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_LOCAL_S_MEM		BIT(4)
>> +
>> +/* Reads to Remote Memory the system identifies as "Slow Memory" */
>> +#define READS_TO_REMOTE_S_MEM		BIT(5)
>> +
>> +/* Dirty Victims to All Types of Memory */
>> +#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
>> +
>> +/* Max event bits supported */
>> +#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
>> +
>> +/* Max configurable events */
>> +#define MAX_CONFIG_EVENTS		2
>>  
> This max being disconnected from what it is a max of looks like
> a source of future confusion.

ok, Not required anymore with your suggested change below.  Will remove it.

>
>>  struct rdt_fs_context {
>>  	struct kernfs_fs_context	kfc;
>> @@ -542,5 +569,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
>>  void __check_limbo(struct rdt_domain *d, bool force_free);
>>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>>  void __init thread_throttle_mode_init(void);
>> +void mbm_config_rftype_init(void);
>>  
>>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 06c2dc980855..a188dacab6c8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -787,6 +787,7 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>>  	if (mon_configurable) {
>>  		mbm_total_event.configurable = true;
>>  		mbm_local_event.configurable = true;
>> +		mbm_config_rftype_init();
>>  	}
>>  
>>  	l3_mon_evt_init(r);
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 8342feb54a7f..dea58b6b4aa4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1423,6 +1423,78 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>>  	return ret;
>>  }
>>  
>> +struct mon_config_info {
>> +	u32 evtid;
>> +	u32 mon_config;
>> +};
>> +
>> +/**
>> + * mon_event_config_index_get - get the index for the configurable event
>> + * @evtid: event id.
>> + *
>> + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
>> + *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
>> + *         > 1 otherwise
>> + */
>> +static inline unsigned int mon_event_config_index_get(u32 evtid)
>> +{
>> +	return evtid - QOS_L3_MBM_TOTAL_EVENT_ID;
>> +}
> It seems strange that the validation of the index is split
> from where the index is determined. I think it would be easier
> to understand, and reduce code duplication, it if is done together.
>
> How about:
> #define INVALID_CONFIG_INDEX   UINT_MAX
>
> static inline unsigned int mon_event_config_index_get(u32 evtid)
> {
> 	switch (evtid) {
> 	case QOS_L3_MBM_TOTAL_EVENT_ID:
> 		return 0;
> 	case QOS_L3_MBM_LOCAL_EVENT_ID:
> 		return 1;
> 	default:
> 		/* WARN */
> 		return INVALID_CONFIG_INDEX;
> 	}
> }
>
> What do you think?
Yes. It should work
>
>> +
>> +static void mon_event_config_read(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 h, index;
>> +
>> +	index = mon_event_config_index_get(mon_info->evtid);
>> +	if (index >= MAX_CONFIG_EVENTS) {
>> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>> +		return;
>> +	}
>> +	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
>> +}
>> +
>> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
>> +{
>> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
>> +}
>> +
>> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
>> +{
>> +	struct mon_config_info mon_info = {0};
>> +	struct rdt_domain *dom;
>> +	bool sep = false;
>> +
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	list_for_each_entry(dom, &r->domains, list) {
>> +		if (sep)
>> +			seq_puts(s, ";");
>> +
>> +		mon_info.evtid = evtid;
>> +		mondata_config_read(dom, &mon_info);
>> +
>> +		seq_printf(s, "%d=0x%02lx", dom->id,
> This is a u32 ... is just x sufficient?

I have added 0x%02lx to silence the compiler. Not required anymore.


>
>> +			   mon_info.mon_config & MAX_EVT_CONFIG_BITS);
> Please do this masking within mondata_config_read(). It should
> not be required for every mon_config_read() caller to validate the
> data because they may forget (re. patch 10).
Sure. Will do.
>
>> +		sep = true;
>> +	}
>> +	seq_puts(s, "\n");
>> +
>> +	mutex_unlock(&rdtgroup_mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
>> +				       struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
>> +
>> +	return 0;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -1521,6 +1593,12 @@ static struct rftype res_common_files[] = {
>>  		.seq_show	= max_threshold_occ_show,
>>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
>>  	},
>> +	{
>> +		.name		= "mbm_total_bytes_config",
>> +		.mode		= 0444,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= mbm_total_bytes_config_show,
>> +	},
>>  	{
>>  		.name		= "cpus",
>>  		.mode		= 0644,
>> @@ -1627,6 +1705,15 @@ void __init thread_throttle_mode_init(void)
>>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
>>  }
>>  
>> +void mbm_config_rftype_init(void)
> Does this need __init?

Not. Required. Will remove it.

Thanks

Babu

>
>> +{
>> +	struct rftype *rft;
>> +
>> +	rft = rdtgroup_get_rftype_by_name("mbm_total_bytes_config");
>> +	if (rft)
>> +		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
>> +}
>> +
>>  /**
>>   * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
>>   * @r: The resource group with which the file is associated.
>>
>>
> Reinette

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-11-23 18:23     ` Moger, Babu
@ 2022-11-23 19:05       ` Reinette Chatre
  2022-11-23 21:46         ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23 19:05 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/23/2022 10:23 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 11/22/22 18:14, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 11/4/2022 1:00 PM, Babu Moger wrote:
>>> Add a new field in mon_evt to support Bandwidth Monitoring Event
>>> Configuration(BMEC) and also update the "mon_features" display.
>>>
>>> The sysfs file "mon_features" will display the monitor configuration
>> sysfs -> resctrl ?
> Sure.
>>
>>> if supported.
>> This is not clear. "mon_features" does not display the monitor
>> configuration, it displays the name of the file that can be used to 
>> see the monitor configuration.
> 
> Will change it to:
> 
> The sysfs -> resctrl file "mon_features" will display the supported events
> and files that can be used to configure those events if monitor
> configuration is supported.
> 

I meant that "sysfs" should be replaced by "resctrl".

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-22 23:43       ` Reinette Chatre
@ 2022-11-23 21:44         ` Moger, Babu
  2022-11-23 22:22           ` Reinette Chatre
  0 siblings, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 21:44 UTC (permalink / raw)
  To: Reinette Chatre, Peter Newman
  Cc: akpm, bagasdotme, bp, chang.seok.bae, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, eranian, fenghua.yu, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, Das1, Sandipan, songmuchun, tglx, tony.luck, x86

[-- Attachment #1: Type: text/plain, Size: 5077 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 5:43 PM
> To: Moger, Babu <Babu.Moger@amd.com>; Peter Newman
> <peternewman@google.com>
> Cc: akpm@linux-foundation.org; bagasdotme@gmail.com; bp@alien8.de;
> chang.seok.bae@intel.com; corbet@lwn.net;
> damien.lemoal@opensource.wdc.com; daniel.sneddon@linux.intel.com;
> dave.hansen@linux.intel.com; eranian@google.com; fenghua.yu@intel.com;
> hpa@zytor.com; james.morse@arm.com; jmattson@google.com;
> jpoimboe@kernel.org; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org; mingo@redhat.com; paulmck@kernel.org;
> pawan.kumar.gupta@linux.intel.com; pbonzini@redhat.com;
> peterz@infradead.org; quic_neeraju@quicinc.com; rdunlap@infradead.org;
> Das1, Sandipan <Sandipan.Das@amd.com>; songmuchun@bytedance.com;
> tglx@linutronix.de; tony.luck@intel.com; x86@kernel.org
> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 11/7/2022 11:00 AM, Moger, Babu wrote:
> >
> > On 11/7/22 04:21, Peter Newman wrote:
> >> Hi Babu,
> >>
> >> On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
> >>> +	/*
> >>> +	 * When an Event Configuration is changed, the bandwidth counters
> >>> +	 * for all RMIDs and Events will be cleared by the hardware. The
> >>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> >>> +	 * every RMID on the next read to any event for every RMID.
> >>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> >>> +	 * cleared while it is tracked by the hardware. Clear the
> >>> +	 * mbm_local and mbm_total counts for all the RMIDs.
> >>> +	 */
> >>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> >>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> >> Looking around, I can't find a reader for mbm_total anymore. It looks
> >> like the last place it was used went away in James's recent change:
> >>
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor
> >> e.kernel.org%2Fall%2F20220902154829.30399-19-
> james.morse%40arm.com&am
> >>
> p;data=05%7C01%7Cbabu.moger%40amd.com%7Ccb4a2daf65b84b45aeac08da
> cce35
> >>
> 66d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63804757402544
> 6241%7
> >>
> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
> 6Ik
> >>
> 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=QZUVrpdr0YQFSJ
> BbS0BHSu
> >> q%2BhMwZHAA06MUqx98hD0U%3D&amp;reserved=0
> >>
> >> Are we supposed to be clearing arch_mbm_total now?
> >>
> > Patch got garbled in previous response.
> >
> > Here is it now.
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 6b222f8e58ae..28d9d99a639e 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1517,7 +1517,7 @@ static int mbm_config_write(struct rdt_resource
> > *r, struct rdt_domain *d,
> >                             u32 evtid, u32 val)
> >  {
> >         struct mon_config_info mon_info = {0};
> > -       int ret = 0;
> > +       int ret = 0, i;
> >
> >         rdt_last_cmd_clear();
> >
> > @@ -1557,8 +1557,10 @@ static int mbm_config_write(struct rdt_resource
> > *r, struct rdt_domain *d,
> >          * cleared while it is tracked by the hardware. Clear the
> >          * mbm_local and mbm_total counts for all the RMIDs.
> >          */
> > -       memset(d->mbm_local, 0, sizeof(struct mbm_state) *
> > r->num_rmid);
> > -       memset(d->mbm_total, 0, sizeof(struct mbm_state) *
> > r->num_rmid);
> > +       for (i = 0; i < r->num_rmid; i++) {
> > +               resctrl_arch_reset_rmid(r, d, i,
> > +QOS_L3_MBM_TOTAL_EVENT_ID);
> > +               resctrl_arch_reset_rmid(r, d, i,
> > +QOS_L3_MBM_LOCAL_EVENT_ID);
> > +       }
> >
> >  write_exit:
> >         return ret;
> 
> Resetting each member of an array individually seems unnecessary when the
> array could just be reset as a unit. How about instead a new
> resctrl_arch_reset_rmid_all() that can do so?

Yes. We can do something like this below. 

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index a188dacab6c8..2e67de911222 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -176,6 +176,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
                memset(am, 0, sizeof(*am));
 }

+void resctrl_arch_reset_rmid_all(struct rdt_domain *d)
+{
+       struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+
+       memset(hw_dom->arch_mbm_total, 0, sizeof(*hw_dom->arch_mbm_total));
+       memset(hw_dom->arch_mbm_local, 0, sizeof(*hw_dom->arch_mbm_local));
+}
+
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 {
        u64 shift = 64 - width, chunks;

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 20370 bytes --]

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-11-23 19:05       ` Reinette Chatre
@ 2022-11-23 21:46         ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 21:46 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Wednesday, November 23, 2022 1:06 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 07/13] x86/resctrl: Introduce data structure to support
> monitor configuration
> 
> Hi Babu,
> 
> On 11/23/2022 10:23 AM, Moger, Babu wrote:
> > Hi Reinette,
> >
> > On 11/22/22 18:14, Reinette Chatre wrote:
> >> Hi Babu,
> >>
> >> On 11/4/2022 1:00 PM, Babu Moger wrote:
> >>> Add a new field in mon_evt to support Bandwidth Monitoring Event
> >>> Configuration(BMEC) and also update the "mon_features" display.
> >>>
> >>> The sysfs file "mon_features" will display the monitor configuration
> >> sysfs -> resctrl ?
> > Sure.
> >>
> >>> if supported.
> >> This is not clear. "mon_features" does not display the monitor
> >> configuration, it displays the name of the file that can be used to
> >> see the monitor configuration.
> >
> > Will change it to:
> >
> > The sysfs -> resctrl file "mon_features" will display the supported
> > events and files that can be used to configure those events if monitor
> > configuration is supported.
> >
> 
> I meant that "sysfs" should be replaced by "resctrl".

Ok. Got it.
Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18236 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-23 21:44         ` Moger, Babu
@ 2022-11-23 22:22           ` Reinette Chatre
  2022-11-28 16:01             ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23 22:22 UTC (permalink / raw)
  To: Moger, Babu, Peter Newman
  Cc: akpm, bagasdotme, bp, Bae, Chang Seok, corbet, damien.lemoal,
	daniel.sneddon, dave.hansen, Eranian, Stephane, Yu, Fenghua, hpa,
	james.morse, jmattson, jpoimboe, linux-doc, linux-kernel, mingo,
	paulmck, pawan.kumar.gupta, pbonzini, peterz, quic_neeraju,
	rdunlap, Das1, Sandipan, songmuchun, tglx, Luck, Tony, x86

Hi Babu,

On 11/23/2022 1:44 PM, Moger, Babu wrote:
> [AMD Official Use Only - General]
> 
> Hi Reinette,
> 
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@intel.com>
>> Sent: Tuesday, November 22, 2022 5:43 PM
>> To: Moger, Babu <Babu.Moger@amd.com>; Peter Newman
>> <peternewman@google.com>
>> Cc: akpm@linux-foundation.org; bagasdotme@gmail.com; bp@alien8.de;
>> chang.seok.bae@intel.com; corbet@lwn.net;
>> damien.lemoal@opensource.wdc.com; daniel.sneddon@linux.intel.com;
>> dave.hansen@linux.intel.com; eranian@google.com; fenghua.yu@intel.com;
>> hpa@zytor.com; james.morse@arm.com; jmattson@google.com;
>> jpoimboe@kernel.org; linux-doc@vger.kernel.org; linux-
>> kernel@vger.kernel.org; mingo@redhat.com; paulmck@kernel.org;
>> pawan.kumar.gupta@linux.intel.com; pbonzini@redhat.com;
>> peterz@infradead.org; quic_neeraju@quicinc.com; rdunlap@infradead.org;
>> Das1, Sandipan <Sandipan.Das@amd.com>; songmuchun@bytedance.com;
>> tglx@linutronix.de; tony.luck@intel.com; x86@kernel.org
>> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
>> mbm_total_bytes_config
>>
>> Hi Babu,
>>
>> On 11/7/2022 11:00 AM, Moger, Babu wrote:
>>>
>>> On 11/7/22 04:21, Peter Newman wrote:
>>>> Hi Babu,
>>>>
>>>> On Fri, Nov 04, 2022 at 03:01:09PM -0500, Babu Moger wrote:
>>>>> + /*
>>>>> +  * When an Event Configuration is changed, the bandwidth counters
>>>>> +  * for all RMIDs and Events will be cleared by the hardware. The
>>>>> +  * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>>>> +  * every RMID on the next read to any event for every RMID.
>>>>> +  * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>>>> +  * cleared while it is tracked by the hardware. Clear the
>>>>> +  * mbm_local and mbm_total counts for all the RMIDs.
>>>>> +  */
>>>>> + memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>>>>> + memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>>>> Looking around, I can't find a reader for mbm_total anymore. It looks
>>>> like the last place it was used went away in James's recent change:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flor
>>>> e.kernel.org%2Fall%2F20220902154829.30399-19-
>> james.morse%40arm.com&am
>>>>
>> p;data=05%7C01%7Cbabu.moger%40amd.com%7Ccb4a2daf65b84b45aeac08da
>> cce35
>>>>
>> 66d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63804757402544
>> 6241%7
>>>>
>> CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
>> 6Ik
>>>>
>> 1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=QZUVrpdr0YQFSJ
>> BbS0BHSu
>>>> q%2BhMwZHAA06MUqx98hD0U%3D&amp;reserved=0
>>>>
>>>> Are we supposed to be clearing arch_mbm_total now?
>>>>
>>> Patch got garbled in previous response.
>>>
>>> Here is it now.
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 6b222f8e58ae..28d9d99a639e 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -1517,7 +1517,7 @@ static int mbm_config_write(struct rdt_resource
>>> *r, struct rdt_domain *d,
>>>                             u32 evtid, u32 val)
>>>  {
>>>         struct mon_config_info mon_info = {0};
>>> -       int ret = 0;
>>> +       int ret = 0, i;
>>>
>>>         rdt_last_cmd_clear();
>>>
>>> @@ -1557,8 +1557,10 @@ static int mbm_config_write(struct rdt_resource
>>> *r, struct rdt_domain *d,
>>>          * cleared while it is tracked by the hardware. Clear the
>>>          * mbm_local and mbm_total counts for all the RMIDs.
>>>          */
>>> -       memset(d->mbm_local, 0, sizeof(struct mbm_state) *
>>> r->num_rmid);
>>> -       memset(d->mbm_total, 0, sizeof(struct mbm_state) *
>>> r->num_rmid);
>>> +       for (i = 0; i < r->num_rmid; i++) {
>>> +               resctrl_arch_reset_rmid(r, d, i,
>>> +QOS_L3_MBM_TOTAL_EVENT_ID);
>>> +               resctrl_arch_reset_rmid(r, d, i,
>>> +QOS_L3_MBM_LOCAL_EVENT_ID);
>>> +       }
>>>
>>>  write_exit:
>>>         return ret;
>>
>> Resetting each member of an array individually seems unnecessary when the
>> array could just be reset as a unit. How about instead a new
>> resctrl_arch_reset_rmid_all() that can do so?
> 
> Yes. We can do something like this below.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index a188dacab6c8..2e67de911222 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -176,6 +176,14 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
>                 memset(am, 0, sizeof(*am));
>  }
> 
> +void resctrl_arch_reset_rmid_all(struct rdt_domain *d)
> +{
> +       struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +
> +       memset(hw_dom->arch_mbm_total, 0, sizeof(*hw_dom->arch_mbm_total));
> +       memset(hw_dom->arch_mbm_local, 0, sizeof(*hw_dom->arch_mbm_local));
> +}
> +
>  static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>  {
>         u64 shift = 64 - width, chunks;

It looks like the above would only reset the first entry of the array.
I expect that the resource should also be provided as parameter so that
the num_rmid can be obtained to be able to clear the entire array.
Also, what is the likelihood of this being called when the array does not
exist? It may be safer to wrap each memset() with an is_mbm_total_enabled()
or is_mbm_local_enabled().

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-11-23 18:35     ` Moger, Babu
@ 2022-11-23 22:27       ` Reinette Chatre
  2022-11-23 22:55         ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-23 22:27 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/23/2022 10:35 AM, Moger, Babu wrote:
> On 11/22/22 18:19, Reinette Chatre wrote:
>> On 11/4/2022 1:00 PM, Babu Moger wrote:

...

>>> @@ -1521,6 +1593,12 @@ static struct rftype res_common_files[] = {
>>>  		.seq_show	= max_threshold_occ_show,
>>>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
>>>  	},
>>> +	{
>>> +		.name		= "mbm_total_bytes_config",
>>> +		.mode		= 0444,
>>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>>> +		.seq_show	= mbm_total_bytes_config_show,
>>> +	},
>>>  	{
>>>  		.name		= "cpus",
>>>  		.mode		= 0644,
>>> @@ -1627,6 +1705,15 @@ void __init thread_throttle_mode_init(void)
>>>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
>>>  }
>>>  
>>> +void mbm_config_rftype_init(void)
>> Does this need __init?
> 
> Not. Required. Will remove it.
> 

Your response is not clear to me. I am not asking for any removal. My
question is whether the function needs the __init attribute. That is,
should this be:

void __init mbm_config_rftype_init(void)

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-23  0:22   ` Reinette Chatre
@ 2022-11-23 22:44     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 22:44 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 8549 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 6:22 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 11/4/2022 1:01 PM, Babu Moger wrote:
> > The current event configuration for mbm_total_bytes can be changed by
> > the user by writing to the file
> > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> >
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> >
> > Following are the types of events supported:
> >
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > For example:
> > To change the mbm_total_bytes to count only reads on domain 0, the
> > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> > Run the command.
> > 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > To change the mbm_total_bytes to count all the slow memory reads on
> > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> > Run the command.
> > 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  130
> > ++++++++++++++++++++++++++++++++
> >  1 file changed, 129 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 18f9588a41cf..0cdccb69386e 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct
> kernfs_open_file *of,
> >  	return 0;
> >  }
> >
> > +static void mon_event_config_write(void *info) {
> > +	struct mon_config_info *mon_info = info;
> > +	u32 index;
> > +
> > +	index = mon_event_config_index_get(mon_info->evtid);
> > +	if (index >= MAX_CONFIG_EVENTS) {
> > +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> > +		return;
> > +	}
> > +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> }
> > +
> > +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> > +			    u32 evtid, u32 val)
> > +{
> > +	struct mon_config_info mon_info = {0};
> > +	int ret = 0;
> > +
> > +	rdt_last_cmd_clear();
> > +
> 
> Why is this extra clear() needed?

I am not sure why I added that. It does not seem required. I can remove it.
> 
> > +	/* mon_config cannot be more than the supported set of events */
> > +	if (val > MAX_EVT_CONFIG_BITS) {
> > +		rdt_last_cmd_puts("Invalid event configuration\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Read the current config value first. If both are same then
> > +	 * we don't need to write it again.
> 
> Please no "we". Maybe just "If both are the same then no need to write it
> again."

Ok.

> 
> > +	 */
> > +	mon_info.evtid = evtid;
> > +	mondata_config_read(d, &mon_info);
> 
> Here I see motivation for doing validity check in mondata_config_read() as
> mentioned in feedback for patch #8. If hardware decides to use the other bits in
> that MSR then the check below would have trouble.
> 
> > +	if (mon_info.mon_config == val)
> > +		goto write_exit;
> > +
> 
> Could you please follow the custom in this area? Please see goto usage in the
> rest of the file that you are changing. The label should reflect the action being
> jumped to. In that sense, "write_exit" is not clear. A simple "goto out"
> would be clear and matches usage in rest of file.

Ok. Sure
> 
> > +	mon_info.mon_config = val;
> > +
> > +	/*
> > +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> > +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> > +	 * are scoped at the domain level. Writing any of these MSRs
> > +	 * on one CPU is supposed to be observed by all CPUs in the
> > +	 * domain. However, the hardware team recommends to update
> > +	 * these MSRs on all the CPUs in the domain.
> > +	 */
> > +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> &mon_info,
> > +1);
> > +
> > +	/*
> > +	 * When an Event Configuration is changed, the bandwidth counters
> > +	 * for all RMIDs and Events will be cleared by the hardware. The
> > +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> > +	 * every RMID on the next read to any event for every RMID.
> > +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> > +	 * cleared while it is tracked by the hardware. Clear the
> > +	 * mbm_local and mbm_total counts for all the RMIDs.
> > +	 */
> > +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> > +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> > +
> > +write_exit:
> > +	return ret;
> > +}
> > +
> > +static int mon_config_parse(struct rdt_resource *r, char *tok, u32
> > +evtid) {
> > +	char *dom_str = NULL, *id_str;
> > +	unsigned long dom_id, val;
> > +	struct rdt_domain *d;
> > +	int ret = 0;
> > +
> > +next:
> > +	if (!tok || tok[0] == '\0')
> > +		return 0;
> > +
> > +	/* Start processing the strings for each domain */
> > +	dom_str = strim(strsep(&tok, ";"));
> > +	id_str = strsep(&dom_str, "=");
> > +
> > +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric event
> configuration value\n");
> > +		return -EINVAL;
> > +	}
> 
> There is some duplication above ... both if () statememts check for "!dom_str" -
> is one intended to be "!id_str"?

The first check should be !id_str. Will correct it.

> Could both checks really mean that a "=" may be missing?

The second check failure means, there is a missing event configuration value. Will remove "missing =".

> 
> > +
> > +	list_for_each_entry(d, &r->domains, list) {
> > +		if (d->id == dom_id) {
> > +			ret = mbm_config_write(r, d, evtid, val);
> > +			if (ret)
> > +				return -EINVAL;
> > +			goto next;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> > +					    char *buf, size_t nbytes,
> > +					    loff_t off)
> > +{
> > +	struct rdt_resource *r = of->kn->parent->priv;
> > +	int ret;
> > +
> > +	/* Valid input requires a trailing newline */
> > +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> > +		return -EINVAL;
> > +
> > +	cpus_read_lock();
> > +	mutex_lock(&rdtgroup_mutex);
> > +
> > +	rdt_last_cmd_clear();
> > +
> > +	buf[nbytes - 1] = '\0';
> > +
> > +	ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> > +
> 
> The naming here does not reflect what is done ... much more than parsing is
> done here.
> 
> How about renaming mon_config_parse() to mon_config_write(), and
> renaming mon_config_write() to mon_config_write_domain() ?

Sure. Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 21054 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-11-23 22:27       ` Reinette Chatre
@ 2022-11-23 22:55         ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 22:55 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 2258 bytes --]

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Wednesday, November 23, 2022 4:28 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 11/23/2022 10:35 AM, Moger, Babu wrote:
> > On 11/22/22 18:19, Reinette Chatre wrote:
> >> On 11/4/2022 1:00 PM, Babu Moger wrote:
> 
> ...
> 
> >>> @@ -1521,6 +1593,12 @@ static struct rftype res_common_files[] = {
> >>>  		.seq_show	= max_threshold_occ_show,
> >>>  		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
> >>>  	},
> >>> +	{
> >>> +		.name		= "mbm_total_bytes_config",
> >>> +		.mode		= 0444,
> >>> +		.kf_ops		= &rdtgroup_kf_single_ops,
> >>> +		.seq_show	= mbm_total_bytes_config_show,
> >>> +	},
> >>>  	{
> >>>  		.name		= "cpus",
> >>>  		.mode		= 0644,
> >>> @@ -1627,6 +1705,15 @@ void __init thread_throttle_mode_init(void)
> >>>  	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;  }
> >>>
> >>> +void mbm_config_rftype_init(void)
> >> Does this need __init?
> >
> > Not. Required. Will remove it.
> >
> 
> Your response is not clear to me. I am not asking for any removal. My question
> is whether the function needs the __init attribute. That is, should this be:
> 
> void __init mbm_config_rftype_init(void)

Oh.. I mis-understood. 
Yes. It is called from rdt_get_mon_l3_config which will be __init routine. It seems appropriate to keep the __init attribute.
Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18461 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-11-23  0:26   ` Reinette Chatre
@ 2022-11-23 23:02     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 23:02 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 2636 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Tuesday, November 22, 2022 6:26 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new
> features
> 
> Hi Babu,
> 
> On 11/4/2022 1:01 PM, Babu Moger wrote:
> ...
> 
> > @@ -464,6 +539,26 @@ Memory bandwidth domain is L3 cache.
> >
> >  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
> >
> > +Slow Memory Bandwidth Allocation (SMBA)
> > +---------------------------------------
> > +AMD hardware support Slow Memory Bandwidth Allocation (SMBA) feature.
> 
> How about
> AMD hardware supports the Slow Memory Bandwidth Allocation (SMBA)
> feature.
> or
> AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).

Sure.
> 
> > +Currently, CXL.memory is the only supported "slow" memory device.
> 
> What does "Currently" mean here? If there is a plan for changes, could that be
> shared? Otherwise maybe just remove it: "CXL.memory is the only supported
> "slow" memory device."

There is no change of plan. I will remove "Currently"

> 
> > +With the support of SMBA, the hardware enables bandwidth allocation
> > +on the slow memory devices. If there are multiple such devices in the
> > +system, the throttling logic groups all the slow sources together and
> > +applies the limit on them as a whole.
> > +
> > +The presence of SMBA (with CXL.memory) is independent of slow memory
> > +devices presence. If there is no such devices on the system, then
> 
> Maybe "is no such device" or "are no such devices"?

It should be "If there are no such devices".   Will correct it.

> 
> > +setting the configuring SMBA will have no impact on the performance
> 
> "setting the configuring SMBA" is hard to parse. How about just "configuring
> SMBA"?

Sure.
Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18320 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-11-23 18:17   ` Yu, Fenghua
@ 2022-11-23 23:06     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 23:06 UTC (permalink / raw)
  To: Yu, Fenghua, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon, Das1,
	Sandipan, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

[-- Attachment #1: Type: text/plain, Size: 6640 bytes --]

[AMD Official Use Only - General]

Hi Fenghua,

> -----Original Message-----
> From: Yu, Fenghua <fenghua.yu@intel.com>
> Sent: Wednesday, November 23, 2022 12:17 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; Chatre, Reinette
> <reinette.chatre@intel.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de
> Cc: dave.hansen@linux.intel.com; x86@kernel.org; hpa@zytor.com;
> paulmck@kernel.org; akpm@linux-foundation.org; quic_neeraju@quicinc.com;
> rdunlap@infradead.org; damien.lemoal@opensource.wdc.com;
> songmuchun@bytedance.com; peterz@infradead.org; jpoimboe@kernel.org;
> pbonzini@redhat.com; Bae, Chang Seok <chang.seok.bae@intel.com>;
> pawan.kumar.gupta@linux.intel.com; jmattson@google.com;
> daniel.sneddon@linux.intel.com; Das1, Sandipan <Sandipan.Das@amd.com>;
> Luck, Tony <tony.luck@intel.com>; james.morse@arm.com; linux-
> doc@vger.kernel.org; linux-kernel@vger.kernel.org; bagasdotme@gmail.com;
> Eranian, Stephane <eranian@google.com>
> Subject: RE: [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring
> Event Configuration feature flag
> 
> Hi, Babu,
> 
> > Newer AMD processors support the new feature Bandwidth Monitoring
> > Event Configuration (BMEC).
> >
> > The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> > Bits    Field Name    Description
> > 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> >
> > Currently, the bandwidth monitoring events mbm_total_bytes and
> > mbm_local_bytes are set to count all the total and local reads/writes
> > respectively. With the introduction of slow memory, the two counters
> > are not enough to count all the different types of memory events. With
> > the feature BMEC, the users have the option to configure
> > mbm_total_bytes and mbm_local_bytes to count the specific type of events.
> >
> > Each BMEC event has a configuration MSR, QOS_EVT_CFG (0xc000_0400h +
> > EventID) which contains one field for each bandwidth type that can be
> > used to configure the bandwidth event to track any combination of
> > supported bandwidth types. The event will count requests from every
> > bandwidth type bit that is set in the corresponding configuration register.
> >
> > Following are the types of events supported:
> >
> > ====    ========================================================
> > Bits    Description
> > ====    ========================================================
> > 6       Dirty Victims from the QOS domain to all types of memory
> > 5       Reads to slow memory in the non-local NUMA domain
> > 4       Reads to slow memory in the local NUMA domain
> > 3       Non-temporal writes to non-local NUMA domain
> > 2       Non-temporal writes to local NUMA domain
> > 1       Reads to memory in the non-local NUMA domain
> > 0       Reads to memory in the local NUMA domain
> > ====    ========================================================
> >
> > By default, the mbm_total_bytes configuration is set to 0x7F to count
> > all the event types and the mbm_local_bytes configuration is set to
> > 0x15 to count all the local memory events.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03
> > Publication
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-
> &amp;data=05%7C0
> >
> 1%7Cbabu.moger%40amd.com%7C50e1807651fd4513648908dacd7efac0%7C3
> dd8961f
> >
> e4884e608e11a82d994e183d%7C0%7C0%7C638048242504277761%7CUnknow
> n%7CTWFp
> >
> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn
> >
> 0%3D%7C3000%7C%7C%7C&amp;sdata=5lpXbZkZ78mJ1d9PnLf7WmRT5vPogfs
> 5HaZLz76
> > x04I%3D&amp;reserved=0
> > platform-quality-service-extensions
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7C50e1807651fd4513648908dacd7efac0%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C638048242504277761%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=2CjPpzCT4JeA9VPNZIW7zxyL22xpEm2FoXQlhAz5OK
> o%3D&am
> > p;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/cpuid-deps.c   |    1 +
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  3 files changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index d68b4c9c181d..6732ca0117be 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -306,6 +306,7 @@
> >  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on
> VM
> > exit when EIBRS is enabled */
> >  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth
> > tracking for RSB stuffing */
> >  #define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory
> > Bandwidth Allocation */
> > +#define X86_FEATURE_BMEC		(11*32+20) /* AMD
> Bandwidth
> > Monitoring Event Configuration (BMEC) */
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> > instructions */
> > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> > b/arch/x86/kernel/cpu/cpuid- deps.c index c881bcafba7d..4555f9596ccf
> > 100644
> > --- a/arch/x86/kernel/cpu/cpuid-deps.c
> > +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> > @@ -68,6 +68,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> >  	{ X86_FEATURE_CQM_OCCUP_LLC,
> > 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_TOTAL,
> > 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_LOCAL,
> > 	X86_FEATURE_CQM_LLC   },
> > +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_LLC   },
> 
> Shouldn't X86_FEATURE_BMEC really depend on
> X86_FEATURE_CQM_MBM_LOCAL and _TOTAL?
> 
> CQM_MBM_LOCAL and/or _TOTAL can be disabled but CQM_LLC can still be
> enabled. In this case, BMEC shouldn't be enabled, right? But with this patch,
> BMEC will be enabled but it won't work well as CQM_MBM_TOTAL/_LOCAL
> are not enabled.

Yes. You are right.
> 
> You may remove the above line and add these two lines:
> 
> +	{ X86_FEATURE_BMEC,
> 	X86_FEATURE_CQM_MBM_TOTAL   },
> +	{ X86_FEATURE_BMEC,
> 	X86_FEATURE_CQM_MBM_LOCAL   },
> 

Sure. Will add these lines.
Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 21324 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-11-23 18:21   ` Yu, Fenghua
@ 2022-11-23 23:09     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 23:09 UTC (permalink / raw)
  To: Yu, Fenghua, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon, Das1,
	Sandipan, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]

[AMD Official Use Only - General]

Hi Fenghua,

> -----Original Message-----
> From: Yu, Fenghua <fenghua.yu@intel.com>
> Sent: Wednesday, November 23, 2022 12:22 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; Chatre, Reinette
> <reinette.chatre@intel.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de
> Cc: dave.hansen@linux.intel.com; x86@kernel.org; hpa@zytor.com;
> paulmck@kernel.org; akpm@linux-foundation.org; quic_neeraju@quicinc.com;
> rdunlap@infradead.org; damien.lemoal@opensource.wdc.com;
> songmuchun@bytedance.com; peterz@infradead.org; jpoimboe@kernel.org;
> pbonzini@redhat.com; Bae, Chang Seok <chang.seok.bae@intel.com>;
> pawan.kumar.gupta@linux.intel.com; jmattson@google.com;
> daniel.sneddon@linux.intel.com; Das1, Sandipan <Sandipan.Das@amd.com>;
> Luck, Tony <tony.luck@intel.com>; james.morse@arm.com; linux-
> doc@vger.kernel.org; linux-kernel@vger.kernel.org; bagasdotme@gmail.com;
> Eranian, Stephane <eranian@google.com>
> Subject: RE: [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth
> Allocation feature flag
> 
> Hi, Babu,
> 
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index aefd0816a333..d68b4c9c181d 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -305,6 +305,7 @@
> >  #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB
> > during runtime firmware calls */
> >  #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on
> VM
> > exit when EIBRS is enabled */
> >  #define X86_FEATURE_CALL_DEPTH		(11*32+18) /* "" Call depth
> > tracking for RSB stuffing */
> > +#define X86_FEATURE_SMBA		(11*32+19) /* Slow Memory
> > Bandwidth Allocation */
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> > instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index fc01f81f6e2a..5a5f17ed69a2
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> > +	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >
> 
> Shouldn't X86_FEATURE_SMBA depend on X86_FEATURE_MBA? Need to add
> the dependency in cpuid-deps.c

I won't consider them as dependents. They both have separate CPUID bits and MSRs.
Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18664 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 04/13] x86/resctrl: Include new features in command line options
  2022-11-23 18:26   ` Yu, Fenghua
@ 2022-11-23 23:10     ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-23 23:10 UTC (permalink / raw)
  To: Yu, Fenghua, corbet, Chatre, Reinette, tglx, mingo, bp
  Cc: dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju, rdunlap,
	damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini, Bae,
	Chang Seok, pawan.kumar.gupta, jmattson, daniel.sneddon, Das1,
	Sandipan, Luck, Tony, james.morse, linux-doc, linux-kernel,
	bagasdotme, Eranian, Stephane

[-- Attachment #1: Type: text/plain, Size: 3074 bytes --]

[AMD Official Use Only - General]

Hi Fenghua,

> -----Original Message-----
> From: Yu, Fenghua <fenghua.yu@intel.com>
> Sent: Wednesday, November 23, 2022 12:26 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; Chatre, Reinette
> <reinette.chatre@intel.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de
> Cc: dave.hansen@linux.intel.com; x86@kernel.org; hpa@zytor.com;
> paulmck@kernel.org; akpm@linux-foundation.org; quic_neeraju@quicinc.com;
> rdunlap@infradead.org; damien.lemoal@opensource.wdc.com;
> songmuchun@bytedance.com; peterz@infradead.org; jpoimboe@kernel.org;
> pbonzini@redhat.com; Bae, Chang Seok <chang.seok.bae@intel.com>;
> pawan.kumar.gupta@linux.intel.com; jmattson@google.com;
> daniel.sneddon@linux.intel.com; Das1, Sandipan <Sandipan.Das@amd.com>;
> Luck, Tony <tony.luck@intel.com>; james.morse@arm.com; linux-
> doc@vger.kernel.org; linux-kernel@vger.kernel.org; bagasdotme@gmail.com;
> Eranian, Stephane <eranian@google.com>
> Subject: RE: [PATCH v8 04/13] x86/resctrl: Include new features in command
> line options
> 
> Hi, Babu,
> 
> > Add the command line options to disable the new features.
> s/disable/disable or enable/

Sure.
Thanks
Babu
> 
> > smba : Slow Memory Bandwidth Allocation bmec : Bandwidth Monitor Event
> > Configuration.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |    2 +-
> >  arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index a465d5242774..f3f0870144fb 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5190,7 +5190,7 @@
> >  	rdt=		[HW,X86,RDT]
> >  			Turn on/off individual RDT features. List is:
> >  			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> > -			mba.
> > +			mba, smba, bmec.
> >  			E.g. to turn on cmt and turn off mba use:
> >  				rdt=cmt,!mba
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index 4b970e7192e8..e31c98e2fafc 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -659,6 +659,8 @@ enum {
> >  	RDT_FLAG_L2_CAT,
> >  	RDT_FLAG_L2_CDP,
> >  	RDT_FLAG_MBA,
> > +	RDT_FLAG_SMBA,
> > +	RDT_FLAG_BMEC,
> >  };
> >
> >  #define RDT_OPT(idx, n, f)	\
> > @@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
> >  	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
> >  	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",
> 	X86_FEATURE_CDP_L2),
> >  	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
> > +	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
> > +	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
> >  };
> >  #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
> >
> >
> Thanks.
> 
> -Fenghua

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 18643 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-23 22:22           ` Reinette Chatre
@ 2022-11-28 16:01             ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-11-28 16:01 UTC (permalink / raw)
  To: reinette.chatre
  Cc: Babu.Moger, Sandipan.Das, akpm, bagasdotme, bp, chang.seok.bae,
	corbet, damien.lemoal, daniel.sneddon, dave.hansen, eranian,
	fenghua.yu, hpa, james.morse, jmattson, jpoimboe, linux-doc,
	linux-kernel, mingo, paulmck, pawan.kumar.gupta, pbonzini,
	peternewman, peterz, quic_neeraju, rdunlap, songmuchun, tglx,
	tony.luck, x86

This thread did not land in my mailbox. Replying with git send mail link.

Ok. Sure. I am fine with these changes. Thanks

-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-23  0:12   ` Reinette Chatre
  2022-11-23 15:17     ` Moger, Babu
@ 2022-11-30 18:43     ` Moger, Babu
  2022-11-30 20:07       ` Reinette Chatre
  1 sibling, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-30 18:43 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Reinette,

On 11/22/22 18:12, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/4/2022 1:00 PM, Babu Moger wrote:
>> The QoS slow memory configuration details are available via
>> CPUID_Fn80000020_EDX_x02. Detect the available details and
>> initialize the rest to defaults.
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>>  4 files changed, 41 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index e31c98e2fafc..6571d08e2b0d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>>  	if (!r)
>>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>>  
>> +	/*
>> +	 * The software controller support is only applicable to MBA resource.
>> +	 * Make sure to check for resource type again.
>> +	 */
> /again/d
>
> Not all callers of is_mba_sc() check if it is called for an MBA resource.
>
>> +	if (r->rid != RDT_RESOURCE_MBA)
>> +		return false;
>> +
>>  	return r->membw.mba_sc;
>>  }
>>  
>> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>  	union cpuid_0x10_3_eax eax;
>>  	union cpuid_0x10_x_edx edx;
>> -	u32 ebx, ecx;
>> +	u32 ebx, ecx, subleaf;
>>  
>> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
>> +	/*
>> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
>> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
>> +	 */
>> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
>> +
>> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>>  	hw_res->num_closid = edx.split.cos_max + 1;
>>  	r->default_ctrl = MAX_MBA_BW_AMD;
>>  
>> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>>  	return false;
>>  }
>>  
>> +static __init bool get_slow_mem_config(void)
>> +{
>> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
>> +
>> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
>> +		return false;
>> +
>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
>> +
>> +	return false;
>> +}
>> +
>>  static __init bool get_rdt_alloc_resources(void)
>>  {
>>  	struct rdt_resource *r;
>> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>>  	if (get_mem_config())
>>  		ret = true;
>>  
>> +	if (get_slow_mem_config())
>> +		ret = true;
>> +
>>  	return ret;
>>  }
>>  
>> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>  			hw_res->msr_update = mba_wrmsr_amd;
>> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
>> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
>> +			hw_res->msr_update = mba_wrmsr_amd;
>>  		}
>>  	}
>>  }
> I mentioned earlier that this can be moved to init of
> rdt_resources_all[]. No strong preference, leaving here works
> also.

I am little confused about this comment. Initialization of
rdt_resources_all in core.c is mostly generic initialization. The msr_base
and msr_update routines here are vendor specific. I would prefer to keep
this in

rdt_init_res_defs_amd.Is that ok?

Thanks
Babu



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-30 18:43     ` Moger, Babu
@ 2022-11-30 20:07       ` Reinette Chatre
  2022-11-30 20:40         ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-11-30 20:07 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/30/2022 10:43 AM, Moger, Babu wrote:
> On 11/22/22 18:12, Reinette Chatre wrote:
>> On 11/4/2022 1:00 PM, Babu Moger wrote:
>>> The QoS slow memory configuration details are available via
>>> CPUID_Fn80000020_EDX_x02. Detect the available details and
>>> initialize the rest to defaults.
>>>
>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>> ---
>>>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>>>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
>>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>>>  4 files changed, 41 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index e31c98e2fafc..6571d08e2b0d 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>>>  	if (!r)
>>>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>>>  
>>> +	/*
>>> +	 * The software controller support is only applicable to MBA resource.
>>> +	 * Make sure to check for resource type again.
>>> +	 */
>> /again/d
>>
>> Not all callers of is_mba_sc() check if it is called for an MBA resource.
>>
>>> +	if (r->rid != RDT_RESOURCE_MBA)
>>> +		return false;
>>> +
>>>  	return r->membw.mba_sc;
>>>  }
>>>  
>>> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>  	union cpuid_0x10_3_eax eax;
>>>  	union cpuid_0x10_x_edx edx;
>>> -	u32 ebx, ecx;
>>> +	u32 ebx, ecx, subleaf;
>>>  
>>> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
>>> +	/*
>>> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
>>> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
>>> +	 */
>>> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
>>> +
>>> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>>>  	hw_res->num_closid = edx.split.cos_max + 1;
>>>  	r->default_ctrl = MAX_MBA_BW_AMD;
>>>  
>>> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>>>  	return false;
>>>  }
>>>  
>>> +static __init bool get_slow_mem_config(void)
>>> +{
>>> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
>>> +
>>> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
>>> +		return false;
>>> +
>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>>> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
>>> +
>>> +	return false;
>>> +}
>>> +
>>>  static __init bool get_rdt_alloc_resources(void)
>>>  {
>>>  	struct rdt_resource *r;
>>> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>>>  	if (get_mem_config())
>>>  		ret = true;
>>>  
>>> +	if (get_slow_mem_config())
>>> +		ret = true;
>>> +
>>>  	return ret;
>>>  }
>>>  
>>> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>>  			hw_res->msr_update = mba_wrmsr_amd;
>>> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
>>> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
>>> +			hw_res->msr_update = mba_wrmsr_amd;
>>>  		}
>>>  	}
>>>  }
>> I mentioned earlier that this can be moved to init of
>> rdt_resources_all[]. No strong preference, leaving here works
>> also.
> 
> I am little confused about this comment. Initialization of
> rdt_resources_all in core.c is mostly generic initialization. The msr_base
> and msr_update routines here are vendor specific. I would prefer to keep
> this in

This is a contradiction. Yes, rdt_resources_all[] initialization in core.c
is indeed generic initialization, so why is SMBA there? If this was really
generic initialization then the entire initialization of SMBA resource
should rather move to AMD specific code.

SMBA is an AMD only feature yet its resource initialization is fragmented
with one portion treated as generic and another portion treated as vendor
specific while it all is vendor specific.

The current fragmentation is not clear to me. Keeping the initialization
as you have in patch #2 is the simplest and that is what prompted me
to suggest the move to keep initialization together at that location.

> 
> rdt_init_res_defs_amd.Is that ok?

The generic vs non-generic initialization argument is not convincing to me. 
Could you please elaborate why you prefer it this way? I already mentioned
that I do not have a strong preference but I would like to understand what
the motivation for this split initialization is.

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-30 20:07       ` Reinette Chatre
@ 2022-11-30 20:40         ` Moger, Babu
  2022-12-01  0:35           ` Reinette Chatre
  0 siblings, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-11-30 20:40 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian


On 11/30/22 14:07, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/30/2022 10:43 AM, Moger, Babu wrote:
>> On 11/22/22 18:12, Reinette Chatre wrote:
>>> On 11/4/2022 1:00 PM, Babu Moger wrote:
>>>> The QoS slow memory configuration details are available via
>>>> CPUID_Fn80000020_EDX_x02. Detect the available details and
>>>> initialize the rest to defaults.
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>> ---
>>>>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>>>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>>>>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
>>>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>>>>  4 files changed, 41 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>> index e31c98e2fafc..6571d08e2b0d 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>>>>  	if (!r)
>>>>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>>>>  
>>>> +	/*
>>>> +	 * The software controller support is only applicable to MBA resource.
>>>> +	 * Make sure to check for resource type again.
>>>> +	 */
>>> /again/d
>>>
>>> Not all callers of is_mba_sc() check if it is called for an MBA resource.
>>>
>>>> +	if (r->rid != RDT_RESOURCE_MBA)
>>>> +		return false;
>>>> +
>>>>  	return r->membw.mba_sc;
>>>>  }
>>>>  
>>>> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>>>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>>  	union cpuid_0x10_3_eax eax;
>>>>  	union cpuid_0x10_x_edx edx;
>>>> -	u32 ebx, ecx;
>>>> +	u32 ebx, ecx, subleaf;
>>>>  
>>>> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
>>>> +	/*
>>>> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
>>>> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
>>>> +	 */
>>>> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
>>>> +
>>>> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>>>>  	hw_res->num_closid = edx.split.cos_max + 1;
>>>>  	r->default_ctrl = MAX_MBA_BW_AMD;
>>>>  
>>>> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>>>>  	return false;
>>>>  }
>>>>  
>>>> +static __init bool get_slow_mem_config(void)
>>>> +{
>>>> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
>>>> +
>>>> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
>>>> +		return false;
>>>> +
>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>>>> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
>>>> +
>>>> +	return false;
>>>> +}
>>>> +
>>>>  static __init bool get_rdt_alloc_resources(void)
>>>>  {
>>>>  	struct rdt_resource *r;
>>>> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>>>>  	if (get_mem_config())
>>>>  		ret = true;
>>>>  
>>>> +	if (get_slow_mem_config())
>>>> +		ret = true;
>>>> +
>>>>  	return ret;
>>>>  }
>>>>  
>>>> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>>>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>>>  			hw_res->msr_update = mba_wrmsr_amd;
>>>> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
>>>> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
>>>> +			hw_res->msr_update = mba_wrmsr_amd;
>>>>  		}
>>>>  	}
>>>>  }
>>> I mentioned earlier that this can be moved to init of
>>> rdt_resources_all[]. No strong preference, leaving here works
>>> also.
>> I am little confused about this comment. Initialization of
>> rdt_resources_all in core.c is mostly generic initialization. The msr_base
>> and msr_update routines here are vendor specific. I would prefer to keep
>> this in
> This is a contradiction. Yes, rdt_resources_all[] initialization in core.c
> is indeed generic initialization, so why is SMBA there? If this was really
> generic initialization then the entire initialization of SMBA resource
> should rather move to AMD specific code.
>
> SMBA is an AMD only feature yet its resource initialization is fragmented
> with one portion treated as generic and another portion treated as vendor
> specific while it all is vendor specific.
>
> The current fragmentation is not clear to me. Keeping the initialization
> as you have in patch #2 is the simplest and that is what prompted me
> to suggest the move to keep initialization together at that location.
>
>> rdt_init_res_defs_amd.Is that ok?
> The generic vs non-generic initialization argument is not convincing to me. 
> Could you please elaborate why you prefer it this way? I already mentioned
> that I do not have a strong preference but I would like to understand what
> the motivation for this split initialization is.
>
I dont have any strong argument. I was thinking, in case Intel supports
this resource in the future then they only have to change
rdt_init_res_defs_intel.

Thanks

Babu


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-11-30 20:40         ` Moger, Babu
@ 2022-12-01  0:35           ` Reinette Chatre
  2022-12-01 13:56             ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: Reinette Chatre @ 2022-12-01  0:35 UTC (permalink / raw)
  To: babu.moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

Hi Babu,

On 11/30/2022 12:40 PM, Moger, Babu wrote:
> On 11/30/22 14:07, Reinette Chatre wrote:
>> On 11/30/2022 10:43 AM, Moger, Babu wrote:
>>> On 11/22/22 18:12, Reinette Chatre wrote:
>>>> On 11/4/2022 1:00 PM, Babu Moger wrote:
>>>>> The QoS slow memory configuration details are available via
>>>>> CPUID_Fn80000020_EDX_x02. Detect the available details and
>>>>> initialize the rest to defaults.
>>>>>
>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>>> ---
>>>>>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>>>>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>>>>>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
>>>>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>>>>>  4 files changed, 41 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> index e31c98e2fafc..6571d08e2b0d 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>>>>>  	if (!r)
>>>>>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>>>>>  
>>>>> +	/*
>>>>> +	 * The software controller support is only applicable to MBA resource.
>>>>> +	 * Make sure to check for resource type again.
>>>>> +	 */
>>>> /again/d
>>>>
>>>> Not all callers of is_mba_sc() check if it is called for an MBA resource.
>>>>
>>>>> +	if (r->rid != RDT_RESOURCE_MBA)
>>>>> +		return false;
>>>>> +
>>>>>  	return r->membw.mba_sc;
>>>>>  }
>>>>>  
>>>>> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>>>>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>>>>>  	union cpuid_0x10_3_eax eax;
>>>>>  	union cpuid_0x10_x_edx edx;
>>>>> -	u32 ebx, ecx;
>>>>> +	u32 ebx, ecx, subleaf;
>>>>>  
>>>>> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
>>>>> +	/*
>>>>> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
>>>>> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
>>>>> +	 */
>>>>> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
>>>>> +
>>>>> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>>>>>  	hw_res->num_closid = edx.split.cos_max + 1;
>>>>>  	r->default_ctrl = MAX_MBA_BW_AMD;
>>>>>  
>>>>> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>>>>>  	return false;
>>>>>  }
>>>>>  
>>>>> +static __init bool get_slow_mem_config(void)
>>>>> +{
>>>>> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
>>>>> +
>>>>> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
>>>>> +		return false;
>>>>> +
>>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
>>>>> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
>>>>> +
>>>>> +	return false;
>>>>> +}
>>>>> +
>>>>>  static __init bool get_rdt_alloc_resources(void)
>>>>>  {
>>>>>  	struct rdt_resource *r;
>>>>> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>>>>>  	if (get_mem_config())
>>>>>  		ret = true;
>>>>>  
>>>>> +	if (get_slow_mem_config())
>>>>> +		ret = true;
>>>>> +
>>>>>  	return ret;
>>>>>  }
>>>>>  
>>>>> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>>>>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>>>>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>>>>>  			hw_res->msr_update = mba_wrmsr_amd;
>>>>> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
>>>>> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
>>>>> +			hw_res->msr_update = mba_wrmsr_amd;
>>>>>  		}
>>>>>  	}
>>>>>  }
>>>> I mentioned earlier that this can be moved to init of
>>>> rdt_resources_all[]. No strong preference, leaving here works
>>>> also.
>>> I am little confused about this comment. Initialization of
>>> rdt_resources_all in core.c is mostly generic initialization. The msr_base
>>> and msr_update routines here are vendor specific. I would prefer to keep
>>> this in
>> This is a contradiction. Yes, rdt_resources_all[] initialization in core.c
>> is indeed generic initialization, so why is SMBA there? If this was really
>> generic initialization then the entire initialization of SMBA resource
>> should rather move to AMD specific code.
>>
>> SMBA is an AMD only feature yet its resource initialization is fragmented
>> with one portion treated as generic and another portion treated as vendor
>> specific while it all is vendor specific.
>>
>> The current fragmentation is not clear to me. Keeping the initialization
>> as you have in patch #2 is the simplest and that is what prompted me
>> to suggest the move to keep initialization together at that location.
>>
>>> rdt_init_res_defs_amd.Is that ok?
>> The generic vs non-generic initialization argument is not convincing to me. 
>> Could you please elaborate why you prefer it this way? I already mentioned
>> that I do not have a strong preference but I would like to understand what
>> the motivation for this split initialization is.
>>
> I dont have any strong argument. I was thinking, in case Intel supports
> this resource in the future then they only have to change
> rdt_init_res_defs_intel.

I agree that this is not a strong argument. If this happens then Intel can split
the initialization also. This is also not the only bits that would need
changing since only __rdt_get_mem_config_amd() can initialize an SMBA
resource.

It does not sound like there is a clear winner. To answer your earlier question
more succinctly, yes, from my perspective you can keep the change to
rdt_init_res_defs_amd(). At least with this change things would be more
familiar between MBA and SMBA and it will be obvious that SMBA is not
supported by Intel.

Reinette

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-12-01  0:35           ` Reinette Chatre
@ 2022-12-01 13:56             ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-12-01 13:56 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian

[-- Attachment #1: Type: text/plain, Size: 7117 bytes --]

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Wednesday, November 30, 2022 6:36 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com
> Subject: Re: [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory
> Bandwidth Allocation
> 
> Hi Babu,
> 
> On 11/30/2022 12:40 PM, Moger, Babu wrote:
> > On 11/30/22 14:07, Reinette Chatre wrote:
> >> On 11/30/2022 10:43 AM, Moger, Babu wrote:
> >>> On 11/22/22 18:12, Reinette Chatre wrote:
> >>>> On 11/4/2022 1:00 PM, Babu Moger wrote:
> >>>>> The QoS slow memory configuration details are available via
> >>>>> CPUID_Fn80000020_EDX_x02. Detect the available details and
> >>>>> initialize the rest to defaults.
> >>>>>
> >>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >>>>> ---
> >>>>>  arch/x86/kernel/cpu/resctrl/core.c        |   36
> +++++++++++++++++++++++++++--
> >>>>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
> >>>>>  arch/x86/kernel/cpu/resctrl/internal.h    |    1 +
> >>>>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
> >>>>>  4 files changed, 41 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> >>>>> b/arch/x86/kernel/cpu/resctrl/core.c
> >>>>> index e31c98e2fafc..6571d08e2b0d 100644
> >>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
> >>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> >>>>> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
> >>>>>  	if (!r)
> >>>>>  		return
> >>>>> rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
> >>>>>
> >>>>> +	/*
> >>>>> +	 * The software controller support is only applicable to MBA
> resource.
> >>>>> +	 * Make sure to check for resource type again.
> >>>>> +	 */
> >>>> /again/d
> >>>>
> >>>> Not all callers of is_mba_sc() check if it is called for an MBA resource.
> >>>>
> >>>>> +	if (r->rid != RDT_RESOURCE_MBA)
> >>>>> +		return false;
> >>>>> +
> >>>>>  	return r->membw.mba_sc;
> >>>>>  }
> >>>>>
> >>>>> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct
> rdt_resource *r)
> >>>>>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> >>>>>  	union cpuid_0x10_3_eax eax;
> >>>>>  	union cpuid_0x10_x_edx edx;
> >>>>> -	u32 ebx, ecx;
> >>>>> +	u32 ebx, ecx, subleaf;
> >>>>>
> >>>>> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
> >>>>> +	/*
> >>>>> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
> >>>>> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
> >>>>> +	 */
> >>>>> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
> >>>>> +
> >>>>> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx,
> >>>>> +&edx.full);
> >>>>>  	hw_res->num_closid = edx.split.cos_max + 1;
> >>>>>  	r->default_ctrl = MAX_MBA_BW_AMD;
> >>>>>
> >>>>> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
> >>>>>  	return false;
> >>>>>  }
> >>>>>
> >>>>> +static __init bool get_slow_mem_config(void) {
> >>>>> +	struct rdt_hw_resource *hw_res =
> >>>>> +&rdt_resources_all[RDT_RESOURCE_SMBA];
> >>>>> +
> >>>>> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
> >>>>> +		return false;
> >>>>> +
> >>>>> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> >>>>> +		return __rdt_get_mem_config_amd(&hw_res-
> >r_resctrl);
> >>>>> +
> >>>>> +	return false;
> >>>>> +}
> >>>>> +
> >>>>>  static __init bool get_rdt_alloc_resources(void)  {
> >>>>>  	struct rdt_resource *r;
> >>>>> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
> >>>>>  	if (get_mem_config())
> >>>>>  		ret = true;
> >>>>>
> >>>>> +	if (get_slow_mem_config())
> >>>>> +		ret = true;
> >>>>> +
> >>>>>  	return ret;
> >>>>>  }
> >>>>>
> >>>>> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
> >>>>>  		} else if (r->rid == RDT_RESOURCE_MBA) {
> >>>>>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
> >>>>>  			hw_res->msr_update = mba_wrmsr_amd;
> >>>>> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
> >>>>> +			hw_res->msr_base =
> MSR_IA32_SMBA_BW_BASE;
> >>>>> +			hw_res->msr_update = mba_wrmsr_amd;
> >>>>>  		}
> >>>>>  	}
> >>>>>  }
> >>>> I mentioned earlier that this can be moved to init of
> >>>> rdt_resources_all[]. No strong preference, leaving here works also.
> >>> I am little confused about this comment. Initialization of
> >>> rdt_resources_all in core.c is mostly generic initialization. The
> >>> msr_base and msr_update routines here are vendor specific. I would
> >>> prefer to keep this in
> >> This is a contradiction. Yes, rdt_resources_all[] initialization in
> >> core.c is indeed generic initialization, so why is SMBA there? If
> >> this was really generic initialization then the entire initialization
> >> of SMBA resource should rather move to AMD specific code.
> >>
> >> SMBA is an AMD only feature yet its resource initialization is
> >> fragmented with one portion treated as generic and another portion
> >> treated as vendor specific while it all is vendor specific.
> >>
> >> The current fragmentation is not clear to me. Keeping the
> >> initialization as you have in patch #2 is the simplest and that is
> >> what prompted me to suggest the move to keep initialization together at
> that location.
> >>
> >>> rdt_init_res_defs_amd.Is that ok?
> >> The generic vs non-generic initialization argument is not convincing to me.
> >> Could you please elaborate why you prefer it this way? I already
> >> mentioned that I do not have a strong preference but I would like to
> >> understand what the motivation for this split initialization is.
> >>
> > I dont have any strong argument. I was thinking, in case Intel
> > supports this resource in the future then they only have to change
> > rdt_init_res_defs_intel.
> 
> I agree that this is not a strong argument. If this happens then Intel can split the
> initialization also. This is also not the only bits that would need changing since
> only __rdt_get_mem_config_amd() can initialize an SMBA resource.
> 
> It does not sound like there is a clear winner. To answer your earlier question
> more succinctly, yes, from my perspective you can keep the change to
> rdt_init_res_defs_amd(). At least with this change things would be more
> familiar between MBA and SMBA and it will be obvious that SMBA is not
> supported by Intel.

Will do. Thanks
Babu

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 20744 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
  2022-11-07 10:21   ` Peter Newman
  2022-11-23  0:22   ` Reinette Chatre
@ 2022-12-07 17:20   ` James Morse
  2022-12-07 17:24     ` James Morse
  2022-12-08  0:02     ` Moger, Babu
  2 siblings, 2 replies; 60+ messages in thread
From: James Morse @ 2022-12-07 17:20 UTC (permalink / raw)
  To: Babu Moger
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian, corbet, tglx, mingo, bp, reinette.chatre

Hi Babu,

(Nit: all the 'sysfs' in the subjects should really be 'resctrl', but as they already have
'x86/resctrl', could you just remove the sysfs?
This patch would be "x86/resctrl: Add interface to write mbm_total_bytes_config")

On 04/11/2022 20:01, Babu Moger wrote:
> The current event configuration for mbm_total_bytes can be changed by
> the user by writing to the file
> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> 
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> 
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> For example:
> To change the mbm_total_bytes to count only reads on domain 0, the bits
> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
> command.
> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> To change the mbm_total_bytes to count all the slow memory reads on
> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> Run the command.
> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 18f9588a41cf..0cdccb69386e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void mon_event_config_write(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 index;
> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index >= MAX_CONFIG_EVENTS) {
> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +}
> +
> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> +			    u32 evtid, u32 val)
> +{
> +	struct mon_config_info mon_info = {0};
> +	int ret = 0;
> +
> +	rdt_last_cmd_clear();
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (val > MAX_EVT_CONFIG_BITS) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Read the current config value first. If both are same then
> +	 * we don't need to write it again.
> +	 */
> +	mon_info.evtid = evtid;

> +	mondata_config_read(d, &mon_info);

This reads the MSR on this CPU, which gets the result for this domain...


> +	if (mon_info.mon_config == val)
> +		goto write_exit;
> +
> +	mon_info.mon_config = val;
> +
> +	/*
> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> +	 * are scoped at the domain level. Writing any of these MSRs
> +	 * on one CPU is supposed to be observed by all CPUs in the
> +	 * domain. However, the hardware team recommends to update
> +	 * these MSRs on all the CPUs in the domain.
> +	 */

> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);

... but here you IPI all the CPUs in the target domain to update them.

This means you unnecessarily IPI the CPUs in the target domain if they already had this
value, but the write syscall occurred on a domain that differs. This isn't what you
intended, but its benign.
More of a problem is: Won't this get skipped if the write syscall occurs on a domain that
happens to have the target configuration already?

Because you need the same value to be written on every CPU ... what happens to CPUs that
are offline when the configuration is changed? Do they keep their previous value, or does
it get reset?


I think this is best solved with a percpu variable for the current value of the MSR. You
can then read it for CPUs in a remote domain, and only issue IPIs to 'sync' the value if
needed. You can then re-use the sync call in resctrl_online_cpu() to set the MSR to
whatever value it should currently be.


> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared by the hardware. The
> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> +	 * every RMID on the next read to any event for every RMID.
> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> +	 * cleared while it is tracked by the hardware. Clear the
> +	 * mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +
> +write_exit:
> +	return ret;
> +}


> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
> +{
> +	char *dom_str = NULL, *id_str;
> +	unsigned long dom_id, val;
> +	struct rdt_domain *d;
> +	int ret = 0;
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +	id_str = strsep(&dom_str, "=");
> +
> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
> +		return -EINVAL;
> +	}

This is parsing the same format strings as parse_line(). Is there any chance that code
could be re-used instead of duplicated? This way anything that is added to the format (or
bugs found!) only need supporting in once place.



> +	list_for_each_entry(d, &r->domains, list) {
> +		if (d->id == dom_id) {
> +			ret = mbm_config_write(r, d, evtid, val);
> +			if (ret)
> +				return -EINVAL;
> +			goto next;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-07 17:20   ` James Morse
@ 2022-12-07 17:24     ` James Morse
  2022-12-08  0:02     ` Moger, Babu
  1 sibling, 0 replies; 60+ messages in thread
From: James Morse @ 2022-12-07 17:24 UTC (permalink / raw)
  To: Babu Moger
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian, corbet, tglx, mingo, bp, reinette.chatre

Bah! Sorry, I thought I was looking at v9!
(but the same comments apply)


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-07 17:20   ` James Morse
  2022-12-07 17:24     ` James Morse
@ 2022-12-08  0:02     ` Moger, Babu
  2022-12-13 17:55       ` James Morse
  1 sibling, 1 reply; 60+ messages in thread
From: Moger, Babu @ 2022-12-08  0:02 UTC (permalink / raw)
  To: James Morse
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian, corbet, tglx, mingo, bp, reinette.chatre

[AMD Official Use Only - General]

Hi James,

> -----Original Message-----
> From: James Morse <james.morse@arm.com>
> Sent: Wednesday, December 7, 2022 11:21 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org; bagasdotme@gmail.com; eranian@google.com;
> corbet@lwn.net; tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> reinette.chatre@intel.com
> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> (Nit: all the 'sysfs' in the subjects should really be 'resctrl', but as they already
> have 'x86/resctrl', could you just remove the sysfs?
> This patch would be "x86/resctrl: Add interface to write
> mbm_total_bytes_config")

Sure. Will change it.
> 
> On 04/11/2022 20:01, Babu Moger wrote:
> > The current event configuration for mbm_total_bytes can be changed by
> > the user by writing to the file
> > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> >
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> >
> > Following are the types of events supported:
> >
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > For example:
> > To change the mbm_total_bytes to count only reads on domain 0, the
> > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> > Run the command.
> > 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > To change the mbm_total_bytes to count all the slow memory reads on
> > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> > Run the command.
> > 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 18f9588a41cf..0cdccb69386e 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct
> kernfs_open_file *of,
> >  	return 0;
> >  }
> >
> > +static void mon_event_config_write(void *info) {
> > +	struct mon_config_info *mon_info = info;
> > +	u32 index;
> > +
> > +	index = mon_event_config_index_get(mon_info->evtid);
> > +	if (index >= MAX_CONFIG_EVENTS) {
> > +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> > +		return;
> > +	}
> > +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> }
> > +
> > +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> > +			    u32 evtid, u32 val)
> > +{
> > +	struct mon_config_info mon_info = {0};
> > +	int ret = 0;
> > +
> > +	rdt_last_cmd_clear();
> > +
> > +	/* mon_config cannot be more than the supported set of events */
> > +	if (val > MAX_EVT_CONFIG_BITS) {
> > +		rdt_last_cmd_puts("Invalid event configuration\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Read the current config value first. If both are same then
> > +	 * we don't need to write it again.
> > +	 */
> > +	mon_info.evtid = evtid;
> 
> > +	mondata_config_read(d, &mon_info);
> 
> This reads the MSR on this CPU, which gets the result for this domain...

[1] No. This read happens at the target domain. 

static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
{
        smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
}

> 
> 
> > +	if (mon_info.mon_config == val)
> > +		goto write_exit;
> > +
> > +	mon_info.mon_config = val;
> > +
> > +	/*
> > +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> > +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> > +	 * are scoped at the domain level. Writing any of these MSRs
> > +	 * on one CPU is supposed to be observed by all CPUs in the
> > +	 * domain. However, the hardware team recommends to update
> > +	 * these MSRs on all the CPUs in the domain.
> > +	 */
> 
> > +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> &mon_info,
> > +1);
> 
> ... but here you IPI all the CPUs in the target domain to update them.

[2] There have been some changes in this area recently. The requirement of writing the value on all the CPUs in the domain is not required anymore. I am working on verifying this right now.  If everything works, then I can do 
smp_call_function_any(&d->cpu_mask, mon_event_config_write,  &mon_info, 1);

I will confirm this soon.

> 
> This means you unnecessarily IPI the CPUs in the target domain if they already
> had this value, but the write syscall occurred on a domain that differs. This isn't
> what you intended, but its benign.
> More of a problem is: Won't this get skipped if the write syscall occurs on a
> domain that happens to have the target configuration already?

Do you still think this is a problem after my comment [1] above?  Or Am I missing something?

> 
> Because you need the same value to be written on every CPU ... what happens
> to CPUs that are offline when the configuration is changed? Do they keep their
> previous value, or does it get reset?

The contents of this MSR register are held outside of all the cores.  If the value changes while a cpu is offline, and it reads it once it comes online, it will get the new value.
> 
> 
> I think this is best solved with a percpu variable for the current value of the
> MSR. You can then read it for CPUs in a remote domain, and only issue IPIs to
> 'sync' the value if needed. You can then re-use the sync call in
> resctrl_online_cpu() to set the MSR to whatever value it should currently be.

This may not be required with my comment 1 and 2 above.

> 
> 
> > +
> > +	/*
> > +	 * When an Event Configuration is changed, the bandwidth counters
> > +	 * for all RMIDs and Events will be cleared by the hardware. The
> > +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> > +	 * every RMID on the next read to any event for every RMID.
> > +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> > +	 * cleared while it is tracked by the hardware. Clear the
> > +	 * mbm_local and mbm_total counts for all the RMIDs.
> > +	 */
> > +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> > +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> > +
> > +write_exit:
> > +	return ret;
> > +}
> 
> 
> > +static int mon_config_parse(struct rdt_resource *r, char *tok, u32
> > +evtid) {
> > +	char *dom_str = NULL, *id_str;
> > +	unsigned long dom_id, val;
> > +	struct rdt_domain *d;
> > +	int ret = 0;
> > +
> > +next:
> > +	if (!tok || tok[0] == '\0')
> > +		return 0;
> > +
> > +	/* Start processing the strings for each domain */
> > +	dom_str = strim(strsep(&tok, ";"));
> > +	id_str = strsep(&dom_str, "=");
> > +
> > +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric event
> configuration value\n");
> > +		return -EINVAL;
> > +	}
> 
> This is parsing the same format strings as parse_line(). Is there any chance that
> code could be re-used instead of duplicated? This way anything that is added to
> the format (or bugs found!) only need supporting in once place.

I have checked on reusing the parse_line. The parse_line is specifically written for schemata.  We can't reuse parse_line without changing it completely.

Thanks
Babu
> 
> 
> 
> > +	list_for_each_entry(d, &r->domains, list) {
> > +		if (d->id == dom_id) {
> > +			ret = mbm_config_write(r, d, evtid, val);
> > +			if (ret)
> > +				return -EINVAL;
> > +			goto next;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> 
> 
> Thanks,
> 
> James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-08  0:02     ` Moger, Babu
@ 2022-12-13 17:55       ` James Morse
  2022-12-13 23:46         ` Moger, Babu
  0 siblings, 1 reply; 60+ messages in thread
From: James Morse @ 2022-12-13 17:55 UTC (permalink / raw)
  To: Moger, Babu
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian, corbet, tglx, mingo, bp, reinette.chatre

Hi Babu,

On 08/12/2022 00:02, Moger, Babu wrote:
> [AMD Official Use Only - General]
>> -----Original Message-----
>> From: James Morse <james.morse@arm.com>
>> Sent: Wednesday, December 7, 2022 11:21 AM
>> To: Moger, Babu <Babu.Moger@amd.com>
>> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
>> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
>> quic_neeraju@quicinc.com; rdunlap@infradead.org;
>> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
>> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
>> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
>> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
>> <Sandipan.Das@amd.com>; tony.luck@intel.com; linux-doc@vger.kernel.org;
>> linux-kernel@vger.kernel.org; bagasdotme@gmail.com; eranian@google.com;
>> corbet@lwn.net; tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
>> reinette.chatre@intel.com
>> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
>> mbm_total_bytes_config

>> On 04/11/2022 20:01, Babu Moger wrote:
>>> The current event configuration for mbm_total_bytes can be changed by
>>> the user by writing to the file
>>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
>>>
>>> The event configuration settings are domain specific and will affect
>>> all the CPUs in the domain.
>>>
>>> Following are the types of events supported:
>>>
>>> ====
>> ===========================================================
>>> Bits   Description
>>> ====
>> ===========================================================
>>> 6      Dirty Victims from the QOS domain to all types of memory
>>> 5      Reads to slow memory in the non-local NUMA domain
>>> 4      Reads to slow memory in the local NUMA domain
>>> 3      Non-temporal writes to non-local NUMA domain
>>> 2      Non-temporal writes to local NUMA domain
>>> 1      Reads to memory in the non-local NUMA domain
>>> 0      Reads to memory in the local NUMA domain
>>> ====
>> ===========================================================
>>>
>>> For example:
>>> To change the mbm_total_bytes to count only reads on domain 0, the
>>> bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
>>> Run the command.
>>> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>>
>>> To change the mbm_total_bytes to count all the slow memory reads on
>>> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
>>> Run the command.
>>> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 18f9588a41cf..0cdccb69386e 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct
>> kernfs_open_file *of,
>>>  	return 0;
>>>  }
>>>
>>> +static void mon_event_config_write(void *info) {
>>> +	struct mon_config_info *mon_info = info;
>>> +	u32 index;
>>> +
>>> +	index = mon_event_config_index_get(mon_info->evtid);
>>> +	if (index >= MAX_CONFIG_EVENTS) {
>>> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
>>> +		return;
>>> +	}
>>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
>> }
>>> +
>>> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
>>> +			    u32 evtid, u32 val)
>>> +{
>>> +	struct mon_config_info mon_info = {0};
>>> +	int ret = 0;
>>> +
>>> +	rdt_last_cmd_clear();
>>> +
>>> +	/* mon_config cannot be more than the supported set of events */
>>> +	if (val > MAX_EVT_CONFIG_BITS) {
>>> +		rdt_last_cmd_puts("Invalid event configuration\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Read the current config value first. If both are same then
>>> +	 * we don't need to write it again.
>>> +	 */
>>> +	mon_info.evtid = evtid;
>>
>>> +	mondata_config_read(d, &mon_info);
>>
>> This reads the MSR on this CPU, which gets the result for this domain...
> 
> [1] No. This read happens at the target domain. 

Oops ... looks like I muddled that with mon_event_config_read().


> static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> {
>         smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> }

>>> +	if (mon_info.mon_config == val)
>>> +		goto write_exit;
>>> +
>>> +	mon_info.mon_config = val;
>>> +
>>> +	/*
>>> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
>>> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
>>> +	 * are scoped at the domain level. Writing any of these MSRs
>>> +	 * on one CPU is supposed to be observed by all CPUs in the
>>> +	 * domain. However, the hardware team recommends to update
>>> +	 * these MSRs on all the CPUs in the domain.
>>> +	 */
>>
>>> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
>> &mon_info,
>>> +1);
>>
>> ... but here you IPI all the CPUs in the target domain to update them.

> [2] There have been some changes in this area recently. The requirement of writing the
> value on all the CPUs in the domain is not required anymore. I am working on verifying
> this right now.  If everything works, then I can do 
> smp_call_function_any(&d->cpu_mask, mon_event_config_write,  &mon_info, 1);
> 
> I will confirm this soon.

Okay, that makes my next question more confusing then ....


>> This means you unnecessarily IPI the CPUs in the target domain if they already
>> had this value, but the write syscall occurred on a domain that differs. This isn't
>> what you intended, but its benign.
>> More of a problem is: Won't this get skipped if the write syscall occurs on a
>> domain that happens to have the target configuration already?

> Do you still think this is a problem after my comment [1] above?  Or Am I missing something?

I'd muddled two similarly named functions. Sorry for the noise!

I think what you're left with is the question "What is the monitor config for CPUs that
were offline when it was last changed?". If its preserved by the CPU, then its some
unknown value, and needs to be made the same as the value user-space/the-domain currently
expects.

If there is only one config value for the domain (as your comment above suggests), then
nothing needs doing here.


>> Because you need the same value to be written on every CPU ... what happens
>> to CPUs that are offline when the configuration is changed? Do they keep their
>> previous value, or does it get reset?
> 
> The contents of this MSR register are held outside of all the cores.  If the value changes
> while a cpu is offline, and it reads it once it comes online, it will get the new value.

This fits with your new description of the value only needing to be written from one CPU
in the domain.


>> I think this is best solved with a percpu variable for the current value of the
>> MSR. You can then read it for CPUs in a remote domain, and only issue IPIs to
>> 'sync' the value if needed. You can then re-use the sync call in
>> resctrl_online_cpu() to set the MSR to whatever value it should currently be.
> 
> This may not be required with my comment 1 and 2 above.
> 
>>
>>
>>> +
>>> +	/*
>>> +	 * When an Event Configuration is changed, the bandwidth counters
>>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>> +	 * every RMID on the next read to any event for every RMID.
>>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>> +	 * cleared while it is tracked by the hardware. Clear the
>>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>>> +	 */
>>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>>> +
>>> +write_exit:
>>> +	return ret;
>>> +}
>>
>>
>>> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32
>>> +evtid) {
>>> +	char *dom_str = NULL, *id_str;
>>> +	unsigned long dom_id, val;
>>> +	struct rdt_domain *d;
>>> +	int ret = 0;
>>> +
>>> +next:
>>> +	if (!tok || tok[0] == '\0')
>>> +		return 0;
>>> +
>>> +	/* Start processing the strings for each domain */
>>> +	dom_str = strim(strsep(&tok, ";"));
>>> +	id_str = strsep(&dom_str, "=");
>>> +
>>> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
>>> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
>>> +		rdt_last_cmd_puts("Missing '=' or non-numeric event
>> configuration value\n");
>>> +		return -EINVAL;
>>> +	}
>>
>> This is parsing the same format strings as parse_line(). Is there any chance that
>> code could be re-used instead of duplicated? This way anything that is added to
>> the format (or bugs found!) only need supporting in once place.
> 
> I have checked on reusing the parse_line. The parse_line is specifically written for
> schemata.  We can't reuse parse_line without changing it completely.

I agree its a little more complicated than it looked at first. I might have a go at it
later...


Thanks,

James

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-13 17:55       ` James Morse
@ 2022-12-13 23:46         ` Moger, Babu
  0 siblings, 0 replies; 60+ messages in thread
From: Moger, Babu @ 2022-12-13 23:46 UTC (permalink / raw)
  To: James Morse
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, linux-doc, linux-kernel, bagasdotme,
	eranian, corbet, tglx, mingo, bp, reinette.chatre

[AMD Official Use Only - General]

Hi James,

> -----Original Message-----
> From: James Morse <james.morse@arm.com>
> Sent: Tuesday, December 13, 2022 11:55 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org; bagasdotme@gmail.com; eranian@google.com;
> corbet@lwn.net; tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> reinette.chatre@intel.com
> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 08/12/2022 00:02, Moger, Babu wrote:
> > [AMD Official Use Only - General]
> >> -----Original Message-----
> >> From: James Morse <james.morse@arm.com>
> >> Sent: Wednesday, December 7, 2022 11:21 AM
> >> To: Moger, Babu <Babu.Moger@amd.com>
> >> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com;
> >> x86@kernel.org; hpa@zytor.com; paulmck@kernel.org;
> >> akpm@linux-foundation.org; quic_neeraju@quicinc.com;
> >> rdunlap@infradead.org; damien.lemoal@opensource.wdc.com;
> >> songmuchun@bytedance.com; peterz@infradead.org;
> jpoimboe@kernel.org;
> >> pbonzini@redhat.com; chang.seok.bae@intel.com;
> >> pawan.kumar.gupta@linux.intel.com;
> >> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> >> <Sandipan.Das@amd.com>; tony.luck@intel.com;
> >> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> >> bagasdotme@gmail.com; eranian@google.com; corbet@lwn.net;
> >> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de;
> >> reinette.chatre@intel.com
> >> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to
> >> write mbm_total_bytes_config
> 
> >> On 04/11/2022 20:01, Babu Moger wrote:
> >>> The current event configuration for mbm_total_bytes can be changed
> >>> by the user by writing to the file
> >>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> >>>
> >>> The event configuration settings are domain specific and will affect
> >>> all the CPUs in the domain.
> >>>
> >>> Following are the types of events supported:
> >>>
> >>> ====
> >> ===========================================================
> >>> Bits   Description
> >>> ====
> >> ===========================================================
> >>> 6      Dirty Victims from the QOS domain to all types of memory
> >>> 5      Reads to slow memory in the non-local NUMA domain
> >>> 4      Reads to slow memory in the local NUMA domain
> >>> 3      Non-temporal writes to non-local NUMA domain
> >>> 2      Non-temporal writes to local NUMA domain
> >>> 1      Reads to memory in the non-local NUMA domain
> >>> 0      Reads to memory in the local NUMA domain
> >>> ====
> >> ===========================================================
> >>>
> >>> For example:
> >>> To change the mbm_total_bytes to count only reads on domain 0, the
> >>> bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> >>> Run the command.
> >>> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >>>
> >>> To change the mbm_total_bytes to count all the slow memory reads on
> >>> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> >>> Run the command.
> >>> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> index 18f9588a41cf..0cdccb69386e 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> @@ -1505,6 +1505,133 @@ static int
> >>> mbm_local_bytes_config_show(struct
> >> kernfs_open_file *of,
> >>>  	return 0;
> >>>  }
> >>>
> >>> +static void mon_event_config_write(void *info) {
> >>> +	struct mon_config_info *mon_info = info;
> >>> +	u32 index;
> >>> +
> >>> +	index = mon_event_config_index_get(mon_info->evtid);
> >>> +	if (index >= MAX_CONFIG_EVENTS) {
> >>> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> >>> +		return;
> >>> +	}
> >>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> >> }
> >>> +
> >>> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> >>> +			    u32 evtid, u32 val)
> >>> +{
> >>> +	struct mon_config_info mon_info = {0};
> >>> +	int ret = 0;
> >>> +
> >>> +	rdt_last_cmd_clear();
> >>> +
> >>> +	/* mon_config cannot be more than the supported set of events */
> >>> +	if (val > MAX_EVT_CONFIG_BITS) {
> >>> +		rdt_last_cmd_puts("Invalid event configuration\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * Read the current config value first. If both are same then
> >>> +	 * we don't need to write it again.
> >>> +	 */
> >>> +	mon_info.evtid = evtid;
> >>
> >>> +	mondata_config_read(d, &mon_info);
> >>
> >> This reads the MSR on this CPU, which gets the result for this domain...
> >
> > [1] No. This read happens at the target domain.
> 
> Oops ... looks like I muddled that with mon_event_config_read().
> 
> 
> > static void mondata_config_read(struct rdt_domain *d, struct
> > mon_config_info *mon_info) {
> >         smp_call_function_any(&d->cpu_mask, mon_event_config_read,
> > mon_info, 1); }
> 
> >>> +	if (mon_info.mon_config == val)
> >>> +		goto write_exit;
> >>> +
> >>> +	mon_info.mon_config = val;
> >>> +
> >>> +	/*
> >>> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> >>> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> >>> +	 * are scoped at the domain level. Writing any of these MSRs
> >>> +	 * on one CPU is supposed to be observed by all CPUs in the
> >>> +	 * domain. However, the hardware team recommends to update
> >>> +	 * these MSRs on all the CPUs in the domain.
> >>> +	 */
> >>
> >>> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> >> &mon_info,
> >>> +1);
> >>
> >> ... but here you IPI all the CPUs in the target domain to update them.
> 
> > [2] There have been some changes in this area recently. The
> > requirement of writing the value on all the CPUs in the domain is not
> > required anymore. I am working on verifying this right now.  If
> > everything works, then I can do smp_call_function_any(&d->cpu_mask,
> > mon_event_config_write,  &mon_info, 1);
> >
> > I will confirm this soon.
> 
> Okay, that makes my next question more confusing then ....
> 
> 
> >> This means you unnecessarily IPI the CPUs in the target domain if
> >> they already had this value, but the write syscall occurred on a
> >> domain that differs. This isn't what you intended, but its benign.
> >> More of a problem is: Won't this get skipped if the write syscall
> >> occurs on a domain that happens to have the target configuration already?
> 
> > Do you still think this is a problem after my comment [1] above?  Or Am I
> missing something?
> 
> I'd muddled two similarly named functions. Sorry for the noise!

No worries. It made me look closely. 
> 
> I think what you're left with is the question "What is the monitor config for
> CPUs that were offline when it was last changed?". If its preserved by the CPU,
> then its some unknown value, and needs to be made the same as the value
> user-space/the-domain currently expects.
> 
> If there is only one config value for the domain (as your comment above
> suggests), then nothing needs doing here.

Ok. Thanks

> 
> 
> >> Because you need the same value to be written on every CPU ... what
> >> happens to CPUs that are offline when the configuration is changed?
> >> Do they keep their previous value, or does it get reset?
> >
> > The contents of this MSR register are held outside of all the cores.
> > If the value changes while a cpu is offline, and it reads it once it comes
> online, it will get the new value.
> 
> This fits with your new description of the value only needing to be written from
> one CPU in the domain.

Yes. Will change the comment about one CPU. 
Still waiting for the comment for the whole series from Reinette before I re-spin the next version.
> 
> 
> >> I think this is best solved with a percpu variable for the current
> >> value of the MSR. You can then read it for CPUs in a remote domain,
> >> and only issue IPIs to 'sync' the value if needed. You can then
> >> re-use the sync call in
> >> resctrl_online_cpu() to set the MSR to whatever value it should currently be.
> >
> > This may not be required with my comment 1 and 2 above.
> >
> >>
> >>
> >>> +
> >>> +	/*
> >>> +	 * When an Event Configuration is changed, the bandwidth counters
> >>> +	 * for all RMIDs and Events will be cleared by the hardware. The
> >>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> >>> +	 * every RMID on the next read to any event for every RMID.
> >>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> >>> +	 * cleared while it is tracked by the hardware. Clear the
> >>> +	 * mbm_local and mbm_total counts for all the RMIDs.
> >>> +	 */
> >>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> >>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> >>> +
> >>> +write_exit:
> >>> +	return ret;
> >>> +}
> >>
> >>
> >>> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32
> >>> +evtid) {
> >>> +	char *dom_str = NULL, *id_str;
> >>> +	unsigned long dom_id, val;
> >>> +	struct rdt_domain *d;
> >>> +	int ret = 0;
> >>> +
> >>> +next:
> >>> +	if (!tok || tok[0] == '\0')
> >>> +		return 0;
> >>> +
> >>> +	/* Start processing the strings for each domain */
> >>> +	dom_str = strim(strsep(&tok, ";"));
> >>> +	id_str = strsep(&dom_str, "=");
> >>> +
> >>> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> >>> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> >>> +		rdt_last_cmd_puts("Missing '=' or non-numeric event
> >> configuration value\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> This is parsing the same format strings as parse_line(). Is there any
> >> chance that code could be re-used instead of duplicated? This way
> >> anything that is added to the format (or bugs found!) only need supporting in
> once place.
> >
> > I have checked on reusing the parse_line. The parse_line is
> > specifically written for schemata.  We can't reuse parse_line without
> changing it completely.
> 
> I agree its a little more complicated than it looked at first. I might have a go at
> it later...

Ok Thanks
Babu


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-12-13 23:47 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 19:59 [PATCH v8 00/13] Support for AMD QoS new features Babu Moger
2022-11-04 19:59 ` [PATCH v8 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
2022-11-23 18:21   ` Yu, Fenghua
2022-11-23 23:09     ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
2022-11-23  0:04   ` Reinette Chatre
2022-11-23 15:13     ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
2022-11-23  0:09   ` Reinette Chatre
2022-11-23 15:16     ` Moger, Babu
2022-11-23 18:17   ` Yu, Fenghua
2022-11-23 23:06     ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 04/13] x86/resctrl: Include new features in command line options Babu Moger
2022-11-23 18:26   ` Yu, Fenghua
2022-11-23 23:10     ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
2022-11-23  0:12   ` Reinette Chatre
2022-11-23 15:17     ` Moger, Babu
2022-11-30 18:43     ` Moger, Babu
2022-11-30 20:07       ` Reinette Chatre
2022-11-30 20:40         ` Moger, Babu
2022-12-01  0:35           ` Reinette Chatre
2022-12-01 13:56             ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 06/13] x86/resctrl: Remove the init attribute for rdt_cpu_has() Babu Moger
2022-11-23  0:13   ` Reinette Chatre
2022-11-23 17:48     ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
2022-11-23  0:14   ` Reinette Chatre
2022-11-23 18:23     ` Moger, Babu
2022-11-23 19:05       ` Reinette Chatre
2022-11-23 21:46         ` Moger, Babu
2022-11-04 20:00 ` [PATCH v8 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
2022-11-23  0:19   ` Reinette Chatre
2022-11-23 18:35     ` Moger, Babu
2022-11-23 22:27       ` Reinette Chatre
2022-11-23 22:55         ` Moger, Babu
2022-11-04 20:01 ` [PATCH v8 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
2022-11-04 20:01 ` [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
2022-11-07 10:21   ` Peter Newman
2022-11-07 18:50     ` Moger, Babu
2022-11-07 19:00     ` Moger, Babu
2022-11-22 23:43       ` Reinette Chatre
2022-11-23 21:44         ` Moger, Babu
2022-11-23 22:22           ` Reinette Chatre
2022-11-28 16:01             ` Moger, Babu
2022-11-23  0:22   ` Reinette Chatre
2022-11-23 22:44     ` Moger, Babu
2022-12-07 17:20   ` James Morse
2022-12-07 17:24     ` James Morse
2022-12-08  0:02     ` Moger, Babu
2022-12-13 17:55       ` James Morse
2022-12-13 23:46         ` Moger, Babu
2022-11-04 20:01 ` [PATCH v8 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
2022-11-04 20:01 ` [PATCH v8 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
2022-11-04 20:01 ` [PATCH v8 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
2022-11-23  0:26   ` Reinette Chatre
2022-11-23 23:02     ` Moger, Babu
2022-11-15 20:50 ` [PATCH v8 00/13] Support for AMD QoS " Moger, Babu
2022-11-15 21:07   ` Reinette Chatre
2022-11-15 21:34     ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).