linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/13] Support for AMD QoS new features
@ 2022-12-01 15:35 Babu Moger
  2022-12-01 15:36 ` [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
                   ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:35 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

New AMD processors can now support following QoS features.

1. Slow Memory Bandwidth Allocation (SMBA)
   With this feature, the QOS enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

   Currently, CXL.memory is the only supported "slow" memory device. With
   the support of SMBA feature the hardware enables bandwidth allocation
   on the slow memory devices.

2. Bandwidth Monitoring Event Configuration (BMEC)
   The bandwidth monitoring events mbm_total_event and mbm_local_event 
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not enough
   to count all the different types are memory events. With the feature
   BMEC, the users have the option to configure mbm_total_event and
   mbm_local_event to count the specific type of events.

   Following are the bitmaps of events supported.
   Bits    Description
     6       Dirty Victims from the QOS domain to all types of memory
     5       Reads to slow memory in the non-local NUMA domain
     4       Reads to slow memory in the local NUMA domain
     3       Non-temporal writes to non-local NUMA domain
     2       Non-temporal writes to local NUMA domain
     1       Reads to memory in the non-local NUMA domain
     0       Reads to memory in the local NUMA domain

This series adds support for these features.

Feature description is available in the specification, "AMD64 Technology Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
v9:
 Summary of changes:
 1. Rebased on top of lastest tip/master as of 11/30.
 2. Most of the changes are result of the comments from Fenghua, Reinette and Peter Newman.
 3. Fixed the cpuid dependancy.
 4. Added the __init attribute to rdt_get_mon_l3_config and mbm_config_rftype_init.
 5. Added new function resctrl_arch_reset_rmid_all to clear all rmid statues.
 6. Changed mon_event_config_index_get based on Reinette's comments.
 7. Changed mbm_config_rftype_init to take care of few extra error handling.
 8. Few other minor changes and text changes.

v8:
 https://lore.kernel.org/lkml/166759188265.3281208.11769277079826754455.stgit@bmoger-ubuntu/
 Changes:
 1. Removed init attribute for rdt_cpu_has to make it available for all the files.
 2. Updated the change log for mon_features to correct the names of config files.
 3. Changed configuration file name from mbm_total_config to mbm_total_bytes_config.
    This is more consistant with other changes.
 4. Added lock protection while reading/writing the config file.
 5. Other few minor text changes. I have been missing few comments in last couple of
    revisions. Hope I have addressed all of them this time.

v7:
 https://lore.kernel.org/lkml/166604543832.5345.9696970469830919982.stgit@bmoger-ubuntu/
 Changes:
 Not much of a change. Missed one comment from Reinette from v5. Corrected it now.
 Few format corrections from Sanjaya.

v6:
 https://lore.kernel.org/lkml/166543345606.23830.3120625408601531368.stgit@bmoger-ubuntu/
 Summary of changes:
 1. Rebased on top of lastest tip tree. Fixed few minor conflicts.
 2. Fixed format issue with scattered.c.
 3. Removed config_name from the structure mon_evt. It is not required.
 4. The read/write format for mbm_total_config and mbm_local_config will be same
    as schemata format "id0=val0;id1=val1;...". This is comment from Fenghua.
 5. Added more comments MSR_IA32_EVT_CFG_BASE writng.
 5. Few text changes in resctrl.rst 
 
v5:
  https://lore.kernel.org/lkml/166431016617.373387.1968875281081252467.stgit@bmoger-ubuntu/
  Summary of changes.
  1. Split the series into two. The first two patches are bug fixes. So, sent them separate.
  2. The config files mbm_total_config and mbm_local_config are now under
     /sys/fs/resctrl/info/L3_MON/. Removed these config files from mon groups.
  3. Ran "checkpatch --strict --codespell" on all the patches. Looks good with few known exceptions.
  4. Few minor text changes in resctrl.rst file. 

v4:
  https://lore.kernel.org/lkml/166257348081.1043018.11227924488792315932.stgit@bmoger-ubuntu/
  Got numerios of comments from Reinette Chatre. Addressed most of them. 
  Summary of changes.
  1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.  
  2. Updated mon_features texts if the BMEC is supported.
  3. Added more explanation about the slow memory support.
  4. Replaced smp_call_function_many with on_each_cpu_mask call.
  5. Removed arch_has_empty_bitmaps
  6. Few other text changes.
  7. Removed Reviewed-by if the patch is modified.
  8. Rebased the patches to latest tip.

v3:
  https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest tip. Resolved some conflicts.
     https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
  b. Taken care of feedback from Bagas Sanjaya.
  c. Added Reviewed by from Mingo.
  Note: I am still looking for comments from Reinette or Fenghua.

v2:
  https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.stgit@bmoger-ubuntu/
  a. Rebased the patches to latest stable tree (v5.18.15). Resolved some conflicts.
  b. Added the patch to fix CBM issue on AMD. This was originally discussed
     https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com/

v1:
  https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.stgit@bmoger-ubuntu/

Babu Moger (13):
      x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
      x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
      x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
      x86/resctrl: Include new features in command line options
      x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
      x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
      x86/resctrl: Introduce data structure to support monitor configuration
      x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
      x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
      x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
      x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
      x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
      Documentation/x86: Update resctrl.rst for new features


 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/x86/resctrl.rst                 | 138 +++++++-
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  54 ++-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  28 ++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  26 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 309 ++++++++++++++++--
 arch/x86/kernel/cpu/scattered.c               |   2 +
 include/linux/resctrl.h                       |  10 +
 12 files changed, 544 insertions(+), 33 deletions(-)

--


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:08   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Add the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
enforcement policies can be applied to external slow memory connected
to the host. QOS enforcement is accomplished by assigning a Class Of
Service (COS) to a processor and specifying allocations or limits for
that COS for each resource to be allocated.

This feature is identified by the CPUID Function 8000_0020_EBX_x0.

CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers
(ECX=0)

Bits    Field Name      Description
2       L3SBE           L3 external slow memory bandwidth enforcement

CXL.memory is the only supported "slow" memory device. With the support
of SMBA feature, the hardware enables bandwidth allocation on the slow
memory devices. If there are multiple slow memory devices in the system,
then the throttling logic groups all the slow sources together and
applies the limit on them as a whole.

The presence of the SMBA feature(with CXL.memory) is independent of
whether slow memory device is actually present in the system. If there
is no slow memory in the system, then setting a SMBA limit will have no
impact on the performance of the system.

Presence of CXL memory can be identified by numactl command.

$numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
node 0 size: 63678 MB node 0 free: 59542 MB
node 1 cpus:
node 1 size: 16122 MB
node 1 free: 15627 MB
node distances:
node   0   1
   0:  10  50
   1:  50  10

CPU list for CXL memory will be empty. The cpu-cxl node distance is
greater than cpu-to-cpu distances. Node 1 has the CXL memory in this
case. CXL memory can also be identified using ACPI SRAT table and
memory maps.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication # 56375 Revision: 1.03 Issue Date: February 2022".

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/scattered.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 11a0e06362e4..b6a45e56cd0c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -307,6 +307,7 @@
 #define X86_FEATURE_SGX_EDECCSSA	(11*32+18) /* "" SGX EDECCSSA user leaf function */
 #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
 #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
+#define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory Bandwidth Allocation */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index f53944fb8f7f..d925753084fb 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
+	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
  2022-12-01 15:36 ` [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:10   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
enforcement policies on the external slow memory.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c98e52ff5f20..f6af3ac1ef20 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_SMBA] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_SMBA,
+			.name			= "SMBA",
+			.cache_level		= 3,
+			.domains		= domain_init(RDT_RESOURCE_SMBA),
+			.parse_ctrlval		= parse_bw,
+			.format_str		= "%d=%*u",
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5ebd28e6aa0c..fdbbf66312ec 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -409,6 +409,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_SMBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
  2022-12-01 15:36 ` [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
  2022-12-01 15:36 ` [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:11   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 04/13] x86/resctrl: Include new features in command line options Babu Moger
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Newer AMD processors support the new feature Bandwidth Monitoring Event
Configuration (BMEC).

The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
Bits    Field Name    Description
3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)

Currently, the bandwidth monitoring events mbm_total_bytes and
mbm_local_bytes are set to count all the total and local reads/writes
respectively. With the introduction of slow memory, the two counters
are not enough to count all the different types of memory events. With
the feature BMEC, the users have the option to configure
mbm_total_bytes and mbm_local_bytes to count the specific type of
events.

Each BMEC event has a configuration MSR, which contains one field for
each bandwidth type that can be used to configure the bandwidth event
to track any combination of supported bandwidth types. The event will
count requests from every bandwidth type bit that is set in the
corresponding configuration register.

Following are the types of events supported:

====    ========================================================
Bits    Description
====    ========================================================
6       Dirty Victims from the QOS domain to all types of memory
5       Reads to slow memory in the non-local NUMA domain
4       Reads to slow memory in the local NUMA domain
3       Non-temporal writes to non-local NUMA domain
2       Non-temporal writes to local NUMA domain
1       Reads to memory in the non-local NUMA domain
0       Reads to memory in the local NUMA domain
====    ========================================================

By default, the mbm_total_bytes configuration is set to 0x7F to count
all the event types and the mbm_local_bytes configuration is set to
0x15 to count all the local memory events.

Feature description is available in the specification, "AMD64
Technology Platform Quality of Service Extensions, Revision: 1.03
Publication

Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/kernel/cpu/cpuid-deps.c   |    2 ++
 arch/x86/kernel/cpu/scattered.c    |    1 +
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b6a45e56cd0c..415796d7b309 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
 #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
 #define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory Bandwidth Allocation */
+#define X86_FEATURE_BMEC		(11*32+22) /* Bandwidth Monitoring Event Configuration */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d95221117129..f6748c8bd647 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
+	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
 	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
 	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index d925753084fb..0dad49a09b7a 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -46,6 +46,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
+	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
 	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 04/13] x86/resctrl: Include new features in command line options
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (2 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:12   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Add the command line options to enable or disable the new resctrl features.
smba : Slow Memory Bandwidth Allocation
bmec : Bandwidth Monitor Event Configuration.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |    2 +-
 arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 42af9ca0127e..a7b6634f4426 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5190,7 +5190,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba.
+			mba, smba, bmec.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f6af3ac1ef20..10a8c9d96f32 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -659,6 +659,8 @@ enum {
 	RDT_FLAG_L2_CAT,
 	RDT_FLAG_L2_CDP,
 	RDT_FLAG_MBA,
+	RDT_FLAG_SMBA,
+	RDT_FLAG_BMEC,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
 	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
 	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
+	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
+	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (3 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 04/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:13   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() Babu Moger
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The QoS slow memory configuration details are available via
CPUID_Fn80000020_EDX_x02. Detect the available details and
initialize the rest to defaults.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/msr-index.h          |    1 +
 arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 37ff47552bcb..e0a40027aa62 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1061,6 +1061,7 @@
 
 /* - AMD: */
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
+#define MSR_IA32_SMBA_BW_BASE		0xc0000280
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 10a8c9d96f32..b4fc851f6489 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
 	if (!r)
 		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
 
+	/*
+	 * The software controller support is only applicable to MBA resource.
+	 * Make sure to check for resource type.
+	 */
+	if (r->rid != RDT_RESOURCE_MBA)
+		return false;
+
 	return r->membw.mba_sc;
 }
 
@@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	union cpuid_0x10_3_eax eax;
 	union cpuid_0x10_x_edx edx;
-	u32 ebx, ecx;
+	u32 ebx, ecx, subleaf;
 
-	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
+	/*
+	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
+	 * CPUID_Fn80000020_EDX_x02 for SMBA
+	 */
+	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
+
+	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
 	hw_res->num_closid = edx.split.cos_max + 1;
 	r->default_ctrl = MAX_MBA_BW_AMD;
 
@@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
 	return false;
 }
 
+static __init bool get_slow_mem_config(void)
+{
+	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
+
+	if (!rdt_cpu_has(X86_FEATURE_SMBA))
+		return false;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
+
+	return false;
+}
+
 static __init bool get_rdt_alloc_resources(void)
 {
 	struct rdt_resource *r;
@@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
 	if (get_mem_config())
 		ret = true;
 
+	if (get_slow_mem_config())
+		ret = true;
+
 	return ret;
 }
 
@@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
 		} else if (r->rid == RDT_RESOURCE_MBA) {
 			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
 			hw_res->msr_update = mba_wrmsr_amd;
+		} else if (r->rid == RDT_RESOURCE_SMBA) {
+			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
+			hw_res->msr_update = mba_wrmsr_amd;
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 1df0e3262bca..2dd4b8c47f23 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -209,7 +209,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 	unsigned long dom_id;
 
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
-	    r->rid == RDT_RESOURCE_MBA) {
+	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
 		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
 		return -EINVAL;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e5a48f05e787..8a3dafc0dbf7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1213,7 +1213,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA)
+		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
 			continue;
 		has_cache = true;
 		list_for_each_entry(d, &r->domains, list) {
@@ -1402,7 +1402,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 					ctrl = resctrl_arch_get_config(r, d,
 								       closid,
 								       type);
-				if (r->rid == RDT_RESOURCE_MBA)
+				if (r->rid == RDT_RESOURCE_MBA ||
+				    r->rid == RDT_RESOURCE_SMBA)
 					size = ctrl;
 				else
 					size = rdtgroup_cbm_to_size(r, d, ctrl);
@@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 
 	list_for_each_entry(s, &resctrl_schema_all, list) {
 		r = s->res;
-		if (r->rid == RDT_RESOURCE_MBA) {
+		if (r->rid == RDT_RESOURCE_MBA ||
+		    r->rid == RDT_RESOURCE_SMBA) {
 			rdtgroup_init_mba(r, rdtgrp->closid);
 			if (is_mba_sc(r))
 				continue;



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (4 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:17   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The function rdt_get_mon_l3_config() needs to call rdt_cpu_has() to
query the monitor related features. It cannot be called right now
because rdt_cpu_has() has the __init attribute but rdt_get_mon_l3_config()
doesn't. So, add the __init attribute to rdt_get_mon_l3_config() to
resolve it.

Also, make the function rdt_cpu_has() available outside core.c file.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/core.c     |    2 +-
 arch/x86/kernel/cpu/resctrl/internal.h |    1 +
 arch/x86/kernel/cpu/resctrl/monitor.c  |    2 +-
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b4fc851f6489..030d3b409768 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -728,7 +728,7 @@ static int __init set_rdt_options(char *str)
 }
 __setup("rdt", set_rdt_options);
 
-static bool __init rdt_cpu_has(int flag)
+bool __init rdt_cpu_has(int flag)
 {
 	bool ret = boot_cpu_has(flag);
 	struct rdt_options *o;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index fdbbf66312ec..7bbfc10094b6 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -512,6 +512,7 @@ void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
+bool rdt_cpu_has(int flag);
 void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index efe0c30d3a12..e33e8d8bd796 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
 		list_add_tail(&mbm_local_event.list, &r->evt_list);
 }
 
-int rdt_get_mon_l3_config(struct rdt_resource *r)
+int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (5 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:19   ` Reinette Chatre
  2022-12-01 15:36 ` [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Add a new field in mon_evt to support Bandwidth Monitoring Event
Configuration(BMEC) and also update the "mon_features" display.

The resctrl file "mon_features" will display the supported events
and files that can be used to configure those events if monitor
configuration is supported.

Before the change.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_local_bytes

After the change when BMEC is supported.
	$cat /sys/fs/resctrl/info/L3_MON/mon_features
	llc_occupancy
	mbm_total_bytes
	mbm_total_bytes_config
	mbm_local_bytes
	mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
 arch/x86/kernel/cpu/resctrl/monitor.c  |    7 +++++++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 7bbfc10094b6..b36750334deb 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -52,11 +52,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
  * struct mon_evt - Entry in the event list of a resource
  * @evtid:		event id
  * @name:		name of the event
+ * @configurable:	true if the event is configurable
  * @list:		entry in &rdt_resource->evt_list
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
 	char			*name;
+	bool			configurable;
 	struct list_head	list;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index e33e8d8bd796..b39e0eca1879 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -783,6 +783,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	if (ret)
 		return ret;
 
+	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
+		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
+			mbm_total_event.configurable = true;
+		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
+			mbm_local_event.configurable = true;
+	}
+
 	l3_mon_evt_init(r);
 
 	r->mon_capable = true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8a3dafc0dbf7..8342feb54a7f 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = of->kn->parent->priv;
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list)
+	list_for_each_entry(mevt, &r->evt_list, list) {
 		seq_printf(seq, "%s\n", mevt->name);
+		if (mevt->configurable)
+			seq_printf(seq, "%s_config\n", mevt->name);
+	}
 
 	return 0;
 }



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (6 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-12-01 15:36 ` Babu Moger
  2022-12-15 17:40   ` Reinette Chatre
  2022-12-01 15:37 ` [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:36 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_total_bytes_config is set to 0x7f to count all the
event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
    0=0x7f;1=0x7f;2=0x7f;3=0x7f

    In this case, the event mbm_total_bytes is currently configured
    with 0x7f on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/include/asm/msr-index.h       |    1 
 arch/x86/kernel/cpu/resctrl/internal.h |   24 ++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   99 ++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e0a40027aa62..2e5f57c93605 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1062,6 +1062,7 @@
 /* - AMD: */
 #define MSR_IA32_MBA_BW_BASE		0xc0000200
 #define MSR_IA32_SMBA_BW_BASE		0xc0000280
+#define MSR_IA32_EVT_CFG_BASE		0xc0000400
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index b36750334deb..f7fc69e82e8b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -30,6 +30,29 @@
  */
 #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
 
+/* Reads to Local DRAM Memory */
+#define READS_TO_LOCAL_MEM		BIT(0)
+
+/* Reads to Remote DRAM Memory */
+#define READS_TO_REMOTE_MEM		BIT(1)
+
+/* Non-Temporal Writes to Local Memory */
+#define NON_TEMP_WRITE_TO_LOCAL_MEM	BIT(2)
+
+/* Non-Temporal Writes to Remote Memory */
+#define NON_TEMP_WRITE_TO_REMOTE_MEM	BIT(3)
+
+/* Reads to Local Memory the system identifies as "Slow Memory" */
+#define READS_TO_LOCAL_S_MEM		BIT(4)
+
+/* Reads to Remote Memory the system identifies as "Slow Memory" */
+#define READS_TO_REMOTE_S_MEM		BIT(5)
+
+/* Dirty Victims to All Types of Memory */
+#define  DIRTY_VICTIMS_TO_ALL_MEM	BIT(6)
+
+/* Max event bits supported */
+#define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
 struct rdt_fs_context {
 	struct kernfs_fs_context	kfc;
@@ -531,5 +554,6 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void __init thread_throttle_mode_init(void);
+void __init mbm_config_rftype_init(const char *config);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b39e0eca1879..2afddebc8636 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -784,8 +784,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 		return ret;
 
 	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
-		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
+		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 			mbm_total_event.configurable = true;
+			mbm_config_rftype_init("mbm_total_bytes_config");
+		}
 		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
 			mbm_local_event.configurable = true;
 	}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8342feb54a7f..e93b1c206116 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1423,6 +1423,90 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+struct mon_config_info {
+	u32 evtid;
+	u32 mon_config;
+};
+
+#define INVALID_CONFIG_INDEX   UINT_MAX
+
+/**
+ * mon_event_config_index_get - get the index for the configurable event
+ * @evtid: event id.
+ *
+ * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
+ *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
+ *         INVALID_CONFIG_INDEX for invalid evtid
+ */
+static inline unsigned int mon_event_config_index_get(u32 evtid)
+{
+	switch (evtid) {
+	case QOS_L3_MBM_TOTAL_EVENT_ID:
+		return 0;
+	case QOS_L3_MBM_LOCAL_EVENT_ID:
+		return 1;
+	default:
+		/* WARN */
+		return INVALID_CONFIG_INDEX;
+	}
+}
+
+static void mon_event_config_read(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 h, index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index == INVALID_CONFIG_INDEX) {
+		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
+		return;
+	}
+	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
+
+	/* Report only the valid event configuration bits */
+	mon_info->mon_config &= MAX_EVT_CONFIG_BITS;
+}
+
+static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
+{
+	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
+}
+
+static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
+{
+	struct mon_config_info mon_info = {0};
+	struct rdt_domain *dom;
+	bool sep = false;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	list_for_each_entry(dom, &r->domains, list) {
+		if (sep)
+			seq_puts(s, ";");
+
+		mon_info.evtid = evtid;
+		mondata_config_read(dom, &mon_info);
+
+		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
+		sep = true;
+	}
+	seq_puts(s, "\n");
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
+				       struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1521,6 +1605,12 @@ static struct rftype res_common_files[] = {
 		.seq_show	= max_threshold_occ_show,
 		.fflags		= RF_MON_INFO | RFTYPE_RES_CACHE,
 	},
+	{
+		.name		= "mbm_total_bytes_config",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_total_bytes_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,
@@ -1627,6 +1717,15 @@ void __init thread_throttle_mode_init(void)
 	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
 }
 
+void __init mbm_config_rftype_init(const char *config)
+{
+	struct rftype *rft;
+
+	rft = rdtgroup_get_rftype_by_name(config);
+	if (rft)
+		rft->fflags = RF_MON_INFO | RFTYPE_RES_CACHE;
+}
+
 /**
  * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
  * @r: The resource group with which the file is associated.



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (7 preceding siblings ...)
  2022-12-01 15:36 ` [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
@ 2022-12-01 15:37 ` Babu Moger
  2022-12-15 17:43   ` Reinette Chatre
  2022-12-01 15:37 ` [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:37 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The current event configuration can be viewed by the user by reading
the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
The event configuration settings are domain specific and will affect
all the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

By default, the mbm_local_bytes_config is set to 0x15 to count all the
local event types.

For example:
    $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
    0=0x15;1=0x15;2=0x15;3=0x15

    In this case, the event mbm_local_bytes is currently configured with
    0x15 on domains 0 to 3.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   16 ++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 2afddebc8636..7c8a3a745041 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -788,8 +788,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 			mbm_total_event.configurable = true;
 			mbm_config_rftype_init("mbm_total_bytes_config");
 		}
-		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
+		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 			mbm_local_event.configurable = true;
+			mbm_config_rftype_init("mbm_local_bytes_config");
+		}
 	}
 
 	l3_mon_evt_init(r);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index e93b1c206116..580f3cce19e2 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1507,6 +1507,16 @@ static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
+				       struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1611,6 +1621,12 @@ static struct rftype res_common_files[] = {
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_bytes_config_show,
 	},
+	{
+		.name		= "mbm_local_bytes_config",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= mbm_local_bytes_config_show,
+	},
 	{
 		.name		= "cpus",
 		.mode		= 0644,



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (8 preceding siblings ...)
  2022-12-01 15:37 ` [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
@ 2022-12-01 15:37 ` Babu Moger
  2022-12-15 18:24   ` Reinette Chatre
  2022-12-01 15:37 ` [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:37 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The current event configuration for mbm_total_bytes can be changed by
the user by writing to the file
/sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:

====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
command.
	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

To change the mbm_total_bytes to count all the slow memory reads on
domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command.
	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c  |   13 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  127 ++++++++++++++++++++++++++++++++
 include/linux/resctrl.h                |   10 +++
 3 files changed, 149 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 7c8a3a745041..b265856835de 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 		memset(am, 0, sizeof(*am));
 }
 
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
+{
+	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+
+	if (is_mbm_total_enabled())
+		memset(hw_dom->arch_mbm_total, 0,
+		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
+
+	if (is_mbm_local_enabled())
+		memset(hw_dom->arch_mbm_local, 0,
+		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+}
+
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 {
 	u64 shift = 64 - width, chunks;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 580f3cce19e2..8a22a652a6e8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static void mon_event_config_write(void *info)
+{
+	struct mon_config_info *mon_info = info;
+	u32 index;
+
+	index = mon_event_config_index_get(mon_info->evtid);
+	if (index == INVALID_CONFIG_INDEX) {
+		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
+		return;
+	}
+	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
+}
+
+static int mbm_config_write_domain(struct rdt_resource *r,
+				   struct rdt_domain *d, u32 evtid, u32 val)
+{
+	struct mon_config_info mon_info = {0};
+	int ret = 0;
+
+	/* mon_config cannot be more than the supported set of events */
+	if (val > MAX_EVT_CONFIG_BITS) {
+		rdt_last_cmd_puts("Invalid event configuration\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Read the current config value first. If both are the same then
+	 * no need to write it again.
+	 */
+	mon_info.evtid = evtid;
+	mondata_config_read(d, &mon_info);
+	if (mon_info.mon_config == val)
+		goto out;
+
+	mon_info.mon_config = val;
+
+	/*
+	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
+	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
+	 * are scoped at the domain level. Writing any of these MSRs
+	 * on one CPU is supposed to be observed by all CPUs in the
+	 * domain. However, the hardware team recommends to update
+	 * these MSRs on all the CPUs in the domain.
+	 */
+	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
+
+	/*
+	 * When an Event Configuration is changed, the bandwidth counters
+	 * for all RMIDs and Events will be cleared by the hardware. The
+	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
+	 * every RMID on the next read to any event for every RMID.
+	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
+	 * cleared while it is tracked by the hardware. Clear the
+	 * mbm_local and mbm_total counts for all the RMIDs.
+	 */
+	resctrl_arch_reset_rmid_all(r, d);
+
+out:
+	return ret;
+}
+
+static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
+{
+	char *dom_str = NULL, *id_str;
+	unsigned long dom_id, val;
+	struct rdt_domain *d;
+	int ret = 0;
+
+next:
+	if (!tok || tok[0] == '\0')
+		return 0;
+
+	/* Start processing the strings for each domain */
+	dom_str = strim(strsep(&tok, ";"));
+	id_str = strsep(&dom_str, "=");
+
+	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
+		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
+		return -EINVAL;
+	}
+
+	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
+		rdt_last_cmd_puts("Non-numeric event configuration value\n");
+		return -EINVAL;
+	}
+
+	list_for_each_entry(d, &r->domains, list) {
+		if (d->id == dom_id) {
+			ret = mbm_config_write_domain(r, d, evtid, val);
+			if (ret)
+				return -EINVAL;
+			goto next;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
+					    char *buf, size_t nbytes,
+					    loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_total_bytes_config",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_total_bytes_config_show,
+		.write		= mbm_total_bytes_config_write,
 	},
 	{
 		.name		= "mbm_local_bytes_config",
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 0cee154abc9f..e4dc65892446 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid);
 
+/**
+ * resctrl_arch_reset_rmid_all() - Reset any private state associated with
+ * 				   all the rmids.
+ * @r:		The domain's resource.
+ * @d:		The rmid's domain.
+ *
+ * This can be called from any CPU.
+ */
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
+
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (9 preceding siblings ...)
  2022-12-01 15:37 ` [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
@ 2022-12-01 15:37 ` Babu Moger
  2022-12-15 18:25   ` Reinette Chatre
  2022-12-01 15:37 ` [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:37 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The current event configuration for mbm_local_bytes can be changed by
the user by writing to the configuration file
/sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.

The event configuration settings are domain specific and will affect all
the CPUs in the domain.

Following are the types of events supported:
====  ===========================================================
Bits   Description
====  ===========================================================
6      Dirty Victims from the QOS domain to all types of memory
5      Reads to slow memory in the non-local NUMA domain
4      Reads to slow memory in the local NUMA domain
3      Non-temporal writes to non-local NUMA domain
2      Non-temporal writes to local NUMA domain
1      Reads to memory in the non-local NUMA domain
0      Reads to memory in the local NUMA domain
====  ===========================================================

For example:
To change the mbm_local_bytes_config to count all the non-temporal writes
on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex 0xc).
Run the command.
    $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

To change the mbm_local_bytes to count only reads to local NUMA domain 1,
the bit 0 needs to be set which 1b (in hex 0x1). Run the command.
    $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8a22a652a6e8..6897c480ae55 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1641,6 +1641,32 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
+					    char *buf, size_t nbytes,
+					    loff_t off)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	int ret;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_last_cmd_clear();
+
+	buf[nbytes - 1] = '\0';
+
+	ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
+
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1748,9 +1774,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "mbm_local_bytes_config",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= mbm_local_bytes_config_show,
+		.write		= mbm_local_bytes_config_write,
 	},
 	{
 		.name		= "cpus",



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (10 preceding siblings ...)
  2022-12-01 15:37 ` [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
@ 2022-12-01 15:37 ` Babu Moger
  2022-12-15 18:26   ` Reinette Chatre
  2022-12-01 15:37 ` [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:37 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

The call on_each_cpu_mask() runs the function on each CPU specified
by cpumask, which may include the local processor. So, replace the call
smp_call_function_many() with on_each_cpu_mask() to simplify the code.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
 1 file changed, 8 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6897c480ae55..68e14831a638 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -325,12 +325,7 @@ static void update_cpu_closid_rmid(void *info)
 static void
 update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
 {
-	int cpu = get_cpu();
-
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update_cpu_closid_rmid(r);
-	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
-	put_cpu();
+	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
 }
 
 static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
@@ -2135,13 +2130,9 @@ static int set_cache_qos_cfg(int level, bool enable)
 			/* Pick one CPU from each domain instance to update MSR */
 			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 	}
-	cpu = get_cpu();
-	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		update(&enable);
-	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, update, &enable, 1);
-	put_cpu();
+
+	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, update, &enable, 1);
 
 	free_cpumask_var(cpu_mask);
 
@@ -2618,7 +2609,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
 	struct rdt_domain *d;
-	int i, cpu;
+	int i;
 
 	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
 		return -ENOMEM;
@@ -2639,13 +2630,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
 		for (i = 0; i < hw_res->num_closid; i++)
 			hw_dom->ctrl_val[i] = r->default_ctrl;
 	}
-	cpu = get_cpu();
-	/* Update CBM on this cpu if it's in cpu_mask. */
-	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_ctrl_update(&msr_param);
-	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
-	put_cpu();
+
+	/* Update CBM on all the CPUs in cpu_mask */
+	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 
 	free_cpumask_var(cpu_mask);
 



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (11 preceding siblings ...)
  2022-12-01 15:37 ` [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
@ 2022-12-01 15:37 ` Babu Moger
  2022-12-15 18:30   ` Reinette Chatre
  2022-12-15 15:08 ` [PATCH v9 00/13] Support for AMD QoS " Moger, Babu
  2022-12-15 18:38 ` Reinette Chatre
  14 siblings, 1 reply; 50+ messages in thread
From: Babu Moger @ 2022-12-01 15:37 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	babu.moger, chang.seok.bae, pawan.kumar.gupta, jmattson,
	daniel.sneddon, sandipan.das, tony.luck, james.morse, linux-doc,
	linux-kernel, bagasdotme, eranian, christophe.leroy,
	pawan.kumar.gupta, jarkko, adrian.hunter, quic_jiles,
	peternewman

Update the documentation for the new features:
1. Slow Memory Bandwidth allocation (SMBA).
   With this feature, the QOS  enforcement policies can be applied
   to the external slow memory connected to the host. QOS enforcement
   is accomplished by assigning a Class Of Service (COS) to a processor
   and specifying allocations or limits for that COS for each resource
   to be allocated.

2. Bandwidth Monitoring Event Configuration (BMEC).
   The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
   are set to count all the total and local reads/writes respectively.
   With the introduction of slow memory, the two counters are not
   enough to count all the different types of memory events. With the
   feature BMEC, the users have the option to configure mbm_total_bytes
   and mbm_local_bytes to count the specific type of events.

Also add configuration instructions with examples.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
---
 Documentation/x86/resctrl.rst |  138 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 136 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
index 71a531061e4e..60761a6f9087 100644
--- a/Documentation/x86/resctrl.rst
+++ b/Documentation/x86/resctrl.rst
@@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
 This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
 flag bits:
 
-=============================================	================================
+===============================================	================================
 RDT (Resource Director Technology) Allocation	"rdt_a"
 CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
 CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
 CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
 MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
 MBA (Memory Bandwidth Allocation)		"mba"
-=============================================	================================
+SMBA (Slow Memory Bandwidth Allocation)         "smba"
+BMEC (Bandwidth Monitoring Event Configuration) "bmec"
+===============================================	================================
 
 To use the feature mount the file system::
 
@@ -161,6 +163,79 @@ with the following files:
 "mon_features":
 		Lists the monitoring events if
 		monitoring is enabled for the resource.
+                Example::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_local_bytes
+
+                If the system supports Bandwidth Monitoring Event
+                Configuration (BMEC), then the bandwidth events will
+                be configurable. The output will be::
+
+                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
+                   llc_occupancy
+                   mbm_total_bytes
+                   mbm_total_bytes_config
+                   mbm_local_bytes
+                   mbm_local_bytes_config
+
+"mbm_total_bytes_config", "mbm_local_bytes_config":
+        These files contain the current event configuration for the events
+        mbm_total_bytes and mbm_local_bytes, respectively, when the
+        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
+        The event configuration settings are domain specific and will affect
+        all the CPUs in the domain.
+
+        Following are the types of events supported:
+
+        ====    ========================================================
+        Bits    Description
+        ====    ========================================================
+        6       Dirty Victims from the QOS domain to all types of memory
+        5       Reads to slow memory in the non-local NUMA domain
+        4       Reads to slow memory in the local NUMA domain
+        3       Non-temporal writes to non-local NUMA domain
+        2       Non-temporal writes to local NUMA domain
+        1       Reads to memory in the non-local NUMA domain
+        0       Reads to memory in the local NUMA domain
+        ====    ========================================================
+
+        By default, the mbm_total_bytes configuration is set to 0x7f to count
+        all the event types and the mbm_local_bytes configuration is set to
+        0x15 to count all the local memory events.
+
+        Examples:
+
+        * To view the current configuration::
+          ::
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+            0=0x7f;1=0x7f;2=0x7f;3=0x7f
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+            0=0x15;1=0x15;3=0x15;4=0x15
+
+        * To change the mbm_total_bytes to count only reads on domain 0,
+          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
+          (in hexadecimal 0x33):
+          ::
+
+            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
+            0=0x33;1=0x7f;2=0x7f;3=0x7f
+
+        * To change the mbm_local_bytes to count all the slow memory reads on
+          domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
+          in binary (in hexadecimal 0x30):
+          ::
+
+            # echo  "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+
+            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
+            0=0x30;1=0x30;3=0x15;4=0x15
 
 "max_threshold_occupancy":
 		Read/write file provides the largest value (in
@@ -464,6 +539,25 @@ Memory bandwidth domain is L3 cache.
 
 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
 
+Slow Memory Bandwidth Allocation (SMBA)
+---------------------------------------
+AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
+CXL.memory is the only supported "slow" memory device. With the
+support of SMBA, the hardware enables bandwidth allocation on
+the slow memory devices. If there are multiple such devices in
+the system, the throttling logic groups all the slow sources
+together and applies the limit on them as a whole.
+
+The presence of SMBA (with CXL.memory) is independent of slow memory
+devices presence. If there are no such devices on the system, then
+configuring SMBA will have no impact on the performance of the system.
+
+The bandwidth domain for slow memory is L3 cache. Its schemata file
+is formatted as:
+::
+
+	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Reading/writing the schemata file
 ---------------------------------
 Reading the schemata file will show the state of all resources
@@ -479,6 +573,46 @@ which you wish to change.  E.g.
   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Reading/writing the schemata file (on AMD systems)
+--------------------------------------------------
+Reading the schemata file will show the current bandwidth limit on all
+domains. The allocated resources are in multiples of one eighth GB/s.
+When writing to the file, you need to specify what cache id you wish to
+configure the bandwidth limit.
+
+For example, to allocate 2GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    MB:0=2048;1=2048;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "MB:1=16" > schemata
+  # cat schemata
+    MB:0=2048;1=  16;2=2048;3=2048
+    L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+Reading/writing the schemata file (on AMD systems) with SMBA feature
+--------------------------------------------------------------------
+Reading and writing the schemata file is the same as without SMBA in
+above section.
+
+For example, to allocate 8GB/s limit on the first cache id:
+
+::
+
+  # cat schemata
+    SMBA:0=2048;1=2048;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
+  # echo "SMBA:1=64" > schemata
+  # cat schemata
+    SMBA:0=2048;1=  64;2=2048;3=2048
+      MB:0=2048;1=2048;2=2048;3=2048
+      L3:0=ffff;1=ffff;2=ffff;3=ffff
+
 Cache Pseudo-Locking
 ====================
 CAT enables a user to specify the amount of cache space that an



^ permalink raw reply related	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 00/13] Support for AMD QoS new features
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (12 preceding siblings ...)
  2022-12-01 15:37 ` [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
@ 2022-12-15 15:08 ` Moger, Babu
  2022-12-15 15:35   ` Reinette Chatre
  2022-12-15 18:38 ` Reinette Chatre
  14 siblings, 1 reply; 50+ messages in thread
From: Moger, Babu @ 2022-12-15 15:08 UTC (permalink / raw)
  To: corbet, reinette.chatre, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, pawan.kumar.gupta, jarkko,
	adrian.hunter, quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,
I am planning refresh the series. I have couple of changes for patch 10.
https://lore.kernel.org/lkml/MW3PR12MB45538A17F57BF80C21BB46C4951D9@MW3PR12MB4553.namprd12.prod.outlook.com/

Let me know if you have any other comments.
Thanks
Babu

> -----Original Message-----
> From: Moger, Babu <Babu.Moger@amd.com>
> Sent: Thursday, December 1, 2022 9:36 AM
> To: corbet@lwn.net; reinette.chatre@intel.com; tglx@linutronix.de;
> mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com; Moger,
> Babu <Babu.Moger@amd.com>; chang.seok.bae@intel.com;
> pawan.kumar.gupta@linux.intel.com; jmattson@google.com;
> daniel.sneddon@linux.intel.com; Das1, Sandipan <Sandipan.Das@amd.com>;
> tony.luck@intel.com; james.morse@arm.com; linux-doc@vger.kernel.org;
> linux-kernel@vger.kernel.org; bagasdotme@gmail.com; eranian@google.com;
> christophe.leroy@csgroup.eu; pawan.kumar.gupta@linux.intel.com;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: [PATCH v9 00/13] Support for AMD QoS new features
> 
> New AMD processors can now support following QoS features.
> 
> 1. Slow Memory Bandwidth Allocation (SMBA)
>    With this feature, the QOS enforcement policies can be applied
>    to the external slow memory connected to the host. QOS enforcement
>    is accomplished by assigning a Class Of Service (COS) to a processor
>    and specifying allocations or limits for that COS for each resource
>    to be allocated.
> 
>    Currently, CXL.memory is the only supported "slow" memory device. With
>    the support of SMBA feature the hardware enables bandwidth allocation
>    on the slow memory devices.
> 
> 2. Bandwidth Monitoring Event Configuration (BMEC)
>    The bandwidth monitoring events mbm_total_event and mbm_local_event
>    are set to count all the total and local reads/writes respectively.
>    With the introduction of slow memory, the two counters are not enough
>    to count all the different types are memory events. With the feature
>    BMEC, the users have the option to configure mbm_total_event and
>    mbm_local_event to count the specific type of events.
> 
>    Following are the bitmaps of events supported.
>    Bits    Description
>      6       Dirty Victims from the QOS domain to all types of memory
>      5       Reads to slow memory in the non-local NUMA domain
>      4       Reads to slow memory in the local NUMA domain
>      3       Non-temporal writes to non-local NUMA domain
>      2       Non-temporal writes to local NUMA domain
>      1       Reads to memory in the non-local NUMA domain
>      0       Reads to memory in the local NUMA domain
> 
> This series adds support for these features.
> 
> Feature description is available in the specification, "AMD64 Technology
> Platform Quality of Service Extensions, Revision: 1.03 Publication # 56375
> Revision: 1.03 Issue Date: February 2022".
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-
> quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> ---
> v9:
>  Summary of changes:
>  1. Rebased on top of lastest tip/master as of 11/30.
>  2. Most of the changes are result of the comments from Fenghua, Reinette and
> Peter Newman.
>  3. Fixed the cpuid dependancy.
>  4. Added the __init attribute to rdt_get_mon_l3_config and
> mbm_config_rftype_init.
>  5. Added new function resctrl_arch_reset_rmid_all to clear all rmid statues.
>  6. Changed mon_event_config_index_get based on Reinette's comments.
>  7. Changed mbm_config_rftype_init to take care of few extra error handling.
>  8. Few other minor changes and text changes.
> 
> v8:
> 
> https://lore.kernel.org/lkml/166759188265.3281208.11769277079826754455.s
> tgit@bmoger-ubuntu/
>  Changes:
>  1. Removed init attribute for rdt_cpu_has to make it available for all the files.
>  2. Updated the change log for mon_features to correct the names of config
> files.
>  3. Changed configuration file name from mbm_total_config to
> mbm_total_bytes_config.
>     This is more consistant with other changes.
>  4. Added lock protection while reading/writing the config file.
>  5. Other few minor text changes. I have been missing few comments in last
> couple of
>     revisions. Hope I have addressed all of them this time.
> 
> v7:
> 
> https://lore.kernel.org/lkml/166604543832.5345.9696970469830919982.stgit@
> bmoger-ubuntu/
>  Changes:
>  Not much of a change. Missed one comment from Reinette from v5. Corrected
> it now.
>  Few format corrections from Sanjaya.
> 
> v6:
> 
> https://lore.kernel.org/lkml/166543345606.23830.3120625408601531368.stgit
> @bmoger-ubuntu/
>  Summary of changes:
>  1. Rebased on top of lastest tip tree. Fixed few minor conflicts.
>  2. Fixed format issue with scattered.c.
>  3. Removed config_name from the structure mon_evt. It is not required.
>  4. The read/write format for mbm_total_config and mbm_local_config will be
> same
>     as schemata format "id0=val0;id1=val1;...". This is comment from Fenghua.
>  5. Added more comments MSR_IA32_EVT_CFG_BASE writng.
>  5. Few text changes in resctrl.rst
> 
> v5:
> 
> https://lore.kernel.org/lkml/166431016617.373387.1968875281081252467.stgi
> t@bmoger-ubuntu/
>   Summary of changes.
>   1. Split the series into two. The first two patches are bug fixes. So, sent them
> separate.
>   2. The config files mbm_total_config and mbm_local_config are now under
>      /sys/fs/resctrl/info/L3_MON/. Removed these config files from mon groups.
>   3. Ran "checkpatch --strict --codespell" on all the patches. Looks good with few
> known exceptions.
>   4. Few minor text changes in resctrl.rst file.
> 
> v4:
> 
> https://lore.kernel.org/lkml/166257348081.1043018.11227924488792315932.s
> tgit@bmoger-ubuntu/
>   Got numerios of comments from Reinette Chatre. Addressed most of them.
>   Summary of changes.
>   1. Removed mon_configurable under /sys/fs/resctrl/info/L3_MON/.
>   2. Updated mon_features texts if the BMEC is supported.
>   3. Added more explanation about the slow memory support.
>   4. Replaced smp_call_function_many with on_each_cpu_mask call.
>   5. Removed arch_has_empty_bitmaps
>   6. Few other text changes.
>   7. Removed Reviewed-by if the patch is modified.
>   8. Rebased the patches to latest tip.
> 
> v3:
> 
> https://lore.kernel.org/lkml/166117559756.6695.16047463526634290701.stgit
> @bmoger-ubuntu/
>   a. Rebased the patches to latest tip. Resolved some conflicts.
>      https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>   b. Taken care of feedback from Bagas Sanjaya.
>   c. Added Reviewed by from Mingo.
>   Note: I am still looking for comments from Reinette or Fenghua.
> 
> v2:
> 
> https://lore.kernel.org/lkml/165938717220.724959.10931629283087443782.st
> git@bmoger-ubuntu/
>   a. Rebased the patches to latest stable tree (v5.18.15). Resolved some
> conflicts.
>   b. Added the patch to fix CBM issue on AMD. This was originally discussed
>      https://lore.kernel.org/lkml/20220517001234.3137157-1-
> eranian@google.com/
> 
> v1:
> 
> https://lore.kernel.org/lkml/165757543252.416408.13547339307237713464.st
> git@bmoger-ubuntu/
> 
> Babu Moger (13):
>       x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
>       x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
>       x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature
> flag
>       x86/resctrl: Include new features in command line options
>       x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
>       x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
>       x86/resctrl: Introduce data structure to support monitor configuration
>       x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
>       x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
>       x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
>       x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
>       x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
>       Documentation/x86: Update resctrl.rst for new features
> 
> 
>  .../admin-guide/kernel-parameters.txt         |   2 +-
>  Documentation/x86/resctrl.rst                 | 138 +++++++-
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/include/asm/msr-index.h              |   2 +
>  arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
>  arch/x86/kernel/cpu/resctrl/core.c            |  54 ++-
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c     |   2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h        |  28 ++
>  arch/x86/kernel/cpu/resctrl/monitor.c         |  26 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 309 ++++++++++++++++--
>  arch/x86/kernel/cpu/scattered.c               |   2 +
>  include/linux/resctrl.h                       |  10 +
>  12 files changed, 544 insertions(+), 33 deletions(-)
> 
> --

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 00/13] Support for AMD QoS new features
  2022-12-15 15:08 ` [PATCH v9 00/13] Support for AMD QoS " Moger, Babu
@ 2022-12-15 15:35   ` Reinette Chatre
  2022-12-15 16:12     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 15:35 UTC (permalink / raw)
  To: Moger, Babu, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/15/2022 7:08 AM, Moger, Babu wrote:
> [AMD Official Use Only - General]
> 
> Hi Reinette,
> I am planning refresh the series. I have couple of changes for patch 10.
> https://lore.kernel.org/lkml/MW3PR12MB45538A17F57BF80C21BB46C4951D9@MW3PR12MB4553.namprd12.prod.outlook.com/
> 
> Let me know if you have any other comments.

Apologies for the delay. I have a few comments that I will post today. 

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 00/13] Support for AMD QoS new features
  2022-12-15 15:35   ` Reinette Chatre
@ 2022-12-15 16:12     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-15 16:12 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 9:36 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 00/13] Support for AMD QoS new features
> 
> Hi Babu,
> 
> On 12/15/2022 7:08 AM, Moger, Babu wrote:
> > [AMD Official Use Only - General]
> >
> > Hi Reinette,
> > I am planning refresh the series. I have couple of changes for patch 10.
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2FMW3PR12MB45538A17F57BF80C21BB46C4951D9%40MW3P
> R12MB4553.namprd12.prod.outlook.com%2F&amp;data=05%7C01%7CBabu.M
> oger%40amd.com%7C6d4c8dce8863452f553008dadeb20cd9%7C3dd8961fe488
> 4e608e11a82d994e183d%7C0%7C0%7C638067153551495498%7CUnknown%7
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> JXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=1IDojcQCCWrhxHldDhKsUm
> H41H3K4oU878bvSNhPh6Y%3D&amp;reserved=0
> >
> > Let me know if you have any other comments.
> 
> Apologies for the delay. I have a few comments that I will post today.

No problem.. Thank you.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-12-01 15:36 ` [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
@ 2022-12-15 17:08   ` Reinette Chatre
  2022-12-15 21:10     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:08 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> Add the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
> enforcement policies can be applied to external slow memory connected
> to the host. QOS enforcement is accomplished by assigning a Class Of
> Service (COS) to a processor and specifying allocations or limits for
> that COS for each resource to be allocated.
> 
> This feature is identified by the CPUID Function 8000_0020_EBX_x0.
> 
> CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature Identifiers
> (ECX=0)
> 
> Bits    Field Name      Description
> 2       L3SBE           L3 external slow memory bandwidth enforcement
> 
> CXL.memory is the only supported "slow" memory device. With the support
> of SMBA feature, the hardware enables bandwidth allocation on the slow
> memory devices. If there are multiple slow memory devices in the system,
> then the throttling logic groups all the slow sources together and
> applies the limit on them as a whole.
> 
> The presence of the SMBA feature(with CXL.memory) is independent of
> whether slow memory device is actually present in the system. If there
> is no slow memory in the system, then setting a SMBA limit will have no
> impact on the performance of the system.
> 
> Presence of CXL memory can be identified by numactl command.
> 
> $numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
> node 0 size: 63678 MB node 0 free: 59542 MB
> node 1 cpus:
> node 1 size: 16122 MB
> node 1 free: 15627 MB
> node distances:
> node   0   1
>    0:  10  50
>    1:  50  10
> 
> CPU list for CXL memory will be empty. The cpu-cxl node distance is
> greater than cpu-to-cpu distances. Node 1 has the CXL memory in this
> case. CXL memory can also be identified using ACPI SRAT table and
> memory maps.
> 
> Feature description is available in the specification, "AMD64
> Technology Platform Quality of Service Extensions, Revision: 1.03
> Publication # 56375 Revision: 1.03 Issue Date: February 2022".
> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>

According to "Ordering of commit tags" in Documentation/process/maintainer-tip.rst
the "Link:" tags should be after "Signed-off-by:". Could you please re-order
these to ensure this series is ready for the next stage?

> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 11a0e06362e4..b6a45e56cd0c 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -307,6 +307,7 @@
>  #define X86_FEATURE_SGX_EDECCSSA	(11*32+18) /* "" SGX EDECCSSA user leaf function */
>  #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
>  #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
> +#define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory Bandwidth Allocation */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index f53944fb8f7f..d925753084fb 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> +	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
> 
> 

With the tag ordering addressed:

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thank you

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-12-01 15:36 ` [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
@ 2022-12-15 17:10   ` Reinette Chatre
  2022-12-15 21:30     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:10 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
> enforcement policies on the external slow memory.
> 

I think a snippet like below may help to set reviewer's mind at
ease about the consequences of values chosen:

"Mostly initialization of the essentials. Setting fflags to
RFTYPE_RES_MB configures the SMBA resource to have the same
resctrl files as the existing MBA resource. The SMBA resource 
has identical properties to the existing MBA resource. These
properties will be enumerated in an upcoming change and exposed
via resctrl because of this flag."

> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index c98e52ff5f20..f6af3ac1ef20 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.fflags			= RFTYPE_RES_MB,
>  		},
>  	},
> +	[RDT_RESOURCE_SMBA] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_SMBA,
> +			.name			= "SMBA",
> +			.cache_level		= 3,
> +			.domains		= domain_init(RDT_RESOURCE_SMBA),
> +			.parse_ctrlval		= parse_bw,
> +			.format_str		= "%d=%*u",
> +			.fflags			= RFTYPE_RES_MB,
> +		},
> +	},
>  };
>  
>  /*
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 5ebd28e6aa0c..fdbbf66312ec 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -409,6 +409,7 @@ enum resctrl_res_level {
>  	RDT_RESOURCE_L3,
>  	RDT_RESOURCE_L2,
>  	RDT_RESOURCE_MBA,
> +	RDT_RESOURCE_SMBA,
>  
>  	/* Must be the last */
>  	RDT_NUM_RESOURCES,
> 
> 

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-12-01 15:36 ` [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
@ 2022-12-15 17:11   ` Reinette Chatre
  2022-12-19 15:31     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:11 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> Newer AMD processors support the new feature Bandwidth Monitoring Event
> Configuration (BMEC).
> 
> The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> Bits    Field Name    Description
> 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> 
> Currently, the bandwidth monitoring events mbm_total_bytes and

Please drop "Currently,".

> mbm_local_bytes are set to count all the total and local reads/writes
> respectively. With the introduction of slow memory, the two counters
> are not enough to count all the different types of memory events. With
> the feature BMEC, the users have the option to configure
> mbm_total_bytes and mbm_local_bytes to count the specific type of
> events.
> 
> Each BMEC event has a configuration MSR, which contains one field for
> each bandwidth type that can be used to configure the bandwidth event
> to track any combination of supported bandwidth types. The event will
> count requests from every bandwidth type bit that is set in the
> corresponding configuration register.
> 
> Following are the types of events supported:
> 
> ====    ========================================================
> Bits    Description
> ====    ========================================================
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> ====    ========================================================
> 
> By default, the mbm_total_bytes configuration is set to 0x7F to count
> all the event types and the mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> Feature description is available in the specification, "AMD64
> Technology Platform Quality of Service Extensions, Revision: 1.03
> Publication

Missing end quote above.

> 
> Link: https://www.amd.com/en/support/tech-docs/amd64-technology-platform-quality-service-extensions
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> Signed-off-by: Babu Moger <babu.moger@amd.com>

Same comment about "Link:" ordering as for patch 1.

> ---
>  arch/x86/include/asm/cpufeatures.h |    1 +
>  arch/x86/kernel/cpu/cpuid-deps.c   |    2 ++
>  arch/x86/kernel/cpu/scattered.c    |    1 +
>  3 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index b6a45e56cd0c..415796d7b309 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -308,6 +308,7 @@
>  #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
>  #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
>  #define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory Bandwidth Allocation */
> +#define X86_FEATURE_BMEC		(11*32+22) /* Bandwidth Monitoring Event Configuration */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index d95221117129..f6748c8bd647 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_CQM_OCCUP_LLC,		X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
>  	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
> +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_TOTAL   },
> +	{ X86_FEATURE_BMEC,			X86_FEATURE_CQM_MBM_LOCAL   },
>  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
>  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
>  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index d925753084fb..0dad49a09b7a 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -46,6 +46,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> +	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
>  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
> 
> 

With changelog comments addressed:

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 04/13] x86/resctrl: Include new features in command line options
  2022-12-01 15:36 ` [PATCH v9 04/13] x86/resctrl: Include new features in command line options Babu Moger
@ 2022-12-15 17:12   ` Reinette Chatre
  2022-12-19 15:33     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:12 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> Add the command line options to enable or disable the new resctrl features.
> smba : Slow Memory Bandwidth Allocation
> bmec : Bandwidth Monitor Event Configuration.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |    2 +-
>  arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 42af9ca0127e..a7b6634f4426 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5190,7 +5190,7 @@
>  	rdt=		[HW,X86,RDT]
>  			Turn on/off individual RDT features. List is:
>  			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> -			mba.
> +			mba, smba, bmec.
>  			E.g. to turn on cmt and turn off mba use:
>  				rdt=cmt,!mba
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index f6af3ac1ef20..10a8c9d96f32 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -659,6 +659,8 @@ enum {
>  	RDT_FLAG_L2_CAT,
>  	RDT_FLAG_L2_CDP,
>  	RDT_FLAG_MBA,
> +	RDT_FLAG_SMBA,
> +	RDT_FLAG_BMEC,
>  };
>  
>  #define RDT_OPT(idx, n, f)	\
> @@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
>  	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
>  	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",	X86_FEATURE_CDP_L2),
>  	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
> +	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
> +	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
>  };
>  #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
>  
> 
> 

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-12-01 15:36 ` [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
@ 2022-12-15 17:13   ` Reinette Chatre
  2022-12-19 15:34     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:13 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> The QoS slow memory configuration details are available via
> CPUID_Fn80000020_EDX_x02. Detect the available details and
> initialize the rest to defaults.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h          |    1 +
>  arch/x86/kernel/cpu/resctrl/core.c        |   36 +++++++++++++++++++++++++++--
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
>  4 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 37ff47552bcb..e0a40027aa62 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1061,6 +1061,7 @@
>  
>  /* - AMD: */
>  #define MSR_IA32_MBA_BW_BASE		0xc0000200
> +#define MSR_IA32_SMBA_BW_BASE		0xc0000280
>  
>  /* MSR_IA32_VMX_MISC bits */
>  #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 10a8c9d96f32..b4fc851f6489 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
>  	if (!r)
>  		return rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
>  
> +	/*
> +	 * The software controller support is only applicable to MBA resource.
> +	 * Make sure to check for resource type.
> +	 */
> +	if (r->rid != RDT_RESOURCE_MBA)
> +		return false;
> +
>  	return r->membw.mba_sc;
>  }
>  
> @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
>  	union cpuid_0x10_3_eax eax;
>  	union cpuid_0x10_x_edx edx;
> -	u32 ebx, ecx;
> +	u32 ebx, ecx, subleaf;
>  
> -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
> +	/*
> +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
> +	 * CPUID_Fn80000020_EDX_x02 for SMBA
> +	 */
> +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
> +
> +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
>  	hw_res->num_closid = edx.split.cos_max + 1;
>  	r->default_ctrl = MAX_MBA_BW_AMD;
>  
> @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
>  	return false;
>  }
>  
> +static __init bool get_slow_mem_config(void)
> +{
> +	struct rdt_hw_resource *hw_res = &rdt_resources_all[RDT_RESOURCE_SMBA];
> +
> +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
> +		return false;
> +
> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
> +
> +	return false;
> +}
> +
>  static __init bool get_rdt_alloc_resources(void)
>  {
>  	struct rdt_resource *r;
> @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
>  	if (get_mem_config())
>  		ret = true;
>  
> +	if (get_slow_mem_config())
> +		ret = true;
> +
>  	return ret;
>  }
>  
> @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
>  		} else if (r->rid == RDT_RESOURCE_MBA) {
>  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
>  			hw_res->msr_update = mba_wrmsr_amd;
> +		} else if (r->rid == RDT_RESOURCE_SMBA) {
> +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
> +			hw_res->msr_update = mba_wrmsr_amd;
>  		}
>  	}
>  }
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 1df0e3262bca..2dd4b8c47f23 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -209,7 +209,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
>  	unsigned long dom_id;
>  
>  	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
> -	    r->rid == RDT_RESOURCE_MBA) {
> +	    (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)) {
>  		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
>  		return -EINVAL;
>  	}
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e5a48f05e787..8a3dafc0dbf7 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1213,7 +1213,7 @@ static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
>  
>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>  		r = s->res;
> -		if (r->rid == RDT_RESOURCE_MBA)
> +		if (r->rid == RDT_RESOURCE_MBA || r->rid == RDT_RESOURCE_SMBA)
>  			continue;
>  		has_cache = true;
>  		list_for_each_entry(d, &r->domains, list) {
> @@ -1402,7 +1402,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>  					ctrl = resctrl_arch_get_config(r, d,
>  								       closid,
>  								       type);
> -				if (r->rid == RDT_RESOURCE_MBA)
> +				if (r->rid == RDT_RESOURCE_MBA ||
> +				    r->rid == RDT_RESOURCE_SMBA)
>  					size = ctrl;
>  				else
>  					size = rdtgroup_cbm_to_size(r, d, ctrl);
> @@ -2845,7 +2846,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
>  
>  	list_for_each_entry(s, &resctrl_schema_all, list) {
>  		r = s->res;
> -		if (r->rid == RDT_RESOURCE_MBA) {
> +		if (r->rid == RDT_RESOURCE_MBA ||
> +		    r->rid == RDT_RESOURCE_SMBA) {
>  			rdtgroup_init_mba(r, rdtgrp->closid);
>  			if (is_mba_sc(r))
>  				continue;
> 
> 

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  2022-12-01 15:36 ` [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() Babu Moger
@ 2022-12-15 17:17   ` Reinette Chatre
  2022-12-19 15:51     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:17 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:36 AM, Babu Moger wrote:
> The function rdt_get_mon_l3_config() needs to call rdt_cpu_has() to

No need to say "The function" ... by using () after a name it is clear
that it is a function.

To support this change it could perhaps be:
"In an upcoming change rdt_get_mon_l3_config() needs to call
rdt_cpu_has() to ..."

> query the monitor related features. It cannot be called right now
> because rdt_cpu_has() has the __init attribute but rdt_get_mon_l3_config()
> doesn't. So, add the __init attribute to rdt_get_mon_l3_config() to
> resolve it.

Please place the solution description in a new paragraph and drop the "So,".
The description could also be expanded to support this change. For example:

"Add the __init attribute to rdt_get_mon_l3_config() that is only called
by get_rdt_mon_resources() that already has the __init attribute. Also
make rdt_cpu_has() available to by rdt_get_mon_l3_config() via
the internal header file."


> 
> Also, make the function rdt_cpu_has() available outside core.c file.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c     |    2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    2 +-
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index b4fc851f6489..030d3b409768 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -728,7 +728,7 @@ static int __init set_rdt_options(char *str)
>  }
>  __setup("rdt", set_rdt_options);
>  
> -static bool __init rdt_cpu_has(int flag)
> +bool __init rdt_cpu_has(int flag)
>  {
>  	bool ret = boot_cpu_has(flag);
>  	struct rdt_options *o;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index fdbbf66312ec..7bbfc10094b6 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -512,6 +512,7 @@ void closid_free(int closid);
>  int alloc_rmid(void);
>  void free_rmid(u32 rmid);
>  int rdt_get_mon_l3_config(struct rdt_resource *r);
> +bool rdt_cpu_has(int flag);

Please also add __init attribute here by using the same style as the other functions
in this file that need __init.

>  void mon_event_count(void *info);
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg);
>  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index efe0c30d3a12..e33e8d8bd796 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
>  		list_add_tail(&mbm_local_event.list, &r->evt_list);
>  }
>  
> -int rdt_get_mon_l3_config(struct rdt_resource *r)
> +int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  {
>  	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
>  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> 
> 

Thank you

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-12-01 15:36 ` [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
@ 2022-12-15 17:19   ` Reinette Chatre
  2022-12-19 17:56     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:19 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

I do not see a new data structure introduced. Perhaps subject
could just be:
x86/resctrl: Support monitor configuration

On 12/1/2022 7:36 AM, Babu Moger wrote:
> Add a new field in mon_evt to support Bandwidth Monitoring Event

"mon_evt" -> "struct mon_evt"

> Configuration(BMEC) and also update the "mon_features" display.
> 
> The resctrl file "mon_features" will display the supported events
> and files that can be used to configure those events if monitor
> configuration is supported.
> 
> Before the change.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_local_bytes
> 
> After the change when BMEC is supported.
> 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_total_bytes_config
> 	mbm_local_bytes
> 	mbm_local_bytes_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    7 +++++++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
>  3 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 7bbfc10094b6..b36750334deb 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -52,11 +52,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
>   * struct mon_evt - Entry in the event list of a resource
>   * @evtid:		event id
>   * @name:		name of the event
> + * @configurable:	true if the event is configurable
>   * @list:		entry in &rdt_resource->evt_list
>   */
>  struct mon_evt {
>  	enum resctrl_event_id	evtid;
>  	char			*name;
> +	bool			configurable;
>  	struct list_head	list;
>  };
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index e33e8d8bd796..b39e0eca1879 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -783,6 +783,13 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	if (ret)
>  		return ret;
>  
> +	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
> +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
> +			mbm_total_event.configurable = true;
> +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
> +			mbm_local_event.configurable = true;
> +	}
> +
>  	l3_mon_evt_init(r);
>  
>  	r->mon_capable = true;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8a3dafc0dbf7..8342feb54a7f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
>  	struct rdt_resource *r = of->kn->parent->priv;
>  	struct mon_evt *mevt;
>  
> -	list_for_each_entry(mevt, &r->evt_list, list)
> +	list_for_each_entry(mevt, &r->evt_list, list) {
>  		seq_printf(seq, "%s\n", mevt->name);
> +		if (mevt->configurable)
> +			seq_printf(seq, "%s_config\n", mevt->name);
> +	}
>  
>  	return 0;
>  }
> 
> 

With subject and changelog changes:

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-12-01 15:36 ` [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
@ 2022-12-15 17:40   ` Reinette Chatre
  2022-12-19 18:21     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:40 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

I would like to second James's suggestion to replace sysfs with resctrl
or just remove it. I am concerned that you mentioned in recent message
that you only plan changes to patch 10 while James highlighted that this
needs to be addressed in entire series. Could you please ensure that
you check all the patches?

On 12/1/2022 7:36 AM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading

What "current" means is not clear and the term could just be removed.

> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> By default, the mbm_total_bytes_config is set to 0x7f to count all the
> event types.
> 
> For example:
>     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>     0=0x7f;1=0x7f;2=0x7f;3=0x7f
> 
>     In this case, the event mbm_total_bytes is currently configured
>     with 0x7f on domains 0 to 3.

"currently" can be removed since it already starts with "In this case".


> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/include/asm/msr-index.h       |    1 
>  arch/x86/kernel/cpu/resctrl/internal.h |   24 ++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   99 ++++++++++++++++++++++++++++++++
>  4 files changed, 127 insertions(+), 1 deletion(-)
> 

...

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8342feb54a7f..e93b1c206116 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1423,6 +1423,90 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
>  	return ret;
>  }
>  
> +struct mon_config_info {
> +	u32 evtid;
> +	u32 mon_config;
> +};
> +
> +#define INVALID_CONFIG_INDEX   UINT_MAX
> +
> +/**
> + * mon_event_config_index_get - get the index for the configurable event

Could you say "get the hardware index" to help clarify what the index is
for?

> + * @evtid: event id.
> + *
> + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> + *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> + *         INVALID_CONFIG_INDEX for invalid evtid
> + */
> +static inline unsigned int mon_event_config_index_get(u32 evtid)
> +{
> +	switch (evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		return 0;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		return 1;
> +	default:
> +		/* WARN */
> +		return INVALID_CONFIG_INDEX;
> +	}
> +}

I see that you copied my sample code here. My intention was that the
/* WARN */ comment be replaced with an actual warning. As a comment
it does not add value. Since the caller now prints a subtler warning it
could just be:

	/* Should never reach here */
	return INVALID_CONFIG_INDEX;

> +
> +static void mon_event_config_read(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 h, index;

index can be "unsigned int" as returned by mon_event_config_index_get()

> +
> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index == INVALID_CONFIG_INDEX) {
> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
> +
> +	/* Report only the valid event configuration bits */
> +	mon_info->mon_config &= MAX_EVT_CONFIG_BITS;
> +}
> +
> +static void mondata_config_read(struct rdt_domain *d, struct mon_config_info *mon_info)
> +{
> +	smp_call_function_any(&d->cpu_mask, mon_event_config_read, mon_info, 1);
> +}
> +
> +static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
> +{
> +	struct mon_config_info mon_info = {0};
> +	struct rdt_domain *dom;
> +	bool sep = false;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	list_for_each_entry(dom, &r->domains, list) {
> +		if (sep)
> +			seq_puts(s, ";");
> +
> +		mon_info.evtid = evtid;
> +		mondata_config_read(dom, &mon_info);
> +

For robustness, please reset mon_config before calling mondata_config_read(). Since
mon_event_config_read() may (yes this is very unlikely) exit early then mon_config
will contain the data from the previous domain.

> +		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
> +		sep = true;
> +	}
> +	seq_puts(s, "\n");
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return 0;
> +}
> +

...

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
  2022-12-01 15:37 ` [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
@ 2022-12-15 17:43   ` Reinette Chatre
  2022-12-19 18:27     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 17:43 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

Same as other commits the subject can be shortened:
x86/resctrl: Add interface to read mbm_local_bytes_config

On 12/1/2022 7:37 AM, Babu Moger wrote:
> The current event configuration can be viewed by the user by reading
> the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
> The event configuration settings are domain specific and will affect
> all the CPUs in the domain.
> 
> Following are the types of events supported:
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> By default, the mbm_local_bytes_config is set to 0x15 to count all the
> local event types.
> 
> For example:
>     $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>     0=0x15;1=0x15;2=0x15;3=0x15
> 
>     In this case, the event mbm_local_bytes is currently configured with
>     0x15 on domains 0 to 3.

"currently" can be removed

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +++-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   16 ++++++++++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 2afddebc8636..7c8a3a745041 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -788,8 +788,10 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  			mbm_total_event.configurable = true;
>  			mbm_config_rftype_init("mbm_total_bytes_config");
>  		}
> -		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
> +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
>  			mbm_local_event.configurable = true;
> +			mbm_config_rftype_init("mbm_local_bytes_config");
> +		}
>  	}
>  
>  	l3_mon_evt_init(r);
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e93b1c206116..580f3cce19e2 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1507,6 +1507,16 @@ static int mbm_total_bytes_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
> +				       struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	return 0;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1611,6 +1621,12 @@ static struct rftype res_common_files[] = {
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= mbm_total_bytes_config_show,
>  	},
> +	{
> +		.name		= "mbm_local_bytes_config",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= mbm_local_bytes_config_show,
> +	},
>  	{
>  		.name		= "cpus",
>  		.mode		= 0644,
> 
> 

With the subject and changelog changes addressed:

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-01 15:37 ` [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
@ 2022-12-15 18:24   ` Reinette Chatre
  2022-12-19 19:28     ` Moger, Babu
  2022-12-19 19:50     ` Moger, Babu
  0 siblings, 2 replies; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 18:24 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:37 AM, Babu Moger wrote:
> The current event configuration for mbm_total_bytes can be changed by
> the user by writing to the file
> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.

Please drop "current" from above

> 
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.

please drop "will" 

> 
> Following are the types of events supported:
> 
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> For example:
> To change the mbm_total_bytes to count only reads on domain 0, the bits
> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
> command.
> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> To change the mbm_total_bytes to count all the slow memory reads on
> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> Run the command.
> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/monitor.c  |   13 +++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  127 ++++++++++++++++++++++++++++++++
>  include/linux/resctrl.h                |   10 +++
>  3 files changed, 149 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 7c8a3a745041..b265856835de 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
>  		memset(am, 0, sizeof(*am));
>  }
>  
> +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> +
> +	if (is_mbm_total_enabled())
> +		memset(hw_dom->arch_mbm_total, 0,
> +		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
> +
> +	if (is_mbm_local_enabled())
> +		memset(hw_dom->arch_mbm_local, 0,
> +		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
> +}
> +

We learned a lot more about this area after Peter's discovery:
https://lore.kernel.org/lkml/20221207112924.3602960-1-peternewman@google.com/

Since this is a new generic function it should be clear in which scenarios it is valid.
Could you please add a function comment that warns future developers about consequences
if a new usage is considered? Something like:

/*
 * Assumes that hardware counters are also reset and thus that there is no need
 * to record initial non-zero counts.
 */

>  static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>  {
>  	u64 shift = 64 - width, chunks;
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 580f3cce19e2..8a22a652a6e8 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +static void mon_event_config_write(void *info)
> +{
> +	struct mon_config_info *mon_info = info;
> +	u32 index;
> +

index does not need to be u32 ... mon_event_config_index_get() returns "unsigned int"
and wrmsr expects "unsigned int", it can also just be "unsigned int".


> +	index = mon_event_config_index_get(mon_info->evtid);
> +	if (index == INVALID_CONFIG_INDEX) {
> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> +		return;
> +	}
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +}
> +
> +static int mbm_config_write_domain(struct rdt_resource *r,
> +				   struct rdt_domain *d, u32 evtid, u32 val)
> +{
> +	struct mon_config_info mon_info = {0};
> +	int ret = 0;
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (val > MAX_EVT_CONFIG_BITS) {
> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Read the current config value first. If both are the same then
> +	 * no need to write it again.
> +	 */
> +	mon_info.evtid = evtid;
> +	mondata_config_read(d, &mon_info);
> +	if (mon_info.mon_config == val)
> +		goto out;
> +
> +	mon_info.mon_config = val;
> +
> +	/*
> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> +	 * are scoped at the domain level. Writing any of these MSRs
> +	 * on one CPU is supposed to be observed by all CPUs in the
> +	 * domain. However, the hardware team recommends to update
> +	 * these MSRs on all the CPUs in the domain.
> +	 */
> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared by the hardware. The
> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> +	 * every RMID on the next read to any event for every RMID.
> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> +	 * cleared while it is tracked by the hardware. Clear the
> +	 * mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	resctrl_arch_reset_rmid_all(r, d);

If I understand correctly the expectation is that when user space read counters
(via mon_data files) right after the configuration was changed then this read
will return "Unavailable" and then the next read will return data.

If this is the case then I think a snippet about this user experience would be
helpful to add to the documentation.

Have you considered doing a preemptive read on the RMIDs that are in use to avoid
users encountering "Unavailable"? I assume doing so on a busy system could potentially
involve hundreds of register reads/writes.

> +
> +out:
> +	return ret;
> +}
> +
> +static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
> +{
> +	char *dom_str = NULL, *id_str;
> +	unsigned long dom_id, val;
> +	struct rdt_domain *d;
> +	int ret = 0;
> +
> +next:
> +	if (!tok || tok[0] == '\0')
> +		return 0;
> +
> +	/* Start processing the strings for each domain */
> +	dom_str = strim(strsep(&tok, ";"));
> +	id_str = strsep(&dom_str, "=");
> +
> +	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> +		rdt_last_cmd_puts("Non-numeric event configuration value\n");
> +		return -EINVAL;
> +	}
> +
> +	list_for_each_entry(d, &r->domains, list) {
> +		if (d->id == dom_id) {
> +			ret = mbm_config_write_domain(r, d, evtid, val);
> +			if (ret)
> +				return -EINVAL;
> +			goto next;
> +		}
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> +					    char *buf, size_t nbytes,
> +					    loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "mbm_total_bytes_config",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= mbm_total_bytes_config_show,
> +		.write		= mbm_total_bytes_config_write,
>  	},
>  	{
>  		.name		= "mbm_local_bytes_config",
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 0cee154abc9f..e4dc65892446 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d,
>  void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d,
>  			     u32 rmid, enum resctrl_event_id eventid);
>  
> +/**
> + * resctrl_arch_reset_rmid_all() - Reset any private state associated with
> + * 				   all the rmids.

It could be more explicit:
"Reset all private state associated with all rmids and eventids."

> + * @r:		The domain's resource.
> + * @d:		The rmid's domain.

This copy&paste needs some changes to match this new utility.
How about:
@r: The resctrl resource.
@d: The domain for which all architectural counter state will be cleared.

I think it can be improved more but the above could be a start (please do
not copy verbatim but ensure style is correct.)

Keep in mind that this utility does not clear the non-architectural
counter state. This does not apply to AMD since that state is used by
the software controller, but it needs to be kept in mind if another
usage for this utility arises.

> + *
> + * This can be called from any CPU.
> + */
> +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d);
> +
>  extern unsigned int resctrl_rmid_realloc_threshold;
>  extern unsigned int resctrl_rmid_realloc_limit;
>  
> 
> 

The above hunk fails the "no spaces before tabs" checkpatch check.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
  2022-12-01 15:37 ` [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
@ 2022-12-15 18:25   ` Reinette Chatre
  2022-12-19 19:51     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 18:25 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:37 AM, Babu Moger wrote:
> The current event configuration for mbm_local_bytes can be changed by
> the user by writing to the configuration file
> /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
> 

Same comments about subject line and usage of "current".

> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
> 
> Following are the types of events supported:
> ====  ===========================================================
> Bits   Description
> ====  ===========================================================
> 6      Dirty Victims from the QOS domain to all types of memory
> 5      Reads to slow memory in the non-local NUMA domain
> 4      Reads to slow memory in the local NUMA domain
> 3      Non-temporal writes to non-local NUMA domain
> 2      Non-temporal writes to local NUMA domain
> 1      Reads to memory in the non-local NUMA domain
> 0      Reads to memory in the local NUMA domain
> ====  ===========================================================
> 
> For example:
> To change the mbm_local_bytes_config to count all the non-temporal writes
> on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex 0xc).
> Run the command.
>     $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 
> To change the mbm_local_bytes to count only reads to local NUMA domain 1,
> the bit 0 needs to be set which 1b (in hex 0x1). Run the command.
>     $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 8a22a652a6e8..6897c480ae55 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1641,6 +1641,32 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
>  	return ret ?: nbytes;
>  }
>  
> +static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> +					    char *buf, size_t nbytes,
> +					    loff_t off)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	int ret;
> +
> +	/* Valid input requires a trailing newline */
> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> +		return -EINVAL;
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	buf[nbytes - 1] = '\0';
> +
> +	ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
> +
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1748,9 +1774,10 @@ static struct rftype res_common_files[] = {
>  	},
>  	{
>  		.name		= "mbm_local_bytes_config",
> -		.mode		= 0444,
> +		.mode		= 0644,
>  		.kf_ops		= &rdtgroup_kf_single_ops,
>  		.seq_show	= mbm_local_bytes_config_show,
> +		.write		= mbm_local_bytes_config_write,
>  	},
>  	{
>  		.name		= "cpus",
> 
> 

With the subject and changelog comments addressed:

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
  2022-12-01 15:37 ` [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
@ 2022-12-15 18:26   ` Reinette Chatre
  2022-12-19 19:59     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 18:26 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:37 AM, Babu Moger wrote:
> The call on_each_cpu_mask() runs the function on each CPU specified
> by cpumask, which may include the local processor. So, replace the call
> smp_call_function_many() with on_each_cpu_mask() to simplify the code.

Please move the solution to a new paragraph and drop the "So,". The two
instances of "the call" can be dropped also.

> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---

Could you please move this patch to the beginning of this series?
Fixes and cleanups should go before new features.

>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
>  1 file changed, 8 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6897c480ae55..68e14831a638 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -325,12 +325,7 @@ static void update_cpu_closid_rmid(void *info)
>  static void
>  update_closid_rmid(const struct cpumask *cpu_mask, struct rdtgroup *r)
>  {
> -	int cpu = get_cpu();
> -
> -	if (cpumask_test_cpu(cpu, cpu_mask))
> -		update_cpu_closid_rmid(r);
> -	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
> -	put_cpu();
> +	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
>  }
>  
>  static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
> @@ -2135,13 +2130,9 @@ static int set_cache_qos_cfg(int level, bool enable)
>  			/* Pick one CPU from each domain instance to update MSR */
>  			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
>  	}
> -	cpu = get_cpu();
> -	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
> -	if (cpumask_test_cpu(cpu, cpu_mask))
> -		update(&enable);
> -	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
> -	smp_call_function_many(cpu_mask, update, &enable, 1);
> -	put_cpu();
> +
> +	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
> +	on_each_cpu_mask(cpu_mask, update, &enable, 1);
>  
>  	free_cpumask_var(cpu_mask);
>  
> @@ -2618,7 +2609,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
>  	struct msr_param msr_param;
>  	cpumask_var_t cpu_mask;
>  	struct rdt_domain *d;
> -	int i, cpu;
> +	int i;
>  
>  	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
>  		return -ENOMEM;
> @@ -2639,13 +2630,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
>  		for (i = 0; i < hw_res->num_closid; i++)
>  			hw_dom->ctrl_val[i] = r->default_ctrl;
>  	}
> -	cpu = get_cpu();
> -	/* Update CBM on this cpu if it's in cpu_mask. */
> -	if (cpumask_test_cpu(cpu, cpu_mask))
> -		rdt_ctrl_update(&msr_param);
> -	/* Update CBM on all other cpus in cpu_mask. */
> -	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
> -	put_cpu();
> +
> +	/* Update CBM on all the CPUs in cpu_mask */
> +	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
>  
>  	free_cpumask_var(cpu_mask);
>  
> 
> 

Should the snippet in resctrl_arch_update_domains() also be updated?

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-12-01 15:37 ` [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
@ 2022-12-15 18:30   ` Reinette Chatre
  2022-12-19 20:05     ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 18:30 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/1/2022 7:37 AM, Babu Moger wrote:
> Update the documentation for the new features:
> 1. Slow Memory Bandwidth allocation (SMBA).
>    With this feature, the QOS  enforcement policies can be applied
>    to the external slow memory connected to the host. QOS enforcement
>    is accomplished by assigning a Class Of Service (COS) to a processor
>    and specifying allocations or limits for that COS for each resource
>    to be allocated.
> 
> 2. Bandwidth Monitoring Event Configuration (BMEC).
>    The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
>    are set to count all the total and local reads/writes respectively.
>    With the introduction of slow memory, the two counters are not
>    enough to count all the different types of memory events. With the
>    feature BMEC, the users have the option to configure mbm_total_bytes
>    and mbm_local_bytes to count the specific type of events.
> 
> Also add configuration instructions with examples.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> ---
>  Documentation/x86/resctrl.rst |  138 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 136 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 71a531061e4e..60761a6f9087 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
>  This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
>  flag bits:
>  
> -=============================================	================================
> +===============================================	================================
>  RDT (Resource Director Technology) Allocation	"rdt_a"
>  CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
>  CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
>  CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
>  MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
>  MBA (Memory Bandwidth Allocation)		"mba"
> -=============================================	================================
> +SMBA (Slow Memory Bandwidth Allocation)         "smba"
> +BMEC (Bandwidth Monitoring Event Configuration) "bmec"
> +===============================================	================================
>  
>  To use the feature mount the file system::
>  
> @@ -161,6 +163,79 @@ with the following files:
>  "mon_features":
>  		Lists the monitoring events if
>  		monitoring is enabled for the resource.
> +                Example::
> +
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_local_bytes
> +
> +                If the system supports Bandwidth Monitoring Event
> +                Configuration (BMEC), then the bandwidth events will
> +                be configurable. The output will be::
> +
> +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> +                   llc_occupancy
> +                   mbm_total_bytes
> +                   mbm_total_bytes_config
> +                   mbm_local_bytes
> +                   mbm_local_bytes_config
> +
> +"mbm_total_bytes_config", "mbm_local_bytes_config":
> +        These files contain the current event configuration for the events

"These files" is redundant. Note that this is already introduced with "the
following files:".
To match similar files it could read:
"Read/write files containing the configuration for the mbm_total_bytes and
mbm_local_bytes events, respectively, ..."

> +        mbm_total_bytes and mbm_local_bytes, respectively, when the
> +        Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
> +        The event configuration settings are domain specific and will affect

"will" can be dropped?

> +        all the CPUs in the domain.
> +
> +        Following are the types of events supported:
> +
> +        ====    ========================================================
> +        Bits    Description
> +        ====    ========================================================
> +        6       Dirty Victims from the QOS domain to all types of memory
> +        5       Reads to slow memory in the non-local NUMA domain
> +        4       Reads to slow memory in the local NUMA domain
> +        3       Non-temporal writes to non-local NUMA domain
> +        2       Non-temporal writes to local NUMA domain
> +        1       Reads to memory in the non-local NUMA domain
> +        0       Reads to memory in the local NUMA domain
> +        ====    ========================================================
> +
> +        By default, the mbm_total_bytes configuration is set to 0x7f to count
> +        all the event types and the mbm_local_bytes configuration is set to
> +        0x15 to count all the local memory events.
> +
> +        Examples:
> +
> +        * To view the current configuration::
> +          ::
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> +            0=0x7f;1=0x7f;2=0x7f;3=0x7f
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> +            0=0x15;1=0x15;3=0x15;4=0x15
> +
> +        * To change the mbm_total_bytes to count only reads on domain 0,
> +          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
> +          (in hexadecimal 0x33):
> +          ::
> +
> +            # echo  "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> +            0=0x33;1=0x7f;2=0x7f;3=0x7f
> +
> +        * To change the mbm_local_bytes to count all the slow memory reads on
> +          domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
> +          in binary (in hexadecimal 0x30):
> +          ::
> +
> +            # echo  "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> +
> +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> +            0=0x30;1=0x30;3=0x15;4=0x15
>  
>  "max_threshold_occupancy":
>  		Read/write file provides the largest value (in
> @@ -464,6 +539,25 @@ Memory bandwidth domain is L3 cache.
>  
>  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>  
> +Slow Memory Bandwidth Allocation (SMBA)
> +---------------------------------------
> +AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
> +CXL.memory is the only supported "slow" memory device. With the
> +support of SMBA, the hardware enables bandwidth allocation on
> +the slow memory devices. If there are multiple such devices in
> +the system, the throttling logic groups all the slow sources
> +together and applies the limit on them as a whole.
> +
> +The presence of SMBA (with CXL.memory) is independent of slow memory
> +devices presence. If there are no such devices on the system, then
> +configuring SMBA will have no impact on the performance of the system.
> +
> +The bandwidth domain for slow memory is L3 cache. Its schemata file
> +is formatted as:
> +::
> +
> +	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
> +
>  Reading/writing the schemata file
>  ---------------------------------
>  Reading the schemata file will show the state of all resources
> @@ -479,6 +573,46 @@ which you wish to change.  E.g.
>    L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
>    L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
>  
> +Reading/writing the schemata file (on AMD systems)
> +--------------------------------------------------
> +Reading the schemata file will show the current bandwidth limit on all
> +domains. The allocated resources are in multiples of one eighth GB/s.
> +When writing to the file, you need to specify what cache id you wish to
> +configure the bandwidth limit.
> +
> +For example, to allocate 2GB/s limit on the first cache id:
> +
> +::
> +
> +  # cat schemata
> +    MB:0=2048;1=2048;2=2048;3=2048
> +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +  # echo "MB:1=16" > schemata
> +  # cat schemata
> +    MB:0=2048;1=  16;2=2048;3=2048
> +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +Reading/writing the schemata file (on AMD systems) with SMBA feature
> +--------------------------------------------------------------------
> +Reading and writing the schemata file is the same as without SMBA in
> +above section.
> +
> +For example, to allocate 8GB/s limit on the first cache id:
> +
> +::
> +
> +  # cat schemata
> +    SMBA:0=2048;1=2048;2=2048;3=2048
> +      MB:0=2048;1=2048;2=2048;3=2048
> +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +  # echo "SMBA:1=64" > schemata
> +  # cat schemata
> +    SMBA:0=2048;1=  64;2=2048;3=2048
> +      MB:0=2048;1=2048;2=2048;3=2048
> +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
>  Cache Pseudo-Locking
>  ====================
>  CAT enables a user to specify the amount of cache space that an
> 
> 

Based on earlier comments I am awaiting information to understand if some
more detail/example is needed to describe to the user what can be expected
after a counter configuration is made.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 00/13] Support for AMD QoS new features
  2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
                   ` (13 preceding siblings ...)
  2022-12-15 15:08 ` [PATCH v9 00/13] Support for AMD QoS " Moger, Babu
@ 2022-12-15 18:38 ` Reinette Chatre
  2022-12-19 20:57   ` Moger, Babu
  14 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-15 18:38 UTC (permalink / raw)
  To: Babu Moger, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	sandipan.das, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

Please also use the x86/resctrl prefix in the cover letter's subject.

Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag
  2022-12-15 17:08   ` Reinette Chatre
@ 2022-12-15 21:10     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-15 21:10 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:09 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth
> Allocation feature flag
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > Add the new AMD feature X86_FEATURE_SMBA. With this feature, the QOS
> > enforcement policies can be applied to external slow memory connected
> > to the host. QOS enforcement is accomplished by assigning a Class Of
> > Service (COS) to a processor and specifying allocations or limits for
> > that COS for each resource to be allocated.
> >
> > This feature is identified by the CPUID Function 8000_0020_EBX_x0.
> >
> > CPUID Fn8000_0020_EBX_x0 AMD Bandwidth Enforcement Feature
> Identifiers
> > (ECX=0)
> >
> > Bits    Field Name      Description
> > 2       L3SBE           L3 external slow memory bandwidth enforcement
> >
> > CXL.memory is the only supported "slow" memory device. With the
> > support of SMBA feature, the hardware enables bandwidth allocation on
> > the slow memory devices. If there are multiple slow memory devices in
> > the system, then the throttling logic groups all the slow sources
> > together and applies the limit on them as a whole.
> >
> > The presence of the SMBA feature(with CXL.memory) is independent of
> > whether slow memory device is actually present in the system. If there
> > is no slow memory in the system, then setting a SMBA limit will have
> > no impact on the performance of the system.
> >
> > Presence of CXL memory can be identified by numactl command.
> >
> > $numactl -H
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 node 0 size:
> > 63678 MB node 0 free: 59542 MB node 1 cpus:
> > node 1 size: 16122 MB
> > node 1 free: 15627 MB
> > node distances:
> > node   0   1
> >    0:  10  50
> >    1:  50  10
> >
> > CPU list for CXL memory will be empty. The cpu-cxl node distance is
> > greater than cpu-to-cpu distances. Node 1 has the CXL memory in this
> > case. CXL memory can also be identified using ACPI SRAT table and
> > memory maps.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03
> > Publication # 56375 Revision: 1.03 Issue Date: February 2022".
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cb03ef091
> >
> f14847baced108dadebfaa5a%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C
> >
> 638067212053430832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=HF%2
> > BYN5rW5i8fw6588RXfgCKMSMF4EmZZ252Q0N7Mew0%3D&amp;reserved=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7Cb03ef091f14847baced108dadebfaa5a%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C638067212053430832%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=Im%2BiT7NDUpcgR0dAYFDJALge6Wf873MNLyntsyC
> MwlQ%3D&
> > amp;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> 
> According to "Ordering of commit tags" in Documentation/process/maintainer-
> tip.rst
> the "Link:" tags should be after "Signed-off-by:". Could you please re-order
> these to ensure this series is ready for the next stage?

Sure. Will change it.

> 
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index 11a0e06362e4..b6a45e56cd0c 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -307,6 +307,7 @@
> >  #define X86_FEATURE_SGX_EDECCSSA	(11*32+18) /* "" SGX
> EDECCSSA user leaf function */
> >  #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth
> tracking for RSB stuffing */
> >  #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR
> IA32_TSX_CTRL (Intel) implemented */
> > +#define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory
> Bandwidth Allocation */
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index f53944fb8f7f..d925753084fb
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> > +	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >
> >
> 
> With the tag ordering addressed:
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks
Babu
> 
> Thank you
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA
  2022-12-15 17:10   ` Reinette Chatre
@ 2022-12-15 21:30     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-15 21:30 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:11 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 02/13] x86/resctrl: Add a new resource type
> RDT_RESOURCE_SMBA
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > Add a new resource type RDT_RESOURCE_SMBA to handle the QoS
> > enforcement policies on the external slow memory.
> >
> 
> I think a snippet like below may help to set reviewer's mind at ease about the
> consequences of values chosen:
> 
> "Mostly initialization of the essentials. Setting fflags to RFTYPE_RES_MB
> configures the SMBA resource to have the same resctrl files as the existing
> MBA resource. The SMBA resource has identical properties to the existing MBA
> resource. These properties will be enumerated in an upcoming change and
> exposed via resctrl because of this flag."

Sure. Will add it.
> 
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/core.c     |   12 ++++++++++++
> >  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index c98e52ff5f20..f6af3ac1ef20 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -100,6 +100,18 @@ struct rdt_hw_resource rdt_resources_all[] = {
> >  			.fflags			= RFTYPE_RES_MB,
> >  		},
> >  	},
> > +	[RDT_RESOURCE_SMBA] =
> > +	{
> > +		.r_resctrl = {
> > +			.rid			= RDT_RESOURCE_SMBA,
> > +			.name			= "SMBA",
> > +			.cache_level		= 3,
> > +			.domains		=
> domain_init(RDT_RESOURCE_SMBA),
> > +			.parse_ctrlval		= parse_bw,
> > +			.format_str		= "%d=%*u",
> > +			.fflags			= RFTYPE_RES_MB,
> > +		},
> > +	},
> >  };
> >
> >  /*
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 5ebd28e6aa0c..fdbbf66312ec 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -409,6 +409,7 @@ enum resctrl_res_level {
> >  	RDT_RESOURCE_L3,
> >  	RDT_RESOURCE_L2,
> >  	RDT_RESOURCE_MBA,
> > +	RDT_RESOURCE_SMBA,
> >
> >  	/* Must be the last */
> >  	RDT_NUM_RESOURCES,
> >
> >
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag
  2022-12-15 17:11   ` Reinette Chatre
@ 2022-12-19 15:31     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 15:31 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:11 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring
> Event Configuration feature flag
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > Newer AMD processors support the new feature Bandwidth Monitoring
> > Event Configuration (BMEC).
> >
> > The feature support is identified via CPUID Fn8000_0020_EBX_x0 (ECX=0).
> > Bits    Field Name    Description
> > 3       EVT_CFG       Bandwidth Monitoring Event Configuration (BMEC)
> >
> > Currently, the bandwidth monitoring events mbm_total_bytes and
> 
> Please drop "Currently,".
Sure.
> 
> > mbm_local_bytes are set to count all the total and local reads/writes
> > respectively. With the introduction of slow memory, the two counters
> > are not enough to count all the different types of memory events. With
> > the feature BMEC, the users have the option to configure
> > mbm_total_bytes and mbm_local_bytes to count the specific type of
> > events.
> >
> > Each BMEC event has a configuration MSR, which contains one field for
> > each bandwidth type that can be used to configure the bandwidth event
> > to track any combination of supported bandwidth types. The event will
> > count requests from every bandwidth type bit that is set in the
> > corresponding configuration register.
> >
> > Following are the types of events supported:
> >
> > ====    ========================================================
> > Bits    Description
> > ====    ========================================================
> > 6       Dirty Victims from the QOS domain to all types of memory
> > 5       Reads to slow memory in the non-local NUMA domain
> > 4       Reads to slow memory in the local NUMA domain
> > 3       Non-temporal writes to non-local NUMA domain
> > 2       Non-temporal writes to local NUMA domain
> > 1       Reads to memory in the non-local NUMA domain
> > 0       Reads to memory in the local NUMA domain
> > ====    ========================================================
> >
> > By default, the mbm_total_bytes configuration is set to 0x7F to count
> > all the event types and the mbm_local_bytes configuration is set to
> > 0x15 to count all the local memory events.
> >
> > Feature description is available in the specification, "AMD64
> > Technology Platform Quality of Service Extensions, Revision: 1.03
> > Publication
> 
> Missing end quote above.
Ok.
> 
> >
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > amd.com%2Fen%2Fsupport%2Ftech-docs%2Famd64-technology-platform-
> quality
> > -service-
> extensions&amp;data=05%7C01%7Cbabu.moger%40amd.com%7Cbf8c3716
> >
> eed74809726508dadebfbee2%7C3dd8961fe4884e608e11a82d994e183d%7C0%
> 7C0%7C
> >
> 638067212372390994%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
> AiLCJQIjo
> >
> iV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdat
> a=nLkb
> >
> Wj%2FCed7gAf%2B8qfEec%2FMa1lKgi83EsPZs%2FU%2FOLWc%3D&amp;reserv
> ed=0
> > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
> >
> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=05%7C01%7Cbab
> u.m
> >
> oger%40amd.com%7Cbf8c3716eed74809726508dadebfbee2%7C3dd8961fe488
> 4e608e
> >
> 11a82d994e183d%7C0%7C0%7C638067212372390994%7CUnknown%7CTWFpb
> GZsb3d8ey
> >
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C300
> >
> 0%7C%7C%7C&amp;sdata=DTiTxeQYtjCz50emuxvoHFJbOPJUO0qcUHjwnQAxP3
> o%3D&am
> > p;reserved=0
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> 
> Same comment about "Link:" ordering as for patch 1.

Sure.
> 
> > ---
> >  arch/x86/include/asm/cpufeatures.h |    1 +
> >  arch/x86/kernel/cpu/cpuid-deps.c   |    2 ++
> >  arch/x86/kernel/cpu/scattered.c    |    1 +
> >  3 files changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h
> > b/arch/x86/include/asm/cpufeatures.h
> > index b6a45e56cd0c..415796d7b309 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -308,6 +308,7 @@
> >  #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth
> tracking for RSB stuffing */
> >  #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR
> IA32_TSX_CTRL (Intel) implemented */
> >  #define X86_FEATURE_SMBA		(11*32+21) /* Slow Memory
> Bandwidth Allocation */
> > +#define X86_FEATURE_BMEC		(11*32+22) /* Bandwidth
> Monitoring Event Configuration */
> >
> >  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> >  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI
> instructions */
> > diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> > b/arch/x86/kernel/cpu/cpuid-deps.c
> > index d95221117129..f6748c8bd647 100644
> > --- a/arch/x86/kernel/cpu/cpuid-deps.c
> > +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> > @@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
> >  	{ X86_FEATURE_CQM_OCCUP_LLC,
> 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_TOTAL,
> 	X86_FEATURE_CQM_LLC   },
> >  	{ X86_FEATURE_CQM_MBM_LOCAL,
> 	X86_FEATURE_CQM_LLC   },
> > +	{ X86_FEATURE_BMEC,
> 	X86_FEATURE_CQM_MBM_TOTAL   },
> > +	{ X86_FEATURE_BMEC,
> 	X86_FEATURE_CQM_MBM_LOCAL   },
> >  	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
> >  	{ X86_FEATURE_AVX512_FP16,		X86_FEATURE_AVX512BW  },
> >  	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES
> },
> > diff --git a/arch/x86/kernel/cpu/scattered.c
> > b/arch/x86/kernel/cpu/scattered.c index d925753084fb..0dad49a09b7a
> > 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -46,6 +46,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
> >  	{ X86_FEATURE_SMBA,		CPUID_EBX,  2, 0x80000020, 0 },
> > +	{ X86_FEATURE_BMEC,		CPUID_EBX,  3, 0x80000020, 0 },
> >  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> >  	{ X86_FEATURE_AMD_LBR_V2,	CPUID_EAX,  1, 0x80000022, 0 },
> >  	{ 0, 0, 0, 0, 0 }
> >
> >
> 
> With changelog comments addressed:
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 04/13] x86/resctrl: Include new features in command line options
  2022-12-15 17:12   ` Reinette Chatre
@ 2022-12-19 15:33     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 15:33 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:12 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 04/13] x86/resctrl: Include new features in command
> line options
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > Add the command line options to enable or disable the new resctrl features.
> > smba : Slow Memory Bandwidth Allocation bmec : Bandwidth Monitor Event
> > Configuration.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |    2 +-
> >  arch/x86/kernel/cpu/resctrl/core.c              |    4 ++++
> >  2 files changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index 42af9ca0127e..a7b6634f4426 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5190,7 +5190,7 @@
> >  	rdt=		[HW,X86,RDT]
> >  			Turn on/off individual RDT features. List is:
> >  			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> > -			mba.
> > +			mba, smba, bmec.
> >  			E.g. to turn on cmt and turn off mba use:
> >  				rdt=cmt,!mba
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index f6af3ac1ef20..10a8c9d96f32 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -659,6 +659,8 @@ enum {
> >  	RDT_FLAG_L2_CAT,
> >  	RDT_FLAG_L2_CDP,
> >  	RDT_FLAG_MBA,
> > +	RDT_FLAG_SMBA,
> > +	RDT_FLAG_BMEC,
> >  };
> >
> >  #define RDT_OPT(idx, n, f)	\
> > @@ -682,6 +684,8 @@ static struct rdt_options rdt_options[]  __initdata = {
> >  	RDT_OPT(RDT_FLAG_L2_CAT,    "l2cat",	X86_FEATURE_CAT_L2),
> >  	RDT_OPT(RDT_FLAG_L2_CDP,    "l2cdp",
> 	X86_FEATURE_CDP_L2),
> >  	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
> > +	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
> > +	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
> >  };
> >  #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
> >
> >
> >
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks 
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation
  2022-12-15 17:13   ` Reinette Chatre
@ 2022-12-19 15:34     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 15:34 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:14 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory
> Bandwidth Allocation
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > The QoS slow memory configuration details are available via
> > CPUID_Fn80000020_EDX_x02. Detect the available details and initialize
> > the rest to defaults.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/msr-index.h          |    1 +
> >  arch/x86/kernel/cpu/resctrl/core.c        |   36
> +++++++++++++++++++++++++++--
> >  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |    2 +-
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |    8 ++++--
> >  4 files changed, 41 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 37ff47552bcb..e0a40027aa62 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -1061,6 +1061,7 @@
> >
> >  /* - AMD: */
> >  #define MSR_IA32_MBA_BW_BASE		0xc0000200
> > +#define MSR_IA32_SMBA_BW_BASE		0xc0000280
> >
> >  /* MSR_IA32_VMX_MISC bits */
> >  #define MSR_IA32_VMX_MISC_INTEL_PT                 (1ULL << 14)
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index 10a8c9d96f32..b4fc851f6489 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -162,6 +162,13 @@ bool is_mba_sc(struct rdt_resource *r)
> >  	if (!r)
> >  		return
> rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl.membw.mba_sc;
> >
> > +	/*
> > +	 * The software controller support is only applicable to MBA resource.
> > +	 * Make sure to check for resource type.
> > +	 */
> > +	if (r->rid != RDT_RESOURCE_MBA)
> > +		return false;
> > +
> >  	return r->membw.mba_sc;
> >  }
> >
> > @@ -225,9 +232,15 @@ static bool __rdt_get_mem_config_amd(struct
> rdt_resource *r)
> >  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> >  	union cpuid_0x10_3_eax eax;
> >  	union cpuid_0x10_x_edx edx;
> > -	u32 ebx, ecx;
> > +	u32 ebx, ecx, subleaf;
> >
> > -	cpuid_count(0x80000020, 1, &eax.full, &ebx, &ecx, &edx.full);
> > +	/*
> > +	 * Query CPUID_Fn80000020_EDX_x01 for MBA and
> > +	 * CPUID_Fn80000020_EDX_x02 for SMBA
> > +	 */
> > +	subleaf = (r->rid == RDT_RESOURCE_SMBA) ? 2 :  1;
> > +
> > +	cpuid_count(0x80000020, subleaf, &eax.full, &ebx, &ecx, &edx.full);
> >  	hw_res->num_closid = edx.split.cos_max + 1;
> >  	r->default_ctrl = MAX_MBA_BW_AMD;
> >
> > @@ -750,6 +763,19 @@ static __init bool get_mem_config(void)
> >  	return false;
> >  }
> >
> > +static __init bool get_slow_mem_config(void) {
> > +	struct rdt_hw_resource *hw_res =
> > +&rdt_resources_all[RDT_RESOURCE_SMBA];
> > +
> > +	if (!rdt_cpu_has(X86_FEATURE_SMBA))
> > +		return false;
> > +
> > +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> > +		return __rdt_get_mem_config_amd(&hw_res->r_resctrl);
> > +
> > +	return false;
> > +}
> > +
> >  static __init bool get_rdt_alloc_resources(void)  {
> >  	struct rdt_resource *r;
> > @@ -780,6 +806,9 @@ static __init bool get_rdt_alloc_resources(void)
> >  	if (get_mem_config())
> >  		ret = true;
> >
> > +	if (get_slow_mem_config())
> > +		ret = true;
> > +
> >  	return ret;
> >  }
> >
> > @@ -869,6 +898,9 @@ static __init void rdt_init_res_defs_amd(void)
> >  		} else if (r->rid == RDT_RESOURCE_MBA) {
> >  			hw_res->msr_base = MSR_IA32_MBA_BW_BASE;
> >  			hw_res->msr_update = mba_wrmsr_amd;
> > +		} else if (r->rid == RDT_RESOURCE_SMBA) {
> > +			hw_res->msr_base = MSR_IA32_SMBA_BW_BASE;
> > +			hw_res->msr_update = mba_wrmsr_amd;
> >  		}
> >  	}
> >  }
> > diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > index 1df0e3262bca..2dd4b8c47f23 100644
> > --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> > @@ -209,7 +209,7 @@ static int parse_line(char *line, struct resctrl_schema
> *s,
> >  	unsigned long dom_id;
> >
> >  	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
> > -	    r->rid == RDT_RESOURCE_MBA) {
> > +	    (r->rid == RDT_RESOURCE_MBA || r->rid ==
> RDT_RESOURCE_SMBA)) {
> >  		rdt_last_cmd_puts("Cannot pseudo-lock MBA resource\n");
> >  		return -EINVAL;
> >  	}
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index e5a48f05e787..8a3dafc0dbf7 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1213,7 +1213,7 @@ static bool rdtgroup_mode_test_exclusive(struct
> > rdtgroup *rdtgrp)
> >
> >  	list_for_each_entry(s, &resctrl_schema_all, list) {
> >  		r = s->res;
> > -		if (r->rid == RDT_RESOURCE_MBA)
> > +		if (r->rid == RDT_RESOURCE_MBA || r->rid ==
> RDT_RESOURCE_SMBA)
> >  			continue;
> >  		has_cache = true;
> >  		list_for_each_entry(d, &r->domains, list) { @@ -1402,7
> +1402,8 @@
> > static int rdtgroup_size_show(struct kernfs_open_file *of,
> >  					ctrl = resctrl_arch_get_config(r, d,
> >  								       closid,
> >  								       type);
> > -				if (r->rid == RDT_RESOURCE_MBA)
> > +				if (r->rid == RDT_RESOURCE_MBA ||
> > +				    r->rid == RDT_RESOURCE_SMBA)
> >  					size = ctrl;
> >  				else
> >  					size = rdtgroup_cbm_to_size(r, d, ctrl);
> @@ -2845,7 +2846,8 @@
> > static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
> >
> >  	list_for_each_entry(s, &resctrl_schema_all, list) {
> >  		r = s->res;
> > -		if (r->rid == RDT_RESOURCE_MBA) {
> > +		if (r->rid == RDT_RESOURCE_MBA ||
> > +		    r->rid == RDT_RESOURCE_SMBA) {
> >  			rdtgroup_init_mba(r, rdtgrp->closid);
> >  			if (is_mba_sc(r))
> >  				continue;
> >
> >
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thank you
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config()
  2022-12-15 17:17   ` Reinette Chatre
@ 2022-12-19 15:51     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 15:51 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:17 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 06/13] x86/resctrl: Add __init attribute to
> rdt_get_mon_l3_config()
> 
> Hi Babu,
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > The function rdt_get_mon_l3_config() needs to call rdt_cpu_has() to
> 
> No need to say "The function" ... by using () after a name it is clear that it is a
> function.

Ok
> 
> To support this change it could perhaps be:
> "In an upcoming change rdt_get_mon_l3_config() needs to call
> rdt_cpu_has() to ..."

Sure.
> 
> > query the monitor related features. It cannot be called right now
> > because rdt_cpu_has() has the __init attribute but
> > rdt_get_mon_l3_config() doesn't. So, add the __init attribute to
> > rdt_get_mon_l3_config() to resolve it.
> 
> Please place the solution description in a new paragraph and drop the "So,".
> The description could also be expanded to support this change. For example:
> 
> "Add the __init attribute to rdt_get_mon_l3_config() that is only called by
> get_rdt_mon_resources() that already has the __init attribute. Also make
> rdt_cpu_has() available to by rdt_get_mon_l3_config() via the internal header
> file."
> 
Sure.
> 
> >
> > Also, make the function rdt_cpu_has() available outside core.c file.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/core.c     |    2 +-
> >  arch/x86/kernel/cpu/resctrl/internal.h |    1 +
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |    2 +-
> >  3 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> > b/arch/x86/kernel/cpu/resctrl/core.c
> > index b4fc851f6489..030d3b409768 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -728,7 +728,7 @@ static int __init set_rdt_options(char *str)  }
> > __setup("rdt", set_rdt_options);
> >
> > -static bool __init rdt_cpu_has(int flag)
> > +bool __init rdt_cpu_has(int flag)
> >  {
> >  	bool ret = boot_cpu_has(flag);
> >  	struct rdt_options *o;
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index fdbbf66312ec..7bbfc10094b6 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -512,6 +512,7 @@ void closid_free(int closid);  int
> > alloc_rmid(void);  void free_rmid(u32 rmid);  int
> > rdt_get_mon_l3_config(struct rdt_resource *r);
> > +bool rdt_cpu_has(int flag);
> 
> Please also add __init attribute here by using the same style as the other
> functions in this file that need __init.
Ok
Thanks
Babu
> 
> >  void mon_event_count(void *info);
> >  int rdtgroup_mondata_show(struct seq_file *m, void *arg);  void
> > mon_event_read(struct rmid_read *rr, struct rdt_resource *r, diff
> > --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> > b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index efe0c30d3a12..e33e8d8bd796 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -746,7 +746,7 @@ static void l3_mon_evt_init(struct rdt_resource *r)
> >  		list_add_tail(&mbm_local_event.list, &r->evt_list);  }
> >
> > -int rdt_get_mon_l3_config(struct rdt_resource *r)
> > +int __init rdt_get_mon_l3_config(struct rdt_resource *r)
> >  {
> >  	unsigned int mbm_offset =
> boot_cpu_data.x86_cache_mbm_width_offset;
> >  	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
> >
> >
> 
> Thank you
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration
  2022-12-15 17:19   ` Reinette Chatre
@ 2022-12-19 17:56     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 17:56 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:19 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 07/13] x86/resctrl: Introduce data structure to support
> monitor configuration
> 
> Hi Babu,
> 
> I do not see a new data structure introduced. Perhaps subject could just be:
> x86/resctrl: Support monitor configuration

Sure
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > Add a new field in mon_evt to support Bandwidth Monitoring Event
> 
> "mon_evt" -> "struct mon_evt"
Ok
> 
> > Configuration(BMEC) and also update the "mon_features" display.
> >
> > The resctrl file "mon_features" will display the supported events and
> > files that can be used to configure those events if monitor
> > configuration is supported.
> >
> > Before the change.
> > 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> > 	llc_occupancy
> > 	mbm_total_bytes
> > 	mbm_local_bytes
> >
> > After the change when BMEC is supported.
> > 	$cat /sys/fs/resctrl/info/L3_MON/mon_features
> > 	llc_occupancy
> > 	mbm_total_bytes
> > 	mbm_total_bytes_config
> > 	mbm_local_bytes
> > 	mbm_local_bytes_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/internal.h |    2 ++
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |    7 +++++++
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |    5 ++++-
> >  3 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> > b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 7bbfc10094b6..b36750334deb 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -52,11 +52,13 @@ DECLARE_STATIC_KEY_FALSE(rdt_mon_enable_key);
> >   * struct mon_evt - Entry in the event list of a resource
> >   * @evtid:		event id
> >   * @name:		name of the event
> > + * @configurable:	true if the event is configurable
> >   * @list:		entry in &rdt_resource->evt_list
> >   */
> >  struct mon_evt {
> >  	enum resctrl_event_id	evtid;
> >  	char			*name;
> > +	bool			configurable;
> >  	struct list_head	list;
> >  };
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> > b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index e33e8d8bd796..b39e0eca1879 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -783,6 +783,13 @@ int __init rdt_get_mon_l3_config(struct
> rdt_resource *r)
> >  	if (ret)
> >  		return ret;
> >
> > +	if (rdt_cpu_has(X86_FEATURE_BMEC)) {
> > +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
> > +			mbm_total_event.configurable = true;
> > +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
> > +			mbm_local_event.configurable = true;
> > +	}
> > +
> >  	l3_mon_evt_init(r);
> >
> >  	r->mon_capable = true;
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 8a3dafc0dbf7..8342feb54a7f 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1001,8 +1001,11 @@ static int rdt_mon_features_show(struct
> kernfs_open_file *of,
> >  	struct rdt_resource *r = of->kn->parent->priv;
> >  	struct mon_evt *mevt;
> >
> > -	list_for_each_entry(mevt, &r->evt_list, list)
> > +	list_for_each_entry(mevt, &r->evt_list, list) {
> >  		seq_printf(seq, "%s\n", mevt->name);
> > +		if (mevt->configurable)
> > +			seq_printf(seq, "%s_config\n", mevt->name);
> > +	}
> >
> >  	return 0;
> >  }
> >
> >
> 
> With subject and changelog changes:
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config
  2022-12-15 17:40   ` Reinette Chatre
@ 2022-12-19 18:21     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 18:21 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:41 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> I would like to second James's suggestion to replace sysfs with resctrl or just
> remove it. I am concerned that you mentioned in recent message that you only
> plan changes to patch 10 while James highlighted that this needs to be
> addressed in entire series. Could you please ensure that you check all the
> patches?

Ok. Sure. Will remove it.
x86/resctrl: Add interface to write mbm_total_bytes_config

Will check other patches for similar changes.
 
> 
> On 12/1/2022 7:36 AM, Babu Moger wrote:
> > The current event configuration can be viewed by the user by reading
> 
> What "current" means is not clear and the term could just be removed.

Will remove it.
> 
> > the configuration file /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> >
> > Following are the types of events supported:
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > By default, the mbm_total_bytes_config is set to 0x7f to count all the
> > event types.
> >
> > For example:
> >     $cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >     0=0x7f;1=0x7f;2=0x7f;3=0x7f
> >
> >     In this case, the event mbm_total_bytes is currently configured
> >     with 0x7f on domains 0 to 3.
> 
> "currently" can be removed since it already starts with "In this case".

Sure.
> 
> 
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/include/asm/msr-index.h       |    1
> >  arch/x86/kernel/cpu/resctrl/internal.h |   24 ++++++++
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   99
> ++++++++++++++++++++++++++++++++
> >  4 files changed, 127 insertions(+), 1 deletion(-)
> >
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 8342feb54a7f..e93b1c206116 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1423,6 +1423,90 @@ static int rdtgroup_size_show(struct
> kernfs_open_file *of,
> >  	return ret;
> >  }
> >
> > +struct mon_config_info {
> > +	u32 evtid;
> > +	u32 mon_config;
> > +};
> > +
> > +#define INVALID_CONFIG_INDEX   UINT_MAX
> > +
> > +/**
> > + * mon_event_config_index_get - get the index for the configurable
> > +event
> 
> Could you say "get the hardware index" to help clarify what the index is for?

Sure.

> 
> > + * @evtid: event id.
> > + *
> > + * Return: 0 for evtid == QOS_L3_MBM_TOTAL_EVENT_ID
> > + *         1 for evtid == QOS_L3_MBM_LOCAL_EVENT_ID
> > + *         INVALID_CONFIG_INDEX for invalid evtid
> > + */
> > +static inline unsigned int mon_event_config_index_get(u32 evtid) {
> > +	switch (evtid) {
> > +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> > +		return 0;
> > +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> > +		return 1;
> > +	default:
> > +		/* WARN */
> > +		return INVALID_CONFIG_INDEX;
> > +	}
> > +}
> 
> I see that you copied my sample code here. My intention was that the
> /* WARN */ comment be replaced with an actual warning. As a comment it
> does not add value. Since the caller now prints a subtler warning it could just
> be:
> 
> 	/* Should never reach here */
> 	return INVALID_CONFIG_INDEX;

Sure.
> 
> > +
> > +static void mon_event_config_read(void *info) {
> > +	struct mon_config_info *mon_info = info;
> > +	u32 h, index;
> 
> index can be "unsigned int" as returned by mon_event_config_index_get()

Ok. Sure

> 
> > +
> > +	index = mon_event_config_index_get(mon_info->evtid);
> > +	if (index == INVALID_CONFIG_INDEX) {
> > +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> > +		return;
> > +	}
> > +	rdmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, h);
> > +
> > +	/* Report only the valid event configuration bits */
> > +	mon_info->mon_config &= MAX_EVT_CONFIG_BITS; }
> > +
> > +static void mondata_config_read(struct rdt_domain *d, struct
> > +mon_config_info *mon_info) {
> > +	smp_call_function_any(&d->cpu_mask, mon_event_config_read,
> mon_info,
> > +1); }
> > +
> > +static int mbm_config_show(struct seq_file *s, struct rdt_resource
> > +*r, u32 evtid) {
> > +	struct mon_config_info mon_info = {0};
> > +	struct rdt_domain *dom;
> > +	bool sep = false;
> > +
> > +	mutex_lock(&rdtgroup_mutex);
> > +
> > +	list_for_each_entry(dom, &r->domains, list) {
> > +		if (sep)
> > +			seq_puts(s, ";");
> > +
> > +		mon_info.evtid = evtid;
> > +		mondata_config_read(dom, &mon_info);
> > +
> 
> For robustness, please reset mon_config before calling mondata_config_read().
> Since
> mon_event_config_read() may (yes this is very unlikely) exit early then
> mon_config will contain the data from the previous domain.

Sure. I can call memset to reset.
Thanks
Babu

> 
> > +		seq_printf(s, "%d=0x%02x", dom->id, mon_info.mon_config);
> > +		sep = true;
> > +	}
> > +	seq_puts(s, "\n");
> > +
> > +	mutex_unlock(&rdtgroup_mutex);
> > +
> > +	return 0;
> > +}
> > +
> 
> ...
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config
  2022-12-15 17:43   ` Reinette Chatre
@ 2022-12-19 18:27     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 18:27 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 11:43 AM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read
> mbm_local_bytes_config
> 
> Hi Babu,
> 
> Same as other commits the subject can be shortened:
> x86/resctrl: Add interface to read mbm_local_bytes_config

Sure.
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > The current event configuration can be viewed by the user by reading
> > the configuration file /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> >
> > Following are the types of events supported:
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > By default, the mbm_local_bytes_config is set to 0x15 to count all the
> > local event types.
> >
> > For example:
> >     $cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> >     0=0x15;1=0x15;2=0x15;3=0x15
> >
> >     In this case, the event mbm_local_bytes is currently configured with
> >     0x15 on domains 0 to 3.
> 
> "currently" can be removed

Sure.
> 
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |    4 +++-
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   16 ++++++++++++++++
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> > b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 2afddebc8636..7c8a3a745041 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -788,8 +788,10 @@ int __init rdt_get_mon_l3_config(struct
> rdt_resource *r)
> >  			mbm_total_event.configurable = true;
> >  			mbm_config_rftype_init("mbm_total_bytes_config");
> >  		}
> > -		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
> > +		if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> >  			mbm_local_event.configurable = true;
> > +			mbm_config_rftype_init("mbm_local_bytes_config");
> > +		}
> >  	}
> >
> >  	l3_mon_evt_init(r);
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index e93b1c206116..580f3cce19e2 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1507,6 +1507,16 @@ static int mbm_total_bytes_config_show(struct
> kernfs_open_file *of,
> >  	return 0;
> >  }
> >
> > +static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
> > +				       struct seq_file *seq, void *v) {
> > +	struct rdt_resource *r = of->kn->parent->priv;
> > +
> > +	mbm_config_show(seq, r, QOS_L3_MBM_LOCAL_EVENT_ID);
> > +
> > +	return 0;
> > +}
> > +
> >  /* rdtgroup information files for one cache resource. */  static
> > struct rftype res_common_files[] = {
> >  	{
> > @@ -1611,6 +1621,12 @@ static struct rftype res_common_files[] = {
> >  		.kf_ops		= &rdtgroup_kf_single_ops,
> >  		.seq_show	= mbm_total_bytes_config_show,
> >  	},
> > +	{
> > +		.name		= "mbm_local_bytes_config",
> > +		.mode		= 0444,
> > +		.kf_ops		= &rdtgroup_kf_single_ops,
> > +		.seq_show	= mbm_local_bytes_config_show,
> > +	},
> >  	{
> >  		.name		= "cpus",
> >  		.mode		= 0644,
> >
> >
> 
> With the subject and changelog changes addressed:
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-15 18:24   ` Reinette Chatre
@ 2022-12-19 19:28     ` Moger, Babu
  2022-12-20 17:32       ` Reinette Chatre
  2022-12-19 19:50     ` Moger, Babu
  1 sibling, 1 reply; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 19:28 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:25 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > The current event configuration for mbm_total_bytes can be changed by
> > the user by writing to the file
> > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> 
> Please drop "current" from above
Sure.
> 
> >
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> 
> please drop "will"

Ok

> 
> >
> > Following are the types of events supported:
> >
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > For example:
> > To change the mbm_total_bytes to count only reads on domain 0, the
> > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> > Run the command.
> > 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > To change the mbm_total_bytes to count all the slow memory reads on
> > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> > Run the command.
> > 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |   13 +++
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  127
> ++++++++++++++++++++++++++++++++
> >  include/linux/resctrl.h                |   10 +++
> >  3 files changed, 149 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> > b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 7c8a3a745041..b265856835de 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r,
> struct rdt_domain *d,
> >  		memset(am, 0, sizeof(*am));
> >  }
> >
> > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct
> > +rdt_domain *d) {
> > +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> > +
> > +	if (is_mbm_total_enabled())
> > +		memset(hw_dom->arch_mbm_total, 0,
> > +		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
> > +
> > +	if (is_mbm_local_enabled())
> > +		memset(hw_dom->arch_mbm_local, 0,
> > +		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); }
> > +
> 
> We learned a lot more about this area after Peter's discovery:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2F20221207112924.3602960-1-
> peternewman%40google.com%2F&amp;data=05%7C01%7Cbabu.moger%40am
> d.com%7Cc9ca23a7b3c643d2385508dadec9f83d%7C3dd8961fe4884e608e11a8
> 2d994e183d%7C0%7C0%7C638067256273943013%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
> 3D%7C3000%7C%7C%7C&amp;sdata=Pmvi2L4L727GqALGaeEO0MbpcnuRcqKc
> opNNXuu%2BbFM%3D&amp;reserved=0
> 
> Since this is a new generic function it should be clear in which scenarios it is
> valid.
> Could you please add a function comment that warns future developers about
> consequences if a new usage is considered? Something like:
> 
> /*
>  * Assumes that hardware counters are also reset and thus that there is no
> need
>  * to record initial non-zero counts.
>  */

Ok. Sure.
> 
> >  static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int
> > width)  {
> >  	u64 shift = 64 - width, chunks;
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 580f3cce19e2..8a22a652a6e8 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct
> kernfs_open_file *of,
> >  	return 0;
> >  }
> >
> > +static void mon_event_config_write(void *info) {
> > +	struct mon_config_info *mon_info = info;
> > +	u32 index;
> > +
> 
> index does not need to be u32 ... mon_event_config_index_get() returns
> "unsigned int"
> and wrmsr expects "unsigned int", it can also just be "unsigned int".

Ok.

> 
> 
> > +	index = mon_event_config_index_get(mon_info->evtid);
> > +	if (index == INVALID_CONFIG_INDEX) {
> > +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> > +		return;
> > +	}
> > +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> }
> > +
> > +static int mbm_config_write_domain(struct rdt_resource *r,
> > +				   struct rdt_domain *d, u32 evtid, u32 val) {
> > +	struct mon_config_info mon_info = {0};
> > +	int ret = 0;
> > +
> > +	/* mon_config cannot be more than the supported set of events */
> > +	if (val > MAX_EVT_CONFIG_BITS) {
> > +		rdt_last_cmd_puts("Invalid event configuration\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Read the current config value first. If both are the same then
> > +	 * no need to write it again.
> > +	 */
> > +	mon_info.evtid = evtid;
> > +	mondata_config_read(d, &mon_info);
> > +	if (mon_info.mon_config == val)
> > +		goto out;
> > +
> > +	mon_info.mon_config = val;
> > +
> > +	/*
> > +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> > +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> > +	 * are scoped at the domain level. Writing any of these MSRs
> > +	 * on one CPU is supposed to be observed by all CPUs in the
> > +	 * domain. However, the hardware team recommends to update
> > +	 * these MSRs on all the CPUs in the domain.
> > +	 */
> > +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> &mon_info,
> > +1);
> > +
> > +	/*
> > +	 * When an Event Configuration is changed, the bandwidth counters
> > +	 * for all RMIDs and Events will be cleared by the hardware. The
> > +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> > +	 * every RMID on the next read to any event for every RMID.
> > +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> > +	 * cleared while it is tracked by the hardware. Clear the
> > +	 * mbm_local and mbm_total counts for all the RMIDs.
> > +	 */
> > +	resctrl_arch_reset_rmid_all(r, d);
> 
> If I understand correctly the expectation is that when user space read counters
> (via mon_data files) right after the configuration was changed then this read
> will return "Unavailable" and then the next read will return data.
> 
> If this is the case then I think a snippet about this user experience would be
> helpful to add to the documentation.

Ok.  How about this in the documentation?

"When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain.
The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value."


> 
> Have you considered doing a preemptive read on the RMIDs that are in use to
> avoid users encountering "Unavailable"? I assume doing so on a busy system
> could potentially involve hundreds of register reads/writes.

No. I have not tried that.

> 
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int mon_config_write(struct rdt_resource *r, char *tok, u32
> > +evtid) {
> > +	char *dom_str = NULL, *id_str;
> > +	unsigned long dom_id, val;
> > +	struct rdt_domain *d;
> > +	int ret = 0;
> > +
> > +next:
> > +	if (!tok || tok[0] == '\0')
> > +		return 0;
> > +
> > +	/* Start processing the strings for each domain */
> > +	dom_str = strim(strsep(&tok, ";"));
> > +	id_str = strsep(&dom_str, "=");
> > +
> > +	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> > +		rdt_last_cmd_puts("Non-numeric event configuration
> value\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	list_for_each_entry(d, &r->domains, list) {
> > +		if (d->id == dom_id) {
> > +			ret = mbm_config_write_domain(r, d, evtid, val);
> > +			if (ret)
> > +				return -EINVAL;
> > +			goto next;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> > +					    char *buf, size_t nbytes,
> > +					    loff_t off)
> > +{
> > +	struct rdt_resource *r = of->kn->parent->priv;
> > +	int ret;
> > +
> > +	/* Valid input requires a trailing newline */
> > +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> > +		return -EINVAL;
> > +
> > +	cpus_read_lock();
> > +	mutex_lock(&rdtgroup_mutex);
> > +
> > +	rdt_last_cmd_clear();
> > +
> > +	buf[nbytes - 1] = '\0';
> > +
> > +	ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> > +
> > +	mutex_unlock(&rdtgroup_mutex);
> > +	cpus_read_unlock();
> > +
> > +	return ret ?: nbytes;
> > +}
> > +
> >  /* rdtgroup information files for one cache resource. */  static
> > struct rftype res_common_files[] = {
> >  	{
> > @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = {
> >  	},
> >  	{
> >  		.name		= "mbm_total_bytes_config",
> > -		.mode		= 0444,
> > +		.mode		= 0644,
> >  		.kf_ops		= &rdtgroup_kf_single_ops,
> >  		.seq_show	= mbm_total_bytes_config_show,
> > +		.write		= mbm_total_bytes_config_write,
> >  	},
> >  	{
> >  		.name		= "mbm_local_bytes_config",
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index
> > 0cee154abc9f..e4dc65892446 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource
> > *r, struct rdt_domain *d,  void resctrl_arch_reset_rmid(struct rdt_resource
> *r, struct rdt_domain *d,
> >  			     u32 rmid, enum resctrl_event_id eventid);
> >
> > +/**
> > + * resctrl_arch_reset_rmid_all() - Reset any private state associated with
> > + * 				   all the rmids.
> 
> It could be more explicit:
> "Reset all private state associated with all rmids and eventids."

Sure.

> 
> > + * @r:		The domain's resource.
> > + * @d:		The rmid's domain.
> 
> This copy&paste needs some changes to match this new utility.
> How about:
> @r: The resctrl resource.
> @d: The domain for which all architectural counter state will be cleared.

Sure. 
> 
> I think it can be improved more but the above could be a start (please do not
> copy verbatim but ensure style is correct.)
> 
> Keep in mind that this utility does not clear the non-architectural counter state.
> This does not apply to AMD since that state is used by the software controller,
> but it needs to be kept in mind if another usage for this utility arises.

Sure. Will add this text in the description.

> 
> > + *
> > + * This can be called from any CPU.
> > + */
> > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct
> > +rdt_domain *d);
> > +
> >  extern unsigned int resctrl_rmid_realloc_threshold;  extern unsigned
> > int resctrl_rmid_realloc_limit;
> >
> >
> >
> 
> The above hunk fails the "no spaces before tabs" checkpatch check.

Sure. Will check.
Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-15 18:24   ` Reinette Chatre
  2022-12-19 19:28     ` Moger, Babu
@ 2022-12-19 19:50     ` Moger, Babu
  2022-12-20 17:32       ` Reinette Chatre
  1 sibling, 1 reply; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 19:50 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:25 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > The current event configuration for mbm_total_bytes can be changed by
> > the user by writing to the file
> > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> 
> Please drop "current" from above
> 
> >
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> 
> please drop "will"
> 
> >
> > Following are the types of events supported:
> >
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > For example:
> > To change the mbm_total_bytes to count only reads on domain 0, the
> > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> > Run the command.
> > 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > To change the mbm_total_bytes to count all the slow memory reads on
> > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> > Run the command.
> > 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/monitor.c  |   13 +++
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  127
> ++++++++++++++++++++++++++++++++
> >  include/linux/resctrl.h                |   10 +++
> >  3 files changed, 149 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> > b/arch/x86/kernel/cpu/resctrl/monitor.c
> > index 7c8a3a745041..b265856835de 100644
> > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r,
> struct rdt_domain *d,
> >  		memset(am, 0, sizeof(*am));
> >  }
> >
> > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct
> > +rdt_domain *d) {
> > +	struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
> > +
> > +	if (is_mbm_total_enabled())
> > +		memset(hw_dom->arch_mbm_total, 0,
> > +		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
> > +
> > +	if (is_mbm_local_enabled())
> > +		memset(hw_dom->arch_mbm_local, 0,
> > +		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); }
> > +
> 
> We learned a lot more about this area after Peter's discovery:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern
> el.org%2Flkml%2F20221207112924.3602960-1-
> peternewman%40google.com%2F&amp;data=05%7C01%7Cbabu.moger%40am
> d.com%7Cc9ca23a7b3c643d2385508dadec9f83d%7C3dd8961fe4884e608e11a8
> 2d994e183d%7C0%7C0%7C638067256273943013%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%
> 3D%7C3000%7C%7C%7C&amp;sdata=Pmvi2L4L727GqALGaeEO0MbpcnuRcqKc
> opNNXuu%2BbFM%3D&amp;reserved=0
> 
> Since this is a new generic function it should be clear in which scenarios it is
> valid.
> Could you please add a function comment that warns future developers about
> consequences if a new usage is considered? Something like:
> 
> /*
>  * Assumes that hardware counters are also reset and thus that there is no
> need
>  * to record initial non-zero counts.
>  */
> 
> >  static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int
> > width)  {
> >  	u64 shift = 64 - width, chunks;
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 580f3cce19e2..8a22a652a6e8 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct
> kernfs_open_file *of,
> >  	return 0;
> >  }
> >
> > +static void mon_event_config_write(void *info) {
> > +	struct mon_config_info *mon_info = info;
> > +	u32 index;
> > +
> 
> index does not need to be u32 ... mon_event_config_index_get() returns
> "unsigned int"
> and wrmsr expects "unsigned int", it can also just be "unsigned int".
> 
> 
> > +	index = mon_event_config_index_get(mon_info->evtid);
> > +	if (index == INVALID_CONFIG_INDEX) {
> > +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> > +		return;
> > +	}
> > +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> }
> > +
> > +static int mbm_config_write_domain(struct rdt_resource *r,
> > +				   struct rdt_domain *d, u32 evtid, u32 val) {
> > +	struct mon_config_info mon_info = {0};
> > +	int ret = 0;
> > +
> > +	/* mon_config cannot be more than the supported set of events */
> > +	if (val > MAX_EVT_CONFIG_BITS) {
> > +		rdt_last_cmd_puts("Invalid event configuration\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Read the current config value first. If both are the same then
> > +	 * no need to write it again.
> > +	 */
> > +	mon_info.evtid = evtid;
> > +	mondata_config_read(d, &mon_info);
> > +	if (mon_info.mon_config == val)
> > +		goto out;
> > +
> > +	mon_info.mon_config = val;
> > +
> > +	/*
> > +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> > +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> > +	 * are scoped at the domain level. Writing any of these MSRs
> > +	 * on one CPU is supposed to be observed by all CPUs in the
> > +	 * domain. However, the hardware team recommends to update
> > +	 * these MSRs on all the CPUs in the domain.
> > +	 */
> > +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> &mon_info,
> > +1);

Forgot about this.  This snippet is going to change. I have tested and works fine.
How about this?

        /*
         * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the
         * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
         * are scoped at the domain level. Writing any of these MSRs
         * on one CPU is supposed to be observed by all CPUs in the domain.
         */
        smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1);


/*
> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> +	 * are scoped at the domain level. Writing any of these MSRs
> +	 * on one CPU is supposed to be observed by all CPUs in the
> +	 * domain. However, the hardware team recommends to update
> +	 * these MSRs on all the CPUs in the domain.
> +	 */
> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);


> > +
> > +	/*
> > +	 * When an Event Configuration is changed, the bandwidth counters
> > +	 * for all RMIDs and Events will be cleared by the hardware. The
> > +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> > +	 * every RMID on the next read to any event for every RMID.
> > +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> > +	 * cleared while it is tracked by the hardware. Clear the
> > +	 * mbm_local and mbm_total counts for all the RMIDs.
> > +	 */
> > +	resctrl_arch_reset_rmid_all(r, d);
> 
> If I understand correctly the expectation is that when user space read counters
> (via mon_data files) right after the configuration was changed then this read
> will return "Unavailable" and then the next read will return data.
> 
> If this is the case then I think a snippet about this user experience would be
> helpful to add to the documentation.
> 
> Have you considered doing a preemptive read on the RMIDs that are in use to
> avoid users encountering "Unavailable"? I assume doing so on a busy system
> could potentially involve hundreds of register reads/writes.
> 
> > +
> > +out:
> > +	return ret;
> > +}
> > +
> > +static int mon_config_write(struct rdt_resource *r, char *tok, u32
> > +evtid) {
> > +	char *dom_str = NULL, *id_str;
> > +	unsigned long dom_id, val;
> > +	struct rdt_domain *d;
> > +	int ret = 0;
> > +
> > +next:
> > +	if (!tok || tok[0] == '\0')
> > +		return 0;
> > +
> > +	/* Start processing the strings for each domain */
> > +	dom_str = strim(strsep(&tok, ";"));
> > +	id_str = strsep(&dom_str, "=");
> > +
> > +	if (!id_str || kstrtoul(id_str, 10, &dom_id)) {
> > +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> > +		rdt_last_cmd_puts("Non-numeric event configuration
> value\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	list_for_each_entry(d, &r->domains, list) {
> > +		if (d->id == dom_id) {
> > +			ret = mbm_config_write_domain(r, d, evtid, val);
> > +			if (ret)
> > +				return -EINVAL;
> > +			goto next;
> > +		}
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> > +
> > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> > +					    char *buf, size_t nbytes,
> > +					    loff_t off)
> > +{
> > +	struct rdt_resource *r = of->kn->parent->priv;
> > +	int ret;
> > +
> > +	/* Valid input requires a trailing newline */
> > +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> > +		return -EINVAL;
> > +
> > +	cpus_read_lock();
> > +	mutex_lock(&rdtgroup_mutex);
> > +
> > +	rdt_last_cmd_clear();
> > +
> > +	buf[nbytes - 1] = '\0';
> > +
> > +	ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> > +
> > +	mutex_unlock(&rdtgroup_mutex);
> > +	cpus_read_unlock();
> > +
> > +	return ret ?: nbytes;
> > +}
> > +
> >  /* rdtgroup information files for one cache resource. */  static
> > struct rftype res_common_files[] = {
> >  	{
> > @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = {
> >  	},
> >  	{
> >  		.name		= "mbm_total_bytes_config",
> > -		.mode		= 0444,
> > +		.mode		= 0644,
> >  		.kf_ops		= &rdtgroup_kf_single_ops,
> >  		.seq_show	= mbm_total_bytes_config_show,
> > +		.write		= mbm_total_bytes_config_write,
> >  	},
> >  	{
> >  		.name		= "mbm_local_bytes_config",
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index
> > 0cee154abc9f..e4dc65892446 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource
> > *r, struct rdt_domain *d,  void resctrl_arch_reset_rmid(struct rdt_resource
> *r, struct rdt_domain *d,
> >  			     u32 rmid, enum resctrl_event_id eventid);
> >
> > +/**
> > + * resctrl_arch_reset_rmid_all() - Reset any private state associated with
> > + * 				   all the rmids.
> 
> It could be more explicit:
> "Reset all private state associated with all rmids and eventids."
> 
> > + * @r:		The domain's resource.
> > + * @d:		The rmid's domain.
> 
> This copy&paste needs some changes to match this new utility.
> How about:
> @r: The resctrl resource.
> @d: The domain for which all architectural counter state will be cleared.
> 
> I think it can be improved more but the above could be a start (please do not
> copy verbatim but ensure style is correct.)
> 
> Keep in mind that this utility does not clear the non-architectural counter state.
> This does not apply to AMD since that state is used by the software controller,
> but it needs to be kept in mind if another usage for this utility arises.
> 
> > + *
> > + * This can be called from any CPU.
> > + */
> > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct
> > +rdt_domain *d);
> > +
> >  extern unsigned int resctrl_rmid_realloc_threshold;  extern unsigned
> > int resctrl_rmid_realloc_limit;
> >
> >
> >
> 
> The above hunk fails the "no spaces before tabs" checkpatch check.
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config
  2022-12-15 18:25   ` Reinette Chatre
@ 2022-12-19 19:51     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 19:51 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:26 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write
> mbm_local_bytes_config
> 
> Hi Babu,
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > The current event configuration for mbm_local_bytes can be changed by
> > the user by writing to the configuration file
> > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config.
> >
> 
> Same comments about subject line and usage of "current".

Sure,
> 
> > The event configuration settings are domain specific and will affect
> > all the CPUs in the domain.
> >
> > Following are the types of events supported:
> > ====
> ===========================================================
> > Bits   Description
> > ====
> ===========================================================
> > 6      Dirty Victims from the QOS domain to all types of memory
> > 5      Reads to slow memory in the non-local NUMA domain
> > 4      Reads to slow memory in the local NUMA domain
> > 3      Non-temporal writes to non-local NUMA domain
> > 2      Non-temporal writes to local NUMA domain
> > 1      Reads to memory in the non-local NUMA domain
> > 0      Reads to memory in the local NUMA domain
> > ====
> ===========================================================
> >
> > For example:
> > To change the mbm_local_bytes_config to count all the non-temporal
> > writes on domain 0, the bits 2 and 3 needs to be set which is 1100b (in hex
> 0xc).
> > Run the command.
> >     $echo  0=0xc > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> >
> > To change the mbm_local_bytes to count only reads to local NUMA domain
> > 1, the bit 0 needs to be set which 1b (in hex 0x1). Run the command.
> >     $echo  1=0x1 > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29
> ++++++++++++++++++++++++++++-
> >  1 file changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 8a22a652a6e8..6897c480ae55 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -1641,6 +1641,32 @@ static ssize_t
> mbm_total_bytes_config_write(struct kernfs_open_file *of,
> >  	return ret ?: nbytes;
> >  }
> >
> > +static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
> > +					    char *buf, size_t nbytes,
> > +					    loff_t off)
> > +{
> > +	struct rdt_resource *r = of->kn->parent->priv;
> > +	int ret;
> > +
> > +	/* Valid input requires a trailing newline */
> > +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
> > +		return -EINVAL;
> > +
> > +	cpus_read_lock();
> > +	mutex_lock(&rdtgroup_mutex);
> > +
> > +	rdt_last_cmd_clear();
> > +
> > +	buf[nbytes - 1] = '\0';
> > +
> > +	ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
> > +
> > +	mutex_unlock(&rdtgroup_mutex);
> > +	cpus_read_unlock();
> > +
> > +	return ret ?: nbytes;
> > +}
> > +
> >  /* rdtgroup information files for one cache resource. */  static
> > struct rftype res_common_files[] = {
> >  	{
> > @@ -1748,9 +1774,10 @@ static struct rftype res_common_files[] = {
> >  	},
> >  	{
> >  		.name		= "mbm_local_bytes_config",
> > -		.mode		= 0444,
> > +		.mode		= 0644,
> >  		.kf_ops		= &rdtgroup_kf_single_ops,
> >  		.seq_show	= mbm_local_bytes_config_show,
> > +		.write		= mbm_local_bytes_config_write,
> >  	},
> >  	{
> >  		.name		= "cpus",
> >
> >
> 
> With the subject and changelog comments addressed:
> 
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask()
  2022-12-15 18:26   ` Reinette Chatre
@ 2022-12-19 19:59     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 19:59 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:27 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many()
> with on_each_cpu_mask()
> 
> Hi Babu,
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > The call on_each_cpu_mask() runs the function on each CPU specified by
> > cpumask, which may include the local processor. So, replace the call
> > smp_call_function_many() with on_each_cpu_mask() to simplify the code.
> 
> Please move the solution to a new paragraph and drop the "So,". The two
> instances of "the call" can be dropped also.

Sure.
> 
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> 
> Could you please move this patch to the beginning of this series?
> Fixes and cleanups should go before new features.

Sure.

> 
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   29 ++++++++---------------------
> >  1 file changed, 8 insertions(+), 21 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > index 6897c480ae55..68e14831a638 100644
> > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > @@ -325,12 +325,7 @@ static void update_cpu_closid_rmid(void *info)
> > static void  update_closid_rmid(const struct cpumask *cpu_mask, struct
> > rdtgroup *r)  {
> > -	int cpu = get_cpu();
> > -
> > -	if (cpumask_test_cpu(cpu, cpu_mask))
> > -		update_cpu_closid_rmid(r);
> > -	smp_call_function_many(cpu_mask, update_cpu_closid_rmid, r, 1);
> > -	put_cpu();
> > +	on_each_cpu_mask(cpu_mask, update_cpu_closid_rmid, r, 1);
> >  }
> >
> >  static int cpus_mon_write(struct rdtgroup *rdtgrp, cpumask_var_t
> > newmask, @@ -2135,13 +2130,9 @@ static int set_cache_qos_cfg(int level,
> bool enable)
> >  			/* Pick one CPU from each domain instance to update
> MSR */
> >  			cpumask_set_cpu(cpumask_any(&d->cpu_mask),
> cpu_mask);
> >  	}
> > -	cpu = get_cpu();
> > -	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
> > -	if (cpumask_test_cpu(cpu, cpu_mask))
> > -		update(&enable);
> > -	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
> > -	smp_call_function_many(cpu_mask, update, &enable, 1);
> > -	put_cpu();
> > +
> > +	/* Update QOS_CFG MSR on all the CPUs in cpu_mask */
> > +	on_each_cpu_mask(cpu_mask, update, &enable, 1);
> >
> >  	free_cpumask_var(cpu_mask);
> >
> > @@ -2618,7 +2609,7 @@ static int reset_all_ctrls(struct rdt_resource *r)
> >  	struct msr_param msr_param;
> >  	cpumask_var_t cpu_mask;
> >  	struct rdt_domain *d;
> > -	int i, cpu;
> > +	int i;
> >
> >  	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
> >  		return -ENOMEM;
> > @@ -2639,13 +2630,9 @@ static int reset_all_ctrls(struct rdt_resource *r)
> >  		for (i = 0; i < hw_res->num_closid; i++)
> >  			hw_dom->ctrl_val[i] = r->default_ctrl;
> >  	}
> > -	cpu = get_cpu();
> > -	/* Update CBM on this cpu if it's in cpu_mask. */
> > -	if (cpumask_test_cpu(cpu, cpu_mask))
> > -		rdt_ctrl_update(&msr_param);
> > -	/* Update CBM on all other cpus in cpu_mask. */
> > -	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
> > -	put_cpu();
> > +
> > +	/* Update CBM on all the CPUs in cpu_mask */
> > +	on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1);
> >
> >  	free_cpumask_var(cpu_mask);
> >
> >
> >
> 
> Should the snippet in resctrl_arch_update_domains() also be updated?

Yes. It should be changed also.  Will fix it.
Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features
  2022-12-15 18:30   ` Reinette Chatre
@ 2022-12-19 20:05     ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 20:05 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]

Hi Reinette,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:31 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new
> features
> 
> Hi Babu,
> 
> On 12/1/2022 7:37 AM, Babu Moger wrote:
> > Update the documentation for the new features:
> > 1. Slow Memory Bandwidth allocation (SMBA).
> >    With this feature, the QOS  enforcement policies can be applied
> >    to the external slow memory connected to the host. QOS enforcement
> >    is accomplished by assigning a Class Of Service (COS) to a processor
> >    and specifying allocations or limits for that COS for each resource
> >    to be allocated.
> >
> > 2. Bandwidth Monitoring Event Configuration (BMEC).
> >    The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
> >    are set to count all the total and local reads/writes respectively.
> >    With the introduction of slow memory, the two counters are not
> >    enough to count all the different types of memory events. With the
> >    feature BMEC, the users have the option to configure mbm_total_bytes
> >    and mbm_local_bytes to count the specific type of events.
> >
> > Also add configuration instructions with examples.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> > ---
> >  Documentation/x86/resctrl.rst |  138
> > ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 136 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/x86/resctrl.rst
> > b/Documentation/x86/resctrl.rst index 71a531061e4e..60761a6f9087
> > 100644
> > --- a/Documentation/x86/resctrl.rst
> > +++ b/Documentation/x86/resctrl.rst
> > @@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality
> of Service(AMD QoS).
> >  This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86
> > /proc/cpuinfo  flag bits:
> >
> > -=============================================
> 	================================
> > +===============================================
> 	================================
> >  RDT (Resource Director Technology) Allocation	"rdt_a"
> >  CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
> >  CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
> >  CQM (Cache QoS Monitoring)			"cqm_llc",
> "cqm_occup_llc"
> >  MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total",
> "cqm_mbm_local"
> >  MBA (Memory Bandwidth Allocation)		"mba"
> > -=============================================
> 	================================
> > +SMBA (Slow Memory Bandwidth Allocation)         "smba"
> > +BMEC (Bandwidth Monitoring Event Configuration) "bmec"
> > +===============================================
> 	================================
> >
> >  To use the feature mount the file system::
> >
> > @@ -161,6 +163,79 @@ with the following files:
> >  "mon_features":
> >  		Lists the monitoring events if
> >  		monitoring is enabled for the resource.
> > +                Example::
> > +
> > +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> > +                   llc_occupancy
> > +                   mbm_total_bytes
> > +                   mbm_local_bytes
> > +
> > +                If the system supports Bandwidth Monitoring Event
> > +                Configuration (BMEC), then the bandwidth events will
> > +                be configurable. The output will be::
> > +
> > +                   # cat /sys/fs/resctrl/info/L3_MON/mon_features
> > +                   llc_occupancy
> > +                   mbm_total_bytes
> > +                   mbm_total_bytes_config
> > +                   mbm_local_bytes
> > +                   mbm_local_bytes_config
> > +
> > +"mbm_total_bytes_config", "mbm_local_bytes_config":
> > +        These files contain the current event configuration for the
> > +events
> 
> "These files" is redundant. Note that this is already introduced with "the
> following files:".
> To match similar files it could read:
> "Read/write files containing the configuration for the mbm_total_bytes and
> mbm_local_bytes events, respectively, ..."

Sure.
> 
> > +        mbm_total_bytes and mbm_local_bytes, respectively, when the
> > +        Bandwidth Monitoring Event Configuration (BMEC) feature is
> supported.
> > +        The event configuration settings are domain specific and will
> > + affect
> 
> "will" can be dropped?

Sure.
> 
> > +        all the CPUs in the domain.
> > +
> > +        Following are the types of events supported:
> > +
> > +        ====
> ========================================================
> > +        Bits    Description
> > +        ====
> ========================================================
> > +        6       Dirty Victims from the QOS domain to all types of memory
> > +        5       Reads to slow memory in the non-local NUMA domain
> > +        4       Reads to slow memory in the local NUMA domain
> > +        3       Non-temporal writes to non-local NUMA domain
> > +        2       Non-temporal writes to local NUMA domain
> > +        1       Reads to memory in the non-local NUMA domain
> > +        0       Reads to memory in the local NUMA domain
> > +        ====
> ========================================================
> > +
> > +        By default, the mbm_total_bytes configuration is set to 0x7f to count
> > +        all the event types and the mbm_local_bytes configuration is set to
> > +        0x15 to count all the local memory events.
> > +
> > +        Examples:
> > +
> > +        * To view the current configuration::
> > +          ::
> > +
> > +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> > +            0=0x7f;1=0x7f;2=0x7f;3=0x7f
> > +
> > +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> > +            0=0x15;1=0x15;3=0x15;4=0x15
> > +
> > +        * To change the mbm_total_bytes to count only reads on domain 0,
> > +          the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
> > +          (in hexadecimal 0x33):
> > +          ::
> > +
> > +            # echo  "0=0x33" >
> > + /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> > +
> > +            # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> > +            0=0x33;1=0x7f;2=0x7f;3=0x7f
> > +
> > +        * To change the mbm_local_bytes to count all the slow memory reads
> on
> > +          domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
> > +          in binary (in hexadecimal 0x30):
> > +          ::
> > +
> > +            # echo  "0=0x30;1=0x30" >
> > + /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> > +
> > +            # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> > +            0=0x30;1=0x30;3=0x15;4=0x15

Planning to add the following text here.

"When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain.
The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value."

> >
> >  "max_threshold_occupancy":
> >  		Read/write file provides the largest value (in @@ -464,6
> +539,25 @@
> > Memory bandwidth domain is L3 cache.
> >
> >  	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
> >
> > +Slow Memory Bandwidth Allocation (SMBA)
> > +---------------------------------------
> > +AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
> > +CXL.memory is the only supported "slow" memory device. With the
> > +support of SMBA, the hardware enables bandwidth allocation on the
> > +slow memory devices. If there are multiple such devices in the
> > +system, the throttling logic groups all the slow sources together and
> > +applies the limit on them as a whole.
> > +
> > +The presence of SMBA (with CXL.memory) is independent of slow memory
> > +devices presence. If there are no such devices on the system, then
> > +configuring SMBA will have no impact on the performance of the system.
> > +
> > +The bandwidth domain for slow memory is L3 cache. Its schemata file
> > +is formatted as:
> > +::
> > +
> > +	SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
> > +
> >  Reading/writing the schemata file
> >  ---------------------------------
> >  Reading the schemata file will show the state of all resources @@
> > -479,6 +573,46 @@ which you wish to change.  E.g.
> >    L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
> >    L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
> >
> > +Reading/writing the schemata file (on AMD systems)
> > +--------------------------------------------------
> > +Reading the schemata file will show the current bandwidth limit on
> > +all domains. The allocated resources are in multiples of one eighth GB/s.
> > +When writing to the file, you need to specify what cache id you wish
> > +to configure the bandwidth limit.
> > +
> > +For example, to allocate 2GB/s limit on the first cache id:
> > +
> > +::
> > +
> > +  # cat schemata
> > +    MB:0=2048;1=2048;2=2048;3=2048
> > +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> > +
> > +  # echo "MB:1=16" > schemata
> > +  # cat schemata
> > +    MB:0=2048;1=  16;2=2048;3=2048
> > +    L3:0=ffff;1=ffff;2=ffff;3=ffff
> > +
> > +Reading/writing the schemata file (on AMD systems) with SMBA feature
> > +--------------------------------------------------------------------
> > +Reading and writing the schemata file is the same as without SMBA in
> > +above section.
> > +
> > +For example, to allocate 8GB/s limit on the first cache id:
> > +
> > +::
> > +
> > +  # cat schemata
> > +    SMBA:0=2048;1=2048;2=2048;3=2048
> > +      MB:0=2048;1=2048;2=2048;3=2048
> > +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> > +
> > +  # echo "SMBA:1=64" > schemata
> > +  # cat schemata
> > +    SMBA:0=2048;1=  64;2=2048;3=2048
> > +      MB:0=2048;1=2048;2=2048;3=2048
> > +      L3:0=ffff;1=ffff;2=ffff;3=ffff
> > +
> >  Cache Pseudo-Locking
> >  ====================
> >  CAT enables a user to specify the amount of cache space that an
> >
> >
> 
> Based on earlier comments I am awaiting information to understand if some
> more detail/example is needed to describe to the user what can be expected
> after a counter configuration is made.

Proposed the new text above. Please check.
Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* RE: [PATCH v9 00/13] Support for AMD QoS new features
  2022-12-15 18:38 ` Reinette Chatre
@ 2022-12-19 20:57   ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-19 20:57 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

[AMD Official Use Only - General]



> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 15, 2022 12:39 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
> quic_neeraju@quicinc.com; rdunlap@infradead.org;
> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
> peternewman@google.com
> Subject: Re: [PATCH v9 00/13] Support for AMD QoS new features
> 
> Hi Babu,
> 
> Please also use the x86/resctrl prefix in the cover letter's subject.

Sure.
Thanks
Babu
> 
> Reinette

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-19 19:28     ` Moger, Babu
@ 2022-12-20 17:32       ` Reinette Chatre
  2022-12-20 18:58         ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-20 17:32 UTC (permalink / raw)
  To: Moger, Babu, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/19/2022 11:28 AM, Moger, Babu wrote:
> [AMD Official Use Only - General]
> 
> Hi Reinette,
> 
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@intel.com>
>> Sent: Thursday, December 15, 2022 12:25 PM
>> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
>> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
>> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
>> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
>> quic_neeraju@quicinc.com; rdunlap@infradead.org;
>> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
>> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
>> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
>> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
>> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
>> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
>> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
>> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
>> peternewman@google.com
>> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write
>> mbm_total_bytes_config
>>
>> Hi Babu,
>>
>> On 12/1/2022 7:37 AM, Babu Moger wrote:

...

>>> +	/*
>>> +	 * When an Event Configuration is changed, the bandwidth counters
>>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>> +	 * every RMID on the next read to any event for every RMID.
>>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>> +	 * cleared while it is tracked by the hardware. Clear the
>>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>>> +	 */
>>> +	resctrl_arch_reset_rmid_all(r, d);
>>
>> If I understand correctly the expectation is that when user space read counters
>> (via mon_data files) right after the configuration was changed then this read
>> will return "Unavailable" and then the next read will return data.
>>
>> If this is the case then I think a snippet about this user experience would be
>> helpful to add to the documentation.
> 
> Ok.  How about this in the documentation?
> 
> "When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain.
> The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value."
> 
> 

Thinking about this more ... why are the counters for all eventids cleared
when only one eventid's configuration is changed?

Reinette



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-19 19:50     ` Moger, Babu
@ 2022-12-20 17:32       ` Reinette Chatre
  2022-12-20 18:00         ` Moger, Babu
  0 siblings, 1 reply; 50+ messages in thread
From: Reinette Chatre @ 2022-12-20 17:32 UTC (permalink / raw)
  To: Moger, Babu, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Babu,

On 12/19/2022 11:50 AM, Moger, Babu wrote:
> 
> Forgot about this.  This snippet is going to change. I have tested and works fine.
> How about this?
> 
>         /*
>          * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the
>          * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
>          * are scoped at the domain level. Writing any of these MSRs
>          * on one CPU is supposed to be observed by all CPUs in the domain.
>          */
>         smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
> 
> 

It looks good but please drop the "supposed to be". If there is any uncertainty
then the data should be written to all CPUs, if not then that uncertain text should
be dropped.

Reinette



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-20 17:32       ` Reinette Chatre
@ 2022-12-20 18:00         ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-20 18:00 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Reinette,

On 12/20/22 11:32, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/19/2022 11:50 AM, Moger, Babu wrote:
>> Forgot about this.  This snippet is going to change. I have tested and works fine.
>> How about this?
>>
>>         /*
>>          * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the
>>          * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
>>          * are scoped at the domain level. Writing any of these MSRs
>>          * on one CPU is supposed to be observed by all CPUs in the domain.
>>          */
>>         smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
>>
>>
> It looks good but please drop the "supposed to be". If there is any uncertainty
> then the data should be written to all CPUs, if not then that uncertain text should
> be dropped.

There is no uncertainty. I will drop "supposed to be".

Thanks

Babu


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
  2022-12-20 17:32       ` Reinette Chatre
@ 2022-12-20 18:58         ` Moger, Babu
  0 siblings, 0 replies; 50+ messages in thread
From: Moger, Babu @ 2022-12-20 18:58 UTC (permalink / raw)
  To: Reinette Chatre, corbet, tglx, mingo, bp
  Cc: fenghua.yu, dave.hansen, x86, hpa, paulmck, akpm, quic_neeraju,
	rdunlap, damien.lemoal, songmuchun, peterz, jpoimboe, pbonzini,
	chang.seok.bae, pawan.kumar.gupta, jmattson, daniel.sneddon,
	Das1, Sandipan, tony.luck, james.morse, linux-doc, linux-kernel,
	bagasdotme, eranian, christophe.leroy, jarkko, adrian.hunter,
	quic_jiles, peternewman

Hi Reinette,

On 12/20/22 11:32, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/19/2022 11:28 AM, Moger, Babu wrote:
>> [AMD Official Use Only - General]
>>
>> Hi Reinette,
>>
>>> -----Original Message-----
>>> From: Reinette Chatre <reinette.chatre@intel.com>
>>> Sent: Thursday, December 15, 2022 12:25 PM
>>> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
>>> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de
>>> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org;
>>> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org;
>>> quic_neeraju@quicinc.com; rdunlap@infradead.org;
>>> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com;
>>> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com;
>>> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com;
>>> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan
>>> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com;
>>> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
>>> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu;
>>> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com;
>>> peternewman@google.com
>>> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write
>>> mbm_total_bytes_config
>>>
>>> Hi Babu,
>>>
>>> On 12/1/2022 7:37 AM, Babu Moger wrote:
> ...
>
>>>> +	/*
>>>> +	 * When an Event Configuration is changed, the bandwidth counters
>>>> +	 * for all RMIDs and Events will be cleared by the hardware. The
>>>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>>> +	 * every RMID on the next read to any event for every RMID.
>>>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>>> +	 * cleared while it is tracked by the hardware. Clear the
>>>> +	 * mbm_local and mbm_total counts for all the RMIDs.
>>>> +	 */
>>>> +	resctrl_arch_reset_rmid_all(r, d);
>>> If I understand correctly the expectation is that when user space read counters
>>> (via mon_data files) right after the configuration was changed then this read
>>> will return "Unavailable" and then the next read will return data.
>>>
>>> If this is the case then I think a snippet about this user experience would be
>>> helpful to add to the documentation.
>> Ok.  How about this in the documentation?
>>
>> "When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain.
>> The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value."
>>
>>
> Thinking about this more ... why are the counters for all eventids cleared
> when only one eventid's configuration is changed?

Its because of the way U-Bit tracking is implemented. The U-bit is tracked
per RMID basis, not on a Per-Event + Per RMID basis. Therefore, resetting
the U-Bit for 1 will reset it for both.

This is what I got from h/w team.

Thanks

Babu

>
> Reinette
>
>
-- 
Thanks
Babu Moger


^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-12-20 18:58 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-01 15:35 [PATCH v9 00/13] Support for AMD QoS new features Babu Moger
2022-12-01 15:36 ` [PATCH v9 01/13] x86/cpufeatures: Add Slow Memory Bandwidth Allocation feature flag Babu Moger
2022-12-15 17:08   ` Reinette Chatre
2022-12-15 21:10     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 02/13] x86/resctrl: Add a new resource type RDT_RESOURCE_SMBA Babu Moger
2022-12-15 17:10   ` Reinette Chatre
2022-12-15 21:30     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 03/13] x86/cpufeatures: Add Bandwidth Monitoring Event Configuration feature flag Babu Moger
2022-12-15 17:11   ` Reinette Chatre
2022-12-19 15:31     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 04/13] x86/resctrl: Include new features in command line options Babu Moger
2022-12-15 17:12   ` Reinette Chatre
2022-12-19 15:33     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 05/13] x86/resctrl: Detect and configure Slow Memory Bandwidth Allocation Babu Moger
2022-12-15 17:13   ` Reinette Chatre
2022-12-19 15:34     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 06/13] x86/resctrl: Add __init attribute to rdt_get_mon_l3_config() Babu Moger
2022-12-15 17:17   ` Reinette Chatre
2022-12-19 15:51     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 07/13] x86/resctrl: Introduce data structure to support monitor configuration Babu Moger
2022-12-15 17:19   ` Reinette Chatre
2022-12-19 17:56     ` Moger, Babu
2022-12-01 15:36 ` [PATCH v9 08/13] x86/resctrl: Add sysfs interface to read mbm_total_bytes_config Babu Moger
2022-12-15 17:40   ` Reinette Chatre
2022-12-19 18:21     ` Moger, Babu
2022-12-01 15:37 ` [PATCH v9 09/13] x86/resctrl: Add sysfs interface to read mbm_local_bytes_config Babu Moger
2022-12-15 17:43   ` Reinette Chatre
2022-12-19 18:27     ` Moger, Babu
2022-12-01 15:37 ` [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config Babu Moger
2022-12-15 18:24   ` Reinette Chatre
2022-12-19 19:28     ` Moger, Babu
2022-12-20 17:32       ` Reinette Chatre
2022-12-20 18:58         ` Moger, Babu
2022-12-19 19:50     ` Moger, Babu
2022-12-20 17:32       ` Reinette Chatre
2022-12-20 18:00         ` Moger, Babu
2022-12-01 15:37 ` [PATCH v9 11/13] x86/resctrl: Add sysfs interface to write mbm_local_bytes_config Babu Moger
2022-12-15 18:25   ` Reinette Chatre
2022-12-19 19:51     ` Moger, Babu
2022-12-01 15:37 ` [PATCH v9 12/13] x86/resctrl: Replace smp_call_function_many() with on_each_cpu_mask() Babu Moger
2022-12-15 18:26   ` Reinette Chatre
2022-12-19 19:59     ` Moger, Babu
2022-12-01 15:37 ` [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features Babu Moger
2022-12-15 18:30   ` Reinette Chatre
2022-12-19 20:05     ` Moger, Babu
2022-12-15 15:08 ` [PATCH v9 00/13] Support for AMD QoS " Moger, Babu
2022-12-15 15:35   ` Reinette Chatre
2022-12-15 16:12     ` Moger, Babu
2022-12-15 18:38 ` Reinette Chatre
2022-12-19 20:57   ` Moger, Babu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).