[PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation
@ 2017-02-17 19:58 Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation Vikas Shivappa
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Memory b/w allocation(MBA) is part of the Intel Resource Director
Technology (RDT). RDT helps monitor and share processor shared
resources. MBA helps enforce a limit on the memory bandwidth(b/w)
threads can use when they are scheduled. OS does the enforcement using
MSR(model specific register) interface and mapping the threads to
architecture specific CLOSids(class of service IDs).
The user interface is via the common resctrl fs for RDT features.

This can be used along with MBM (memory b/w monitoring) and cache
allocation to control and monitor the applications cache and memory
resources as per the QoS or other performance requirements.  Use cases
could be large serverclusters, VM, clould and container based services
where the admin or orchestration tools can use this framework to
manage/allocate these processor shared resources like other memory/cpu
resources to provide QoS guarentees. Detailed use case examples are
documented in the 01/08 Documentation patch.

Patches are dependent on :
https://lkml.org/lkml/2017/2/17/595

V2 changes
----------
Thanks to Thomas for all the feed back and below are the changes as per
the feedback:
- Sent the RDT generic improvements patch series seperately
- Split the info file/validate apis as prep patches to MBA
  changes
- Fixed a lot of coding convention issues, variable initializations,
  and removing unnecessary allocations etc.
- Changed the hotcpu handling for CLOS/RMID to zero the CLOSid/RMID when
  cpu is offlined.
- Changed some naming conventions.
- Changed the interface to show the bandwidth percentage that can be
  requested rather than showing h/w specific delay values which do not map
  to a particular b/w percentage in non-linear scale

[PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w
[PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for
[PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
[PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
[PATCH 5/8] x86/intel_rdt: info file support for MBA prepare
[PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
[PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare
[PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Update the intel_rdt_ui documentation to have Memory bandwidth(b/w)
allocation interface usage.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 74 ++++++++++++++++++++++++++++++++++----
 1 file changed, 67 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index d918d26..2f679e2 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -4,6 +4,7 @@ Copyright (C) 2016 Intel Corporation
 
 Fenghua Yu <fenghua.yu@intel.com>
 Tony Luck <tony.luck@intel.com>
+Vikas Shivappa <vikas.shivappa@intel.com>
 
 This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
 X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
@@ -22,8 +23,8 @@ Info directory
 
 The 'info' directory contains information about the enabled
 resources. Each resource has its own subdirectory. The subdirectory
-names reflect the resource names. Each subdirectory contains the
-following files:
+names reflect the resource names.
+Cache resource(L3/L2)  subdirectory contains the following files:
 
 "num_closids":  The number of CLOSIDs which are valid for this
 	        resource. The kernel uses the smallest number of
@@ -35,6 +36,16 @@ following files:
 "min_cbm_bits": The minimum number of consecutive bits which must be
 		set when writing a mask.
 
+Memory bandwitdh(MB) subdirectory contains the following files:
+
+"min_bw":	The minimum memory bandwidth percentage which user can
+		request.
+
+"bw_gran":	The granularity in which the user can request the memory
+		bandwidth percentage.
+
+"scale_linear":Indicates if the bandwidth scale is linear or
+		non-linear.
 
 Resource groups
 ---------------
@@ -107,6 +118,28 @@ and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
 of the capacity of the cache. You could partition the cache into four
 equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
 
+Memory bandwidth(b/w) throttle
+------------------------------
+For Memory b/w resource, user controls the resource by indicating the
+percentage of total memory b/w.
+
+The minimum bandwidth percentage value for each cpu model is predefined
+and can be looked up through "info/MB/min_bw". The bandwidth granularity
+that can be requested is also dependent on the cpu model and can be
+looked up at "info/MB/bw_gran".
+
+The bandwidth percentage values are mapped to hardware delay values and
+programmed in the QOS_MSRs. The delay values may be in linear scale and
+non-linear scale. In a linear scale the programmed values directly
+correspond to a delay value(b/w percentage = 100 - delay). However in a
+non-linear scale, the percentage values correspond to a pre-caliberated
+delay values. The delay values in non-linear scale have the granularity
+of power of 2.
+
+The bandwidth throttling is a a core specific mechanism on some of Intel
+SKUs. Using a high bandwidth and a low bandwidth setting on two threads
+sharing a core will result in both threads being throttled to use the
+low bandwidth.
 
 L3 details (code and data prioritization disabled)
 --------------------------------------------------
@@ -129,16 +162,24 @@ schemata format is always:
 
 	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
 
+Memory b/w Allocation details
+-----------------------------
+
+Memory b/w domain is L3 cache.
+
+	MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
+
 Example 1
 ---------
 On a two socket machine (one L3 cache per socket) with just four bits
-for cache bit masks
+for cache bit masks, minimum b/w of 10% with a memory bandwidth
+granularity of 10%
 
 # mount -t resctrl resctrl /sys/fs/resctrl
 # cd /sys/fs/resctrl
 # mkdir p0 p1
-# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
-# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
+# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
+# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
 
 The default resource group is unmodified, so we have access to all parts
 of all caches (its schemata file reads "L3:0=f;1=f").
@@ -147,6 +188,14 @@ Tasks that are under the control of group "p0" may only allocate from the
 "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
 
+Similarly, tasks that are under the control of group "p0" may use a
+maximum memory b/w of 50% on socket0, and the 50% on socket 1.
+Tasks in group "p1" may use the rest of 50% memory b/w on both sockets.
+Note that unlike cache masks, memory b/w cannot specify whether these
+allocations can overlap or not. The allocations specifies the maximum
+b/w that the group may be able to use and the system admin can configure
+the b/w accordingly.
+
 Example 2
 ---------
 Again two sockets, but this time with a more realistic 20-bit mask.
@@ -185,6 +234,16 @@ Ditto for the second real time task (with the remaining 25% of cache):
 # echo 5678 > p1/tasks
 # taskset -cp 2 5678
 
+For the same 2 socket system with memory b/w resource and CAT L3 the
+schemata would look like:
+
+Assume min_bw 10 and bw_gran is 10.
+
+# echo -e "L3:0=f8000;1=fffff\nMB:0=10;1=30" > p0/schemata
+
+This would request 10% memory b/w on socket 0 and 30% memory b/w on
+socket1.
+
 Example 3
 ---------
 
@@ -203,10 +262,11 @@ First we reset the schemata for the default group so that the "upper"
 # echo "L3:0=3ff" > schemata
 
 Next we make a resource group for our real time cores and give
-it access to the "top" 50% of the cache on socket 0.
+it access to the "top" 50% of the cache on socket 0 and 50% of memory
+bandwidth on socket 0.
 
 # mkdir p0
-# echo "L3:0=ffc00;" > p0/schemata
+# echo "L3:0=ffc00;\nMB:0=50" > p0/schemata
 
 Finally we move core 4-7 over to the new group and make sure that the
 kernel and the tasks running there get 50% of the cache.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Preparatory patch to generalize the naming for RDT so that we can get
ready to apply other resource controls like MBA.

RDT resource cbm values are named to ctrl_val representing generic
control values which will hold both cbm(cache bit mask) and memory b/w
throttle values. max_cbm is updated to default_ctrl which represents
default values which provide no control which is all bits set in case of
CAT and 100% bandwidth in case of MBA. The tmp_cbm is updated to
tmp_ctrls. Similarly domain structures are updated to ctrl_val instead
of cbm.

APIs are also generalized:
- get_cache_config is added to separate from memory specific apis.
- MSR update api names are changed from having cbm to ctrl.
- info file API names are set to reflect generic default_ctrl or control
values rather than cbm.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         | 22 +++++++++++-----------
 arch/x86/kernel/cpu/intel_rdt.c          | 28 ++++++++++++++--------------
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 12 ++++++------
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 26 +++++++++++++-------------
 4 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 95ce5c8..326df9e 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -71,14 +71,14 @@ struct rftype {
  * @capable:			Is this feature available on this machine
  * @name:			Name to use in "schemata" file
  * @num_closid:			Number of CLOSIDs available
- * @max_cbm:			Largest Cache Bit Mask allowed
+ * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
  * @domains:			All domains for this resource
  * @num_domains:		Number of domains active
  * @msr_base:			Base MSR address for CBMs
- * @tmp_cbms:			Scratch space when updating schemata
- * @num_tmp_cbms:		Number of CBMs in tmp_cbms
+ * @tmp_ctrl:			Scratch space when updating schemata
+ * @num_tmp_ctrl:		Number of control values in tmp_ctrl
  * @cache_level:		Which cache level defines scope of this domain
  * @cbm_idx_multi:		Multiplier of CBM index
  * @cbm_idx_offset:		Offset of CBM index. CBM index is computed by:
@@ -91,12 +91,12 @@ struct rdt_resource {
 	int			num_closid;
 	int			cbm_len;
 	int			min_cbm_bits;
-	u32			max_cbm;
+	u32			default_ctrl;
 	struct list_head	domains;
 	int			num_domains;
 	int			msr_base;
-	u32			*tmp_cbms;
-	int			num_tmp_cbms;
+	u32			*tmp_ctrl;
+	int			num_tmp_ctrl;
 	int			cache_level;
 	int			cbm_idx_multi;
 	int			cbm_idx_offset;
@@ -107,13 +107,13 @@ struct rdt_resource {
  * @list:	all instances of this resource
  * @id:		unique id for this instance
  * @cpu_mask:	which cpus share this resource
- * @cbm:	array of cache bit masks (indexed by CLOSID)
+ * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
  */
 struct rdt_domain {
 	struct list_head	list;
 	int			id;
 	struct cpumask		cpu_mask;
-	u32			*cbm;
+	u32			*ctrl_val;
 };
 
 /**
@@ -164,8 +164,8 @@ enum {
 	unsigned int full;
 };
 
-/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
-union cpuid_0x10_1_edx {
+/* CPUID.(EAX=10H, ECX=ResID).EDX */
+union cpuid_0x10_x_edx {
 	struct {
 		unsigned int cos_max:16;
 	} split;
@@ -174,7 +174,7 @@ enum {
 
 DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
 
-void rdt_cbm_update(void *arg);
+void rdt_ctrl_update(void *arg);
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index c8af5d9..5d8c1be 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -119,7 +119,7 @@ static inline bool cache_alloc_hsw_probe(void)
 
 		r->num_closid = 4;
 		r->cbm_len = 20;
-		r->max_cbm = max_cbm;
+		r->default_ctrl = max_cbm;
 		r->min_cbm_bits = 2;
 		r->capable = true;
 		r->enabled = true;
@@ -130,16 +130,16 @@ static inline bool cache_alloc_hsw_probe(void)
 	return false;
 }
 
-static void rdt_get_config(int idx, struct rdt_resource *r)
+static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 {
 	union cpuid_0x10_1_eax eax;
-	union cpuid_0x10_1_edx edx;
+	union cpuid_0x10_x_edx edx;
 	u32 ebx, ecx;
 
 	cpuid_count(0x00000010, idx, &eax.full, &ebx, &ecx, &edx.full);
 	r->num_closid = edx.split.cos_max + 1;
 	r->cbm_len = eax.split.cbm_len + 1;
-	r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
 	r->capable = true;
 	r->enabled = true;
 }
@@ -151,7 +151,7 @@ static void rdt_get_cdp_l3_config(int type)
 
 	r->num_closid = r_l3->num_closid / 2;
 	r->cbm_len = r_l3->cbm_len;
-	r->max_cbm = r_l3->max_cbm;
+	r->default_ctrl = r_l3->default_ctrl;
 	r->capable = true;
 	/*
 	 * By default, CDP is disabled. CDP can be enabled by mount parameter
@@ -171,7 +171,7 @@ static inline bool get_rdt_resources(void)
 		return false;
 
 	if (boot_cpu_has(X86_FEATURE_CAT_L3)) {
-		rdt_get_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
+		rdt_get_cache_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
 		if (boot_cpu_has(X86_FEATURE_CDP_L3)) {
 			rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA);
 			rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE);
@@ -180,7 +180,7 @@ static inline bool get_rdt_resources(void)
 	}
 	if (boot_cpu_has(X86_FEATURE_CAT_L2)) {
 		/* CPUID 0x10.2 fields are same format at 0x10.1 */
-		rdt_get_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
+		rdt_get_cache_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
 		ret = true;
 	}
 
@@ -200,7 +200,7 @@ static int get_cache_id(int cpu, int level)
 	return -1;
 }
 
-void rdt_cbm_update(void *arg)
+void rdt_ctrl_update(void *arg)
 {
 	struct msr_param *m = (struct msr_param *)arg;
 	struct rdt_resource *r = m->res;
@@ -221,7 +221,7 @@ void rdt_cbm_update(void *arg)
 	for (i = m->low; i < m->high; i++) {
 		int idx = cbm_idx(r, i);
 
-		wrmsrl(r->msr_base + idx, d->cbm[i]);
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
 	}
 }
 
@@ -294,8 +294,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
 	d->id = id;
 
-	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
-	if (!d->cbm) {
+	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
+	if (!d->ctrl_val) {
 		kfree(d);
 		return;
 	}
@@ -303,8 +303,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	for (i = 0; i < r->num_closid; i++) {
 		int idx = cbm_idx(r, i);
 
-		d->cbm[i] = r->max_cbm;
-		wrmsrl(r->msr_base + idx, d->cbm[i]);
+		d->ctrl_val[i] = r->default_ctrl;
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
 	}
 
 	cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -326,7 +326,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	cpumask_clear_cpu(cpu, &d->cpu_mask);
 	if (cpumask_empty(&d->cpu_mask)) {
 		r->num_domains--;
-		kfree(d->cbm);
+		kfree(d->ctrl_val);
 		list_del(&d->list);
 		kfree(d);
 	}
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 9b9565f..0d8fa61 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -498,12 +498,12 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
 	return 0;
 }
 
-static int rdt_cbm_mask_show(struct kernfs_open_file *of,
+static int rdt_default_ctrl_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%x\n", r->max_cbm);
+	seq_printf(seq, "%x\n", r->default_ctrl);
 
 	return 0;
 }
@@ -530,7 +530,7 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 		.name		= "cbm_mask",
 		.mode		= 0444,
 		.kf_ops		= &rdtgroup_kf_single_ops,
-		.seq_show	= rdt_cbm_mask_show,
+		.seq_show	= rdt_default_ctrl_show,
 	},
 	{
 		.name		= "min_cbm_bits",
@@ -803,14 +803,14 @@ static int reset_all_ctrls(struct rdt_resource *r, u32 sclosid, u32 eclosid)
 		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 
 		for (i = sclosid; i < eclosid; i++)
-			d->cbm[i] = r->max_cbm;
+			d->ctrl_val[i] = r->default_ctrl;
 	}
 	cpu = get_cpu();
 	/* Update CBM on this cpu if it's in cpu_mask. */
 	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_cbm_update(&msr_param);
+		rdt_ctrl_update(&msr_param);
 	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index 527d042..3cde1e8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -38,7 +38,7 @@ static bool cbm_validate(unsigned long var, struct rdt_resource *r)
 {
 	unsigned long first_bit, zero_bit;
 
-	if (var == 0 || var > r->max_cbm)
+	if (var == 0 || var > r->default_ctrl)
 		return false;
 
 	first_bit = find_first_bit(&var, r->cbm_len);
@@ -66,7 +66,7 @@ static int parse_cbm(char *buf, struct rdt_resource *r)
 		return ret;
 	if (!cbm_validate(data, r))
 		return -EINVAL;
-	r->tmp_cbms[r->num_tmp_cbms++] = data;
+	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
 
 	return 0;
 }
@@ -95,7 +95,7 @@ static int parse_line(char *line, struct rdt_resource *r)
 	}
 
 	/* Incorrect number of domains in the line */
-	if (r->num_tmp_cbms != r->num_domains)
+	if (r->num_tmp_ctrl != r->num_domains)
 		return -EINVAL;
 
 	/* Any garbage at the end of the line? */
@@ -123,18 +123,18 @@ static int update_domains(struct rdt_resource *r, int closid)
 	 * There by avoiding unnecessary IPIs.
 	 */
 	list_for_each_entry(d, &r->domains, list) {
-		if (d->cbm[msr_param.low] != r->tmp_cbms[idx]) {
+		if (d->ctrl_val[msr_param.low] != r->tmp_ctrl[idx]) {
 			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
-			d->cbm[msr_param.low] = r->tmp_cbms[idx];
+			d->ctrl_val[msr_param.low] = r->tmp_ctrl[idx];
 		}
 		idx++;
 	}
 	cpu = get_cpu();
 	/* Update CBM on this cpu if it's in cpu_mask. */
 	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_cbm_update(&msr_param);
+		rdt_ctrl_update(&msr_param);
 	/* Update CBM on other cpus. */
-	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
@@ -187,13 +187,13 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
 	/* get scratch space to save all the masks while we validate input */
 	for_each_enabled_rdt_resource(r) {
-		r->tmp_cbms = kcalloc(r->num_domains, sizeof(*l3_cbms),
+		r->tmp_ctrl = kcalloc(r->num_domains, sizeof(*l3_cbms),
 				      GFP_KERNEL);
-		if (!r->tmp_cbms) {
+		if (!r->tmp_ctrl) {
 			ret = -ENOMEM;
 			goto out;
 		}
-		r->num_tmp_cbms = 0;
+		r->num_tmp_ctrl = 0;
 	}
 
 	r = NULL;
@@ -225,8 +225,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 out:
 	rdtgroup_kn_unlock(of->kn);
 	for_each_enabled_rdt_resource(r) {
-		kfree(r->tmp_cbms);
-		r->tmp_cbms = NULL;
+		kfree(r->tmp_ctrl);
+		r->tmp_ctrl = NULL;
 	}
 	return ret ?: nbytes;
 }
@@ -240,7 +240,7 @@ static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
 	list_for_each_entry(dom, &r->domains, list) {
 		if (sep)
 			seq_puts(s, ";");
-		seq_printf(s, "%d=%x", dom->id, dom->cbm[closid]);
+		seq_printf(s, "%d=%x", dom->id, dom->ctrl_val[closid]);
 		sep = true;
 	}
 	seq_puts(s, "\n");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
Add supporting data structures to detect feature details which is done
in later patch using CPUID with EAX=10H, ECX= 3.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 2 ++
 arch/x86/include/asm/intel_rdt.h   | 8 ++++++++
 arch/x86/kernel/cpu/intel_rdt.c    | 4 ++++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 4 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index eafee31..50cbdd0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -201,6 +201,8 @@
 #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
 
+#define X86_FEATURE_MBA         ( 7*32+18) /* Memory Bandwidth Allocation */
+
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
 #define X86_FEATURE_VNMI        ( 8*32+ 1) /* Intel Virtual NMI */
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 326df9e..d2eee45 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -164,6 +164,14 @@ enum {
 	unsigned int full;
 };
 
+/* CPUID.(EAX=10H, ECX=ResID=3).EAX */
+union cpuid_0x10_3_eax {
+	struct {
+		unsigned int max_delay:12;
+	} split;
+	unsigned int full;
+};
+
 /* CPUID.(EAX=10H, ECX=ResID).EDX */
 union cpuid_0x10_x_edx {
 	struct {
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 5d8c1be..b76a518 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -184,6 +184,10 @@ static inline bool get_rdt_resources(void)
 		ret = true;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_MBA)) {
+		ret = true;
+	}
+
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index d979406..23c2350 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ struct cpuid_bit {
 	{ X86_FEATURE_CAT_L3,		CPUID_EBX,  1, 0x00000010, 0 },
 	{ X86_FEATURE_CAT_L2,		CPUID_EBX,  2, 0x00000010, 0 },
 	{ X86_FEATURE_CDP_L3,		CPUID_ECX,  2, 0x00000010, 1 },
+	{ X86_FEATURE_MBA,		CPUID_EBX,  3, 0x00000010, 0 },
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
                   ` (2 preceding siblings ...)
  2017-02-17 19:58 ` [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 15:24   ` Thomas Gleixner
  2017-02-17 19:58 ` [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare Vikas Shivappa
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

The MBA feature details like minimum bandwidth supported, b/w
granularity etc are obtained via executing CPUID with EAX=10H
,ECX=3.

Setup and initialize the MBA specific extensions to data structures like
global list of RDT resources, RDT resource structure and RDT domain
structure.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h | 17 +++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 95 ++++++++++++++++++++++++++++++++++------
 2 files changed, 99 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index d2eee45..af65b2a 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -11,6 +11,9 @@
 #define IA32_L3_QOS_CFG		0xc81
 #define IA32_L3_CBM_BASE	0xc90
 #define IA32_L2_CBM_BASE	0xd10
+#define IA32_MBA_THRTL_BASE	0xd50
+#define MAX_MBA_THRTL		100u
+#define MBA_IS_LINEAR		0x4
 
 #define L3_QOS_CDP_ENABLE	0x01ULL
 
@@ -74,6 +77,14 @@ struct rftype {
  * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @max_delay:			Max throttle delay. Delay is the hardware
+ *				understandable value for memory b/w.
+ * @min_bw:			Minimum memory bandwidth in percentage
+ *				user can request
+ * @bw_gran:			Bandwidth granularity
+ * @delay_linear:		True if Mem b/w delay is in linear scale
+ * @mb_map:			Mapping of mem b/w delay to
+ *				b/w throttle percentage
  * @domains:			All domains for this resource
  * @num_domains:		Number of domains active
  * @msr_base:			Base MSR address for CBMs
@@ -92,6 +103,11 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			default_ctrl;
+	u32			max_delay;
+	u32			min_bw;
+	u32			bw_gran;
+	u32			delay_linear;
+	u32			*mb_map;
 	struct list_head	domains;
 	int			num_domains;
 	int			msr_base;
@@ -141,6 +157,7 @@ enum {
 	RDT_RESOURCE_L3DATA,
 	RDT_RESOURCE_L3CODE,
 	RDT_RESOURCE_L2,
+	RDT_RESOURCE_MBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index b76a518..130ce98 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -76,6 +76,14 @@ struct rdt_resource rdt_resources_all[] = {
 		.cbm_idx_multi	= 1,
 		.cbm_idx_offset	= 0
 	},
+	{
+		.name		= "MB",
+		.domains	= domain_init(RDT_RESOURCE_MBA),
+		.msr_base	= IA32_MBA_THRTL_BASE,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset = 0
+	},
 };
 
 static int cbm_idx(struct rdt_resource *r, int closid)
@@ -130,6 +138,51 @@ static inline bool cache_alloc_hsw_probe(void)
 	return false;
 }
 
+/*
+ * rdt_get_mb_table() - get a mapping of b/w percentage values
+ * exposed to user interface and the h/w understandable delay values.
+ *
+ * The non-linear delay values have the granularity of power of two
+ * and also the h/w does not guarantee a curve for configured delay
+ * values vs. actual b/w throttled.
+ * Hence we need a mapping that is pre caliberated for user to express
+ * the b/w in terms of any sensible number.
+ */
+static inline int rdt_get_mb_table(struct rdt_resource *r)
+{
+	/*
+	 * There are no Intel SKUs as of now to support non-linear delay.
+	 */
+	r->mb_map = NULL;
+
+	return -ENODEV;
+}
+
+static bool rdt_get_mem_config(struct rdt_resource *r)
+{
+	union cpuid_0x10_3_eax eax;
+	union cpuid_0x10_x_edx edx;
+	u32 ebx, ecx;
+
+	cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full);
+	r->num_closid = edx.split.cos_max + 1;
+	r->max_delay = eax.split.max_delay + 1;
+	r->default_ctrl = MAX_MBA_THRTL;
+	if (ecx & MBA_IS_LINEAR) {
+		r->delay_linear = true;
+		r->min_bw = MAX_MBA_THRTL - r->max_delay;
+		r->bw_gran = MAX_MBA_THRTL - r->max_delay;
+	} else {
+		if (rdt_get_mb_table(r))
+			return false;
+	}
+
+	r->capable = true;
+	r->enabled = true;
+
+	return true;
+}
+
 static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 {
 	union cpuid_0x10_1_eax eax;
@@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void)
 		ret = true;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_MBA)) {
-		ret = true;
-	}
+	if (boot_cpu_has(X86_FEATURE_MBA))
+		ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
 
 	return ret;
 }
@@ -262,6 +314,30 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 	return NULL;
 }
 
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+{
+	int i;
+
+	d->ctrl_val = kmalloc_array(r->num_closid,
+				     sizeof(*d->ctrl_val), GFP_KERNEL);
+	if (!d->ctrl_val)
+		return -ENOMEM;
+
+	/*
+	 * Initialize the Control MSRs to having no control.
+	 * For Cache Allocation: Set all bits in cbm
+	 * For Memory Allocation: Set b/w requested to 100
+	 */
+	for (i = 0; i < r->num_closid; i++) {
+		int idx = cbm_idx(r, i);
+
+		d->ctrl_val[i] = r->default_ctrl;
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
+	}
+
+	return 0;
+}
+
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -277,7 +353,7 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
  */
 static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
-	int i, id = get_cache_id(cpu, r->cache_level);
+	int id = get_cache_id(cpu, r->cache_level), ret;
 	struct list_head *add_pos = NULL;
 	struct rdt_domain *d;
 
@@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
 	d->id = id;
 
-	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
-	if (!d->ctrl_val) {
+	ret = domain_setup_ctrlval(r, d);
+	if (ret) {
 		kfree(d);
 		return;
 	}
 
-	for (i = 0; i < r->num_closid; i++) {
-		int idx = cbm_idx(r, i);
-
-		d->ctrl_val[i] = r->default_ctrl;
-		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
-	}
-
 	cpumask_set_cpu(cpu, &d->cpu_mask);
 	list_add_tail(&d->list, add_pos);
 	r->num_domains++;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
                   ` (3 preceding siblings ...)
  2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 15:34   ` Thomas Gleixner
  2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

As a preparatory patch to MBA info file setup, generalize the info file
setup to have the option to choose between different set of files.
Although multiple cache resources have same info files, Memory resources
have different set of info files. That way we have the option to choose
between memory resource and cache resource info files.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  5 +++++
 arch/x86/kernel/cpu/intel_rdt.c          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 18 ++++++++++++++----
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index af65b2a..36abed2 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -77,6 +77,8 @@ struct rftype {
  * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @info_files:		resctrl info files for the resource
+ * @infofiles_len:		Number of info files
  * @max_delay:			Max throttle delay. Delay is the hardware
  *				understandable value for memory b/w.
  * @min_bw:			Minimum memory bandwidth in percentage
@@ -103,6 +105,8 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			default_ctrl;
+	struct rftype		*info_files;
+	int			infofiles_len;
 	u32			max_delay;
 	u32			min_bw;
 	u32			bw_gran;
@@ -144,6 +148,7 @@ struct msr_param {
 	int			high;
 };
 
+void rdt_get_cache_infofile(struct rdt_resource *r);
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 130ce98..481ff32 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -193,6 +193,7 @@ static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 	r->num_closid = edx.split.cos_max + 1;
 	r->cbm_len = eax.split.cbm_len + 1;
 	r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	rdt_get_cache_infofile(r);
 	r->capable = true;
 	r->enabled = true;
 }
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 0d8fa61..0a70e87 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -519,7 +519,7 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 }
 
 /* rdtgroup information files for one cache resource. */
-static struct rftype res_info_files[] = {
+static struct rftype res_cache_info_files[] = {
 	{
 		.name		= "num_closids",
 		.mode		= 0444,
@@ -540,11 +540,18 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+void rdt_get_cache_infofile(struct rdt_resource *r)
+{
+	r->info_files = &res_cache_info_files[0];
+	r->infofiles_len = ARRAY_SIZE(res_cache_info_files);
+}
+
 static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 {
 	struct kernfs_node *kn_subdir;
+	struct rftype *res_info_files;
 	struct rdt_resource *r;
-	int ret;
+	int ret, len;
 
 	/* create the directory */
 	kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL);
@@ -563,8 +570,11 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 		ret = rdtgroup_kn_set_ugid(kn_subdir);
 		if (ret)
 			goto out_destroy;
-		ret = rdtgroup_add_files(kn_subdir, res_info_files,
-					 ARRAY_SIZE(res_info_files));
+
+		res_info_files = r->info_files;
+		len = r->infofiles_len;
+
+		ret = rdtgroup_add_files(kn_subdir, res_info_files, len);
 		if (ret)
 			goto out_destroy;
 		kernfs_activate(kn_subdir);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
                   ` (4 preceding siblings ...)
  2017-02-17 19:58 ` [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 16:05   ` Thomas Gleixner
  2017-02-17 19:58 ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare Vikas Shivappa
  2017-02-17 19:58 ` [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA Vikas Shivappa
  7 siblings, 1 reply; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Add files in info directory for MBA.
The files in the info directory are as follows :
- num_closids: max number of closids for MBA which represents the max
class of service user can configure.
- min_bw: the minimum memory bandwidth(b/w) values in percentage.

OS maps the b/w percentage values to memory b/w throttle delay values
and configures them via MSR interface by writing to the QOS_MSRs. These
delay values can have a linear or nonlinear scale.

- bw_gran: The memory b/w granularity that can be configured.
For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
values are 10,20,30...

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  2 +
 arch/x86/kernel/cpu/intel_rdt.c          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 64 ++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 36abed2..24de64c 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -149,6 +149,8 @@ struct msr_param {
 };
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
+void rdt_get_mba_infofile(struct rdt_resource *r);
+
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 481ff32..353c476b4 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -176,6 +176,7 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
 		if (rdt_get_mb_table(r))
 			return false;
 	}
+	rdt_get_mba_infofile(r);
 
 	r->capable = true;
 	r->enabled = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 0a70e87..b445ee8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -518,6 +518,36 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_min_bw_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->min_bw);
+
+	return 0;
+}
+
+static int rdt_bw_gran_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->bw_gran);
+
+	return 0;
+}
+
+static int rdt_delay_linear_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->delay_linear);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_cache_info_files[] = {
 	{
@@ -540,6 +570,40 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+/* rdtgroup information files for MBE. */
+static struct rftype res_mbe_info_files[] = {
+	{
+		.name		= "num_closids",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closids_show,
+	},
+	{
+		.name		= "min_bandwidth",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_min_bw_show,
+	},
+	{
+		.name		= "bandwidth_gran",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_bw_gran_show,
+	},
+	{
+		.name		= "delay_linear",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_linear_show,
+	},
+};
+
+void rdt_get_mba_infofile(struct rdt_resource *r)
+{
+	r->info_files = &res_mbe_info_files[0];
+	r->infofiles_len = ARRAY_SIZE(res_mbe_info_files);
+}
+
 void rdt_get_cache_infofile(struct rdt_resource *r)
 {
 	r->info_files = &res_cache_info_files[0];
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
                   ` (5 preceding siblings ...)
  2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 16:24   ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare\ Thomas Gleixner
  2017-02-17 19:58 ` [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA Vikas Shivappa
  7 siblings, 1 reply; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Add support to introduce generic APIs for control validation and writing
QOS_MSRs for RDT resources. The control validation api is meant to
validate the control values like cache bit mask for CAT and memory b/w
percentage for MBA. A resource generic display format is also added and
used for the resources depending on whether its displayed in
hex/decimal.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  8 +++++++
 arch/x86/kernel/cpu/intel_rdt.c          | 33 +++++++++++++++++++++-----
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 40 ++++++++++++++++++--------------
 3 files changed, 58 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 24de64c..8748b0d 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -77,6 +77,9 @@ struct rftype {
  * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @format_str:		Per resource format string to show domain val
+ * @parse_ctrlval:		Per resource API to parse the ctrl values
+ * @msr_update:		API to update QOS MSRs
  * @info_files:		resctrl info files for the resource
  * @infofiles_len:		Number of info files
  * @max_delay:			Max throttle delay. Delay is the hardware
@@ -105,6 +108,9 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			default_ctrl;
+	const char		*format_str;
+	int (*parse_ctrlval)	(char *buf, struct rdt_resource *r);
+	void (*msr_update)	(void *a1, void *a2, struct rdt_resource *r);
 	struct rftype		*info_files;
 	int			infofiles_len;
 	u32			max_delay;
@@ -150,6 +156,8 @@ struct msr_param {
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
 void rdt_get_mba_infofile(struct rdt_resource *r);
+int parse_cbm(char *buf, struct rdt_resource *r);
+void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 353c476b4..7ce4453 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -44,6 +44,9 @@ struct rdt_resource rdt_resources_all[] = {
 		.name		= "L3",
 		.domains	= domain_init(RDT_RESOURCE_L3),
 		.msr_base	= IA32_L3_CBM_BASE,
+		.parse_ctrlval	= parse_cbm,
+		.msr_update	= cqm_wrmsr,
+		.format_str	= "%d=%x",
 		.min_cbm_bits	= 1,
 		.cache_level	= 3,
 		.cbm_idx_multi	= 1,
@@ -53,6 +56,9 @@ struct rdt_resource rdt_resources_all[] = {
 		.name		= "L3DATA",
 		.domains	= domain_init(RDT_RESOURCE_L3DATA),
 		.msr_base	= IA32_L3_CBM_BASE,
+		.parse_ctrlval	= parse_cbm,
+		.msr_update	= cqm_wrmsr,
+		.format_str	= "%d=%x",
 		.min_cbm_bits	= 1,
 		.cache_level	= 3,
 		.cbm_idx_multi	= 2,
@@ -62,6 +68,9 @@ struct rdt_resource rdt_resources_all[] = {
 		.name		= "L3CODE",
 		.domains	= domain_init(RDT_RESOURCE_L3CODE),
 		.msr_base	= IA32_L3_CBM_BASE,
+		.parse_ctrlval	= parse_cbm,
+		.msr_update	= cqm_wrmsr,
+		.format_str	= "%d=%x",
 		.min_cbm_bits	= 1,
 		.cache_level	= 3,
 		.cbm_idx_multi	= 2,
@@ -71,6 +80,9 @@ struct rdt_resource rdt_resources_all[] = {
 		.name		= "L2",
 		.domains	= domain_init(RDT_RESOURCE_L2),
 		.msr_base	= IA32_L2_CBM_BASE,
+		.parse_ctrlval	= parse_cbm,
+		.msr_update	= cqm_wrmsr,
+		.format_str	= "%d=%x",
 		.min_cbm_bits	= 1,
 		.cache_level	= 2,
 		.cbm_idx_multi	= 1,
@@ -258,11 +270,24 @@ static int get_cache_id(int cpu, int level)
 	return -1;
 }
 
+void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r)
+{
+	struct rdt_domain *d = (struct rdt_domain *)a2;
+	struct msr_param *m = (struct msr_param *)a1;
+	int i;
+
+	for (i = m->low; i < m->high; i++) {
+		int idx = cbm_idx(r, i);
+
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
+	}
+}
+
 void rdt_ctrl_update(void *arg)
 {
 	struct msr_param *m = (struct msr_param *)arg;
 	struct rdt_resource *r = m->res;
-	int i, cpu = smp_processor_id();
+	int cpu = smp_processor_id();
 	struct rdt_domain *d;
 
 	list_for_each_entry(d, &r->domains, list) {
@@ -276,11 +301,7 @@ void rdt_ctrl_update(void *arg)
 	return;
 
 found:
-	for (i = m->low; i < m->high; i++) {
-		int idx = cbm_idx(r, i);
-
-		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
-	}
+	r->msr_update(m, d, r);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index 3cde1e8..4141662 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -34,41 +34,46 @@
  *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
  * Additionally Haswell requires at least two bits set.
  */
-static bool cbm_validate(unsigned long var, struct rdt_resource *r)
+static int cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
 {
-	unsigned long first_bit, zero_bit;
+	unsigned long first_bit, zero_bit, var;
+	int ret;
+
+	ret = kstrtoul(buf, 16, &var);
+	if (ret)
+		return ret;
 
 	if (var == 0 || var > r->default_ctrl)
-		return false;
+		return -EINVAL;
 
 	first_bit = find_first_bit(&var, r->cbm_len);
 	zero_bit = find_next_zero_bit(&var, r->cbm_len, first_bit);
 
 	if (find_next_bit(&var, r->cbm_len, zero_bit) < r->cbm_len)
-		return false;
+		return -EINVAL;
 
 	if ((zero_bit - first_bit) < r->min_cbm_bits)
-		return false;
-	return true;
+		return -EINVAL;
+
+	*data = var;
+
+	return 0;
 }
 
 /*
- * Read one cache bit mask (hex). Check that it is valid for the current
- * resource type.
+ * Read the user RDT control value into tempory buffer:
+ * Cache bit mask (hex) or Memory b/w throttle (decimal).
+ * Check that it is valid for the current resource type.
  */
-static int parse_cbm(char *buf, struct rdt_resource *r)
+int parse_cbm(char *buf, struct rdt_resource *r)
 {
 	unsigned long data;
 	int ret;
 
-	ret = kstrtoul(buf, 16, &data);
-	if (ret)
-		return ret;
-	if (!cbm_validate(data, r))
-		return -EINVAL;
+	ret = cbm_validate(buf, &data, r);
 	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -90,7 +95,7 @@ static int parse_line(char *line, struct rdt_resource *r)
 		id = strsep(&dom, "=");
 		if (kstrtoul(id, 10, &dom_id) || dom_id != d->id)
 			return -EINVAL;
-		if (parse_cbm(dom, r))
+		if (r->parse_ctrlval(dom, r))
 			return -EINVAL;
 	}
 
@@ -240,7 +245,8 @@ static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
 	list_for_each_entry(dom, &r->domains, list) {
 		if (sep)
 			seq_puts(s, ";");
-		seq_printf(s, "%d=%x", dom->id, dom->ctrl_val[closid]);
+
+		seq_printf(s, r->format_str, dom->id, dom->ctrl_val[closid]);
 		sep = true;
 	}
 	seq_puts(s, "\n");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
                   ` (6 preceding siblings ...)
  2017-02-17 19:58 ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 16:59   ` Thomas Gleixner
  7 siblings, 1 reply; 18+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Add support to update the MBA bandwidth values for the domains. The MBA
bandwidth values are specified for each domain by updating the schemata.
The schemata string is parsed and validated for the correct bandwidth
values for meeting the minimum b/w and b/w granularity requirements. OS
updates the corresponding domain PQOS_MSRs which are indexed from 0xD50
for MBA.

For linear scale the granularity is just the 100-max_delay. For
non-linear values, it is obtained by the mapping between the delay
values and percentage b/w values.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  3 +++
 arch/x86/kernel/cpu/intel_rdt.c          | 35 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 37 ++++++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 8748b0d..869fee1 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -13,6 +13,7 @@
 #define IA32_L2_CBM_BASE	0xd10
 #define IA32_MBA_THRTL_BASE	0xd50
 #define MAX_MBA_THRTL		100u
+#define INVALID_DELAY		100u
 #define MBA_IS_LINEAR		0x4
 
 #define L3_QOS_CDP_ENABLE	0x01ULL
@@ -157,7 +158,9 @@ struct msr_param {
 void rdt_get_cache_infofile(struct rdt_resource *r);
 void rdt_get_mba_infofile(struct rdt_resource *r);
 int parse_cbm(char *buf, struct rdt_resource *r);
+int parse_thrtl(char *buf, struct rdt_resource *r);
 void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r);
+void mba_wrmsr(void *a1, void *a2, struct rdt_resource *r);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 7ce4453..c0e56a7 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -92,6 +92,9 @@ struct rdt_resource rdt_resources_all[] = {
 		.name		= "MB",
 		.domains	= domain_init(RDT_RESOURCE_MBA),
 		.msr_base	= IA32_MBA_THRTL_BASE,
+		.parse_ctrlval	= parse_thrtl,
+		.msr_update	= mba_wrmsr,
+		.format_str	= "%d=%d",
 		.cache_level	= 3,
 		.cbm_idx_multi	= 1,
 		.cbm_idx_offset = 0
@@ -151,6 +154,19 @@ static inline bool cache_alloc_hsw_probe(void)
 }
 
 /*
+ * Map the memory b/w percentage value to delay values
+ * that can be written to QOS_MSRs.
+ * There are currently no SKUs which support non linear delay values.
+ */
+static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
+{
+	if (r->delay_linear)
+		return MAX_MBA_THRTL - bw;
+
+	return INVALID_DELAY;
+}
+
+/*
  * rdt_get_mb_table() - get a mapping of b/w percentage values
  * exposed to user interface and the h/w understandable delay values.
  *
@@ -270,6 +286,25 @@ static int get_cache_id(int cpu, int level)
 	return -1;
 }
 
+void mba_wrmsr(void *a1, void *a2, struct rdt_resource *r)
+{
+	struct rdt_domain *d = (struct rdt_domain *)a2;
+	struct msr_param *m = (struct msr_param *)a1;
+	int i;
+
+	for (i = m->low; i < m->high; i++) {
+		int idx = cbm_idx(r, i);
+
+		/*
+		 * Write the delay value for mba.
+		 * delay_bw_map will return a correct value
+		 * for this call as a nonexistant map throws error
+		 * on init.
+		 */
+		wrmsrl(r->msr_base + idx, delay_bw_map(d->ctrl_val[i], r));
+	}
+}
+
 void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r)
 {
 	struct rdt_domain *d = (struct rdt_domain *)a2;
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index 4141662..1c016c2 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -29,6 +29,43 @@
 #include <asm/intel_rdt.h>
 
 /*
+ * Check whether MBA bandwidth percentage value is correct.
+ * The value is checked against the minimum bandwidth
+ * values and the b/w granularity specified by the h/w.
+ */
+static int thrtl_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+{
+	unsigned long bw;
+	int ret;
+
+	ret = kstrtoul(buf, 10, &bw);
+	if (ret)
+		return ret;
+
+	if (bw < r->min_bw || bw > MAX_MBA_THRTL || (bw % r->bw_gran))
+		return -EINVAL;
+	*data = bw;
+
+	return 0;
+}
+
+/*
+ * Read the user RDT control value into tempory buffer:
+ * Cache bit mask (hex) or Memory b/w throttle (decimal).
+ * Check that it is valid for the current resource type.
+ */
+int parse_thrtl(char *buf, struct rdt_resource *r)
+{
+	unsigned long data;
+	int ret;
+
+	ret = thrtl_validate(buf, &data, r);
+	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
+
+	return ret;
+}
+
+/*
  * Check whether a cache bit mask is valid. The SDM says:
  *	Please note that all (and only) contiguous '1' combinations
  *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
@ 2017-03-01 15:24   ` Thomas Gleixner
  2017-03-10 21:51     ` Shivappa Vikas
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2017-03-01 15:24 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -11,6 +11,9 @@
>  #define IA32_L3_QOS_CFG		0xc81
>  #define IA32_L3_CBM_BASE	0xc90
>  #define IA32_L2_CBM_BASE	0xd10
> +#define IA32_MBA_THRTL_BASE	0xd50
> +#define MAX_MBA_THRTL		100u
> +#define MBA_IS_LINEAR		0x4

I have a hard time to figure out how the latter two constants are related
to this list of registers. MBA_IS_LINEAR is used to check the CPUID bit and
MAX_MBA_THRTL is obviously a pure software constant because with a
non-linear scale the maximum value is not 100.

Just slapping defines to random places is equally bad as using hard coded
constants.

> +/*
> + * rdt_get_mb_table() - get a mapping of b/w percentage values
> + * exposed to user interface and the h/w understandable delay values.
> + *
> + * The non-linear delay values have the granularity of power of two
> + * and also the h/w does not guarantee a curve for configured delay
> + * values vs. actual b/w throttled.
> + * Hence we need a mapping that is pre caliberated for user to express
> + * the b/w in terms of any sensible number.

... calibrated so the user can express the bandwidth as a percentage value.

> +static inline int rdt_get_mb_table(struct rdt_resource *r)
> +{
> +	/*
> +	 * There are no Intel SKUs as of now to support non-linear delay.
> +	 */
> +	r->mb_map = NULL;

What's the point of setting this to NULL?

Also it would be helpful to emit log info here so people don't have to
start digging around.

	pr_info("Bandwidth map not implemented for ....", ... model);

> +
> +	return -ENODEV;

Returning -ENODEV to a function which just returns a boolean value is
pointless.

>  static void rdt_get_cache_config(int idx, struct rdt_resource *r)
>  {
>  	union cpuid_0x10_1_eax eax;
> @@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void)
>  		ret = true;
>  	}
>  
> -	if (boot_cpu_has(X86_FEATURE_MBA)) {
> -		ret = true;
> -	}
> +	if (boot_cpu_has(X86_FEATURE_MBA))
> +		ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);

Groan. When rdt_get_mem_config() returns false (because the map is not
implemented), then the whole function returns false and CAT is disabled.

> +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	int i;
> +
> +	d->ctrl_val = kmalloc_array(r->num_closid,
> +				     sizeof(*d->ctrl_val), GFP_KERNEL);
> +	if (!d->ctrl_val)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Initialize the Control MSRs to having no control.
> +	 * For Cache Allocation: Set all bits in cbm
> +	 * For Memory Allocation: Set b/w requested to 100
> +	 */
> +	for (i = 0; i < r->num_closid; i++) {
> +		int idx = cbm_idx(r, i);
> +
> +		d->ctrl_val[i] = r->default_ctrl;
> +		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
> +	}

So if you use a local pointer for that, this whole mess becomes readable.

static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
{
	u32 *p;
	int i;

	p = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
	if (!p)
		return -ENOMEM;

	d->ctrl_val = p;

	/* Initialize the Control MSRs to the default value */
	for (i = 0; i < r->num_closid; i++, p++) {
		int idx = cbm_idx(r, i);

		*p = r->default_ctrl;
		wrmsrl(r->msr_base + idx, *p);
	}
> +
> +	return 0;
> +}

>  static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  {
> -	int i, id = get_cache_id(cpu, r->cache_level);
> +	int id = get_cache_id(cpu, r->cache_level), ret;

Bah. If you have the same type in one line, then please move the
uninitialized variables to the front.

	int ret, id = get_cache_id(cpu, r->cache_level);

But a s/i/ret/ would have been to simple and kept the code readable.

> @@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  
>  	d->id = id;
>  
> -	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
> -	if (!d->ctrl_val) {
> +	ret = domain_setup_ctrlval(r, d);
> +	if (ret) {
>  		kfree(d);
>  		return;
>  	}

What's the point of this 'ret' variable if the function is void?

	if (domain_setup_ctrlval(r, d)) {
		kfree(d);
		return;
	}

would have been to easy to read, right?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare
  2017-02-17 19:58 ` [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare Vikas Shivappa
@ 2017-03-01 15:34   ` Thomas Gleixner
  2017-03-10 22:04     ` Shivappa Vikas
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2017-03-01 15:34 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:

> As a preparatory patch to MBA info file setup, generalize the info file
> setup to have the option to choose between different set of files.
> Although multiple cache resources have same info files, Memory resources
> have different set of info files. That way we have the option to choose
> between memory resource and cache resource info files.

Sigh.

> @@ -77,6 +77,8 @@ struct rftype {
>   * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>   *				in a cache bit mask
> + * @info_files:		resctrl info files for the resource
> + * @infofiles_len:		Number of info files

len == length, nr == number. No? Too intuitive, right?

And no, not infofiles_nr. It wants to be nr_infofiles. And while at it
please either use infofiles or info_files, but not a mixture of
both. Random underscores are not enhancing readability at all.

> +void rdt_get_cache_infofile(struct rdt_resource *r)
> +{
> +	r->info_files = &res_cache_info_files[0];

What's wrong with

        r->info_files = res_cache_info_files;

Nothing, but it would be too easy to read. This is not the obfuscated
c-code contest.

> +	r->infofiles_len = ARRAY_SIZE(res_cache_info_files);

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
@ 2017-03-01 16:05   ` Thomas Gleixner
  2017-03-10 23:26     ` Shivappa Vikas
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2017-03-01 16:05 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:

> Add files in info directory for MBA.
> The files in the info directory are as follows :
> - num_closids: max number of closids for MBA which represents the max
> class of service user can configure.
> - min_bw: the minimum memory bandwidth(b/w) values in percentage.
> 
> OS maps the b/w percentage values to memory b/w throttle delay values
> and configures them via MSR interface by writing to the QOS_MSRs. These
> delay values can have a linear or nonlinear scale.
> 
> - bw_gran: The memory b/w granularity that can be configured.
> For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
> values are 10,20,30...

This is unreadable. It's possible to structure ASCII text cleanly.

x86/rdt: Add info directory files for MBA

 The info directory for MBA contains the following files:

 num_closids

	The maximum number of class of service slots available for MBA

 min_bandwidth
 
	The minimum memory bandwidth percentage value

 bandwidth_gran

	The granularity of the bandwidth control for the particular
 	hardware in percent. The available bandwidth control steps are:
 	min_bw + N * bw_gran Intermediate values are rounded to the next
 	control step available on the hardware.

 delay_linear

	If set, the control registers take a linear percentage based value
	between min_bandwidth and 100 percent.

	If not set, the control registers take a power of 2 based value
	which is mapped by the kernel to percentage based values.

	This file is of pure informational nature and has no influence on
	the values which are written to the schemata files. These are
	always percentage based.

Note, that this uses the actual file names and not some random
abbreviations thereof. It also documents delay_linear and gets rid of the
implementation details of QOS_MSRs. They are irrelevant here. 

And exactly this information wants to go into Documentation/... preferably
in exactly this patch and not in a disconnected one which describes stuff
differently for whatever reasons.
  
> +static int rdt_min_bw_show(struct kernfs_open_file *of,
> +			     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	seq_printf(seq, "%d\n", r->min_bw);
> +

Can you please get rid of these pointless extra new lines before the
'return 0;' ? They are just eating screen estate and do not make the code
more readable.

> +/* rdtgroup information files for MBE. */

What is MBE?

> +static struct rftype res_mbe_info_files[] = {

Randomizing names make the code more secure or what are you trying to
achieve?

> +void rdt_get_mba_infofile(struct rdt_resource *r)
> +{
> +	r->info_files = &res_mbe_info_files[0];

See other mail.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare\
  2017-02-17 19:58 ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare Vikas Shivappa
@ 2017-03-01 16:24   ` Thomas Gleixner
  2017-03-30 19:05     ` Shivappa Vikas
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2017-03-01 16:24 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:

Subject: x86/intel_rdt: schemata file support for MBA prepare

I have no idea what MBA prepare is. Is that yet another variant aside of
MBE?

> Add support to introduce generic APIs for control validation and writing
> QOS_MSRs for RDT resources. The control validation api is meant to
> validate the control values like cache bit mask for CAT and memory b/w
> percentage for MBA. A resource generic display format is also added and
> used for the resources depending on whether its displayed in
> hex/decimal.

The usual unpenetratable mess of random sentences lumped together.

> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 24de64c..8748b0d 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -77,6 +77,9 @@ struct rftype {
>   * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>   *				in a cache bit mask
> + * @format_str:		Per resource format string to show domain val

Can you please spell out words in comments instead of using random
abbreviations just as you see fit?

> + * @parse_ctrlval:		Per resource API to parse the ctrl values

That's not an API. That's a function pointer.

> + * @msr_update:		API to update QOS MSRs

Ditto.

>   * @info_files:		resctrl info files for the resource
>   * @infofiles_len:		Number of info files
>   * @max_delay:			Max throttle delay. Delay is the hardware
> @@ -105,6 +108,9 @@ struct rdt_resource {
>  	int			cbm_len;
>  	int			min_cbm_bits;
>  	u32			default_ctrl;
> +	const char		*format_str;
> +	int (*parse_ctrlval)	(char *buf, struct rdt_resource *r);
> +	void (*msr_update)	(void *a1, void *a2, struct rdt_resource *r);

void *a1, *a2? Dammit, both implementations (CAT and MBA) use the same
types. This just avoids type safety which does not magically come back by
your completely nonsensical typecasts inside the callback implementations.

> +void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r)

And this is global because it's only used in this file, right?

> +{
> +	struct rdt_domain *d = (struct rdt_domain *)a2;
> +	struct msr_param *m = (struct msr_param *)a1;

Oh well......

> +	int i;
> +
> +	for (i = m->low; i < m->high; i++) {
> +		int idx = cbm_idx(r, i);
> +
> +		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
> +	}
> +}

> -static bool cbm_validate(unsigned long var, struct rdt_resource *r)
> +static int cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
>  {
> -	unsigned long first_bit, zero_bit;
> +	unsigned long first_bit, zero_bit, var;
> +	int ret;
> +
> +	ret = kstrtoul(buf, 16, &var);
> +	if (ret)
> +		return ret;
>  
>  	if (var == 0 || var > r->default_ctrl)
> -		return false;
> +		return -EINVAL;

So you change this function and the whole call chain to return either
-EINVAL or 0 instead of false/true.

And then you treat the integer return value as boolean on the call site
again:

> @@ -90,7 +95,7 @@ static int parse_line(char *line, struct rdt_resource *r)
>  		id = strsep(&dom, "=");
>  		if (kstrtoul(id, 10, &dom_id) || dom_id != d->id)
>  			return -EINVAL;
> -		if (parse_cbm(dom, r))
> +		if (r->parse_ctrlval(dom, r))
>  			return -EINVAL;

What's the purpose of this exercise? Annoying reviewers or what?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA
  2017-02-17 19:58 ` [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA Vikas Shivappa
@ 2017-03-01 16:59   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2017-03-01 16:59 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:
> @@ -13,6 +13,7 @@
>  #define IA32_L2_CBM_BASE	0xd10
>  #define IA32_MBA_THRTL_BASE	0xd50
>  #define MAX_MBA_THRTL		100u
> +#define INVALID_DELAY		100u

100% is an interesting definition of invalid.

>  #define MBA_IS_LINEAR		0x4
>  
>  #define L3_QOS_CDP_ENABLE	0x01ULL
> @@ -157,7 +158,9 @@ struct msr_param {
>  void rdt_get_cache_infofile(struct rdt_resource *r);
>  void rdt_get_mba_infofile(struct rdt_resource *r);
>  int parse_cbm(char *buf, struct rdt_resource *r);
> +int parse_thrtl(char *buf, struct rdt_resource *r);

Bah. The resource is MBA not 'thrtl'. Consistent naming is NOT optional.

>  void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r);
> +void mba_wrmsr(void *a1, void *a2, struct rdt_resource *r);

Again this is global, because it's only used in intel_rdt.c, right?

>  /*
> + * Map the memory b/w percentage value to delay values
> + * that can be written to QOS_MSRs.
> + * There are currently no SKUs which support non linear delay values.
> + */
> +static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
> +{
> +	if (r->delay_linear)
> +		return MAX_MBA_THRTL - bw;

Bah. Why do you have r->default_ctrl? Just to hard code stuff in places?

> +
> +	return INVALID_DELAY;

This is crap. If the code reaches this point, then the map has been
initialized. If the map has not been initialized, but delay_linear is 0
then that's a BUG and we should handle it gracefully.

It's fine to prepare for the eventual arrival of this non linear nonsense,
but then we do it proper.

	if (r->delay_linear)
		return r->default_ctrl - bw;

	if (!r->bw_map) {
		WARN_ONCE("Sensible error message");
		return r->default_ctrl;
	}
	WARN_ONCE("Sensible error message");
	return r->default_ctrl;

Whether this needs to be a WARN_ONCE() or a pr_err_once() is debatable as
the call chain is known and the only value of the backtrace is that it is
more prominent in dmesg.

> +void mba_wrmsr(void *a1, void *a2, struct rdt_resource *r)
> +{
> +	struct rdt_domain *d = (struct rdt_domain *)a2;
> +	struct msr_param *m = (struct msr_param *)a1;
> +	int i;
> +
> +	for (i = m->low; i < m->high; i++) {
> +		int idx = cbm_idx(r, i);
> +
> +		/*
> +		 * Write the delay value for mba.
> +		 * delay_bw_map will return a correct value

If you use function names in a comment then please do it proper:
delay_bw_map()

> +		 * for this call as a nonexistant map throws error
> +		 * on init.

Oh well. Adding nonsensical comments is way simpler than thinking about
proper implementations, right?

The function as you wrote it does not return a correct value under all
circumstances and the nonexistant map does not throw an error at all. We
simply should not reach that code when the map does not exist.

>  /*
> + * Check whether MBA bandwidth percentage value is correct.
> + * The value is checked against the minimum bandwidth
> + * values and the b/w granularity specified by the h/w.
> + */
> +static int thrtl_validate(char *buf, unsigned long *data, struct rdt_resource *r)

Darn. The input value is not a throttle value. Read the comment you wrote
yourself. It's a bandwidth percentage. So why the heck can't you make the
function name reflect what it does?

> +{
> +	unsigned long bw;
> +	int ret;
> +
> +	ret = kstrtoul(buf, 10, &bw);
> +	if (ret)
> +		return ret;
> +
> +	if (bw < r->min_bw || bw > MAX_MBA_THRTL || (bw % r->bw_gran))

I told you last time, that requiring exact matches is bad because you can't
use the same settings across different machines. It does not matter whether
it's accurate simply because the actual outcome in the hardware is not
accurate either. So the only interesting thing is that:

	 r->min_bw <= val <= MAX_MBA_BW

Yes, it's not MAX_MBA_THRTL, it's MAX_MBA_BW, which is always 100%. Please
fix that all over the place.

In principle, we could even relax the requirement to:

        0 <= val <= MAX_MBA_BW

and set the value to r->min_bw if it's smaller.

And of course this granularity check is completely bogus when non-linear
mappings come into play. So while you have plastered the rest of the code
with bogus handling of the non-linear stuff, you ignore it here.

A simple

	if (!r->delay_linear)
		return -ENOTSUPP;

would be the proper thing to do.

> +/*
> + * Read the user RDT control value into tempory buffer:
> + * Cache bit mask (hex) or Memory b/w throttle (decimal).

That's a really valuable comment for this function. NOT!

I really wonder how you manage to not confuse yourself with the
inconsistent mess you create all over the place. Maybe you do....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-03-01 15:24   ` Thomas Gleixner
@ 2017-03-10 21:51     ` Shivappa Vikas
  0 siblings, 0 replies; 18+ messages in thread
From: Shivappa Vikas @ 2017-03-10 21:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen



On Wed, 1 Mar 2017, Thomas Gleixner wrote:

> On Fri, 17 Feb 2017, Vikas Shivappa wrote:
>> --- a/arch/x86/include/asm/intel_rdt.h
>> +++ b/arch/x86/include/asm/intel_rdt.h
>> @@ -11,6 +11,9 @@
>>  #define IA32_L3_QOS_CFG		0xc81
>>  #define IA32_L3_CBM_BASE	0xc90
>>  #define IA32_L2_CBM_BASE	0xd10
>> +#define IA32_MBA_THRTL_BASE	0xd50
>> +#define MAX_MBA_THRTL		100u
>> +#define MBA_IS_LINEAR		0x4
>
> I have a hard time to figure out how the latter two constants are related
> to this list of registers. MBA_IS_LINEAR is used to check the CPUID bit and
> MAX_MBA_THRTL is obviously a pure software constant because with a
> non-linear scale the maximum value is not 100.
>
> Just slapping defines to random places is equally bad as using hard coded
> constants.
>
>> +/*
>> + * rdt_get_mb_table() - get a mapping of b/w percentage values
>> + * exposed to user interface and the h/w understandable delay values.
>> + *
>> + * The non-linear delay values have the granularity of power of two
>> + * and also the h/w does not guarantee a curve for configured delay
>> + * values vs. actual b/w throttled.
>> + * Hence we need a mapping that is pre caliberated for user to express
>> + * the b/w in terms of any sensible number.
>
> ... calibrated so the user can express the bandwidth as a percentage value.
>
>> +static inline int rdt_get_mb_table(struct rdt_resource *r)
>> +{
>> +	/*
>> +	 * There are no Intel SKUs as of now to support non-linear delay.
>> +	 */
>> +	r->mb_map = NULL;
>
> What's the point of setting this to NULL?
>
> Also it would be helpful to emit log info here so people don't have to
> start digging around.
>
> 	pr_info("Bandwidth map not implemented for ....", ... model);
>
>> +
>> +	return -ENODEV;
>
> Returning -ENODEV to a function which just returns a boolean value is
> pointless.
>
>>  static void rdt_get_cache_config(int idx, struct rdt_resource *r)
>>  {
>>  	union cpuid_0x10_1_eax eax;
>> @@ -184,9 +237,8 @@ static inline bool get_rdt_resources(void)
>>  		ret = true;
>>  	}
>>
>> -	if (boot_cpu_has(X86_FEATURE_MBA)) {
>> -		ret = true;
>> -	}
>> +	if (boot_cpu_has(X86_FEATURE_MBA))
>> +		ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
>
> Groan. When rdt_get_mem_config() returns false (because the map is not
> implemented), then the whole function returns false and CAT is disabled.
>
>> +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	int i;
>> +
>> +	d->ctrl_val = kmalloc_array(r->num_closid,
>> +				     sizeof(*d->ctrl_val), GFP_KERNEL);
>> +	if (!d->ctrl_val)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Initialize the Control MSRs to having no control.
>> +	 * For Cache Allocation: Set all bits in cbm
>> +	 * For Memory Allocation: Set b/w requested to 100
>> +	 */
>> +	for (i = 0; i < r->num_closid; i++) {
>> +		int idx = cbm_idx(r, i);
>> +
>> +		d->ctrl_val[i] = r->default_ctrl;
>> +		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
>> +	}
>
> So if you use a local pointer for that, this whole mess becomes readable.
>
> static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
> {
> 	u32 *p;
> 	int i;
>
> 	p = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
> 	if (!p)
> 		return -ENOMEM;
>
> 	d->ctrl_val = p;
>
> 	/* Initialize the Control MSRs to the default value */
> 	for (i = 0; i < r->num_closid; i++, p++) {
> 		int idx = cbm_idx(r, i);
>
> 		*p = r->default_ctrl;
> 		wrmsrl(r->msr_base + idx, *p);
> 	}
>> +
>> +	return 0;
>> +}
>
>>  static void domain_add_cpu(int cpu, struct rdt_resource *r)
>>  {
>> -	int i, id = get_cache_id(cpu, r->cache_level);
>> +	int id = get_cache_id(cpu, r->cache_level), ret;
>
> Bah. If you have the same type in one line, then please move the
> uninitialized variables to the front.
>
> 	int ret, id = get_cache_id(cpu, r->cache_level);
>
> But a s/i/ret/ would have been to simple and kept the code readable.
>
>> @@ -298,19 +374,12 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
>>
>>  	d->id = id;
>>
>> -	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
>> -	if (!d->ctrl_val) {
>> +	ret = domain_setup_ctrlval(r, d);
>> +	if (ret) {
>>  		kfree(d);
>>  		return;
>>  	}
>
> What's the point of this 'ret' variable if the function is void?
>
> 	if (domain_setup_ctrlval(r, d)) {
> 		kfree(d);
> 		return;
> 	}
>
> would have been to easy to read, right?

Will fix all the issues pointed. Thanks for pointing out.

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare
  2017-03-01 15:34   ` Thomas Gleixner
@ 2017-03-10 22:04     ` Shivappa Vikas
  0 siblings, 0 replies; 18+ messages in thread
From: Shivappa Vikas @ 2017-03-10 22:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen



On Wed, 1 Mar 2017, Thomas Gleixner wrote:

> On Fri, 17 Feb 2017, Vikas Shivappa wrote:
>
>> As a preparatory patch to MBA info file setup, generalize the info file
>> setup to have the option to choose between different set of files.
>> Although multiple cache resources have same info files, Memory resources
>> have different set of info files. That way we have the option to choose
>> between memory resource and cache resource info files.
>
> Sigh.
>
>> @@ -77,6 +77,8 @@ struct rftype {
>>   * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
>>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>>   *				in a cache bit mask
>> + * @info_files:		resctrl info files for the resource
>> + * @infofiles_len:		Number of info files
>
> len == length, nr == number. No? Too intuitive, right?

Took it as len as the rdtgroup_add_files calls its parameter as len.. Will fix 
to nr_infofiles as that seems better.

>
> And no, not infofiles_nr. It wants to be nr_infofiles. And while at it
> please either use infofiles or info_files, but not a mixture of
> both. Random underscores are not enhancing readability at all.
>
>> +void rdt_get_cache_infofile(struct rdt_resource *r)
>> +{
>> +	r->info_files = &res_cache_info_files[0];
>
> What's wrong with
>
>        r->info_files = res_cache_info_files;
>
> Nothing, but it would be too easy to read. This is not the obfuscated
> c-code contest.

Will fix..

Thanks,
Vikas

>
>> +	r->infofiles_len = ARRAY_SIZE(res_cache_info_files);
>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-03-01 16:05   ` Thomas Gleixner
@ 2017-03-10 23:26     ` Shivappa Vikas
  0 siblings, 0 replies; 18+ messages in thread
From: Shivappa Vikas @ 2017-03-10 23:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen



On Wed, 1 Mar 2017, Thomas Gleixner wrote:

> On Fri, 17 Feb 2017, Vikas Shivappa wrote:
>
>> Add files in info directory for MBA.
>> The files in the info directory are as follows :
>> - num_closids: max number of closids for MBA which represents the max
>> class of service user can configure.
>> - min_bw: the minimum memory bandwidth(b/w) values in percentage.
>>
>> OS maps the b/w percentage values to memory b/w throttle delay values
>> and configures them via MSR interface by writing to the QOS_MSRs. These
>> delay values can have a linear or nonlinear scale.
>>
>> - bw_gran: The memory b/w granularity that can be configured.
>> For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
>> values are 10,20,30...
>
> This is unreadable. It's possible to structure ASCII text cleanly.
>
> x86/rdt: Add info directory files for MBA
>
> The info directory for MBA contains the following files:
>
> num_closids
>
> 	The maximum number of class of service slots available for MBA
>
> min_bandwidth
>
> 	The minimum memory bandwidth percentage value
>
> bandwidth_gran
>
> 	The granularity of the bandwidth control for the particular
> 	hardware in percent. The available bandwidth control steps are:
> 	min_bw + N * bw_gran Intermediate values are rounded to the next

Is this next or pervious? Meaning when 12 is requested on a 10 granularity , we 
give 10 ?

> 	control step available on the hardware.
>
> delay_linear
>
> 	If set, the control registers take a linear percentage based value
> 	between min_bandwidth and 100 percent.
>
> 	If not set, the control registers take a power of 2 based value
> 	which is mapped by the kernel to percentage based values.
>
> 	This file is of pure informational nature and has no influence on
> 	the values which are written to the schemata files. These are
> 	always percentage based.

Will update the changelogs

>
> Note, that this uses the actual file names and not some random
> abbreviations thereof. It also documents delay_linear and gets rid of the
> implementation details of QOS_MSRs. They are irrelevant here.
>
> And exactly this information wants to go into Documentation/... preferably
> in exactly this patch and not in a disconnected one which describes stuff
> differently for whatever reasons.
>
>> +static int rdt_min_bw_show(struct kernfs_open_file *of,
>> +			     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	seq_printf(seq, "%d\n", r->min_bw);
>> +
>
> Can you please get rid of these pointless extra new lines before the
> 'return 0;' ? They are just eating screen estate and do not make the code
> more readable.

All the rest of the show functions have that line before return like the 
rdt_min_cbm_bits_show etc.  I have tried to 
always keep a line before return like the old cqm/rapl - if thats ok

>
>> +/* rdtgroup information files for MBE. */
>
> What is MBE?
>
>> +static struct rftype res_mbe_info_files[] = {
>
> Randomizing names make the code more secure or what are you trying to
> achieve?
>
>> +void rdt_get_mba_infofile(struct rdt_resource *r)
>> +{
>> +	r->info_files = &res_mbe_info_files[0];
>
> See other mail.
>

Will fix the MBE to MBA and the pointer init.

Thanks,
Vikas

> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare\
  2017-03-01 16:24   ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare\ Thomas Gleixner
@ 2017-03-30 19:05     ` Shivappa Vikas
  0 siblings, 0 replies; 18+ messages in thread
From: Shivappa Vikas @ 2017-03-30 19:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen



On Wed, 1 Mar 2017, Thomas Gleixner wrote:

> On Fri, 17 Feb 2017, Vikas Shivappa wrote:
>
> Subject: x86/intel_rdt: schemata file support for MBA prepare
>
> I have no idea what MBA prepare is. Is that yet another variant aside of
> MBE?
>
>> Add support to introduce generic APIs for control validation and writing
>> QOS_MSRs for RDT resources. The control validation api is meant to
>> validate the control values like cache bit mask for CAT and memory b/w
>> percentage for MBA. A resource generic display format is also added and
>> used for the resources depending on whether its displayed in
>> hex/decimal.
>
> The usual unpenetratable mess of random sentences lumped together.
>
>> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
>> index 24de64c..8748b0d 100644
>> --- a/arch/x86/include/asm/intel_rdt.h
>> +++ b/arch/x86/include/asm/intel_rdt.h
>> @@ -77,6 +77,9 @@ struct rftype {
>>   * @default_ctrl:		Specifies default cache cbm or mem b/w percent.
>>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>>   *				in a cache bit mask
>> + * @format_str:		Per resource format string to show domain val
>
> Can you please spell out words in comments instead of using random
> abbreviations just as you see fit?
>
>> + * @parse_ctrlval:		Per resource API to parse the ctrl values
>
> That's not an API. That's a function pointer.
>
>> + * @msr_update:		API to update QOS MSRs
>
> Ditto.
>
>>   * @info_files:		resctrl info files for the resource
>>   * @infofiles_len:		Number of info files
>>   * @max_delay:			Max throttle delay. Delay is the hardware
>> @@ -105,6 +108,9 @@ struct rdt_resource {
>>  	int			cbm_len;
>>  	int			min_cbm_bits;
>>  	u32			default_ctrl;
>> +	const char		*format_str;
>> +	int (*parse_ctrlval)	(char *buf, struct rdt_resource *r);
>> +	void (*msr_update)	(void *a1, void *a2, struct rdt_resource *r);
>
> void *a1, *a2? Dammit, both implementations (CAT and MBA) use the same
> types. This just avoids type safety which does not magically come back by
> your completely nonsensical typecasts inside the callback implementations.
>
>> +void cqm_wrmsr(void *a1, void *a2, struct rdt_resource *r)
>
> And this is global because it's only used in this file, right?
>
>> +{
>> +	struct rdt_domain *d = (struct rdt_domain *)a2;
>> +	struct msr_param *m = (struct msr_param *)a1;
>
> Oh well......
>
>> +	int i;
>> +
>> +	for (i = m->low; i < m->high; i++) {
>> +		int idx = cbm_idx(r, i);
>> +
>> +		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
>> +	}
>> +}
>
>> -static bool cbm_validate(unsigned long var, struct rdt_resource *r)
>> +static int cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
>>  {
>> -	unsigned long first_bit, zero_bit;
>> +	unsigned long first_bit, zero_bit, var;
>> +	int ret;
>> +
>> +	ret = kstrtoul(buf, 16, &var);
>> +	if (ret)
>> +		return ret;
>>
>>  	if (var == 0 || var > r->default_ctrl)
>> -		return false;
>> +		return -EINVAL;
>
> So you change this function and the whole call chain to return either
> -EINVAL or 0 instead of false/true.
>
> And then you treat the integer return value as boolean on the call site
> again:

Will fix wrt all the comments.

Thanks,
Vikas

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-30 19:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
2017-02-17 19:58 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation Vikas Shivappa
2017-02-17 19:58 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
2017-02-17 19:58 ` [PATCH 3/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
2017-02-17 19:58 ` [PATCH 4/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
2017-03-01 15:24   ` Thomas Gleixner
2017-03-10 21:51     ` Shivappa Vikas
2017-02-17 19:58 ` [PATCH 5/8] x86/intel_rdt: info file support for MBA prepare Vikas Shivappa
2017-03-01 15:34   ` Thomas Gleixner
2017-03-10 22:04     ` Shivappa Vikas
2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
2017-03-01 16:05   ` Thomas Gleixner
2017-03-10 23:26     ` Shivappa Vikas
2017-02-17 19:58 ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare Vikas Shivappa
2017-03-01 16:24   ` [PATCH 7/8] x86/intel_rdt: schemata file support for MBA prepare\ Thomas Gleixner
2017-03-30 19:05     ` Shivappa Vikas
2017-02-17 19:58 ` [PATCH 8/8] x86/intel_rdt/mba: Add schemata file support for MBA Vikas Shivappa
2017-03-01 16:59   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.