linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support
@ 2017-01-10 19:33 Vikas Shivappa
  2017-01-10 19:33 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface Vikas Shivappa
                   ` (7 more replies)
  0 siblings, 8 replies; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

Memory b/w allocation(MBA) is part of the Intel Resource Director
Technology (RDT). RDT helps monitor and share processor shared
resources.  MBA helps enforce a limit on the memory b/w, threads can use
when they are scheduled. OS does the enforcement using MSR(model
specific register) interface and mapping the threads to architecture
specific CLOSids(class of service IDs) just like the cache allocation
technology(CAT). The interface is the resctrl just like CAT.

This can be used along with MBM (memory b/w monitoring) and cache
allocation to control/restrict the applications cache and memory
resources as per the QoS or other performance requirements.  Use cases
could be large serverclusters, VM, clould and container based services
where the admin or orchestration tools can use this framework to
manage/allocate these processor shared resources like other memory/cpu
resources to provide QoS guarentees.

Sending the first patch series to support the feature. Patches are based
on on 4.10-rc2.

[PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w
[PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for
[PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT
[PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
[PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
[PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
[PATCH 7/8] x86/intel_rdt/mba: Add schemata file support for MBA
[PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates for MBA

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 13:43   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

Memory b/w allocation is part of Intel RDT(resource director technology)
which lets user control the amount of memory b/w (L2 external b/w) per
thread. This is done programming MSR interfaces like cache allocation
technology and other RDT features.
This patch adds documentation for Memory b/w allocation interface usage.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index d918d26..23959ba 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -4,6 +4,7 @@ Copyright (C) 2016 Intel Corporation
 
 Fenghua Yu <fenghua.yu@intel.com>
 Tony Luck <tony.luck@intel.com>
+Vikas Shivappa <vikas.shivappa@intel.com>
 
 This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
 X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
@@ -107,6 +108,19 @@ and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
 of the capacity of the cache. You could partition the cache into four
 equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
 
+Memory b/w throttle
+-------------------
+For Memory b/w resource, the portion of total memory b/w the user can
+restrict or 'throttle by' is indicated by the thrtl_by values.
+
+Throttle by values could be linear scale or non-linear scale.  In linear
+scale a thrtl_by value of say 20 would throttle the memory b/w by 20%
+allowing only 80% max b/w. In nonlinear scale currently SDM specifies
+throttle values in 2^n values. However the h/w does not guarantee a
+specific curve for the amount of memory b/w that is actually throttled.
+But for any thrtl_by value x > y, its guaranteed that x would throttle
+more b/w than y.  The info directory specifies the max thrtl_by value
+and thrtl_by granularity.
 
 L3 details (code and data prioritization disabled)
 --------------------------------------------------
@@ -129,6 +143,13 @@ schemata format is always:
 
 	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
 
+Memory b/w Allocation details
+-----------------------------
+
+Memory b/w domain is L3 cache.
+
+	MB:<cache_id0>=thrtl_by;<cache_id1>=thrtl_by;...
+
 Example 1
 ---------
 On a two socket machine (one L3 cache per socket) with just four bits
@@ -185,6 +206,16 @@ Ditto for the second real time task (with the remaining 25% of cache):
 # echo 5678 > p1/tasks
 # taskset -cp 2 5678
 
+For the same 2 socket system with memory b/w resource and CAT L3 the
+schemata would look like:
+
+Assume max_thrtl_by is 90 and thrtl_gran is 10.
+
+# echo -e "L3:0=f8000;1=fffff\nMB:0=10;1=30" > p0/schemata
+
+This would throttle the socket 1 memory b/w by 10% and socket2 memory
+b/w by 30%
+
 Example 3
 ---------
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
  2017-01-10 19:33 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 13:45   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA Vikas Shivappa
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

This patch just generalizes the naming for RDT so that we can get ready
to apply other resource controls like MBA.

RDT resource cbm values are named to ctrl_val representing generic
control values which will hold both cbm(cache bit mask) and memory b/w
throttle values. max_cbm is updated to no_ctrl which represents default
values which provide no control which is all bits set in case of CAT and
zero throttle_by in case of MBA. The tmp_cbm is updated to tmp_ctrls.
Similarly domain structures are updated to ctrl_val instead of cbm.

APIs are also generalized:
- get_cache_config is added to separate from memory specific apis.
- MSR update api names are changed from having cbm to ctrl.
- info file API names are set to reflect generic no_ctrl or control values
rather than cbm.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         | 20 ++++++++++----------
 arch/x86/kernel/cpu/intel_rdt.c          | 28 ++++++++++++++--------------
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 12 ++++++------
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 24 ++++++++++++------------
 4 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 95ce5c8..9f66bf5 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -71,14 +71,14 @@ struct rftype {
  * @capable:			Is this feature available on this machine
  * @name:			Name to use in "schemata" file
  * @num_closid:			Number of CLOSIDs available
- * @max_cbm:			Largest Cache Bit Mask allowed
+ * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
  * @domains:			All domains for this resource
  * @num_domains:		Number of domains active
  * @msr_base:			Base MSR address for CBMs
- * @tmp_cbms:			Scratch space when updating schemata
- * @num_tmp_cbms:		Number of CBMs in tmp_cbms
+ * @tmp_ctrl:			Scratch space when updating schemata
+ * @num_tmp_ctrl:		Number of control values in tmp_ctrl
  * @cache_level:		Which cache level defines scope of this domain
  * @cbm_idx_multi:		Multiplier of CBM index
  * @cbm_idx_offset:		Offset of CBM index. CBM index is computed by:
@@ -91,12 +91,12 @@ struct rdt_resource {
 	int			num_closid;
 	int			cbm_len;
 	int			min_cbm_bits;
-	u32			max_cbm;
+	u32			no_ctrl;
 	struct list_head	domains;
 	int			num_domains;
 	int			msr_base;
-	u32			*tmp_cbms;
-	int			num_tmp_cbms;
+	u32			*tmp_ctrl;
+	int			num_tmp_ctrl;
 	int			cache_level;
 	int			cbm_idx_multi;
 	int			cbm_idx_offset;
@@ -107,13 +107,13 @@ struct rdt_resource {
  * @list:	all instances of this resource
  * @id:		unique id for this instance
  * @cpu_mask:	which cpus share this resource
- * @cbm:	array of cache bit masks (indexed by CLOSID)
+ * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
  */
 struct rdt_domain {
 	struct list_head	list;
 	int			id;
 	struct cpumask		cpu_mask;
-	u32			*cbm;
+	u32			*ctrl_val;
 };
 
 /**
@@ -165,7 +165,7 @@ enum {
 };
 
 /* CPUID.(EAX=10H, ECX=ResID=1).EDX */
-union cpuid_0x10_1_edx {
+union cpuid_0x10_x_edx {
 	struct {
 		unsigned int cos_max:16;
 	} split;
@@ -174,7 +174,7 @@ enum {
 
 DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
 
-void rdt_cbm_update(void *arg);
+void rdt_ctrl_update(void *arg);
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 5a533fe..b2c037a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -119,7 +119,7 @@ static inline bool cache_alloc_hsw_probe(void)
 
 		r->num_closid = 4;
 		r->cbm_len = 20;
-		r->max_cbm = max_cbm;
+		r->no_ctrl = max_cbm;
 		r->min_cbm_bits = 2;
 		r->capable = true;
 		r->enabled = true;
@@ -130,16 +130,16 @@ static inline bool cache_alloc_hsw_probe(void)
 	return false;
 }
 
-static void rdt_get_config(int idx, struct rdt_resource *r)
+static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 {
 	union cpuid_0x10_1_eax eax;
-	union cpuid_0x10_1_edx edx;
+	union cpuid_0x10_x_edx edx;
 	u32 ebx, ecx;
 
 	cpuid_count(0x00000010, idx, &eax.full, &ebx, &ecx, &edx.full);
 	r->num_closid = edx.split.cos_max + 1;
 	r->cbm_len = eax.split.cbm_len + 1;
-	r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	r->no_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
 	r->capable = true;
 	r->enabled = true;
 }
@@ -151,7 +151,7 @@ static void rdt_get_cdp_l3_config(int type)
 
 	r->num_closid = r_l3->num_closid / 2;
 	r->cbm_len = r_l3->cbm_len;
-	r->max_cbm = r_l3->max_cbm;
+	r->no_ctrl = r_l3->no_ctrl;
 	r->capable = true;
 	/*
 	 * By default, CDP is disabled. CDP can be enabled by mount parameter
@@ -171,7 +171,7 @@ static inline bool get_rdt_resources(void)
 		return false;
 
 	if (boot_cpu_has(X86_FEATURE_CAT_L3)) {
-		rdt_get_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
+		rdt_get_cache_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
 		if (boot_cpu_has(X86_FEATURE_CDP_L3)) {
 			rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA);
 			rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE);
@@ -180,7 +180,7 @@ static inline bool get_rdt_resources(void)
 	}
 	if (boot_cpu_has(X86_FEATURE_CAT_L2)) {
 		/* CPUID 0x10.2 fields are same format at 0x10.1 */
-		rdt_get_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
+		rdt_get_cache_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
 		ret = true;
 	}
 
@@ -200,7 +200,7 @@ static int get_cache_id(int cpu, int level)
 	return -1;
 }
 
-void rdt_cbm_update(void *arg)
+void rdt_ctrl_update(void *arg)
 {
 	struct msr_param *m = (struct msr_param *)arg;
 	struct rdt_resource *r = m->res;
@@ -221,7 +221,7 @@ void rdt_cbm_update(void *arg)
 	for (i = m->low; i < m->high; i++) {
 		int idx = cbm_idx(r, i);
 
-		wrmsrl(r->msr_base + idx, d->cbm[i]);
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
 	}
 }
 
@@ -294,8 +294,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
 	d->id = id;
 
-	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
-	if (!d->cbm) {
+	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
+	if (!d->ctrl_val) {
 		kfree(d);
 		return;
 	}
@@ -303,8 +303,8 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 	for (i = 0; i < r->num_closid; i++) {
 		int idx = cbm_idx(r, i);
 
-		d->cbm[i] = r->max_cbm;
-		wrmsrl(r->msr_base + idx, d->cbm[i]);
+		d->ctrl_val[i] = r->no_ctrl;
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
 	}
 
 	cpumask_set_cpu(cpu, &d->cpu_mask);
@@ -326,7 +326,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	cpumask_clear_cpu(cpu, &d->cpu_mask);
 	if (cpumask_empty(&d->cpu_mask)) {
 		r->num_domains--;
-		kfree(d->cbm);
+		kfree(d->ctrl_val);
 		list_del(&d->list);
 		kfree(d);
 	}
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 8af04af..edc6195 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -498,12 +498,12 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
 	return 0;
 }
 
-static int rdt_cbm_mask_show(struct kernfs_open_file *of,
+static int rdt_no_ctrl_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%x\n", r->max_cbm);
+	seq_printf(seq, "%x\n", r->no_ctrl);
 
 	return 0;
 }
@@ -530,7 +530,7 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 		.name		= "cbm_mask",
 		.mode		= 0444,
 		.kf_ops		= &rdtgroup_kf_single_ops,
-		.seq_show	= rdt_cbm_mask_show,
+		.seq_show	= rdt_no_ctrl_show,
 	},
 	{
 		.name		= "min_cbm_bits",
@@ -803,14 +803,14 @@ static int reset_all_cbms(struct rdt_resource *r)
 		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 
 		for (i = 0; i < r->num_closid; i++)
-			d->cbm[i] = r->max_cbm;
+			d->ctrl_val[i] = r->no_ctrl;
 	}
 	cpu = get_cpu();
 	/* Update CBM on this cpu if it's in cpu_mask. */
 	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_cbm_update(&msr_param);
+		rdt_ctrl_update(&msr_param);
 	/* Update CBM on all other cpus in cpu_mask. */
-	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index f369cb8..c50f742 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -38,7 +38,7 @@ static bool cbm_validate(unsigned long var, struct rdt_resource *r)
 {
 	unsigned long first_bit, zero_bit;
 
-	if (var == 0 || var > r->max_cbm)
+	if (var == 0 || var > r->no_ctrl)
 		return false;
 
 	first_bit = find_first_bit(&var, r->cbm_len);
@@ -66,7 +66,7 @@ static int parse_cbm(char *buf, struct rdt_resource *r)
 		return ret;
 	if (!cbm_validate(data, r))
 		return -EINVAL;
-	r->tmp_cbms[r->num_tmp_cbms++] = data;
+	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
 
 	return 0;
 }
@@ -116,14 +116,14 @@ static int update_domains(struct rdt_resource *r, int closid)
 
 	list_for_each_entry(d, &r->domains, list) {
 		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
-		d->cbm[msr_param.low] = r->tmp_cbms[idx++];
+		d->ctrl_val[msr_param.low] = r->tmp_ctrl[idx++];
 	}
 	cpu = get_cpu();
 	/* Update CBM on this cpu if it's in cpu_mask. */
 	if (cpumask_test_cpu(cpu, cpu_mask))
-		rdt_cbm_update(&msr_param);
+		rdt_ctrl_update(&msr_param);
 	/* Update CBM on other cpus. */
-	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
 	put_cpu();
 
 	free_cpumask_var(cpu_mask);
@@ -155,13 +155,13 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
 	/* get scratch space to save all the masks while we validate input */
 	for_each_enabled_rdt_resource(r) {
-		r->tmp_cbms = kcalloc(r->num_domains, sizeof(*l3_cbms),
+		r->tmp_ctrl = kcalloc(r->num_domains, sizeof(*l3_cbms),
 				      GFP_KERNEL);
-		if (!r->tmp_cbms) {
+		if (!r->tmp_ctrl) {
 			ret = -ENOMEM;
 			goto out;
 		}
-		r->num_tmp_cbms = 0;
+		r->num_tmp_ctrl = 0;
 	}
 
 	while ((tok = strsep(&buf, "\n")) != NULL) {
@@ -187,7 +187,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
 	/* Did the parser find all the masks we need? */
 	for_each_enabled_rdt_resource(r) {
-		if (r->num_tmp_cbms != r->num_domains) {
+		if (r->num_tmp_ctrl != r->num_domains) {
 			ret = -EINVAL;
 			goto out;
 		}
@@ -202,8 +202,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 out:
 	rdtgroup_kn_unlock(of->kn);
 	for_each_enabled_rdt_resource(r) {
-		kfree(r->tmp_cbms);
-		r->tmp_cbms = NULL;
+		kfree(r->tmp_ctrl);
+		r->tmp_ctrl = NULL;
 	}
 	return ret ?: nbytes;
 }
@@ -217,7 +217,7 @@ static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
 	list_for_each_entry(dom, &r->domains, list) {
 		if (sep)
 			seq_puts(s, ";");
-		seq_printf(s, "%d=%x", dom->id, dom->cbm[closid]);
+		seq_printf(s, "%d=%x", dom->id, dom->ctrl_val[closid]);
 		sep = true;
 	}
 	seq_puts(s, "\n");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
  2017-01-10 19:33 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface Vikas Shivappa
  2017-01-10 19:33 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 13:54   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

This patch does some changes to get ready to handle more resources like
Memory b/w allocation(MBA).

-Update the control registers only when user changes the controls(cbm for
Cache resources and Mem b/w for memory). Hence not sending IPIs on all
domains when user updates the control vals.
-Introduce next_enabled_resource rather than looping through all
resources while parsing each schemata line. The order of resources
should be anyways the same as the root schemata.
-Return error as soon as we detect a resource not entering all domain
values in schemata rather than waiting till we parse all resources.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         | 16 +++++++++++++
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 40 +++++++++++++++++---------------
 2 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 9f66bf5..e1f8acb 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -156,6 +156,22 @@ enum {
 	     r++)							      \
 		if (r->enabled)
 
+/*
+ * Parameter r must be NULL or pointing to
+ * a valid rdt_resource_all entry.
+ * Points r to the next enabled RDT resource at the end.
+ */
+#define next_enabled_rdt_resource(r)					\
+do {									\
+	if (!r)								\
+		r = rdt_resources_all;					\
+	else								\
+		r++;							\
+	for (; r < rdt_resources_all + RDT_NUM_RESOURCES; r++)		\
+		if (r->enabled)						\
+			break;						\
+} while (0)
+
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
 	struct {
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index c50f742..054d771 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -94,6 +94,10 @@ static int parse_line(char *line, struct rdt_resource *r)
 			return -EINVAL;
 	}
 
+	/* Incorrect number of domains in the line */
+	if (r->num_tmp_ctrl != r->num_domains)
+		return -EINVAL;
+
 	/* Any garbage at the end of the line? */
 	if (line && line[0])
 		return -EINVAL;
@@ -114,9 +118,16 @@ static int update_domains(struct rdt_resource *r, int closid)
 	msr_param.high = msr_param.low + 1;
 	msr_param.res = r;
 
+	/*
+	 * Only update the domains that user has changed.
+	 * There by avoiding unnecessary IPIs.
+	 */
 	list_for_each_entry(d, &r->domains, list) {
-		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
-		d->ctrl_val[msr_param.low] = r->tmp_ctrl[idx++];
+		if (d->ctrl_val[msr_param.low] != r->tmp_ctrl[idx]) {
+			cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+			d->ctrl_val[msr_param.low] = r->tmp_ctrl[idx];
+		}
+		idx++;
 	}
 	cpu = get_cpu();
 	/* Update CBM on this cpu if it's in cpu_mask. */
@@ -164,30 +175,21 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 		r->num_tmp_ctrl = 0;
 	}
 
+	r = NULL;
 	while ((tok = strsep(&buf, "\n")) != NULL) {
 		resname = strsep(&tok, ":");
 		if (!tok) {
 			ret = -EINVAL;
 			goto out;
 		}
-		for_each_enabled_rdt_resource(r) {
-			if (!strcmp(resname, r->name) &&
-			    closid < r->num_closid) {
-				ret = parse_line(tok, r);
-				if (ret)
-					goto out;
-				break;
-			}
-		}
-		if (!r->name) {
-			ret = -EINVAL;
-			goto out;
-		}
-	}
 
-	/* Did the parser find all the masks we need? */
-	for_each_enabled_rdt_resource(r) {
-		if (r->num_tmp_ctrl != r->num_domains) {
+		next_enabled_rdt_resource(r);
+
+		if (!strcmp(resname, r->name)) {
+			ret = parse_line(tok, r);
+			if (ret)
+				goto out;
+		} else {
 			ret = -EINVAL;
 			goto out;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
                   ` (2 preceding siblings ...)
  2017-01-10 19:33 ` [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 13:59   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
Add supporting data structures to detect feature details which is done
in later patch using CPUID with EAX=10H, ECX= 3.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h |  2 ++
 arch/x86/include/asm/intel_rdt.h   | 10 +++++++++-
 arch/x86/kernel/cpu/intel_rdt.c    |  4 ++++
 arch/x86/kernel/cpu/scattered.c    |  1 +
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index eafee31..50cbdd0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -201,6 +201,8 @@
 #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
 
+#define X86_FEATURE_MBA         ( 7*32+18) /* Memory Bandwidth Allocation */
+
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
 #define X86_FEATURE_VNMI        ( 8*32+ 1) /* Intel Virtual NMI */
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index e1f8acb..35e76b4 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -180,7 +180,15 @@ enum {
 	unsigned int full;
 };
 
-/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
+/* CPUID.(EAX=10H, ECX=ResID=3).EAX */
+union cpuid_0x10_3_eax {
+	struct {
+		unsigned int max_delay:12;
+	} split;
+	unsigned int full;
+};
+
+/* CPUID.(EAX=10H, ECX=ResID).EDX */
 union cpuid_0x10_x_edx {
 	struct {
 		unsigned int cos_max:16;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index b2c037a..fced83c 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -184,6 +184,10 @@ static inline bool get_rdt_resources(void)
 		ret = true;
 	}
 
+	if (boot_cpu_has(X86_FEATURE_MBA)) {
+		ret = true;
+	}
+
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index d979406..23c2350 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -27,6 +27,7 @@ struct cpuid_bit {
 	{ X86_FEATURE_CAT_L3,		CPUID_EBX,  1, 0x00000010, 0 },
 	{ X86_FEATURE_CAT_L2,		CPUID_EBX,  2, 0x00000010, 0 },
 	{ X86_FEATURE_CDP_L3,		CPUID_ECX,  2, 0x00000010, 1 },
+	{ X86_FEATURE_MBA,		CPUID_EBX,  3, 0x00000010, 0 },
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
                   ` (3 preceding siblings ...)
  2017-01-10 19:33 ` [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 14:06   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

The MBA feature details are obtained via executing CPUID with EAX=10H
ECX= 3 and initialize the MBA structures from this info.

Add a new rdt resource 'MBA' to the global list of RDT resources.  Add
extensions to the generic RDT resource structure to store the MBA
specific feature details.  Parameters specific to delay values and delay
granularity are added to the RDT resource and domain structure.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         | 10 +++++
 arch/x86/kernel/cpu/intel_rdt.c          | 70 ++++++++++++++++++++++++++------
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  2 +-
 3 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 35e76b4..49ae832 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -11,6 +11,9 @@
 #define IA32_L3_QOS_CFG		0xc81
 #define IA32_L3_CBM_BASE	0xc90
 #define IA32_L2_CBM_BASE	0xd10
+#define IA32_MBE_THRTL_BASE	0xd50
+#define MAX_MBA_THRTL		100u
+#define MBE_IS_LINEAR		0x4
 
 #define L3_QOS_CDP_ENABLE	0x01ULL
 
@@ -74,6 +77,9 @@ struct rftype {
  * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @max_delay:		Max throttle delay
+ * @delay_gran:		Throttle delay granularity
+ * @delay_linear:		true if delay is in linear scale
  * @domains:			All domains for this resource
  * @num_domains:		Number of domains active
  * @msr_base:			Base MSR address for CBMs
@@ -92,6 +98,9 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			no_ctrl;
+	u32			max_delay;
+	u32			delay_gran;
+	u32			delay_linear;
 	struct list_head	domains;
 	int			num_domains;
 	int			msr_base;
@@ -141,6 +150,7 @@ enum {
 	RDT_RESOURCE_L3DATA,
 	RDT_RESOURCE_L3CODE,
 	RDT_RESOURCE_L2,
+	RDT_RESOURCE_MBA,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index fced83c..6736e1d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -76,6 +76,14 @@ struct rdt_resource rdt_resources_all[] = {
 		.cbm_idx_multi	= 1,
 		.cbm_idx_offset	= 0
 	},
+	{
+		.name		= "MB",
+		.domains	= domain_init(RDT_RESOURCE_MBA),
+		.msr_base	= IA32_MBE_THRTL_BASE,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset = 0
+	},
 };
 
 static int cbm_idx(struct rdt_resource *r, int closid)
@@ -130,6 +138,26 @@ static inline bool cache_alloc_hsw_probe(void)
 	return false;
 }
 
+static void rdt_get_mem_config(struct rdt_resource *r)
+{
+	union cpuid_0x10_3_eax eax;
+	union cpuid_0x10_x_edx edx;
+	u32 ebx, ecx;
+
+	cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full);
+	r->num_closid = edx.split.cos_max + 1;
+	r->max_delay = eax.split.max_delay + 1;
+	r->no_ctrl = 0;
+	if (ecx & MBE_IS_LINEAR)
+		r->delay_linear = true;
+
+	if (r->delay_linear)
+		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
+
+	r->capable = true;
+	r->enabled = true;
+}
+
 static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 {
 	union cpuid_0x10_1_eax eax;
@@ -185,6 +213,7 @@ static inline bool get_rdt_resources(void)
 	}
 
 	if (boot_cpu_has(X86_FEATURE_MBA)) {
+		rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
 		ret = true;
 	}
 
@@ -262,6 +291,32 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 	return NULL;
 }
 
+static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
+{
+	int i;
+
+	d->ctrl_val = kmalloc_array(r->num_closid,
+				     sizeof(*d->ctrl_val), GFP_KERNEL);
+	if (!d->ctrl_val) {
+		kfree(d);
+		return -ENOMEM;
+	}
+
+	/*
+	 * Initialize the Control MSRs to having no control.
+	 * For Cache Allocation: Set all bits in cbm
+	 * For Memory Allocation: Set throttle_by to zero.
+	 */
+	for (i = 0; i < r->num_closid; i++) {
+		int idx = cbm_idx(r, i);
+
+		d->ctrl_val[i] = r->no_ctrl;
+		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
+	}
+
+	return 0;
+}
+
 /*
  * domain_add_cpu - Add a cpu to a resource's domain list.
  *
@@ -277,7 +332,7 @@ static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
  */
 static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
-	int i, id = get_cache_id(cpu, r->cache_level);
+	int id = get_cache_id(cpu, r->cache_level), ret;
 	struct list_head *add_pos = NULL;
 	struct rdt_domain *d;
 
@@ -298,18 +353,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
 	d->id = id;
 
-	d->ctrl_val = kmalloc_array(r->num_closid, sizeof(*d->ctrl_val), GFP_KERNEL);
-	if (!d->ctrl_val) {
-		kfree(d);
+	ret = domain_setup_ctrlval(r, d);
+	if (ret)
 		return;
-	}
-
-	for (i = 0; i < r->num_closid; i++) {
-		int idx = cbm_idx(r, i);
-
-		d->ctrl_val[i] = r->no_ctrl;
-		wrmsrl(r->msr_base + idx, d->ctrl_val[i]);
-	}
 
 	cpumask_set_cpu(cpu, &d->cpu_mask);
 	list_add_tail(&d->list, add_pos);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index edc6195..53f1917 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -503,7 +503,7 @@ static int rdt_no_ctrl_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = of->kn->parent->priv;
 
-	seq_printf(seq, "%x\n", r->no_ctrl);
+	seq_printf(seq, "0x%x\n", r->no_ctrl);
 
 	return 0;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
                   ` (4 preceding siblings ...)
  2017-01-10 19:33 ` [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 14:14   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support " Vikas Shivappa
  2017-01-10 19:33 ` [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates " Vikas Shivappa
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

Add the files in info directory for MBA.
The files in the info directory are as follows :
 - num_closids: max number of closids for MBA which represents the max
 class of service user can configure.
 - max_thrtl_by: the max throttle by values.

Throttle by can have a linear scale and non linear scale.  In linear
scale, if a throttle_by value is 40%, it means that the memory b/w is
throttled 'by' 40% or in other words a max of 60% b/w is allowed to
pass. In non-linear scale, the throttle_by values are in 2^n
granularity. The h/w does not guarantee a curve of actual throttle w.r.t
the configured values. But if a throttle_by value of x > y, then x is
guaranteed to throttle more b/w than y.
 - throttle granularity: Shows the throttle granularity of the throttle
 values that can be configured.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  8 ++-
 arch/x86/kernel/cpu/intel_rdt.c          |  2 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 84 ++++++++++++++++++++++++++++++--
 3 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 49ae832..88725b6 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -5,7 +5,6 @@
 
 #include <linux/kernfs.h>
 #include <linux/jump_label.h>
-
 #include <asm/intel_rdt_common.h>
 
 #define IA32_L3_QOS_CFG		0xc81
@@ -77,6 +76,8 @@ struct rftype {
  * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @info_files:		resctrl info files for the resource
+ * @infofiles_len:		Number of info files
  * @max_delay:		Max throttle delay
  * @delay_gran:		Throttle delay granularity
  * @delay_linear:		true if delay is in linear scale
@@ -98,6 +99,8 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			no_ctrl;
+	struct rftype		*info_files;
+	int			infofiles_len;
 	u32			max_delay;
 	u32			delay_gran;
 	u32			delay_linear;
@@ -137,6 +140,9 @@ struct msr_param {
 	int			high;
 };
 
+void rdt_get_cache_infofile(struct rdt_resource *r);
+void rdt_get_mbe_infofile(struct rdt_resource *r);
+
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6736e1d..bdfbd1d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -154,6 +154,7 @@ static void rdt_get_mem_config(struct rdt_resource *r)
 	if (r->delay_linear)
 		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
 
+	rdt_get_mbe_infofile(r);
 	r->capable = true;
 	r->enabled = true;
 }
@@ -168,6 +169,7 @@ static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 	r->num_closid = edx.split.cos_max + 1;
 	r->cbm_len = eax.split.cbm_len + 1;
 	r->no_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	rdt_get_cache_infofile(r);
 	r->capable = true;
 	r->enabled = true;
 }
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 53f1917..9d9b7f4 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -517,9 +517,41 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 
 	return 0;
 }
+static int rdt_max_delay_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->max_delay);
+
+	return 0;
+}
+
+static int rdt_delay_gran_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	if (r->delay_linear)
+		seq_printf(seq, "%d\n", r->delay_gran);
+	else
+		seq_printf(seq, "power of 2\n");
+
+	return 0;
+}
+
+static int rdt_delay_linear_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->delay_linear);
+
+	return 0;
+}
 
 /* rdtgroup information files for one cache resource. */
-static struct rftype res_info_files[] = {
+static struct rftype res_cache_info_files[] = {
 	{
 		.name		= "num_closids",
 		.mode		= 0444,
@@ -540,11 +572,52 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+/* rdtgroup information files for MBE. */
+static struct rftype res_mbe_info_files[] = {
+	{
+		.name		= "num_closids",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closids_show,
+	},
+	{
+		.name		= "max_thrtl_by",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_max_delay_show,
+	},
+	{
+		.name		= "thrtl_by_gran",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_gran_show,
+	},
+	{
+		.name		= "thrtl_by_linear",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_linear_show,
+	},
+};
+
+void rdt_get_mbe_infofile(struct rdt_resource *r)
+{
+	r->info_files = &res_mbe_info_files[0];
+	r->infofiles_len = ARRAY_SIZE(res_mbe_info_files);
+}
+
+void rdt_get_cache_infofile(struct rdt_resource *r)
+{
+	r->info_files = &res_cache_info_files[0];
+	r->infofiles_len = ARRAY_SIZE(res_cache_info_files);
+}
+
 static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 {
 	struct kernfs_node *kn_subdir;
+	struct rftype *res_info_files;
 	struct rdt_resource *r;
-	int ret;
+	int ret, len;
 
 	/* create the directory */
 	kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL);
@@ -563,8 +636,11 @@ static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
 		ret = rdtgroup_kn_set_ugid(kn_subdir);
 		if (ret)
 			goto out_destroy;
-		ret = rdtgroup_add_files(kn_subdir, res_info_files,
-					 ARRAY_SIZE(res_info_files));
+
+		res_info_files = r->info_files;
+		len = r->infofiles_len;
+
+		ret = rdtgroup_add_files(kn_subdir, res_info_files, len);
 		if (ret)
 			goto out_destroy;
 		kernfs_activate(kn_subdir);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support for MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
                   ` (5 preceding siblings ...)
  2017-01-10 19:33 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 16:07   ` Thomas Gleixner
  2017-01-10 19:33 ` [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates " Vikas Shivappa
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

Add support to update the MBA throttle_by values for the domains.
The MBA throttle_by values are specified for each domain which is L3
cache. The schemata string is parsed and validated for the correct
throttle_by values.

The throttle_by granularity is 100-max_throttle_by if scale is linear
and 2^n if non-linear scale.  OS then updates the corresponding domain
PQOS_MSRs which are indexed from 0xD50 for MBA.  The schemata APIs for
parsing and validating the schemata input are changed to accommodate
handling of both cbm and throttle values.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  7 ++++
 arch/x86/kernel/cpu/intel_rdt.c          | 35 +++++++++++------
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 66 +++++++++++++++++++++++---------
 3 files changed, 79 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 88725b6..dbf4c6e 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -76,6 +76,8 @@ struct rftype {
  * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
  * @min_cbm_bits:		Minimum number of consecutive bits to be set
  *				in a cache bit mask
+ * @display_str:		Format string to show schemata
+ * @validate:			API to validate the ctrl values.
  * @info_files:		resctrl info files for the resource
  * @infofiles_len:		Number of info files
  * @max_delay:		Max throttle delay
@@ -99,6 +101,9 @@ struct rdt_resource {
 	int			cbm_len;
 	int			min_cbm_bits;
 	u32			no_ctrl;
+	char			*display_str;
+	int (*validate)		(char *buf, unsigned long *data,
+				  struct rdt_resource *r);
 	struct rftype		*info_files;
 	int			infofiles_len;
 	u32			max_delay;
@@ -142,6 +147,8 @@ struct msr_param {
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
 void rdt_get_mbe_infofile(struct rdt_resource *r);
+int thrtl_validate(char *buf, unsigned long *data, struct rdt_resource *r);
+int cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index bdfbd1d..9b0a00e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -138,7 +138,7 @@ static inline bool cache_alloc_hsw_probe(void)
 	return false;
 }
 
-static void rdt_get_mem_config(struct rdt_resource *r)
+static int rdt_get_mem_config(struct rdt_resource *r)
 {
 	union cpuid_0x10_3_eax eax;
 	union cpuid_0x10_x_edx edx;
@@ -155,11 +155,18 @@ static void rdt_get_mem_config(struct rdt_resource *r)
 		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
 
 	rdt_get_mbe_infofile(r);
+	r->validate = thrtl_validate;
+	r->display_str = kstrdup("%d=%d", GFP_KERNEL);
+	if (!r->display_str)
+		return -ENOMEM;
+
 	r->capable = true;
 	r->enabled = true;
+
+	return 0;
 }
 
-static void rdt_get_cache_config(int idx, struct rdt_resource *r)
+static int rdt_get_cache_config(int idx, struct rdt_resource *r)
 {
 	union cpuid_0x10_1_eax eax;
 	union cpuid_0x10_x_edx edx;
@@ -170,8 +177,16 @@ static void rdt_get_cache_config(int idx, struct rdt_resource *r)
 	r->cbm_len = eax.split.cbm_len + 1;
 	r->no_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
 	rdt_get_cache_infofile(r);
+	r->validate = cbm_validate;
+
+	r->display_str = kstrdup("%d=0x%x", GFP_KERNEL);
+	if (!r->display_str)
+		return -ENOMEM;
+
 	r->capable = true;
 	r->enabled = true;
+
+	return 0;
 }
 
 static void rdt_get_cdp_l3_config(int type)
@@ -190,15 +205,15 @@ static void rdt_get_cdp_l3_config(int type)
 	r->enabled = false;
 }
 
-static inline bool get_rdt_resources(void)
+static inline int get_rdt_resources(void)
 {
-	bool ret = false;
+	int ret = 0;
 
 	if (cache_alloc_hsw_probe())
-		return true;
+		return ret;
 
 	if (!boot_cpu_has(X86_FEATURE_RDT_A))
-		return false;
+		return -ENODEV;
 
 	if (boot_cpu_has(X86_FEATURE_CAT_L3)) {
 		rdt_get_cache_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
@@ -210,13 +225,11 @@ static inline bool get_rdt_resources(void)
 	}
 	if (boot_cpu_has(X86_FEATURE_CAT_L2)) {
 		/* CPUID 0x10.2 fields are same format at 0x10.1 */
-		rdt_get_cache_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
-		ret = true;
+		ret = rdt_get_cache_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
 	}
 
 	if (boot_cpu_has(X86_FEATURE_MBA)) {
-		rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
-		ret = true;
+		ret = rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
 	}
 
 	return ret;
@@ -431,7 +444,7 @@ static int __init intel_rdt_late_init(void)
 	struct rdt_resource *r;
 	int state, ret;
 
-	if (!get_rdt_resources())
+	if (get_rdt_resources())
 		return -ENODEV;
 
 	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
index 054d771..f2205d1 100644
--- a/arch/x86/kernel/cpu/intel_rdt_schemata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -29,46 +29,75 @@
 #include <asm/intel_rdt.h>
 
 /*
+ * Check whether MBE 'throttle by' value is correct.
+ *	As per the SDM, when the scale is linear the
+ *	throttle_by granularity is '100 - max_thrtl_by'
+ *	and when its non-linear it is 'power of 2'.
+ */
+int thrtl_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+{
+	u32 delay;
+	int ret;
+
+	ret = kstrtoul(buf, 10, data);
+	if (ret)
+		return ret;
+
+	delay = *data;
+	if (delay > r->max_delay ||
+	     (r->delay_linear && (delay % r->delay_gran)) ||
+	     (!r->delay_linear && !is_power_of_2(delay)))
+		return -EINVAL;
+
+	return 0;
+}
+
+/*
  * Check whether a cache bit mask is valid. The SDM says:
  *	Please note that all (and only) contiguous '1' combinations
  *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
  * Additionally Haswell requires at least two bits set.
  */
-static bool cbm_validate(unsigned long var, struct rdt_resource *r)
+int cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
 {
-	unsigned long first_bit, zero_bit;
+	unsigned long first_bit, zero_bit, var;
+	int ret;
+
+	ret = kstrtoul(buf, 16, &var);
+	if (ret)
+		return ret;
 
 	if (var == 0 || var > r->no_ctrl)
-		return false;
+		return -EINVAL;
 
 	first_bit = find_first_bit(&var, r->cbm_len);
 	zero_bit = find_next_zero_bit(&var, r->cbm_len, first_bit);
 
 	if (find_next_bit(&var, r->cbm_len, zero_bit) < r->cbm_len)
-		return false;
+		return -EINVAL;
 
 	if ((zero_bit - first_bit) < r->min_cbm_bits)
-		return false;
-	return true;
+		return -EINVAL;
+
+	*data = var;
+
+	return 0;
 }
 
 /*
- * Read one cache bit mask (hex). Check that it is valid for the current
- * resource type.
+ * Read the user RDT control value into tempory buffer:
+ * Cache bit mask (hex) or Memory b/w throttle (decimal).
+ * Check that it is valid for the current resource type.
  */
-static int parse_cbm(char *buf, struct rdt_resource *r)
+static int parse_ctrls(char *buf, struct rdt_resource *r)
 {
 	unsigned long data;
-	int ret;
+	int ret = 0;
 
-	ret = kstrtoul(buf, 16, &data);
-	if (ret)
-		return ret;
-	if (!cbm_validate(data, r))
-		return -EINVAL;
+	ret = r->validate(buf, &data, r);
 	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -90,7 +119,7 @@ static int parse_line(char *line, struct rdt_resource *r)
 		id = strsep(&dom, "=");
 		if (kstrtoul(id, 10, &dom_id) || dom_id != d->id)
 			return -EINVAL;
-		if (parse_cbm(dom, r))
+		if (parse_ctrls(dom, r))
 			return -EINVAL;
 	}
 
@@ -219,7 +248,8 @@ static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
 	list_for_each_entry(dom, &r->domains, list) {
 		if (sep)
 			seq_puts(s, ";");
-		seq_printf(s, "%d=%x", dom->id, dom->ctrl_val[closid]);
+
+		seq_printf(s, r->display_str, dom->id, dom->ctrl_val[closid]);
 		sep = true;
 	}
 	seq_puts(s, "\n");
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates for MBA
  2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
                   ` (6 preceding siblings ...)
  2017-01-10 19:33 ` [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support " Vikas Shivappa
@ 2017-01-10 19:33 ` Vikas Shivappa
  2017-01-16 16:13   ` Thomas Gleixner
  7 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-01-10 19:33 UTC (permalink / raw)
  To: vikas.shivappa, linux-kernel
  Cc: x86, hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck,
	fenghua.yu, h.peter.anvin, vikas.shivappa

During rmdir only reset the ctrl values for the rdtgroup's closid. This
is done so that that next time when the closid is reused they dont
reflect old values.

Remove the closid update during cpuonline in cqm as its
already in the CAT code. Since both cqm and CAT want the rmid and closid
to be zero when cpu is online, remove the PQR MSR write to zero during
cpuonline because the MSRs are at zero after cpu reset and also during
the first sched in they are updated.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c              |  1 -
 arch/x86/kernel/cpu/intel_rdt.c          |  1 -
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 18 +++++++++++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8c00dc0..baf7ade 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -1573,7 +1573,6 @@ static int intel_cqm_cpu_starting(unsigned int cpu)
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 	state->rmid = 0;
-	state->closid = 0;
 	state->rmid_usecnt = 0;
 
 	WARN_ON(c->x86_cache_max_rmid != cqm_max_rmid);
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 9b0a00e..0c25227 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -403,7 +403,6 @@ static void clear_closid(int cpu)
 
 	per_cpu(cpu_closid, cpu) = 0;
 	state->closid = 0;
-	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
 }
 
 static int intel_rdt_online_cpu(unsigned int cpu)
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 9d9b7f4..97fc129 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -856,7 +856,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 	return dentry;
 }
 
-static int reset_all_cbms(struct rdt_resource *r)
+static int reset_all_ctrls(struct rdt_resource *r, u32 sclosid, u32 eclosid)
 {
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
@@ -867,8 +867,8 @@ static int reset_all_cbms(struct rdt_resource *r)
 		return -ENOMEM;
 
 	msr_param.res = r;
-	msr_param.low = 0;
-	msr_param.high = r->num_closid;
+	msr_param.low = sclosid;
+	msr_param.high = eclosid;
 
 	/*
 	 * Disable resource control for this resource by setting all
@@ -878,7 +878,7 @@ static int reset_all_cbms(struct rdt_resource *r)
 	list_for_each_entry(d, &r->domains, list) {
 		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
 
-		for (i = 0; i < r->num_closid; i++)
+		for (i = sclosid; i < eclosid; i++)
 			d->ctrl_val[i] = r->no_ctrl;
 	}
 	cpu = get_cpu();
@@ -972,7 +972,7 @@ static void rdt_kill_sb(struct super_block *sb)
 
 	/*Put everything back to default values. */
 	for_each_enabled_rdt_resource(r)
-		reset_all_cbms(r);
+		reset_all_ctrls(r, 0, r->num_closid);
 	cdp_disable();
 	rmdir_all_sub();
 	static_branch_disable(&rdt_enable_key);
@@ -1067,6 +1067,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 {
 	int ret, cpu, closid = rdtgroup_default.closid;
 	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
 	cpumask_var_t tmpmask;
 
 	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
@@ -1095,6 +1096,13 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	rdt_update_closid(tmpmask, NULL);
 
+	/*
+	 * Put domain control values back to default for the
+	 * rdtgrp thats being removed.
+	 */
+	for_each_enabled_rdt_resource(r)
+		reset_all_ctrls(r, rdtgrp->closid, rdtgrp->closid + 1);
+
 	rdtgrp->flags = RDT_DELETED;
 	closid_free(rdtgrp->closid);
 	list_del(&rdtgrp->rdtgroup_list);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-10 19:33 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface Vikas Shivappa
@ 2017-01-16 13:43   ` Thomas Gleixner
  2017-01-18  0:51     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 13:43 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:

> Memory b/w allocation is part of Intel RDT(resource director technology)
> which lets user control the amount of memory b/w (L2 external b/w) per
> thread. This is done programming MSR interfaces like cache allocation
> technology and other RDT features.
> This patch adds documentation for Memory b/w allocation interface usage.

Sigh. I told you how often that 'This patch' is crap. We already know that
this is a patch. Read and finally act according to
Documentation/process/SubmittingPatches

> +Memory b/w throttle

Can we please spell out Bandwidth at least once? b/w can mean anything
(black/white, both ways ...)

> +-------------------
> +For Memory b/w resource, the portion of total memory b/w the user can
> +restrict or 'throttle by' is indicated by the thrtl_by values.
> +
> +Throttle by values could be linear scale or non-linear scale.  In linear
> +scale a thrtl_by value of say 20 would throttle the memory b/w by 20%
> +allowing only 80% max b/w. In nonlinear scale currently SDM specifies
> +throttle values in 2^n values. However the h/w does not guarantee a
> +specific curve for the amount of memory b/w that is actually throttled.
> +But for any thrtl_by value x > y, its guaranteed that x would throttle
> +more b/w than y.  The info directory specifies the max thrtl_by value
> +and thrtl_by granularity.

This interface is really crap. The natural way to express it is:

     Requested Bandwidth = X %

i.e. 100% is unthrottled.

The info file should tell the minimum bandwidth,the granularity value and
the scale mode.

The actual programming should just take a bandwidth percentage value
between 0 and 100. The written value is adjusted by the write function to
the granularity and minimum bandwidth, so a subsequent readout will tell
the effective value.

That's important because that allows scripts to work independent of the
actual hardware implementation with default bandwidth configurations and
then allows the user/admin to readout the effective values on a particular
machine. If someone wants to adjust them machine specific, that's possible
as well.

Aside of that this documentation should contain some information about the
limitations of that bandwidth control, i.e. the fact that this is a core
specific mechanism and using a high bandwidth and a low bandwidth setting
on two threads sharing a core will throttle the high bandwidth thread
inadvertently. That's really important to mention in the documentation
because that's going to bring interesting surprises for users.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA
  2017-01-10 19:33 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
@ 2017-01-16 13:45   ` Thomas Gleixner
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 13:45 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> @@ -71,14 +71,14 @@ struct rftype {
>   * @capable:			Is this feature available on this machine
>   * @name:			Name to use in "schemata" file
>   * @num_closid:			Number of CLOSIDs available
> - * @max_cbm:			Largest Cache Bit Mask allowed
> + * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.

no_ctrl is a horrible name. This should be default_ctrl or something similar.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA
  2017-01-10 19:33 ` [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA Vikas Shivappa
@ 2017-01-16 13:54   ` Thomas Gleixner
  2017-01-18  0:53     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 13:54 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> This patch does some changes to get ready to handle more resources like
> Memory b/w allocation(MBA).
> 
> -Update the control registers only when user changes the controls(cbm for
> Cache resources and Mem b/w for memory). Hence not sending IPIs on all
> domains when user updates the control vals.
> -Introduce next_enabled_resource rather than looping through all
> resources while parsing each schemata line. The order of resources
> should be anyways the same as the root schemata.
> -Return error as soon as we detect a resource not entering all domain
> values in schemata rather than waiting till we parse all resources.

That looks all like reasonable optimizations and I have a hard time to
understand why this is a prerequisite for the bandwidth support.

And each of these changes is independent so they should be in seperate
patches.

> +/*
> + * Parameter r must be NULL or pointing to
> + * a valid rdt_resource_all entry.
> + * Points r to the next enabled RDT resource at the end.
> + */
> +#define next_enabled_rdt_resource(r)					\
> +do {									\
> +	if (!r)								\
> +		r = rdt_resources_all;					\
> +	else								\
> +		r++;							\
> +	for (; r < rdt_resources_all + RDT_NUM_RESOURCES; r++)		\
> +		if (r->enabled)						\
> +			break;						\
> +} while (0)


This is crap, really. What the heck is wrong with a proper function?

static struct rdt_resource *get_next_enabled_resource(struct rdt_resource *r)
{
	....

	return r;
}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
  2017-01-10 19:33 ` [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
@ 2017-01-16 13:59   ` Thomas Gleixner
  2017-01-16 14:41     ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 13:59 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:

> Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
> Add supporting data structures to detect feature details which is done
> in later patch using CPUID with EAX=10H, ECX= 3.

So why is the $subject of this patch claiming that it provides the feature
detection?

> -/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
> +/* CPUID.(EAX=10H, ECX=ResID=3).EAX */
> +union cpuid_0x10_3_eax {
> +	struct {
> +		unsigned int max_delay:12;
> +	} split;

And the point of this struct is?

> +	unsigned int full;
> +};

> +	if (boot_cpu_has(X86_FEATURE_MBA)) {
> +		ret = true;
> +	}

Pointless brackets.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-01-10 19:33 ` [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
@ 2017-01-16 14:06   ` Thomas Gleixner
  2017-01-18  0:56     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 14:06 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> +static void rdt_get_mem_config(struct rdt_resource *r)
> +{
> +	union cpuid_0x10_3_eax eax;
> +	union cpuid_0x10_x_edx edx;
> +	u32 ebx, ecx;
> +
> +	cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full);
> +	r->num_closid = edx.split.cos_max + 1;
> +	r->max_delay = eax.split.max_delay + 1;
> +	r->no_ctrl = 0;
> +	if (ecx & MBE_IS_LINEAR)
> +		r->delay_linear = true;
> +
> +	if (r->delay_linear)
> +		r->delay_gran = MAX_MBA_THRTL - r->max_delay;

What's the point of this extra conditional?

	if (ecx & MBE_IS_LINEAR) {
		r->delay_linear = true;
		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
	}	

would be too obvious and easy to understand, right?

> +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
> +{
> +	int i;
> +
> +	d->ctrl_val = kmalloc_array(r->num_closid,
> +				     sizeof(*d->ctrl_val), GFP_KERNEL);
> +	if (!d->ctrl_val) {
> +		kfree(d);

Freeing memory in the error path of some other random function is just wrong.

	if (!d->ctrl_val)
		return -ENOMEM;

and deal with the fallout at the calling function.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-01-10 19:33 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
@ 2017-01-16 14:14   ` Thomas Gleixner
  2017-01-18  1:02     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 14:14 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:

> Add the files in info directory for MBA.
> The files in the info directory are as follows :
>  - num_closids: max number of closids for MBA which represents the max
>  class of service user can configure.
>  - max_thrtl_by: the max throttle by values.
> 
> Throttle by can have a linear scale and non linear scale.  In linear
> scale, if a throttle_by value is 40%, it means that the memory b/w is
> throttled 'by' 40% or in other words a max of 60% b/w is allowed to
> pass. In non-linear scale, the throttle_by values are in 2^n
> granularity. The h/w does not guarantee a curve of actual throttle w.r.t
> the configured values. But if a throttle_by value of x > y, then x is
> guaranteed to throttle more b/w than y.

This is ambigous because that is only correct when the effective values are
different. x=11 and y=12 with a granularity of 10 are resulting in the same
throttling.

> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -5,7 +5,6 @@
>  
>  #include <linux/kernfs.h>
>  #include <linux/jump_label.h>
> -
>  #include <asm/intel_rdt_common.h>

This white space is there on purpose to seperate linux and asm prefixed
includes. Stop making random white space changes.

> @@ -77,6 +76,8 @@ struct rftype {
>   * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>   *				in a cache bit mask
> + * @info_files:		resctrl info files for the resource
> + * @infofiles_len:		Number of info files

This wants to be a seperate patch as it changes the way how the info files
are set up. The implementation for the MBA stuff is a follow up patch.

>   * @max_delay:		Max throttle delay
>   * @delay_gran:		Throttle delay granularity
>   * @delay_linear:		true if delay is in linear scale
> @@ -98,6 +99,8 @@ struct rdt_resource {
>  	int			cbm_len;
>  	int			min_cbm_bits;
>  	u32			no_ctrl;
> +	struct rftype		*info_files;
> +	int			infofiles_len;
>  	u32			max_delay;
>  	u32			delay_gran;
>  	u32			delay_linear;
> @@ -137,6 +140,9 @@ struct msr_param {
>  	int			high;
>  };
>  
> +void rdt_get_cache_infofile(struct rdt_resource *r);
> +void rdt_get_mbe_infofile(struct rdt_resource *r);
> +
>  extern struct mutex rdtgroup_mutex;
>  
>  extern struct rdt_resource rdt_resources_all[];
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 6736e1d..bdfbd1d 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -154,6 +154,7 @@ static void rdt_get_mem_config(struct rdt_resource *r)
>  	if (r->delay_linear)
>  		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
>  
> +	rdt_get_mbe_infofile(r);
>  	r->capable = true;
>  	r->enabled = true;
>  }
> @@ -168,6 +169,7 @@ static void rdt_get_cache_config(int idx, struct rdt_resource *r)
>  	r->num_closid = edx.split.cos_max + 1;
>  	r->cbm_len = eax.split.cbm_len + 1;
>  	r->no_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
> +	rdt_get_cache_infofile(r);
>  	r->capable = true;
>  	r->enabled = true;
>  }
> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> index 53f1917..9d9b7f4 100644
> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -517,9 +517,41 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
>  
>  	return 0;
>  }
> +static int rdt_max_delay_show(struct kernfs_open_file *of,
> +			     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	seq_printf(seq, "%d\n", r->max_delay);
> +
> +	return 0;
> +}
> +
> +static int rdt_delay_gran_show(struct kernfs_open_file *of,
> +			     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	if (r->delay_linear)
> +		seq_printf(seq, "%d\n", r->delay_gran);
> +	else
> +		seq_printf(seq, "power of 2\n");
> +
> +	return 0;
> +}
> +
> +static int rdt_delay_linear_show(struct kernfs_open_file *of,
> +			     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	seq_printf(seq, "%d\n", r->delay_linear);
> +
> +	return 0;
> +}
>  
>  /* rdtgroup information files for one cache resource. */
> -static struct rftype res_info_files[] = {
> +static struct rftype res_cache_info_files[] = {
>  	{
>  		.name		= "num_closids",
>  		.mode		= 0444,
> @@ -540,11 +572,52 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
>  	},
>  };
>  
> +/* rdtgroup information files for MBE. */
> +static struct rftype res_mbe_info_files[] = {
> +	{
> +		.name		= "num_closids",
> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= rdt_num_closids_show,
> +	},
> +	{
> +		.name		= "max_thrtl_by",

You surely could not come up with a more cryptic file name, right? What's
so wrong with spelling out throttle? And the whole '_by' postfix here and
on the other files is pointless as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
  2017-01-16 13:59   ` Thomas Gleixner
@ 2017-01-16 14:41     ` Peter Zijlstra
  2017-01-16 16:16       ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2017-01-16 14:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Mon, Jan 16, 2017 at 02:59:11PM +0100, Thomas Gleixner wrote:
> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> 
> > Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
> > Add supporting data structures to detect feature details which is done
> > in later patch using CPUID with EAX=10H, ECX= 3.
> 
> So why is the $subject of this patch claiming that it provides the feature
> detection?
> 
> > -/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
> > +/* CPUID.(EAX=10H, ECX=ResID=3).EAX */
> > +union cpuid_0x10_3_eax {
> > +	struct {
> > +		unsigned int max_delay:12;
> > +	} split;
> 
> And the point of this struct is?

I suppose its there so we cannot forget adding it when we add more
bitfields in that word and to keep naming (full vs split) consistent wrt
other cpuid unions that do have multiple fields.

> 
> > +	unsigned int full;
> > +};

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support for MBA
  2017-01-10 19:33 ` [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support " Vikas Shivappa
@ 2017-01-16 16:07   ` Thomas Gleixner
  2017-01-18  1:03     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 16:07 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> + * @display_str:		Format string to show schemata
> + * @validate:			API to validate the ctrl values.
>   * @info_files:		resctrl info files for the resource
>   * @infofiles_len:		Number of info files
>   * @max_delay:		Max throttle delay
> @@ -99,6 +101,9 @@ struct rdt_resource {
>  	int			cbm_len;
>  	int			min_cbm_bits;
>  	u32			no_ctrl;
> +	char			*display_str;
> +	int (*validate)		(char *buf, unsigned long *data,
> +				  struct rdt_resource *r);

Again this display and validation change wants to be seperate from the
bandwidth stuff.

It's not rocket science to split patches into preparatory and
implementation parts.

> +	r->display_str = kstrdup("%d=%d", GFP_KERNEL);
> +	if (!r->display_str)
> +		return -ENOMEM;

And the point of this allocation is? To consume extra memory for a constant
string which is in const data anyway.

       r->display_str = "%d=%d";

does not need allcotion and consumes exactly the same amount of const data
as the above. Oh well...

> -static inline bool get_rdt_resources(void)
> +static inline int get_rdt_resources(void)
>  {

And the point of this change is? Lots of churn to return the same -ENODEV
value at the call site. So why are you trying to return other values
instead of the simple boolean success/fail decision?

>  /*
> + * Check whether MBE 'throttle by' value is correct.
> + *	As per the SDM, when the scale is linear the
> + *	throttle_by granularity is '100 - max_thrtl_by'
> + *	and when its non-linear it is 'power of 2'.

That's wrong. We really want to let the user set a bandwidth percentage
value from 0 - 100 %. And then adjust it to the proper value which the
hardware can provide. So the user value is independent from granularity,
linear and the max throttling allowed.

>  /*
> - * Read one cache bit mask (hex). Check that it is valid for the current
> - * resource type.
> + * Read the user RDT control value into tempory buffer:
> + * Cache bit mask (hex) or Memory b/w throttle (decimal).
> + * Check that it is valid for the current resource type.
>   */
> -static int parse_cbm(char *buf, struct rdt_resource *r)
> +static int parse_ctrls(char *buf, struct rdt_resource *r)
>  {
>  	unsigned long data;
> -	int ret;
> +	int ret = 0;

What's the purpose of initializing ret to 0 if the next action is assigning
ret the return value of the validate function?

> -	ret = kstrtoul(buf, 16, &data);
> -	if (ret)
> -		return ret;
> -	if (!cbm_validate(data, r))
> -		return -EINVAL;
> +	ret = r->validate(buf, &data, r);
>  	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
>  
> -	return 0;
> +	return ret;
>  }

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates for MBA
  2017-01-10 19:33 ` [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates " Vikas Shivappa
@ 2017-01-16 16:13   ` Thomas Gleixner
  2017-01-23 20:18     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 16:13 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 10 Jan 2017, Vikas Shivappa wrote:

> During rmdir only reset the ctrl values for the rdtgroup's closid. This
> is done so that that next time when the closid is reused they dont
> reflect old values.

How on earth is that related to MBA?

> Remove the closid update during cpuonline in cqm as its
> already in the CAT code. Since both cqm and CAT want the rmid and closid
> to be zero when cpu is online, remove the PQR MSR write to zero during
> cpuonline because the MSRs are at zero after cpu reset and also during

Are you sure that the MSRs are zero when the cpu goes offline and then
online again? That's not a reset.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect
  2017-01-16 14:41     ` Peter Zijlstra
@ 2017-01-16 16:16       ` Thomas Gleixner
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-16 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Mon, 16 Jan 2017, Peter Zijlstra wrote:

> On Mon, Jan 16, 2017 at 02:59:11PM +0100, Thomas Gleixner wrote:
> > On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> > 
> > > Detect MBA feature if CPUID.(EAX=10H, ECX=0):EBX.L2[bit 3] = 1.
> > > Add supporting data structures to detect feature details which is done
> > > in later patch using CPUID with EAX=10H, ECX= 3.
> > 
> > So why is the $subject of this patch claiming that it provides the feature
> > detection?
> > 
> > > -/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
> > > +/* CPUID.(EAX=10H, ECX=ResID=3).EAX */
> > > +union cpuid_0x10_3_eax {
> > > +	struct {
> > > +		unsigned int max_delay:12;
> > > +	} split;
> > 
> > And the point of this struct is?
> 
> I suppose its there so we cannot forget adding it when we add more
> bitfields in that word and to keep naming (full vs split) consistent wrt
> other cpuid unions that do have multiple fields.

Fair enough.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-16 13:43   ` Thomas Gleixner
@ 2017-01-18  0:51     ` Shivappa Vikas
  2017-01-18  9:01       ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-18  0:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>
>> Memory b/w allocation is part of Intel RDT(resource director technology)
>> which lets user control the amount of memory b/w (L2 external b/w) per
>> thread. This is done programming MSR interfaces like cache allocation
>> technology and other RDT features.
>> This patch adds documentation for Memory b/w allocation interface usage.
>
> Sigh. I told you how often that 'This patch' is crap. We already know that
> this is a patch. Read and finally act according to
> Documentation/process/SubmittingPatches
>
>> +Memory b/w throttle
>
> Can we please spell out Bandwidth at least once? b/w can mean anything
> (black/white, both ways ...)
>
>> +-------------------
>> +For Memory b/w resource, the portion of total memory b/w the user can
>> +restrict or 'throttle by' is indicated by the thrtl_by values.
>> +
>> +Throttle by values could be linear scale or non-linear scale.  In linear
>> +scale a thrtl_by value of say 20 would throttle the memory b/w by 20%
>> +allowing only 80% max b/w. In nonlinear scale currently SDM specifies
>> +throttle values in 2^n values. However the h/w does not guarantee a
>> +specific curve for the amount of memory b/w that is actually throttled.
>> +But for any thrtl_by value x > y, its guaranteed that x would throttle
>> +more b/w than y.  The info directory specifies the max thrtl_by value
>> +and thrtl_by granularity.
>
> This interface is really crap. The natural way to express it is:
>
>     Requested Bandwidth = X %

I wanted to do it this way which did seem more intuitive but the issue is with 
the non-linear scale which the hardware does not guarantee a particular 
percentage for a particular value. Or we don't know the curve for delay value vs. 
actual b/w throttled.

ex: in non linear scale , the granularity is 2^n.
Max : 512

Say a value of 256 is not guaranteed to have 50% or even follow a curve where we 
can calculate the corresponding percentage.

>
> i.e. 100% is unthrottled.
>
> The info file should tell the minimum bandwidth,the granularity value and
> the scale mode.
>
> The actual programming should just take a bandwidth percentage value
> between 0 and 100. The written value is adjusted by the write function to
> the granularity and minimum bandwidth, so a subsequent readout will tell
> the effective value.
>
> That's important because that allows scripts to work independent of the
> actual hardware implementation with default bandwidth configurations and
> then allows the user/admin to readout the effective values on a particular
> machine. If someone wants to adjust them machine specific, that's possible
> as well.
>
> Aside of that this documentation should contain some information about the
> limitations of that bandwidth control, i.e. the fact that this is a core
> specific mechanism and using a high bandwidth and a low bandwidth setting
> on two threads sharing a core will throttle the high bandwidth thread
> inadvertently.

Ok , will do.

Thanks,
Vikas

That's really important to mention in the documentation
> because that's going to bring interesting surprises for users.
>
> Thanks,
>
> 	tglx
>
>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA
  2017-01-16 13:54   ` Thomas Gleixner
@ 2017-01-18  0:53     ` Shivappa Vikas
  2017-01-18  9:05       ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-18  0:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>> This patch does some changes to get ready to handle more resources like
>> Memory b/w allocation(MBA).
>>
>> -Update the control registers only when user changes the controls(cbm for
>> Cache resources and Mem b/w for memory). Hence not sending IPIs on all
>> domains when user updates the control vals.
>> -Introduce next_enabled_resource rather than looping through all
>> resources while parsing each schemata line. The order of resources
>> should be anyways the same as the root schemata.
>> -Return error as soon as we detect a resource not entering all domain
>> values in schemata rather than waiting till we parse all resources.
>
> That looks all like reasonable optimizations and I have a hard time to
> understand why this is a prerequisite for the bandwidth support.
>
> And each of these changes is independent so they should be in seperate
> patches.

Ok will split them. Its not a pre requisite for MBA. Should i send as a seperate 
series ?

>
>> +/*
>> + * Parameter r must be NULL or pointing to
>> + * a valid rdt_resource_all entry.
>> + * Points r to the next enabled RDT resource at the end.
>> + */
>> +#define next_enabled_rdt_resource(r)					\
>> +do {									\
>> +	if (!r)								\
>> +		r = rdt_resources_all;					\
>> +	else								\
>> +		r++;							\
>> +	for (; r < rdt_resources_all + RDT_NUM_RESOURCES; r++)		\
>> +		if (r->enabled)						\
>> +			break;						\
>> +} while (0)
>
>
> This is crap, really. What the heck is wrong with a proper function?
>
> static struct rdt_resource *get_next_enabled_resource(struct rdt_resource *r)
> {
> 	....
>
> 	return r;
> }
>

Will fix,

Thanks,
Vikas

> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA
  2017-01-16 14:06   ` Thomas Gleixner
@ 2017-01-18  0:56     ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-18  0:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>> +static void rdt_get_mem_config(struct rdt_resource *r)
>> +{
>> +	union cpuid_0x10_3_eax eax;
>> +	union cpuid_0x10_x_edx edx;
>> +	u32 ebx, ecx;
>> +
>> +	cpuid_count(0x00000010, 3, &eax.full, &ebx, &ecx, &edx.full);
>> +	r->num_closid = edx.split.cos_max + 1;
>> +	r->max_delay = eax.split.max_delay + 1;
>> +	r->no_ctrl = 0;
>> +	if (ecx & MBE_IS_LINEAR)
>> +		r->delay_linear = true;
>> +
>> +	if (r->delay_linear)
>> +		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
>
> What's the point of this extra conditional?
>
> 	if (ecx & MBE_IS_LINEAR) {
> 		r->delay_linear = true;
> 		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
> 	}
>
> would be too obvious and easy to understand, right?

Will fix.

>
>> +static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
>> +{
>> +	int i;
>> +
>> +	d->ctrl_val = kmalloc_array(r->num_closid,
>> +				     sizeof(*d->ctrl_val), GFP_KERNEL);
>> +	if (!d->ctrl_val) {
>> +		kfree(d);
>
> Freeing memory in the error path of some other random function is just wrong.
>
> 	if (!d->ctrl_val)
> 		return -ENOMEM;
>
> and deal with the fallout at the calling function.

Will fix.. Thanks for pointing out.

Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-01-16 14:14   ` Thomas Gleixner
@ 2017-01-18  1:02     ` Shivappa Vikas
  2017-01-18  9:08       ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-18  1:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>
>> Add the files in info directory for MBA.
>> The files in the info directory are as follows :
>>  - num_closids: max number of closids for MBA which represents the max
>>  class of service user can configure.
>>  - max_thrtl_by: the max throttle by values.
>>
>> Throttle by can have a linear scale and non linear scale.  In linear
>> scale, if a throttle_by value is 40%, it means that the memory b/w is
>> throttled 'by' 40% or in other words a max of 60% b/w is allowed to
>> pass. In non-linear scale, the throttle_by values are in 2^n
>> granularity. The h/w does not guarantee a curve of actual throttle w.r.t
>> the configured values. But if a throttle_by value of x > y, then x is
>> guaranteed to throttle more b/w than y.
>
> This is ambigous because that is only correct when the effective values are
> different. x=11 and y=12 with a granularity of 10 are resulting in the same
> throttling.

The x and y are actual values written which i will spell out. The assumption is 
that only correct values are input though because we filterout the values which 
dont follow granularity, meaning return -EINVAL 
when someone tries to write 11 when granularity is 10. This was with the idea 
thats its easier for the user to understand whats actually written. Woudl that 
be reasonable or does it need a change ?
(Although the h/w does like you say , we can do a msr write for 11 etc ..)

>
>> --- a/arch/x86/include/asm/intel_rdt.h
>> +++ b/arch/x86/include/asm/intel_rdt.h
>> @@ -5,7 +5,6 @@
>>
>>  #include <linux/kernfs.h>
>>  #include <linux/jump_label.h>
>> -
>>  #include <asm/intel_rdt_common.h>
>
> This white space is there on purpose to seperate linux and asm prefixed
> includes. Stop making random white space changes.
>
>> @@ -77,6 +76,8 @@ struct rftype {
>>   * @no_ctrl:			Specifies max cache cbm or min mem b/w delay.
>>   * @min_cbm_bits:		Minimum number of consecutive bits to be set
>>   *				in a cache bit mask
>> + * @info_files:		resctrl info files for the resource
>> + * @infofiles_len:		Number of info files
>
> This wants to be a seperate patch as it changes the way how the info files
> are set up. The implementation for the MBA stuff is a follow up patch.

Ok , will split

>
>>   * @max_delay:		Max throttle delay
>>   * @delay_gran:		Throttle delay granularity
>>   * @delay_linear:		true if delay is in linear scale
>> @@ -98,6 +99,8 @@ struct rdt_resource {
>>  	int			cbm_len;
>>  	int			min_cbm_bits;
>>  	u32			no_ctrl;
>> +	struct rftype		*info_files;
>> +	int			infofiles_len;
>>  	u32			max_delay;
>>  	u32			delay_gran;
>>  	u32			delay_linear;
>> @@ -137,6 +140,9 @@ struct msr_param {
>>  	int			high;
>>  };
>>
>> +void rdt_get_cache_infofile(struct rdt_resource *r);
>> +void rdt_get_mbe_infofile(struct rdt_resource *r);
>> +
>>  extern struct mutex rdtgroup_mutex;
>>
>>  extern struct rdt_resource rdt_resources_all[];
>> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
>> index 6736e1d..bdfbd1d 100644
>> --- a/arch/x86/kernel/cpu/intel_rdt.c
>> +++ b/arch/x86/kernel/cpu/intel_rdt.c
>> @@ -154,6 +154,7 @@ static void rdt_get_mem_config(struct rdt_resource *r)
>>  	if (r->delay_linear)
>>  		r->delay_gran = MAX_MBA_THRTL - r->max_delay;
>>
>> +	rdt_get_mbe_infofile(r);
>>  	r->capable = true;
>>  	r->enabled = true;
>>  }
>> @@ -168,6 +169,7 @@ static void rdt_get_cache_config(int idx, struct rdt_resource *r)
>>  	r->num_closid = edx.split.cos_max + 1;
>>  	r->cbm_len = eax.split.cbm_len + 1;
>>  	r->no_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
>> +	rdt_get_cache_infofile(r);
>>  	r->capable = true;
>>  	r->enabled = true;
>>  }
>> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
>> index 53f1917..9d9b7f4 100644
>> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
>> @@ -517,9 +517,41 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
>>
>>  	return 0;
>>  }
>> +static int rdt_max_delay_show(struct kernfs_open_file *of,
>> +			     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	seq_printf(seq, "%d\n", r->max_delay);
>> +
>> +	return 0;
>> +}
>> +
>> +static int rdt_delay_gran_show(struct kernfs_open_file *of,
>> +			     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	if (r->delay_linear)
>> +		seq_printf(seq, "%d\n", r->delay_gran);
>> +	else
>> +		seq_printf(seq, "power of 2\n");
>> +
>> +	return 0;
>> +}
>> +
>> +static int rdt_delay_linear_show(struct kernfs_open_file *of,
>> +			     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	seq_printf(seq, "%d\n", r->delay_linear);
>> +
>> +	return 0;
>> +}
>>
>>  /* rdtgroup information files for one cache resource. */
>> -static struct rftype res_info_files[] = {
>> +static struct rftype res_cache_info_files[] = {
>>  	{
>>  		.name		= "num_closids",
>>  		.mode		= 0444,
>> @@ -540,11 +572,52 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
>>  	},
>>  };
>>
>> +/* rdtgroup information files for MBE. */
>> +static struct rftype res_mbe_info_files[] = {
>> +	{
>> +		.name		= "num_closids",
>> +		.mode		= 0444,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= rdt_num_closids_show,
>> +	},
>> +	{
>> +		.name		= "max_thrtl_by",
>
> You surely could not come up with a more cryptic file name, right? What's
> so wrong with spelling out throttle? And the whole '_by' postfix here and
> on the other files is pointless as well.

This is due to the issue i mention in reply to 1/8.. Can be changed to something 
better though. bw_restrict / bw_block (block is stopping..) ?

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support for MBA
  2017-01-16 16:07   ` Thomas Gleixner
@ 2017-01-18  1:03     ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-18  1:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>> + * @display_str:		Format string to show schemata
>> + * @validate:			API to validate the ctrl values.
>>   * @info_files:		resctrl info files for the resource
>>   * @infofiles_len:		Number of info files
>>   * @max_delay:		Max throttle delay
>> @@ -99,6 +101,9 @@ struct rdt_resource {
>>  	int			cbm_len;
>>  	int			min_cbm_bits;
>>  	u32			no_ctrl;
>> +	char			*display_str;
>> +	int (*validate)		(char *buf, unsigned long *data,
>> +				  struct rdt_resource *r);
>
> Again this display and validation change wants to be seperate from the
> bandwidth stuff.
>
> It's not rocket science to split patches into preparatory and
> implementation parts.

Will split.

>
>> +	r->display_str = kstrdup("%d=%d", GFP_KERNEL);
>> +	if (!r->display_str)
>> +		return -ENOMEM;
>
> And the point of this allocation is? To consume extra memory for a constant
> string which is in const data anyway.
>
>       r->display_str = "%d=%d";

Ok will fix to const string.

>
> does not need allcotion and consumes exactly the same amount of const data
> as the above. Oh well...
>
>> -static inline bool get_rdt_resources(void)
>> +static inline int get_rdt_resources(void)
>>  {
>
> And the point of this change is? Lots of churn to return the same -ENODEV
> value at the call site. So why are you trying to return other values
> instead of the simple boolean success/fail decision?
>
>>  /*
>> + * Check whether MBE 'throttle by' value is correct.
>> + *	As per the SDM, when the scale is linear the
>> + *	throttle_by granularity is '100 - max_thrtl_by'
>> + *	and when its non-linear it is 'power of 2'.
>
> That's wrong. We really want to let the user set a bandwidth percentage
> value from 0 - 100 %. And then adjust it to the proper value which the
> hardware can provide. So the user value is independent from granularity,
> linear and the max throttling allowed.
>
>>  /*
>> - * Read one cache bit mask (hex). Check that it is valid for the current
>> - * resource type.
>> + * Read the user RDT control value into tempory buffer:
>> + * Cache bit mask (hex) or Memory b/w throttle (decimal).
>> + * Check that it is valid for the current resource type.
>>   */
>> -static int parse_cbm(char *buf, struct rdt_resource *r)
>> +static int parse_ctrls(char *buf, struct rdt_resource *r)
>>  {
>>  	unsigned long data;
>> -	int ret;
>> +	int ret = 0;
>
> What's the purpose of initializing ret to 0 if the next action is assigning
> ret the return value of the validate function?

Will fix.

Thanks,
Vikas

>
>> -	ret = kstrtoul(buf, 16, &data);
>> -	if (ret)
>> -		return ret;
>> -	if (!cbm_validate(data, r))
>> -		return -EINVAL;
>> +	ret = r->validate(buf, &data, r);
>>  	r->tmp_ctrl[r->num_tmp_ctrl++] = data;
>>
>> -	return 0;
>> +	return ret;
>>  }
>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-18  0:51     ` Shivappa Vikas
@ 2017-01-18  9:01       ` Thomas Gleixner
  2017-01-23 18:57         ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-18  9:01 UTC (permalink / raw)
  To: Shivappa Vikas
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 17 Jan 2017, Shivappa Vikas wrote:
> On Mon, 16 Jan 2017, Thomas Gleixner wrote:
> > This interface is really crap. The natural way to express it is:
> > 
> >     Requested Bandwidth = X %
> 
> I wanted to do it this way which did seem more intuitive but the issue is with
> the non-linear scale which the hardware does not guarantee a particular
> percentage for a particular value. Or we don't know the curve for delay value
> vs. actual b/w throttled.
> 
> ex: in non linear scale , the granularity is 2^n.
> Max : 512
> 
> Say a value of 256 is not guaranteed to have 50% or even follow a curve where
> we can calculate the corresponding percentage.

The question is whether this non linear scale thing is just a first
implementation attempt and any sane hardware in the future will use the
percentage value (which is an approximation as well).

If that non-linear scale is not going to be prevalent, then we really can
live with the fallout of a particular CPU type.

If it's going to stay, then Intel should be able to provide simple tables
which give us the required information for a particular CPU model.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA
  2017-01-18  0:53     ` Shivappa Vikas
@ 2017-01-18  9:05       ` Thomas Gleixner
  0 siblings, 0 replies; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-18  9:05 UTC (permalink / raw)
  To: Shivappa Vikas
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 17 Jan 2017, Shivappa Vikas wrote:
> On Mon, 16 Jan 2017, Thomas Gleixner wrote:
> 
> > On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> > > This patch does some changes to get ready to handle more resources like
> > > Memory b/w allocation(MBA).
> > > 
> > > -Update the control registers only when user changes the controls(cbm for
> > > Cache resources and Mem b/w for memory). Hence not sending IPIs on all
> > > domains when user updates the control vals.
> > > -Introduce next_enabled_resource rather than looping through all
> > > resources while parsing each schemata line. The order of resources
> > > should be anyways the same as the root schemata.
> > > -Return error as soon as we detect a resource not entering all domain
> > > values in schemata rather than waiting till we parse all resources.
> > 
> > That looks all like reasonable optimizations and I have a hard time to
> > understand why this is a prerequisite for the bandwidth support.
> > 
> > And each of these changes is independent so they should be in seperate
> > patches.
> 
> Ok will split them. Its not a pre requisite for MBA. Should i send as a
> seperate series ?

Yes please. That way we can apply preparatory things first and then
concentrate on the new MBA functionality.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-01-18  1:02     ` Shivappa Vikas
@ 2017-01-18  9:08       ` Thomas Gleixner
  2017-01-23 19:45         ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-18  9:08 UTC (permalink / raw)
  To: Shivappa Vikas
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Tue, 17 Jan 2017, Shivappa Vikas wrote:
> On Mon, 16 Jan 2017, Thomas Gleixner wrote:
> 
> > On Tue, 10 Jan 2017, Vikas Shivappa wrote:
> > 
> > > Add the files in info directory for MBA.
> > > The files in the info directory are as follows :
> > >  - num_closids: max number of closids for MBA which represents the max
> > >  class of service user can configure.
> > >  - max_thrtl_by: the max throttle by values.
> > > 
> > > Throttle by can have a linear scale and non linear scale.  In linear
> > > scale, if a throttle_by value is 40%, it means that the memory b/w is
> > > throttled 'by' 40% or in other words a max of 60% b/w is allowed to
> > > pass. In non-linear scale, the throttle_by values are in 2^n
> > > granularity. The h/w does not guarantee a curve of actual throttle w.r.t
> > > the configured values. But if a throttle_by value of x > y, then x is
> > > guaranteed to throttle more b/w than y.
> > 
> > This is ambigous because that is only correct when the effective values are
> > different. x=11 and y=12 with a granularity of 10 are resulting in the same
> > throttling.
> 
> The x and y are actual values written which i will spell out. The assumption
> is that only correct values are input though because we filterout the values
> which dont follow granularity, meaning return -EINVAL when someone tries to
> write 11 when granularity is 10. This was with the idea thats its easier for
> the user to understand whats actually written. Woudl that be reasonable or
> does it need a change ?
> (Although the h/w does like you say , we can do a msr write for 11 etc ..)


> > > +/* rdtgroup information files for MBE. */
> > > +static struct rftype res_mbe_info_files[] = {
> > > +	{
> > > +		.name		= "num_closids",
> > > +		.mode		= 0444,
> > > +		.kf_ops		= &rdtgroup_kf_single_ops,
> > > +		.seq_show	= rdt_num_closids_show,
> > > +	},
> > > +	{
> > > +		.name		= "max_thrtl_by",
> > 
> > You surely could not come up with a more cryptic file name, right? What's
> > so wrong with spelling out throttle? And the whole '_by' postfix here and
> > on the other files is pointless as well.
> 
> This is due to the issue i mention in reply to 1/8.. Can be changed to

Err no. max_thrtl_by has nothing to do with 1/8. Of course you want to
expose the hardware information.

What's wrong with spelling out throttle and remove the '_by':

       max_throttle

That is readable, understandable and matches the documentation in the SDM.

It's not that hard.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-18  9:01       ` Thomas Gleixner
@ 2017-01-23 18:57         ` Shivappa Vikas
  2017-01-23 19:01           ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-23 18:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Shivappa Vikas, Vikas Shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Wed, 18 Jan 2017, Thomas Gleixner wrote:

> On Tue, 17 Jan 2017, Shivappa Vikas wrote:
>> On Mon, 16 Jan 2017, Thomas Gleixner wrote:
>>> This interface is really crap. The natural way to express it is:
>>>
>>>     Requested Bandwidth = X %
>>
>> I wanted to do it this way which did seem more intuitive but the issue is with
>> the non-linear scale which the hardware does not guarantee a particular
>> percentage for a particular value. Or we don't know the curve for delay value
>> vs. actual b/w throttled.
>>
>> ex: in non linear scale , the granularity is 2^n.
>> Max : 512
>>
>> Say a value of 256 is not guaranteed to have 50% or even follow a curve where
>> we can calculate the corresponding percentage.
>
> The question is whether this non linear scale thing is just a first
> implementation attempt and any sane hardware in the future will use the
> percentage value (which is an approximation as well).
>
> If that non-linear scale is not going to be prevalent, then we really can
> live with the fallout of a particular CPU type.
>
> If it's going to stay, then Intel should be able to provide simple tables
> which give us the required information for a particular CPU model.

By sample table - does this mean we can map a throttle value in non-linear 
scale to its percentage ?

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-23 18:57         ` Shivappa Vikas
@ 2017-01-23 19:01           ` Thomas Gleixner
  2017-01-24 22:10             ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-01-23 19:01 UTC (permalink / raw)
  To: Shivappa Vikas
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin

On Mon, 23 Jan 2017, Shivappa Vikas wrote:
> On Wed, 18 Jan 2017, Thomas Gleixner wrote:
> > If it's going to stay, then Intel should be able to provide simple tables
> > which give us the required information for a particular CPU model.
> 
> By sample table - does this mean we can map a throttle value in non-linear
> scale to its percentage ?

Exactly. And that map should be provided by Intel simply because it's a
royal pain in the neck if users have to map random power of 2 values to
something useful. And of course every user has to do the same thing on
their own. That'd be more than silly.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-01-18  9:08       ` Thomas Gleixner
@ 2017-01-23 19:45         ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-23 19:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Shivappa Vikas, Vikas Shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Wed, 18 Jan 2017, Thomas Gleixner wrote:

> On Tue, 17 Jan 2017, Shivappa Vikas wrote:
>> On Mon, 16 Jan 2017, Thomas Gleixner wrote:
>>
>>> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>>>
>>>> Add the files in info directory for MBA.
>>>> The files in the info directory are as follows :
>>>>  - num_closids: max number of closids for MBA which represents the max
>>>>  class of service user can configure.
>>>>  - max_thrtl_by: the max throttle by values.
>>>>
>>>> Throttle by can have a linear scale and non linear scale.  In linear
>>>> scale, if a throttle_by value is 40%, it means that the memory b/w is
>>>> throttled 'by' 40% or in other words a max of 60% b/w is allowed to
>>>> pass. In non-linear scale, the throttle_by values are in 2^n
>>>> granularity. The h/w does not guarantee a curve of actual throttle w.r.t
>>>> the configured values. But if a throttle_by value of x > y, then x is
>>>> guaranteed to throttle more b/w than y.
>>>
>>> This is ambigous because that is only correct when the effective values are
>>> different. x=11 and y=12 with a granularity of 10 are resulting in the same
>>> throttling.
>>
>> The x and y are actual values written which i will spell out. The assumption
>> is that only correct values are input though because we filterout the values
>> which dont follow granularity, meaning return -EINVAL when someone tries to
>> write 11 when granularity is 10. This was with the idea thats its easier for
>> the user to understand whats actually written. Woudl that be reasonable or
>> does it need a change ?
>> (Although the h/w does like you say , we can do a msr write for 11 etc ..)
>
>
>>>> +/* rdtgroup information files for MBE. */
>>>> +static struct rftype res_mbe_info_files[] = {
>>>> +	{
>>>> +		.name		= "num_closids",
>>>> +		.mode		= 0444,
>>>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>>>> +		.seq_show	= rdt_num_closids_show,
>>>> +	},
>>>> +	{
>>>> +		.name		= "max_thrtl_by",
>>>
>>> You surely could not come up with a more cryptic file name, right? What's
>>> so wrong with spelling out throttle? And the whole '_by' postfix here and
>>> on the other files is pointless as well.
>>
>> This is due to the issue i mention in reply to 1/8.. Can be changed to
>
> Err no. max_thrtl_by has nothing to do with 1/8. Of course you want to
> expose the hardware information.
>
> What's wrong with spelling out throttle and remove the '_by':
>
>       max_throttle

Always was under the impression max_throttle by default means the max b/w thats 
allowed. (if max_throttle is 90, user would think 90% is whats the most allowed 
to pass..). But this is really the delay value which was implemented which 
means the % of bytes that are 'not' allowed to pass (so 90 means , that 90% is whats 
restricted and 10 is what is allowed).

Thanks,
Vikas

>
> That is readable, understandable and matches the documentation in the SDM.
>
> It's not that hard.
>
> Thanks,
>
> 	tglx
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates for MBA
  2017-01-16 16:13   ` Thomas Gleixner
@ 2017-01-23 20:18     ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-23 20:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 16 Jan 2017, Thomas Gleixner wrote:

> On Tue, 10 Jan 2017, Vikas Shivappa wrote:
>
>> During rmdir only reset the ctrl values for the rdtgroup's closid. This
>> is done so that that next time when the closid is reused they dont
>> reflect old values.
>
> How on earth is that related to MBA?

This is not specific to MBA so i put it as x86/intel_rdt in the subject but will 
send this as part of seperate series which you mentioned.

>
>> Remove the closid update during cpuonline in cqm as its
>> already in the CAT code. Since both cqm and CAT want the rmid and closid
>> to be zero when cpu is online, remove the PQR MSR write to zero during
>> cpuonline because the MSRs are at zero after cpu reset and also during
>
> Are you sure that the MSRs are zero when the cpu goes offline and then
> online again? That's not a reset.

Thats right - I was removing this since at the first sched_in the new CLOSID 
would be updated anyways. I know this prevents from having a 0 CLOSid at the 
time all the hot plug code is run, but we already have run the __cpu_disable and 
all the hotplug code when going down on the last used CLOSid. (if the kernel 
thread that handles the offline/online should have a 0 CLOS then thats fine 
probably..)
So may be the right way is to actually zero the CLOS in PQR_MSR when we are 
going down/offline ?

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface
  2017-01-23 19:01           ` Thomas Gleixner
@ 2017-01-24 22:10             ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-01-24 22:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Shivappa Vikas, Vikas Shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, h.peter.anvin



On Mon, 23 Jan 2017, Thomas Gleixner wrote:

> On Mon, 23 Jan 2017, Shivappa Vikas wrote:
>> On Wed, 18 Jan 2017, Thomas Gleixner wrote:
>>> If it's going to stay, then Intel should be able to provide simple tables
>>> which give us the required information for a particular CPU model.
>>
>> By sample table - does this mean we can map a throttle value in non-linear
>> scale to its percentage ?
>
> Exactly. And that map should be provided by Intel simply because it's a
> royal pain in the neck if users have to map random power of 2 values to
> something useful. And of course every user has to do the same thing on
> their own. That'd be more than silly.

Ok. Right now we dont have this capability on any SKUs, so i will just do a call 
to say get_throttle_map() for non_linear scale and fail if there is a nonlinear 
scale and later we can update to API to actually return the map for SKUs that 
support it.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>
>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-04-08  0:33 [PATCH 0/8 V4] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
@ 2017-04-08  0:33 ` Vikas Shivappa
  0 siblings, 0 replies; 37+ messages in thread
From: Vikas Shivappa @ 2017-04-08  0:33 UTC (permalink / raw)
  To: vikas.shivappa, x86, linux-kernel
  Cc: hpa, tglx, mingo, ravi.v.shankar, tony.luck, fenghua.yu, vikas.shivappa

The files in the info directory for MBA are as follows:

 num_closids

 	The maximum number of CLOSids available for MBA

 min_bandwidth

 	The minimum memory bandwidth percentage value

 bandwidth_gran

 	The granularity of the bandwidth control in percent for the
	particular CPU SKU. Intermediate values entered are rounded off
	to the previous control step available. Available bandwidth
	control steps are minimum_bandwidth + N * bandwidth_gran.

 delay_linear

 	When set, the OS writes a linear percentage based value to the
	control MSRs ranging from minimum_bandwidth to 100 percent.

	This value is informational and has no influence on the values
	written to the schemata files. The values written to the
	schemata are always bandwidth percentage that is requested.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  1 +
 arch/x86/kernel/cpu/intel_rdt.c          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 64 ++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index c030637..32de56a 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -161,6 +161,7 @@ struct rdt_resource {
 };
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
+void rdt_get_mba_infofile(struct rdt_resource *r);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 86142cc..a4f56d9 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -197,6 +197,7 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
 			return false;
 	}
 	r->data_width = 3;
+	rdt_get_mba_infofile(r);
 
 	r->capable = true;
 	r->enabled = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index a3a0faa..d97ffc1 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -518,6 +518,36 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_min_bw_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->min_bw);
+
+	return 0;
+}
+
+static int rdt_bw_gran_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->bw_gran);
+
+	return 0;
+}
+
+static int rdt_delay_linear_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->delay_linear);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_cache_info_files[] = {
 	{
@@ -540,6 +570,40 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+/* rdtgroup information files for memory bandwidth. */
+static struct rftype res_mba_info_files[] = {
+	{
+		.name		= "num_closids",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closids_show,
+	},
+	{
+		.name		= "min_bandwidth",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_min_bw_show,
+	},
+	{
+		.name		= "bandwidth_gran",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_bw_gran_show,
+	},
+	{
+		.name		= "delay_linear",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_linear_show,
+	},
+};
+
+void rdt_get_mba_infofile(struct rdt_resource *r)
+{
+	r->info_files = res_mba_info_files;
+	r->nr_info_files = ARRAY_SIZE(res_mba_info_files);
+}
+
 void rdt_get_cache_infofile(struct rdt_resource *r)
 {
 	r->info_files = res_cache_info_files;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-04-03 21:57 [PATCH 0/8 V3] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
@ 2017-04-03 21:57 ` Vikas Shivappa
  0 siblings, 0 replies; 37+ messages in thread
From: Vikas Shivappa @ 2017-04-03 21:57 UTC (permalink / raw)
  To: vikas.shivappa, x86, linux-kernel
  Cc: hpa, tglx, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu,
	h.peter.anvin

The files in the info directory for MBA are as follows:

 num_closids

 	The maximum number of CLOSids available for MBA

 min_bandwidth

 	The minimum memory bandwidth percentage value

 bandwidth_gran

 	The granularity of the bandwidth control in percent for the
	particular CPU SKU. Intermediate values entered are rounded off
	to the previous control step available. Available bandwidth
	control steps are minimum_bandwidth + N * bandwidth_gran.

 delay_linear

 	When set, the OS writes a linear percentage based value to the
	control MSRs ranging from minimum_bandwidth to 100 percent.

	This value is informational and has no influence on the values
	written to the schemata files. The values written to the
	schemata are always bandwidth percentage that is requested.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  1 +
 arch/x86/kernel/cpu/intel_rdt.c          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 64 ++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index e58e1d4..b42fba5 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -155,6 +155,7 @@ struct rdt_resource {
 };
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
+void rdt_get_mba_infofile(struct rdt_resource *r);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index b50cb35..fd65548 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -197,6 +197,7 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
 			return false;
 	}
 	r->data_width = 3;
+	rdt_get_mba_infofile(r);
 
 	r->capable = true;
 	r->enabled = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index aee4a19..ce5b173 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -518,6 +518,36 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_min_bw_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->min_bw);
+
+	return 0;
+}
+
+static int rdt_bw_gran_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->bw_gran);
+
+	return 0;
+}
+
+static int rdt_delay_linear_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->delay_linear);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_cache_info_files[] = {
 	{
@@ -540,6 +570,40 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+/* rdtgroup information files for memory bandwidth. */
+static struct rftype res_mba_info_files[] = {
+	{
+		.name		= "num_closids",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closids_show,
+	},
+	{
+		.name		= "min_bandwidth",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_min_bw_show,
+	},
+	{
+		.name		= "bandwidth_gran",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_bw_gran_show,
+	},
+	{
+		.name		= "delay_linear",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_linear_show,
+	},
+};
+
+void rdt_get_mba_infofile(struct rdt_resource *r)
+{
+	r->info_files = res_mba_info_files;
+	r->nr_info_files = ARRAY_SIZE(res_mba_info_files);
+}
+
 void rdt_get_cache_infofile(struct rdt_resource *r)
 {
 	r->info_files = res_cache_info_files;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-03-01 16:05   ` Thomas Gleixner
@ 2017-03-10 23:26     ` Shivappa Vikas
  0 siblings, 0 replies; 37+ messages in thread
From: Shivappa Vikas @ 2017-03-10 23:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen



On Wed, 1 Mar 2017, Thomas Gleixner wrote:

> On Fri, 17 Feb 2017, Vikas Shivappa wrote:
>
>> Add files in info directory for MBA.
>> The files in the info directory are as follows :
>> - num_closids: max number of closids for MBA which represents the max
>> class of service user can configure.
>> - min_bw: the minimum memory bandwidth(b/w) values in percentage.
>>
>> OS maps the b/w percentage values to memory b/w throttle delay values
>> and configures them via MSR interface by writing to the QOS_MSRs. These
>> delay values can have a linear or nonlinear scale.
>>
>> - bw_gran: The memory b/w granularity that can be configured.
>> For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
>> values are 10,20,30...
>
> This is unreadable. It's possible to structure ASCII text cleanly.
>
> x86/rdt: Add info directory files for MBA
>
> The info directory for MBA contains the following files:
>
> num_closids
>
> 	The maximum number of class of service slots available for MBA
>
> min_bandwidth
>
> 	The minimum memory bandwidth percentage value
>
> bandwidth_gran
>
> 	The granularity of the bandwidth control for the particular
> 	hardware in percent. The available bandwidth control steps are:
> 	min_bw + N * bw_gran Intermediate values are rounded to the next

Is this next or pervious? Meaning when 12 is requested on a 10 granularity , we 
give 10 ?

> 	control step available on the hardware.
>
> delay_linear
>
> 	If set, the control registers take a linear percentage based value
> 	between min_bandwidth and 100 percent.
>
> 	If not set, the control registers take a power of 2 based value
> 	which is mapped by the kernel to percentage based values.
>
> 	This file is of pure informational nature and has no influence on
> 	the values which are written to the schemata files. These are
> 	always percentage based.

Will update the changelogs

>
> Note, that this uses the actual file names and not some random
> abbreviations thereof. It also documents delay_linear and gets rid of the
> implementation details of QOS_MSRs. They are irrelevant here.
>
> And exactly this information wants to go into Documentation/... preferably
> in exactly this patch and not in a disconnected one which describes stuff
> differently for whatever reasons.
>
>> +static int rdt_min_bw_show(struct kernfs_open_file *of,
>> +			     struct seq_file *seq, void *v)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +
>> +	seq_printf(seq, "%d\n", r->min_bw);
>> +
>
> Can you please get rid of these pointless extra new lines before the
> 'return 0;' ? They are just eating screen estate and do not make the code
> more readable.

All the rest of the show functions have that line before return like the 
rdt_min_cbm_bits_show etc.  I have tried to 
always keep a line before return like the old cqm/rapl - if thats ok

>
>> +/* rdtgroup information files for MBE. */
>
> What is MBE?
>
>> +static struct rftype res_mbe_info_files[] = {
>
> Randomizing names make the code more secure or what are you trying to
> achieve?
>
>> +void rdt_get_mba_infofile(struct rdt_resource *r)
>> +{
>> +	r->info_files = &res_mbe_info_files[0];
>
> See other mail.
>

Will fix the MBE to MBA and the pointer init.

Thanks,
Vikas

> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
@ 2017-03-01 16:05   ` Thomas Gleixner
  2017-03-10 23:26     ` Shivappa Vikas
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2017-03-01 16:05 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen

On Fri, 17 Feb 2017, Vikas Shivappa wrote:

> Add files in info directory for MBA.
> The files in the info directory are as follows :
> - num_closids: max number of closids for MBA which represents the max
> class of service user can configure.
> - min_bw: the minimum memory bandwidth(b/w) values in percentage.
> 
> OS maps the b/w percentage values to memory b/w throttle delay values
> and configures them via MSR interface by writing to the QOS_MSRs. These
> delay values can have a linear or nonlinear scale.
> 
> - bw_gran: The memory b/w granularity that can be configured.
> For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
> values are 10,20,30...

This is unreadable. It's possible to structure ASCII text cleanly.

x86/rdt: Add info directory files for MBA

 The info directory for MBA contains the following files:

 num_closids

	The maximum number of class of service slots available for MBA

 min_bandwidth
 
	The minimum memory bandwidth percentage value

 bandwidth_gran

	The granularity of the bandwidth control for the particular
 	hardware in percent. The available bandwidth control steps are:
 	min_bw + N * bw_gran Intermediate values are rounded to the next
 	control step available on the hardware.

 delay_linear

	If set, the control registers take a linear percentage based value
	between min_bandwidth and 100 percent.

	If not set, the control registers take a power of 2 based value
	which is mapped by the kernel to percentage based values.

	This file is of pure informational nature and has no influence on
	the values which are written to the schemata files. These are
	always percentage based.

Note, that this uses the actual file names and not some random
abbreviations thereof. It also documents delay_linear and gets rid of the
implementation details of QOS_MSRs. They are irrelevant here. 

And exactly this information wants to go into Documentation/... preferably
in exactly this patch and not in a disconnected one which describes stuff
differently for whatever reasons.
  
> +static int rdt_min_bw_show(struct kernfs_open_file *of,
> +			     struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +
> +	seq_printf(seq, "%d\n", r->min_bw);
> +

Can you please get rid of these pointless extra new lines before the
'return 0;' ? They are just eating screen estate and do not make the code
more readable.

> +/* rdtgroup information files for MBE. */

What is MBE?

> +static struct rftype res_mbe_info_files[] = {

Randomizing names make the code more secure or what are you trying to
achieve?

> +void rdt_get_mba_infofile(struct rdt_resource *r)
> +{
> +	r->info_files = &res_mbe_info_files[0];

See other mail.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA
  2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
@ 2017-02-17 19:58 ` Vikas Shivappa
  2017-03-01 16:05   ` Thomas Gleixner
  0 siblings, 1 reply; 37+ messages in thread
From: Vikas Shivappa @ 2017-02-17 19:58 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, vikas.shivappa

Add files in info directory for MBA.
The files in the info directory are as follows :
- num_closids: max number of closids for MBA which represents the max
class of service user can configure.
- min_bw: the minimum memory bandwidth(b/w) values in percentage.

OS maps the b/w percentage values to memory b/w throttle delay values
and configures them via MSR interface by writing to the QOS_MSRs. These
delay values can have a linear or nonlinear scale.

- bw_gran: The memory b/w granularity that can be configured.
For ex: If the granularity is 10% and min_bw is 10, valid bandwidth
values are 10,20,30...

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  2 +
 arch/x86/kernel/cpu/intel_rdt.c          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 64 ++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 36abed2..24de64c 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -149,6 +149,8 @@ struct msr_param {
 };
 
 void rdt_get_cache_infofile(struct rdt_resource *r);
+void rdt_get_mba_infofile(struct rdt_resource *r);
+
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 481ff32..353c476b4 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -176,6 +176,7 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
 		if (rdt_get_mb_table(r))
 			return false;
 	}
+	rdt_get_mba_infofile(r);
 
 	r->capable = true;
 	r->enabled = true;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 0a70e87..b445ee8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -518,6 +518,36 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_min_bw_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->min_bw);
+
+	return 0;
+}
+
+static int rdt_bw_gran_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->bw_gran);
+
+	return 0;
+}
+
+static int rdt_delay_linear_show(struct kernfs_open_file *of,
+			     struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->delay_linear);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_cache_info_files[] = {
 	{
@@ -540,6 +570,40 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 	},
 };
 
+/* rdtgroup information files for MBE. */
+static struct rftype res_mbe_info_files[] = {
+	{
+		.name		= "num_closids",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closids_show,
+	},
+	{
+		.name		= "min_bandwidth",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_min_bw_show,
+	},
+	{
+		.name		= "bandwidth_gran",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_bw_gran_show,
+	},
+	{
+		.name		= "delay_linear",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_delay_linear_show,
+	},
+};
+
+void rdt_get_mba_infofile(struct rdt_resource *r)
+{
+	r->info_files = &res_mbe_info_files[0];
+	r->infofiles_len = ARRAY_SIZE(res_mbe_info_files);
+}
+
 void rdt_get_cache_infofile(struct rdt_resource *r)
 {
 	r->info_files = &res_cache_info_files[0];
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-04-08  0:34 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-10 19:33 [PATCH 0/8 V1] x86/intel_rdt: Memory b/w Allocation support Vikas Shivappa
2017-01-10 19:33 ` [PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation user interface Vikas Shivappa
2017-01-16 13:43   ` Thomas Gleixner
2017-01-18  0:51     ` Shivappa Vikas
2017-01-18  9:01       ` Thomas Gleixner
2017-01-23 18:57         ` Shivappa Vikas
2017-01-23 19:01           ` Thomas Gleixner
2017-01-24 22:10             ` Shivappa Vikas
2017-01-10 19:33 ` [PATCH 2/8] x86/intel_rdt/mba: Generalize the naming to get ready for MBA Vikas Shivappa
2017-01-16 13:45   ` Thomas Gleixner
2017-01-10 19:33 ` [PATCH 3/8] x86/intel_rdt/mba: Improvements to handle more RDT resources like MBA Vikas Shivappa
2017-01-16 13:54   ` Thomas Gleixner
2017-01-18  0:53     ` Shivappa Vikas
2017-01-18  9:05       ` Thomas Gleixner
2017-01-10 19:33 ` [PATCH 4/8] x86/intel_rdt/mba: Memory b/w allocation feature detect Vikas Shivappa
2017-01-16 13:59   ` Thomas Gleixner
2017-01-16 14:41     ` Peter Zijlstra
2017-01-16 16:16       ` Thomas Gleixner
2017-01-10 19:33 ` [PATCH 5/8] x86/intel_rct/mba: Add MBA structures and initialize MBA Vikas Shivappa
2017-01-16 14:06   ` Thomas Gleixner
2017-01-18  0:56     ` Shivappa Vikas
2017-01-10 19:33 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
2017-01-16 14:14   ` Thomas Gleixner
2017-01-18  1:02     ` Shivappa Vikas
2017-01-18  9:08       ` Thomas Gleixner
2017-01-23 19:45         ` Shivappa Vikas
2017-01-10 19:33 ` [PATCH 7/8] x86/intel_rdt/mba: Add schemata file support " Vikas Shivappa
2017-01-16 16:07   ` Thomas Gleixner
2017-01-18  1:03     ` Shivappa Vikas
2017-01-10 19:33 ` [PATCH 8/8] x86/intel_rdt: rmdir,umount and hotcpu updates " Vikas Shivappa
2017-01-16 16:13   ` Thomas Gleixner
2017-01-23 20:18     ` Shivappa Vikas
2017-02-17 19:58 [PATCH 0/8 V2] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
2017-02-17 19:58 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
2017-03-01 16:05   ` Thomas Gleixner
2017-03-10 23:26     ` Shivappa Vikas
2017-04-03 21:57 [PATCH 0/8 V3] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
2017-04-03 21:57 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa
2017-04-08  0:33 [PATCH 0/8 V4] x86/intel_rdt: Intel Memory bandwidth allocation Vikas Shivappa
2017-04-08  0:33 ` [PATCH 6/8] x86/intel_rdt/mba: Add info directory files for MBA Vikas Shivappa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).