All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/18] Intel Cache Allocation Technology
@ 2016-10-22 13:19 Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
                   ` (18 more replies)
  0 siblings, 19 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This version should cover all comments from Thomas.

The patches are in the same order as V4 and have small commit messages
changes in patch #16.

0001-Documentation-ABI-Add-a-document-entry-for-cache-id.patch
0002-cacheinfo-Introduce-cache-id.patch
0003-x86-intel_cacheinfo-Enable-cache-id-in-x86.patch

	These three define an "id" for each cache ... we need a "name"
	for a cache so we can say what restrictions to apply to each
	cache in the system.  All you will see at this point is an
	extra "id" file in each /sys/devices/system/cpu/cpu*/cache/index*/
	directory.

0004-x86-intel_rdt-Feature-discovery.patch

	Look at CPUID for the features related to cache allocation.
	At this point /proc/cpuinfo shows extra flags for the features
	found on your system.

0005-Documentation-x86-Documentation-for-Intel-resource-a.patch

	Documentation patch could be anywhere in this sequence. We
	put in early so you can read it to see how to use the
	interface.

0006-x86-intel_rdt-Add-CONFIG-Makefile-and-basic-initiali.patch

	Add CONFIG_INTEL_RDT (default "n" ... you'll have to set
	it to have this, and all the following patches do anything).
	Template driver here just checks for features and spams
	the console with one line for each.

0007-x86-intel_rdt-Add-Haswell-feature-discovery.patch

	There are some Haswell systems that support cache allocation,
	but they were made before the CPUID bits were fully defined.
	So we check by probing the CBM base MSR to see if CLOSID
	bits stick. Unless you have one of these Haswells, you won't
	see any difference here.

0008-x86-intel_rdt-Pick-up-L3-L2-RDT-parameters-from-CPUID.patch

	This is all new code, not seen in the previous versions of this
	patch series. L3 and L2 cache allocations are just the first of
	several resource control features. Define rdt_resource structure
	that contains all the useful things we need to know about a
	resource. Pick up the parameters for the resource from CPUID.
	The console spam strings change format here.

0009-x86-cqm-Move-PQR_ASSOC-management-code-into-generic-.patch

	The PQR_ASSOC MSR has a field for the CLOSID (which we need
	define which allocation rules are in effect). But it also
	contains the RMID (used by CQM and MBM perf monitoring).
	The perf code got here first, but defined structures that
	make it easy for the two systems to co-exist without stomping
	on each other. This patch moves the relevant parts into a
	common header file and changes the scope from "static" to
	global so we can access them. No visible change.

0010-x86-intel_rdt-Build-structures-for-each-resource-bas.patch

	For each enabled resource, we build a list of "rdt_domains" based
	on hotplug cpu notifications. Since we only have L3 at this point,
	this is just a list of L3 caches (named by the "id" established
	in the first three patches). As each cache is found we initialize
	the array of CBMs (cache bit masks). No visible change here.

0011-x86-intel_rdt-Add-basic-resctrl-filesystem-support.patch

	Our interface is a kernfs backed file system. Establish the
	mount point, and provide mount/unmount functionality.
	At this point "/sys/fs/resctrl" appears. You can mount and
	unmount the resctrl file system (if your system supports
	code/data prioritization, you can use the "cdp" mount option).
	The file system is empty and doesn't allow creation of any
	files or subdirectories.

0012-x86-intel_rdt-Add-info-files-to-resctrl-file-system.patch

	Parameters for each resource are buried in CPUID leaf 0x10.
	This isn't very user friendly for scripts and applications
	that want to configure resource allocation. Create an
	"info" directory, with a subdirectory for each resource
	containing a couple of useful parameters. Visible change:
	$ ls -l /sys/fs/resctrl/info/L3
	total 0
	-r--r--r-- 1 root root 0 Oct  7 11:20 cbm_val
	-r--r--r-- 1 root root 0 Oct  7 11:20 num_closid

0013-x86-intel_rdt-Add-mkdir-to-resctrl-file-system.patch

	Each resource group is represented by a directory in the
	resctrl file system. The root directory is the default group.
	Use "mkdir" to create new groups and "rmdir" to remove them.
	The maximum number of groups is defined by the effective
	number of CLOSIDs.
	Visible change: If you have CDP (and enable with the "cdp"
	mount option) you will find that you can only create half
	as many groups as without (e.g. 8 vs. 16 on Broadwell, but
	the default group uses one ... so actually 7, 15).

0014-x86-intel_rdt-Add-cpus-file.patch

	One of the control mechanisms for a resource group is the
	logical CPU. Initially all CPUs are assigned to the default
	group. They can be reassigned to other groups by writing
	a cpumask to the "cpus" file. See the documentation for what
	this means.
	Visible change: "cpus" file in the root, and automatically
	in each created subdirectory. You can "echo" masks to these
	files and watch as CPUs added to one group are removed from
	whatever group they previously belonged to. Removing a directory
	will give all CPUs owned by it back to the default (root)
	group.

0015-x86-intel_rdt-Add-tasks-files.patch

	Tasks can be assigned to resource groups by writing their PID
	to a "tasks" file (which removes the task from its previous
	group). Forked/cloned tasks inherit the group from their
	parent. You cannot remove a group (directory) that has any
	tasks assigned.
	Visible change: "tasks" files appear. E.g. (we see two tasks
	in the group, our shell, and the "cat" that it spawned).
	# echo $$ > p0/tasks; cat p0/tasks
	268890
	268914

0016-x86-intel_rdt-Add-schemata-file.patch

	The "schemata" file in each group/directory defines what
	access tasks controlled by this resource are permitted.
	One line per resource type. Fields for each instance of
	the resource. You redefine the access by wrting to the
	file in the same format.
	Visible change: "schemata" file which starts out with maximum
	allowed resources. E.g.
	$ cat schemata
	L3:0=fffff;1=fffff
	Now restrict this group to just 20% of L3 on first cache, but
	allow 50% on the second
	# echo L3:0=f;1=3ff > schemata

0017-x86-intel_rdt-Add-scheduler-hook.patch

	When context switching we check if we are changing resource
	groups for the new process, and update the PQR_ASSOC MSR with
	the new CLOSID if needed.
	Visble change: Everything should be working now. Tasks run with
	the permitted access to L3 cache.

0018-MAINTAINERS-Add-maintainer-for-Intel-RDT-resource-al.patch

	New files ... need a maintainer. Fenghua has the job.

Fenghua Yu (14):
  cacheinfo: Introduce cache id
  x86/intel_cacheinfo: Enable cache id in cache info
  x86/intel_rdt: Feature discovery
  Documentation, x86: Documentation for Intel resource allocation user
    interface
  x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
  x86/intel_rdt: Add Haswell feature discovery
  x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
  x86/cqm: Move PQR_ASSOC management code into generic code used by both
    CQM and CAT
  x86/intel_rdt: Add basic resctrl filesystem support
  x86/intel_rdt: Add "info" files to resctrl file system
  x86/intel_rdt: Add mkdir to resctrl file system
  x86/intel_rdt: Add tasks files
  x86/intel_rdt: Add scheduler hook
  MAINTAINERS: Add maintainer for Intel RDT resource allocation

Tony Luck (4):
  Documentation, ABI: Add a document entry for cache id
  x86/intel_rdt: Build structures for each resource based on cache
    topology
  x86/intel_rdt: Add cpus file
  x86/intel_rdt: Add schemata file

 Documentation/ABI/testing/sysfs-devices-system-cpu |   16 +
 Documentation/x86/intel_rdt_ui.txt                 |  167 ++++
 MAINTAINERS                                        |    8 +
 arch/x86/Kconfig                                   |   12 +
 arch/x86/events/intel/cqm.c                        |   23 +-
 arch/x86/include/asm/cpufeatures.h                 |    5 +
 arch/x86/include/asm/intel_rdt.h                   |  219 +++++
 arch/x86/include/asm/intel_rdt_common.h            |   27 +
 arch/x86/kernel/cpu/Makefile                       |    2 +
 arch/x86/kernel/cpu/intel_cacheinfo.c              |   20 +
 arch/x86/kernel/cpu/intel_rdt.c                    |  382 ++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c           | 1013 ++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_schemata.c           |  245 +++++
 arch/x86/kernel/cpu/scattered.c                    |    3 +
 arch/x86/kernel/process_32.c                       |    4 +
 arch/x86/kernel/process_64.c                       |    4 +
 drivers/base/cacheinfo.c                           |    5 +
 include/linux/cacheinfo.h                          |    3 +
 include/linux/sched.h                              |    3 +
 include/uapi/linux/magic.h                         |    1 +
 20 files changed, 2141 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/x86/intel_rdt_ui.txt
 create mode 100644 arch/x86/include/asm/intel_rdt.h
 create mode 100644 arch/x86/include/asm/intel_rdt_common.h
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_schemata.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:25   ` [tip:x86/cache] Documentation, ABI: Document the new sysfs files for cpu cache ids tip-bot for Tony Luck
  2016-10-22 13:19 ` [PATCH v5 02/18] cacheinfo: Introduce cache id Fenghua Yu
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Tony Luck <tony.luck@intel.com>

Add an ABI document entry for /sys/devices/system/cpu/cpu*/cache/index*/id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 4987417..2a4a423 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -272,6 +272,22 @@ Description:	Parameters for the CPU cache attributes
 				     the modified cache line is written to main
 				     memory only when it is replaced
 
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/id
+Date:		September 2016
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Cache id
+
+		The id provides a unique number for a specific instance of
+		a cache of a particular type. E.g. there may be a level
+		3 unified cache on each socket in a server and we may
+		assign them ids 0, 1, 2, ...
+
+		Note that id value can be non-contiguous. E.g. level 1
+		caches typically exist per core, but there may not be a
+		power of two cores on a socket, so these caches may be
+		numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
+
 What:		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 02/18] cacheinfo: Introduce cache id
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:25   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info Fenghua Yu
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Cache management software needs an id for each instance of a cache of
a particular type.

The current cacheinfo structure does not provide any information about
the underlying hardware so there is no way to expose it.

Hardware with cache management features provides means (cpuid, enumeration
etc.) to retrieve the hardware id of a particular cache instance. Cache
instances which share hardware have the same hardware id.

Add an 'id' field to struct cacheinfo to store this information. Expose
this information under the /sys/devices/system/cpu/cpu*/cache/index*/
directory as well.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/base/cacheinfo.c  | 5 +++++
 include/linux/cacheinfo.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index e9fd32e..00a9688 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev,		\
 	return sprintf(buf, "%u\n", this_leaf->object);		\
 }
 
+show_one(id, id);
 show_one(level, level);
 show_one(coherency_line_size, coherency_line_size);
 show_one(number_of_sets, number_of_sets);
@@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
 	return n;
 }
 
+static DEVICE_ATTR_RO(id);
 static DEVICE_ATTR_RO(level);
 static DEVICE_ATTR_RO(type);
 static DEVICE_ATTR_RO(coherency_line_size);
@@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
 static DEVICE_ATTR_RO(physical_line_partition);
 
 static struct attribute *cache_default_attrs[] = {
+	&dev_attr_id.attr,
 	&dev_attr_type.attr,
 	&dev_attr_level.attr,
 	&dev_attr_shared_cpu_map.attr,
@@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
 	const struct cpumask *mask = &this_leaf->shared_cpu_map;
 	umode_t mode = attr->mode;
 
+	if ((attr == &dev_attr_id.attr) && (this_leaf->attributes & CACHE_ID))
+		return mode;
 	if ((attr == &dev_attr_type.attr) && this_leaf->type)
 		return mode;
 	if ((attr == &dev_attr_level.attr) && this_leaf->level)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2189935..0bcbb67 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -18,6 +18,7 @@ enum cache_type {
 
 /**
  * struct cacheinfo - represent a cache leaf node
+ * @id: This cache's id. It is unique among caches with the same (type, level).
  * @type: type of the cache - data, inst or unified
  * @level: represents the hierarchy in the multi-level cache
  * @coherency_line_size: size of each cache line usually representing
@@ -44,6 +45,7 @@ enum cache_type {
  * keeping, the remaining members form the core properties of the cache
  */
 struct cacheinfo {
+	unsigned int id;
 	enum cache_type type;
 	unsigned int level;
 	unsigned int coherency_line_size;
@@ -61,6 +63,7 @@ struct cacheinfo {
 #define CACHE_WRITE_ALLOCATE	BIT(3)
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
+#define CACHE_ID		BIT(4)
 
 	struct device_node *of_node;
 	bool disable_sysfs;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 02/18] cacheinfo: Introduce cache id Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:26   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".

Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel_cacheinfo.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index de6626c..8dc5720 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -153,6 +153,7 @@ struct _cpuid4_info_regs {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
 	union _cpuid4_leaf_ecx ecx;
+	unsigned int id;
 	unsigned long size;
 	struct amd_northbridge *nb;
 };
@@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct _cpuid4_info_regs *base)
 {
+	this_leaf->id = base->id;
+	this_leaf->attributes = CACHE_ID;
 	this_leaf->level = base->eax.split.level;
 	this_leaf->type = cache_type_map[base->eax.split.type];
 	this_leaf->coherency_line_size =
@@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * ECX as cache index. Then right shift apicid by the number's order to get
+ * cache id for this cache node.
+ */
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	unsigned long num_threads_sharing;
+	int index_msb;
+
+	num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+	index_msb = get_count_order(num_threads_sharing);
+	id4_regs->id = c->apicid >> index_msb;
+}
+
 static int __populate_cache_leaves(unsigned int cpu)
 {
 	unsigned int idx, ret;
@@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu)
 		ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
 		if (ret)
 			return ret;
+		get_cache_id(cpu, &id4_regs);
 		ci_leaf_init(this_leaf++, &id4_regs);
 		__cache_cpumap_setup(cpu, idx, &id4_regs);
 	}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 04/18] x86/intel_rdt: Feature discovery
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (2 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 14:15   ` Borislav Petkov
  2016-10-26 21:26   ` [tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 05/18] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Presence of allocation features:
  CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
  CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
  CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
  CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 5 +++++
 arch/x86/kernel/cpu/scattered.c    | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 1188bc8..21f4a7e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -195,6 +195,10 @@
 
 #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
 
+#define X86_FEATURE_CAT_L3	( 7*32+16) /* Cache Allocation Technology L3 */
+#define X86_FEATURE_CAT_L2	( 7*32+17) /* Cache Allocation Technology L2 */
+#define X86_FEATURE_CDP_L3	( 7*32+18) /* Code and Data Prioritization L3 */
+
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
 #define X86_FEATURE_VNMI        ( 8*32+ 1) /* Intel Virtual NMI */
@@ -219,6 +223,7 @@
 #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
 #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
+#define X86_FEATURE_RDT_A	( 9*32+15) /* Resource Director Technology Allocation */
 #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_AVX512DQ	( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 8cb57df..11f39a2 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -34,6 +34,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 		{ X86_FEATURE_INTEL_PT,		CR_EBX,25, 0x00000007, 0 },
 		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006, 0 },
 		{ X86_FEATURE_EPB,		CR_ECX, 3, 0x00000006, 0 },
+		{ X86_FEATURE_CAT_L3,		CR_EBX, 1, 0x00000010, 0 },
+		{ X86_FEATURE_CAT_L2,		CR_EBX, 2, 0x00000010, 0 },
+		{ X86_FEATURE_CDP_L3,		CR_ECX, 2, 0x00000010, 1 },
 		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
 		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
 		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 05/18] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (3 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

The documentation describes user interface of how to allocate resource
in Intel RDT.

Please note that the documentation covers generic user interface. Current
patch set code only implemente CAT L3. CAT L2 code will be sent later.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 167 +++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)
 create mode 100644 Documentation/x86/intel_rdt_ui.txt

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
new file mode 100644
index 0000000..b9f634c
--- /dev/null
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -0,0 +1,167 @@
+User Interface for Resource Allocation in Intel Resource Director Technology
+
+Copyright (C) 2016 Intel Corporation
+
+Fenghua Yu <fenghua.yu@intel.com>
+Tony Luck <tony.luck@intel.com>
+
+This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
+X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
+
+To use the feature mount the file system:
+
+ # mount -t resctrl resctrl [-o cdp] /sys/fs/resctrl
+
+mount options are:
+
+"cdp": Enable code/data prioritization in L3 cache allocations.
+
+
+Resource groups
+---------------
+Resource groups are represented as directories in the resctrl file
+system. The default group is the root directory. Other groups may be
+created as desired by the system administrator using the "mkdir(1)"
+command, and removed using "rmdir(1)".
+
+There are three files associated with each group:
+
+"tasks": A list of tasks that belongs to this group. Tasks can be
+	added to a group by writing the task ID to the "tasks" file
+	(which will automatically remove them from the previous
+	group to which they belonged). New tasks created by fork(2)
+	and clone(2) are added to the same group as their parent.
+	If a pid is not in any sub partition, it is in root partition
+	(i.e. default partition).
+
+"cpus": A bitmask of logical CPUs assigned to this group. Writing
+	a new mask can add/remove CPUs from this group. Added CPUs
+	are removed from their previous group. Removed ones are
+	given to the default (root) group. You cannot remove CPUs
+	from the default group.
+
+"schemata": A list of all the resources available to this group.
+	Each resource has its own line and format - see below for
+	details.
+
+When a task is running the following rules define which resources
+are available to it:
+
+1) If the task is a member of a non-default group, then the schemata
+for that group is used.
+
+2) Else if the task belongs to the default group, but is running on a
+CPU that is assigned to some specific group, then the schemata for
+the CPU's group is used.
+
+3) Otherwise the schemata for the default group is used.
+
+
+Schemata files - general concepts
+---------------------------------
+Each line in the file describes one resource. The line starts with
+the name of the resource, followed by specific values to be applied
+in each of the instances of that resource on the system.
+
+Cache IDs
+---------
+On current generation systems there is one L3 cache per socket and L2
+caches are generally just shared by the hyperthreads on a core, but this
+isn't an architectural requirement. We could have multiple separate L3
+caches on a socket, multiple cores could share an L2 cache. So instead
+of using "socket" or "core" to define the set of logical cpus sharing
+a resource we use a "Cache ID". At a given cache level this will be a
+unique number across the whole system (but it isn't guaranteed to be a
+contiguous sequence, there may be gaps).  To find the ID for each logical
+CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
+
+Cache Bit Masks (CBM)
+---------------------
+For cache resources we describe the portion of the cache that is available
+for allocation using a bitmask. The maximum value of the mask is defined
+by each cpu model (and may be different for different cache levels). It
+is found using CPUID, but is also provided in the "info" directory of
+the resctrl file system in "info/{resource}/max_cbm_val". X86 hardware
+requires that these masks have all the '1' bits in a contiguous block. So
+0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
+and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
+of the capacity of the cache. You could partition the cache into four
+equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
+
+
+L3 details (code and data prioritization disabled)
+--------------------------------------------------
+With CDP disabled the L3 schemata format is:
+
+	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+L3 details (CDP enabled via mount option to resctrl)
+----------------------------------------------------
+When CDP is enabled L3 control is split into two separate resources
+so you can specify independent masks for code and data like this:
+
+	L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+	L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+L2 details
+----------
+L2 cache does not support code and data prioritization, so the
+schemata format is always:
+
+	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+Example 1
+---------
+On a two socket machine (one L3 cache per socket) with just four bits
+for cache bit masks
+
+# mount -t resctrl resctrl /sys/fs/resctrl
+# cd /sys/fs/resctrl
+# mkdir p0 p1
+# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
+# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
+
+The default resource group is unmodified, so we have access to all parts
+of all caches (its schemata file reads "L3:0=f;1=f").
+
+Tasks that are under the control of group "p0" may only allocate from the
+"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
+Tasks in group "p1" use the "lower" 50% of cache on both sockets.
+
+Example 2
+---------
+Again two sockets, but this time with a more realistic 20-bit mask.
+
+Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
+processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
+neighbors, each of the two real-time tasks exclusively occupies one quarter
+of L3 cache on socket 0.
+
+# mount -t resctrl resctrl /sys/fs/resctrl
+# cd /sys/fs/resctrl
+
+First we reset the schemata for the default group so that the "upper"
+50% of the L3 cache on socket 0 cannot be used by ordinary tasks:
+
+# echo "L3:0=3ff;1=fffff" > schemata
+
+Next we make a resource group for our first real time task and give
+it access to the "top" 25% of the cache on socket 0.
+
+# mkdir p0
+# echo "L3:0=f8000;1=fffff" > p0/schemata
+
+Finally we move our first real time task into this resource group. We
+also use taskset(1) to ensure the task always runs on a dedicated CPU
+on socket 0. Most uses of resource groups will also constrain which
+processors tasks run on.
+
+# echo 1234 > p0/tasks
+# taskset -cp 1 1234
+
+Ditto for the second real time task (with the remaining 25% of cache):
+
+# mkdir p1
+# echo "L3:0=7c00;1=fffff" > p1/schemata
+# echo 5678 > p1/tasks
+# taskset -cp 2 5678
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (4 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 05/18] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 20:43   ` Thomas Gleixner
  2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery Fenghua Yu
                   ` (12 subsequent siblings)
  18 siblings, 2 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Introduce CONFIG_INTEL_RDT_A (default: no, dependent on X86 and
CPU_SUP_INTEL) to control inclusion of Resource Director Technology in
the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/Kconfig                | 12 +++++++++
 arch/x86/kernel/cpu/Makefile    |  2 ++
 arch/x86/kernel/cpu/intel_rdt.c | 54 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636..770fb5f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -407,6 +407,18 @@ config GOLDFISH
        def_bool y
        depends on X86_GOLDFISH
 
+config INTEL_RDT_A
+	bool "Intel Resource Director Technology Allocation support"
+	default n
+	depends on X86 && CPU_SUP_INTEL
+	help
+	  Select to enable resource allocation which is a sub-feature of
+	  Intel Resource Director Technology(RDT). More information about
+	  RDT can be found in the Intel x86 Architecture Software
+	  Developer Manual.
+
+	  Say N if unsure.
+
 if X86_32
 config X86_EXTENDED_PLATFORM
 	bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4a8697f..cf4bfd0 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
+obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o
+
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
 obj-$(CONFIG_MICROCODE)			+= microcode/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 0000000..7d7aebe
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,54 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ *    Fenghua Yu <fenghua.yu@intel.com>
+ *    Tony Luck <tony.luck@intel.com>
+ *    Vikas Shivappa <vikas.shivappa@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+
+static inline bool get_rdt_resources(void)
+{
+	bool ret = false;
+
+	if (!boot_cpu_has(X86_FEATURE_RDT_A))
+		return false;
+	if (boot_cpu_has(X86_FEATURE_CAT_L3))
+		ret = true;
+
+	return ret;
+}
+
+static int __init intel_rdt_late_init(void)
+{
+	if (!get_rdt_resources())
+		return -ENODEV;
+
+	pr_info("Intel RDT cache allocation detected\n");
+	if (boot_cpu_has(X86_FEATURE_CDP_L3))
+		pr_info("Intel RDT code data prioritization detected\n");
+
+	return 0;
+}
+
+late_initcall(intel_rdt_late_init);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (5 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Fenghua Yu
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Some Haswell generation CPUs support RDT, but they don't enumerate this
using CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on
cpu model 63 (INTEL_FAM6_HASWELL_X)

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/events/intel/cqm.c             |  2 +-
 arch/x86/include/asm/intel_rdt.h        |  6 ++++
 arch/x86/include/asm/intel_rdt_common.h |  6 ++++
 arch/x86/kernel/cpu/intel_rdt.c         | 49 ++++++++++++++++++++++++++++++---
 4 files changed, 58 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_rdt.h
 create mode 100644 arch/x86/include/asm/intel_rdt_common.h

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8f82b02..df86874 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
+#include <asm/intel_rdt_common.h>
 #include "../perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC	0x0c8f
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 0000000..3aca86d
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_H
+#define _ASM_X86_INTEL_RDT_H
+
+#define IA32_L3_CBM_BASE	0xc90
+
+#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
new file mode 100644
index 0000000..e6e15cf
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_COMMON_H
+#define _ASM_X86_INTEL_RDT_COMMON_H
+
+#define MSR_IA32_PQR_ASSOC	0x0c8f
+
+#endif /* _ASM_X86_INTEL_RDT_COMMON_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 7d7aebe..f8e35cf 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -27,16 +27,57 @@
 #include <linux/slab.h>
 #include <linux/err.h>
 
+#include <asm/intel_rdt_common.h>
+#include <asm/intel-family.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as they do not have CPUID enumeration support for Cache allocation.
+ * The check for Vendor/Family/Model is not enough to guarantee that
+ * the MSRs won't #GP fault because only the following SKUs support
+ * CAT:
+ *	Intel(R) Xeon(R)  CPU E5-2658  v3  @  2.20GHz
+ *	Intel(R) Xeon(R)  CPU E5-2648L v3  @  1.80GHz
+ *	Intel(R) Xeon(R)  CPU E5-2628L v3  @  2.00GHz
+ *	Intel(R) Xeon(R)  CPU E5-2618L v3  @  2.30GHz
+ *	Intel(R) Xeon(R)  CPU E5-2608L v3  @  2.00GHz
+ *	Intel(R) Xeon(R)  CPU E5-2658A v3  @  2.20GHz
+ *
+ * Probe by trying to write the first of the L3 cach mask registers
+ * and checking that the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86 == 6 &&
+	    boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) {
+		u32 l, h, max_cbm = BIT_MASK(20) - 1;
+
+		if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
+			return false;
+		rdmsr(IA32_L3_CBM_BASE, l, h);
+
+		/* If all the bits were set in MSR, return success */
+		return l == max_cbm;
+	}
+
+	return false;
+}
+
 static inline bool get_rdt_resources(void)
 {
-	bool ret = false;
+	if (cache_alloc_hsw_probe())
+		return true;
 
 	if (!boot_cpu_has(X86_FEATURE_RDT_A))
 		return false;
-	if (boot_cpu_has(X86_FEATURE_CAT_L3))
-		ret = true;
+	if (!boot_cpu_has(X86_FEATURE_CAT_L3))
+		return false;
 
-	return ret;
+	return true;
 }
 
 static int __init intel_rdt_late_init(void)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (6 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:28   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT Fenghua Yu
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Define struct rdt_resource to hold all the parameterized
values for an RDT resource. Fill in some of those values
from CPUID leaf 0x10 (on Haswell we hard code them).

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |  68 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 111 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 3aca86d..9780409 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -2,5 +2,73 @@
 #define _ASM_X86_INTEL_RDT_H
 
 #define IA32_L3_CBM_BASE	0xc90
+#define IA32_L2_CBM_BASE	0xd10
 
+/**
+ * struct rdt_resource - attributes of an RDT resource
+ * @enabled:			Is this feature enabled on this machine
+ * @capable:			Is this feature available on this machine
+ * @name:			Name to use in "schemata" file
+ * @num_closid:			Number of CLOSIDs available
+ * @max_cbm:			Largest Cache Bit Mask allowed
+ * @min_cbm_bits:		Minimum number of consecutive bits to be set
+ *				in a cache bit mask
+ * @domains:			All domains for this resource
+ * @num_domains:		Number of domains active
+ * @msr_base:			Base MSR address for CBMs
+ * @tmp_cbms:			Scratch space when updating schemata
+ * @cache_level:		Which cache level defines scope of this domain
+ * @cbm_idx_multi:		Multiplier of CBM index
+ * @cbm_idx_offset:		Offset of CBM index. CBM index is computed by:
+ *				closid * cbm_idx_multi + cbm_idx_offset
+ */
+struct rdt_resource {
+	bool			enabled;
+	bool			capable;
+	char			*name;
+	int			num_closid;
+	int			cbm_len;
+	int			min_cbm_bits;
+	u32			max_cbm;
+	struct list_head	domains;
+	int			num_domains;
+	int			msr_base;
+	u32			*tmp_cbms;
+	int			cache_level;
+	int			cbm_idx_multi;
+	int			cbm_idx_offset;
+};
+
+extern struct rdt_resource rdt_resources_all[];
+
+enum {
+	RDT_RESOURCE_L3,
+	RDT_RESOURCE_L3DATA,
+	RDT_RESOURCE_L3CODE,
+	RDT_RESOURCE_L2,
+
+	/* Must be the last */
+	RDT_NUM_RESOURCES,
+};
+
+#define for_each_capable_rdt_resource(r)				      \
+	for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\
+	     r++) 							      \
+		if (r->capable)
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
+union cpuid_0x10_1_eax {
+	struct {
+		unsigned int cbm_len:5;
+	} split;
+	unsigned int full;
+};
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
+union cpuid_0x10_1_edx {
+	struct {
+		unsigned int cos_max:16;
+	} split;
+	unsigned int full;
+};
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f8e35cf..157dc8d0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -31,6 +31,47 @@
 #include <asm/intel-family.h>
 #include <asm/intel_rdt.h>
 
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+
+struct rdt_resource rdt_resources_all[] = {
+	{
+		.name		= "L3",
+		.domains	= domain_init(RDT_RESOURCE_L3),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset	= 0
+	},
+	{
+		.name		= "L3DATA",
+		.domains	= domain_init(RDT_RESOURCE_L3DATA),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 2,
+		.cbm_idx_offset	= 0
+	},
+	{
+		.name		= "L3CODE",
+		.domains	= domain_init(RDT_RESOURCE_L3CODE),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 2,
+		.cbm_idx_offset	= 1
+	},
+	{
+		.name		= "L2",
+		.domains	= domain_init(RDT_RESOURCE_L2),
+		.msr_base	= IA32_L2_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 2,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset	= 0
+	},
+};
+
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -54,6 +95,7 @@ static inline bool cache_alloc_hsw_probe(void)
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
 	    boot_cpu_data.x86 == 6 &&
 	    boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) {
+		struct rdt_resource *r  = &rdt_resources_all[RDT_RESOURCE_L3];
 		u32 l, h, max_cbm = BIT_MASK(20) - 1;
 
 		if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
@@ -61,33 +103,88 @@ static inline bool cache_alloc_hsw_probe(void)
 		rdmsr(IA32_L3_CBM_BASE, l, h);
 
 		/* If all the bits were set in MSR, return success */
-		return l == max_cbm;
+		if (l != max_cbm)
+			return false;
+
+		r->num_closid = 4;
+		r->cbm_len = 20;
+		r->max_cbm = max_cbm;
+		r->min_cbm_bits = 2;
+		r->capable = true;
+		r->enabled = true;
+
+		return true;
 	}
 
 	return false;
 }
 
+static void rdt_get_config(int idx, struct rdt_resource *r)
+{
+	union cpuid_0x10_1_eax eax;
+	union cpuid_0x10_1_edx edx;
+	u32 ebx, ecx;
+
+	cpuid_count(0x00000010, idx, &eax.full, &ebx, &ecx, &edx.full);
+	r->num_closid = edx.split.cos_max + 1;
+	r->cbm_len = eax.split.cbm_len + 1;
+	r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	r->capable = true;
+	r->enabled = true;
+}
+
+static void rdt_get_cdp_l3_config(int type)
+{
+	struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+	struct rdt_resource *r = &rdt_resources_all[type];
+
+	r->num_closid = r_l3->num_closid / 2;
+	r->cbm_len = r_l3->cbm_len;
+	r->max_cbm = r_l3->max_cbm;
+	r->capable = true;
+	/*
+	 * By default, CDP is disabled. CDP can be enabled by mount parameter
+	 * "cdp" during resctrl file system mount time.
+	 */
+	r->enabled = false;
+}
+
 static inline bool get_rdt_resources(void)
 {
+	bool ret = false;
+
 	if (cache_alloc_hsw_probe())
 		return true;
 
 	if (!boot_cpu_has(X86_FEATURE_RDT_A))
 		return false;
-	if (!boot_cpu_has(X86_FEATURE_CAT_L3))
-		return false;
 
-	return true;
+	if (boot_cpu_has(X86_FEATURE_CAT_L3)) {
+		rdt_get_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
+		if (boot_cpu_has(X86_FEATURE_CDP_L3)) {
+			rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA);
+			rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE);
+		}
+		ret = true;
+	}
+	if (boot_cpu_has(X86_FEATURE_CAT_L2)) {
+		/* CPUID 0x10.2 fields are same format at 0x10.1 */
+		rdt_get_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
+		ret = true;
+	}
+
+	return ret;
 }
 
 static int __init intel_rdt_late_init(void)
 {
+	struct rdt_resource *r;
+
 	if (!get_rdt_resources())
 		return -ENODEV;
 
-	pr_info("Intel RDT cache allocation detected\n");
-	if (boot_cpu_has(X86_FEATURE_CDP_L3))
-		pr_info("Intel RDT code data prioritization detected\n");
+	for_each_capable_rdt_resource(r)
+		pr_info("Intel RDT %s allocation detected\n", r->name);
 
 	return 0;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (7 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 21:29   ` [tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between " tip-bot for Fenghua Yu
  2016-10-22 13:19 ` [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology Fenghua Yu
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

A CLOSID goes into effect when written to the PQR_ASSOC MSR. But this MSR
also contains the RMID used for preformance monitoring of cache occupancy
and memory bandwidth. Move the management code for this MSR out of
arch/x86/events/intel/cqm.c and into generic RDT code so we can coordinate
updates to the MSR.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/events/intel/cqm.c             | 21 +--------------------
 arch/x86/include/asm/intel_rdt_common.h | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index df86874..0c45cc8 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
 unsigned int mbm_socket_max;
 
-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid:		The cached Resource Monitoring ID
- * @closid:		The cached Class Of Service ID
- * @rmid_usecnt:	The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
-	u32			rmid;
-	u32			closid;
-	int			rmid_usecnt;
-};
-
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
  * updated from a remote CPU. Both functions which modify the state
  * (intel_cqm_event_start and intel_cqm_event_stop) are called with
  * interrupts disabled, which is sufficient for the protection.
  */
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index e6e15cf..b31081b 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -3,4 +3,25 @@
 
 #define MSR_IA32_PQR_ASSOC	0x0c8f
 
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:		The cached Resource Monitoring ID
+ * @closid:		The cached Class Of Service ID
+ * @rmid_usecnt:	The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+	u32			rmid;
+	u32			closid;
+	int			rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
 #endif /* _ASM_X86_INTEL_RDT_COMMON_H */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (8 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 13:02   ` Thomas Gleixner
  2016-10-22 13:19 ` [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support Fenghua Yu
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Tony Luck <tony.luck@intel.com>

We use the cpu hotplug notifier to catch each cpu in turn and look at
its cache topology w.r.t each of the resource groups. As we discover
new resources, we initialize the bitmask array for each to the default
(full access) value.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |  35 ++++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 175 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 210 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 9780409..c0d0a6e 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -39,6 +39,34 @@ struct rdt_resource {
 	int			cbm_idx_offset;
 };
 
+/**
+ * struct rdt_domain - group of cpus sharing an RDT resource
+ * @list:	all instances of this resource
+ * @id:		unique id for this instance
+ * @cpu_mask:	which cpus share this resource
+ * @cbm:	array of cache bit masks (indexed by CLOSID)
+ */
+struct rdt_domain {
+	struct list_head	list;
+	int			id;
+	struct cpumask		cpu_mask;
+	u32			*cbm;
+};
+
+/**
+ * struct msr_param - set a range of MSRs from a domain
+ * @res:       The resource to use
+ * @low:       Beginning index from base MSR
+ * @high:      End index
+ */
+struct msr_param {
+	struct rdt_resource	*res;
+	int			low;
+	int			high;
+};
+
+extern struct mutex rdtgroup_mutex;
+
 extern struct rdt_resource rdt_resources_all[];
 
 enum {
@@ -56,6 +84,11 @@ enum {
 	     r++) 							      \
 		if (r->capable)
 
+#define for_each_enabled_rdt_resource(r)				      \
+	for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\
+	     r++)							      \
+		if (r->enabled)
+
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
 	struct {
@@ -71,4 +104,6 @@ union cpuid_0x10_1_edx {
 	} split;
 	unsigned int full;
 };
+
+void rdt_cbm_update(void *arg);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 157dc8d0..4ca9718 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -26,11 +26,16 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpuhotplug.h>
 
 #include <asm/intel_rdt_common.h>
 #include <asm/intel-family.h>
 #include <asm/intel_rdt.h>
 
+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
 #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
 
 struct rdt_resource rdt_resources_all[] = {
@@ -72,6 +77,11 @@ struct rdt_resource rdt_resources_all[] = {
 	},
 };
 
+static int cbm_idx(struct rdt_resource *r, int closid)
+{
+	return closid * r->cbm_idx_multi + r->cbm_idx_offset;
+}
+
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -176,13 +186,178 @@ static inline bool get_rdt_resources(void)
 	return ret;
 }
 
+static int get_cache_id(int cpu, int level)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+	int i;
+
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == level)
+			return ci->info_list[i].id;
+	}
+
+	return -1;
+}
+
+void rdt_cbm_update(void *arg)
+{
+	struct msr_param *m = (struct msr_param *)arg;
+	struct rdt_resource *r = m->res;
+	int i, cpu = smp_processor_id();
+	struct rdt_domain *d;
+
+	list_for_each_entry(d, &r->domains, list) {
+		/* Find the domain that contains this CPU */
+		if (cpumask_test_cpu(cpu, &d->cpu_mask))
+			goto found;
+	}
+	pr_info_once("cpu %d not found in any domain for resource %s\n",
+		     cpu, r->name);
+
+	return;
+
+found:
+	for (i = m->low; i < m->high; i++) {
+		int idx = cbm_idx(r, i);
+
+		wrmsrl(r->msr_base + idx, d->cbm[i]);
+	}
+}
+
+static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
+					  struct list_head **pos)
+{
+	struct rdt_domain *d;
+	struct list_head *l;
+
+	if (id < 0)
+		return ERR_PTR(id);
+
+	list_for_each(l, &r->domains) {
+		d = list_entry(l, struct rdt_domain, list);
+		/* When id is found, return its domain. */
+		if (id == d->id)
+			return d;
+		/* Stop searching when finding id's position in sorted list. */
+		if (id < d->id)
+			break;
+	}
+	/*
+	 * No id is found in resource domains. Record the position
+	 * that the new domain will be added. The posistion is not used
+	 * when removing a domain.
+	 */
+	*pos = l;
+
+	return NULL;
+}
+
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+	int i, id = get_cache_id(cpu, r->cache_level);
+	struct list_head *add_pos = NULL;
+	struct rdt_domain *d;
+
+	d = rdt_find_domain(r, id, &add_pos);
+	if (IS_ERR(d)) {
+		pr_warn("Could't find cache id for cpu %d\n", cpu);
+		return;
+	}
+
+	if (d) {
+		cpumask_set_cpu(cpu, &d->cpu_mask);
+		return;
+	}
+
+	if (!add_pos) {
+		pr_warn("Couldn't add cpu %d in %s domain\n", cpu, r->name);
+		return;
+	}
+
+	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
+	if (!d)
+		return;
+
+	d->id = id;
+	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
+	if (!d->cbm) {
+		pr_warn("Failed to alloc CBM array for cpu %d\n", cpu);
+		kfree(d);
+		return;
+	}
+	for (i = 0; i < r->num_closid; i++) {
+		int idx = cbm_idx(r, i);
+
+		d->cbm[i] = r->max_cbm;
+		wrmsrl(r->msr_base + idx, d->cbm[i]);
+	}
+	cpumask_set_cpu(cpu, &d->cpu_mask);
+	list_add_tail(&d->list, add_pos);
+	r->num_domains++;
+}
+
+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+	int id = get_cache_id(cpu, r->cache_level);
+	struct list_head *pos;
+	struct rdt_domain *d;
+
+	d = rdt_find_domain(r, id, &pos);
+	if (IS_ERR_OR_NULL(d)) {
+		pr_warn("Could't find cache id for cpu %d\n", cpu);
+		return;
+	}
+
+	cpumask_clear_cpu(cpu, &d->cpu_mask);
+	if (cpumask_empty(&d->cpu_mask)) {
+		r->num_domains--;
+		kfree(d->cbm);
+		list_del(&d->list);
+		kfree(d);
+	}
+}
+
+static int intel_rdt_online_cpu(unsigned int cpu)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	struct rdt_resource *r;
+
+	mutex_lock(&rdtgroup_mutex);
+	for_each_capable_rdt_resource(r)
+		domain_add_cpu(cpu, r);
+	state->closid = 0;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+static int intel_rdt_offline_cpu(unsigned int cpu)
+{
+	struct rdt_resource *r;
+
+	mutex_lock(&rdtgroup_mutex);
+	for_each_capable_rdt_resource(r)
+		domain_remove_cpu(cpu, r);
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 static int __init intel_rdt_late_init(void)
 {
 	struct rdt_resource *r;
+	int state;
 
 	if (!get_rdt_resources())
 		return -ENODEV;
 
+	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				  "x86/rdt/cat:online:",
+				  intel_rdt_online_cpu, intel_rdt_offline_cpu);
+	if (state < 0)
+		return state;
+
 	for_each_capable_rdt_resource(r)
 		pr_info("Intel RDT %s allocation detected\n", r->name);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (9 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 13:52   ` Thomas Gleixner
  2016-10-22 13:19 ` [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system Fenghua Yu
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Use kernfs as basis for our user interface filesystem. This patch
supports mount/umount, and one mount parameter "cdp" to enable code/data
prioritization (though all we do at this point is ensure that the system
can support CDP).  The file system is not populated yet in this patch.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  24 +++
 arch/x86/kernel/cpu/Makefile             |   2 +-
 arch/x86/kernel/cpu/intel_rdt.c          |   8 +-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 264 +++++++++++++++++++++++++++++++
 include/uapi/linux/magic.h               |   1 +
 5 files changed, 297 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index c0d0a6e..8e1d002 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -1,10 +1,30 @@
 #ifndef _ASM_X86_INTEL_RDT_H
 #define _ASM_X86_INTEL_RDT_H
 
+#include <linux/jump_label.h>
+
+#define IA32_L3_QOS_CFG		0xc81
 #define IA32_L3_CBM_BASE	0xc90
 #define IA32_L2_CBM_BASE	0xd10
 
 /**
+ * struct rdtgroup - store rdtgroup's data in resctrl file system.
+ * @kn:				kernfs node
+ * @rdtgroup_list:		linked list for all rdtgroups
+ * @closid:			closid for this rdtgroup
+ */
+struct rdtgroup {
+	struct kernfs_node	*kn;
+	struct list_head	rdtgroup_list;
+	int			closid;
+};
+
+/* List of all resource groups */
+extern struct list_head rdt_all_groups;
+
+int __init rdtgroup_init(void);
+
+/**
  * struct rdt_resource - attributes of an RDT resource
  * @enabled:			Is this feature enabled on this machine
  * @capable:			Is this feature available on this machine
@@ -68,6 +88,10 @@ struct msr_param {
 extern struct mutex rdtgroup_mutex;
 
 extern struct rdt_resource rdt_resources_all[];
+extern struct rdtgroup rdtgroup_default;
+DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
+
+int __init rdtgroup_init(void);
 
 enum {
 	RDT_RESOURCE_L3,
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index cf4bfd0..b4334e8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
-obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o
+obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o intel_rdt_rdtgroup.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 4ca9718..c07e03a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -347,7 +347,7 @@ static int intel_rdt_offline_cpu(unsigned int cpu)
 static int __init intel_rdt_late_init(void)
 {
 	struct rdt_resource *r;
-	int state;
+	int state, ret;
 
 	if (!get_rdt_resources())
 		return -ENODEV;
@@ -358,6 +358,12 @@ static int __init intel_rdt_late_init(void)
 	if (state < 0)
 		return state;
 
+	ret = rdtgroup_init();
+	if (ret) {
+		cpuhp_remove_state(state);
+		return ret;
+	}
+
 	for_each_capable_rdt_resource(r)
 		pr_info("Intel RDT %s allocation detected\n", r->name);
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
new file mode 100644
index 0000000..6c9061d
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -0,0 +1,264 @@
+/*
+ * User interface for Resource Alloction in Resource Director Technology(RDT)
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Author: Fenghua Yu <fenghua.yu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/fs.h>
+#include <linux/sysfs.h>
+#include <linux/kernfs.h>
+#include <linux/slab.h>
+
+#include <uapi/linux/magic.h>
+
+#include <asm/intel_rdt.h>
+
+DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
+struct kernfs_root *rdt_root;
+struct rdtgroup rdtgroup_default;
+LIST_HEAD(rdt_all_groups);
+
+static void l3_qos_cfg_update(void *arg)
+{
+	bool enable = *(bool *)arg;
+
+	wrmsrl(IA32_L3_QOS_CFG, enable);
+}
+
+static int set_l3_qos_cfg(struct rdt_resource *r, bool enable)
+{
+	struct rdt_domain *d;
+	cpumask_var_t cpu_mask;
+	int cpu;
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	list_for_each_entry(d, &r->domains, list) {
+		/* Pick one CPU from each domain instance to update MSR */
+		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+	}
+	cpu = get_cpu();
+	/* Update QOS_CFG MSR on this cpu if it's in cpu_mask. */
+	if (cpumask_test_cpu(cpu, cpu_mask))
+		l3_qos_cfg_update(r);
+	/* Update QOS_CFG MSR on all other cpus in cpu_mask. */
+	smp_call_function_many(cpu_mask, l3_qos_cfg_update, &enable, 1);
+	put_cpu();
+
+	free_cpumask_var(cpu_mask);
+
+	return 0;
+}
+
+static int enable_cdp(void)
+{
+	struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+	struct rdt_resource *r_l3data = &rdt_resources_all[RDT_RESOURCE_L3DATA];
+	struct rdt_resource *r_l3code = &rdt_resources_all[RDT_RESOURCE_L3CODE];
+
+	if (!r_l3->capable || !r_l3data->capable || !r_l3code->capable)
+		return -EINVAL;
+
+	r_l3->enabled = false;
+	r_l3data->enabled = true;
+	r_l3code->enabled = true;
+
+	return 0;
+}
+
+static int parse_rdtgroupfs_options(char *data)
+{
+	char *token, *o = data;
+	int ret = 0;
+
+	while ((token = strsep(&o, ",")) != NULL) {
+		if (!*token)
+			return -EINVAL;
+
+		if (!strcmp(token, "cdp"))
+			ret = enable_cdp();
+	}
+
+	return ret;
+}
+
+static struct dentry *rdt_mount(struct file_system_type *fs_type,
+				int flags, const char *unused_dev_name,
+				void *data)
+{
+	struct dentry *dentry;
+	int ret;
+
+	mutex_lock(&rdtgroup_mutex);
+	/*
+	 * resctrl file system can only be mounted once.
+	 */
+	if (static_branch_unlikely(&rdt_enable_key)) {
+		dentry = ERR_PTR(-EBUSY);
+		goto out;
+	}
+
+	ret = parse_rdtgroupfs_options(data);
+	if (ret) {
+		dentry = ERR_PTR(ret);
+		goto out;
+	}
+
+	dentry = kernfs_mount(fs_type, flags, rdt_root,
+			      RDTGROUP_SUPER_MAGIC, NULL);
+	if (IS_ERR(dentry))
+		goto out;
+
+	if (rdt_resources_all[RDT_RESOURCE_L3DATA].enabled &&
+	    rdt_resources_all[RDT_RESOURCE_L3CODE].enabled) {
+		ret = set_l3_qos_cfg(&rdt_resources_all[RDT_RESOURCE_L3], true);
+		if (ret)
+			goto out;
+	}
+	static_branch_enable(&rdt_enable_key);
+
+out:
+	mutex_unlock(&rdtgroup_mutex);
+
+	return dentry;
+}
+
+static int reset_all_cbms(struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+	struct msr_param msr_param;
+	cpumask_var_t cpu_mask;
+	int i, cpu;
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	msr_param.res = r;
+	msr_param.low = 0;
+	msr_param.high = r->num_closid;
+
+	/*
+	 * Reset each domain's all CBMs to max value and copy the domain's
+	 * cpus to cpu_mask.
+	 */
+	list_for_each_entry(d, &r->domains, list) {
+		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+
+		for (i = 0; i < r->num_closid; i++)
+			d->cbm[i] = r->max_cbm;
+	}
+	cpu = get_cpu();
+	/* Update CBM on this cpu if it's in cpu_mask. */
+	if (cpumask_test_cpu(cpu, cpu_mask))
+		rdt_cbm_update(&msr_param);
+	/* Update CBM on all other cpus in cpu_mask. */
+	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	put_cpu();
+
+	free_cpumask_var(cpu_mask);
+
+	return 0;
+}
+
+static void rdt_kill_sb(struct super_block *sb)
+{
+	struct rdt_resource *r;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	/*Put everything back to default values. */
+	for_each_enabled_rdt_resource(r)
+		reset_all_cbms(r);
+	r = &rdt_resources_all[RDT_RESOURCE_L3];
+	r->enabled = r->capable;
+	if (boot_cpu_has(X86_FEATURE_CDP_L3)) {
+		rdt_resources_all[RDT_RESOURCE_L3DATA].enabled = false;
+		rdt_resources_all[RDT_RESOURCE_L3CODE].enabled = false;
+		set_l3_qos_cfg(r, false);
+	}
+
+	static_branch_disable(&rdt_enable_key);
+	kernfs_kill_sb(sb);
+	mutex_unlock(&rdtgroup_mutex);
+}
+
+static struct file_system_type rdt_fs_type = {
+	.name    = "resctrl",
+	.mount   = rdt_mount,
+	.kill_sb = rdt_kill_sb,
+};
+
+static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+};
+
+static int __init rdtgroup_setup_root(void)
+{
+	rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
+				      KERNFS_ROOT_CREATE_DEACTIVATED,
+				      &rdtgroup_default);
+	if (IS_ERR(rdt_root))
+		return PTR_ERR(rdt_root);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	rdtgroup_default.closid = 0;
+	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
+
+	rdtgroup_default.kn = rdt_root->kn;
+	kernfs_activate(rdtgroup_default.kn);
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+/*
+ * rdtgroup_init - rdtgroup initialization
+ *
+ * Setup resctrl file system including set up root, create mount point,
+ * register rdtgroup filesystem, and initialize files under root directory.
+ *
+ * Return: 0 on success or -errno
+ */
+int __init rdtgroup_init(void)
+{
+	int ret = 0;
+
+	ret = rdtgroup_setup_root();
+	if (ret)
+		return ret;
+
+	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
+	if (ret)
+		goto cleanup_root;
+
+	ret = register_filesystem(&rdt_fs_type);
+	if (ret)
+		goto cleanup_mountpoint;
+
+	return 0;
+
+cleanup_mountpoint:
+	sysfs_remove_mount_point(fs_kobj, "resctrl");
+cleanup_root:
+	kernfs_destroy_root(rdt_root);
+
+	return ret;
+}
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 9bd5594..e230af2 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -57,6 +57,7 @@
 #define CGROUP_SUPER_MAGIC	0x27e0eb
 #define CGROUP2_SUPER_MAGIC	0x63677270
 
+#define RDTGROUP_SUPER_MAGIC	0x7655821
 
 #define STACK_END_MAGIC		0x57AC6E9D
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (10 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support Fenghua Yu
@ 2016-10-22 13:19 ` Fenghua Yu
  2016-10-26 14:45   ` Thomas Gleixner
  2016-10-22 13:20 ` [PATCH v5 13/18] x86/intel_rdt: Add mkdir " Fenghua Yu
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

For the convenience of applications we make the decoded values of some
of the CPUID values available in read-only (0444) files.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |  24 ++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 186 +++++++++++++++++++++++++++++++
 2 files changed, 210 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 8e1d002..39ed561 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -25,6 +25,30 @@ extern struct list_head rdt_all_groups;
 int __init rdtgroup_init(void);
 
 /**
+ * struct rftype - describe each file in the resctrl file system
+ * @name: file name
+ * @mode: access mode
+ * @kf_ops: operations
+ * @seq_show: show content of the file
+ * @write: write to the file
+ */
+struct rftype {
+	char			*name;
+	umode_t			mode;
+	struct kernfs_ops	*kf_ops;
+
+	int (*seq_show)(struct kernfs_open_file *of,
+			struct seq_file *sf, void *v);
+	/*
+	 * write() is the generic write callback which maps directly to
+	 * kernfs write operation and overrides all other operations.
+	 * Maximum write size is determined by ->max_write_len.
+	 */
+	ssize_t (*write)(struct kernfs_open_file *of,
+			 char *buf, size_t nbytes, loff_t off);
+};
+
+/**
  * struct rdt_resource - attributes of an RDT resource
  * @enabled:			Is this feature enabled on this machine
  * @capable:			Is this feature available on this machine
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 6c9061d..dd0d584 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -23,6 +23,8 @@
 #include <linux/fs.h>
 #include <linux/sysfs.h>
 #include <linux/kernfs.h>
+#include <linux/seq_file.h>
+#include <linux/sched.h>
 #include <linux/slab.h>
 
 #include <uapi/linux/magic.h>
@@ -34,6 +36,176 @@ struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
 
+/* Kernel fs node for "info" directory under root */
+static struct kernfs_node *kn_info;
+
+/* set uid and gid of rdtgroup dirs and files to that of the creator */
+static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
+{
+	struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID,
+				.ia_uid = current_fsuid(),
+				.ia_gid = current_fsgid(), };
+
+	if (uid_eq(iattr.ia_uid, GLOBAL_ROOT_UID) &&
+	    gid_eq(iattr.ia_gid, GLOBAL_ROOT_GID))
+		return 0;
+
+	return kernfs_setattr(kn, &iattr);
+}
+
+static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
+{
+	struct kernfs_node *kn;
+	int ret;
+
+	kn = __kernfs_create_file(parent_kn, rft->name, rft->mode,
+				  0, rft->kf_ops, rft, NULL, NULL);
+	if (IS_ERR(kn))
+		return PTR_ERR(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret) {
+		kernfs_remove(kn);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts,
+			      int len)
+{
+	struct rftype *rft;
+	int ret;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	for (rft = rfts; rft < rfts + len; rft++) {
+		ret = rdtgroup_add_file(kn, rft);
+		if (ret)
+			goto error;
+	}
+
+	return 0;
+error:
+	pr_warn("%s: failed to add %s, err=%d\n", __func__, rft->name, ret);
+	while (--rft >= rfts)
+		kernfs_remove_by_name(kn, rft->name);
+	return ret;
+}
+
+static int rdtgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+	struct kernfs_open_file *of = m->private;
+	struct rftype *rft = of->kn->priv;
+
+	if (rft->seq_show)
+		return rft->seq_show(of, m, arg);
+	return 0;
+}
+
+static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf,
+				   size_t nbytes, loff_t off)
+{
+	struct rftype *rft = of->kn->priv;
+
+	if (rft->write)
+		return rft->write(of, buf, nbytes, off);
+
+	return -EINVAL;
+}
+
+static struct kernfs_ops rdtgroup_kf_single_ops = {
+	.atomic_write_len	= PAGE_SIZE,
+	.write			= rdtgroup_file_write,
+	.seq_show		= rdtgroup_seqfile_show,
+};
+
+static int rdt_num_closid_show(struct kernfs_open_file *of,
+			       struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%d\n", r->num_closid);
+
+	return 0;
+}
+
+static int rdt_cbm_val_show(struct kernfs_open_file *of,
+			    struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+
+	seq_printf(seq, "%x\n", r->max_cbm);
+
+	return 0;
+}
+
+/* rdtgroup information files for one cache resource. */
+static struct rftype res_info_files[] = {
+	{
+		.name		= "num_closid",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_num_closid_show,
+	},
+	{
+		.name		= "cbm_val",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_cbm_val_show,
+	},
+};
+
+static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
+{
+	struct kernfs_node *kn_subdir;
+	struct rdt_resource *r;
+	int ret;
+
+	/* create the directory */
+	kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL);
+	if (IS_ERR(kn_info))
+		return PTR_ERR(kn_info);
+	kernfs_get(kn_info);
+
+	for_each_enabled_rdt_resource(r) {
+		kn_subdir = kernfs_create_dir(kn_info, r->name,
+					      kn_info->mode, r);
+		if (IS_ERR(kn_subdir)) {
+			ret = PTR_ERR(kn_subdir);
+			goto out_destroy;
+		}
+		kernfs_get(kn_subdir);
+		ret = rdtgroup_kn_set_ugid(kn_subdir);
+		if (ret)
+			goto out_destroy;
+		ret = rdtgroup_add_files(kn_subdir, res_info_files,
+					 ARRAY_SIZE(res_info_files));
+		if (ret)
+			goto out_destroy;
+		kernfs_activate(kn_subdir);
+	}
+
+	/*
+	 * This extra ref will be put in kernfs_remove() and guarantees
+	 * that @rdtgrp->kn is always accessible.
+	 */
+	kernfs_get(kn_info);
+
+	ret = rdtgroup_kn_set_ugid(kn_info);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn_info);
+
+	return 0;
+
+out_destroy:
+	kernfs_remove(kn_info);
+	return ret;
+}
+
 static void l3_qos_cfg_update(void *arg)
 {
 	bool enable = *(bool *)arg;
@@ -121,6 +293,10 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		goto out;
 	}
 
+	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
+	if (ret)
+		goto out;
+
 	dentry = kernfs_mount(fs_type, flags, rdt_root,
 			      RDTGROUP_SUPER_MAGIC, NULL);
 	if (IS_ERR(dentry))
@@ -177,6 +353,14 @@ static int reset_all_cbms(struct rdt_resource *r)
 	return 0;
 }
 
+/*
+ * Forcibly remove all of subdirectories under root.
+ */
+static void rmdir_all_sub(void)
+{
+	kernfs_remove(kn_info);
+}
+
 static void rdt_kill_sb(struct super_block *sb)
 {
 	struct rdt_resource *r;
@@ -194,6 +378,7 @@ static void rdt_kill_sb(struct super_block *sb)
 		set_l3_qos_cfg(r, false);
 	}
 
+	rmdir_all_sub();
 	static_branch_disable(&rdt_enable_key);
 	kernfs_kill_sb(sb);
 	mutex_unlock(&rdtgroup_mutex);
@@ -224,6 +409,7 @@ static int __init rdtgroup_setup_root(void)
 	rdtgroup_default.kn = rdt_root->kn;
 	kernfs_activate(rdtgroup_default.kn);
 
+out:
 	mutex_unlock(&rdtgroup_mutex);
 
 	return 0;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (11 preceding siblings ...)
  2016-10-22 13:19 ` [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-26 15:01   ` Thomas Gleixner
  2016-10-22 13:20 ` [PATCH v5 14/18] x86/intel_rdt: Add cpus file Fenghua Yu
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Resource control groups are represented as directories in the resctrl
file system. The root directory describes the default resources available
to tasks that have not been assigned specific resources. Other directories
can be created at the root level to make new resource groups. It is not
permitted to make directories within other directories.

Hardware uses a CLOSID (Class of service ID) to determine which resource
limits are currently in effect. The exact number available is enumerated
by CPUID leaf 0x10, but on current implementations it is a small number.
We implement a simple bitmask allocator for CLOSIDs.

Each resource control group uses one CLOSID, which limits the total number
of directories that can be created.

Resource groups can be removed using rmdir.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |   9 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 247 +++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 39ed561..a6c7d94 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -12,13 +12,20 @@
  * @kn:				kernfs node
  * @rdtgroup_list:		linked list for all rdtgroups
  * @closid:			closid for this rdtgroup
+ * @flags:			status bits
+ * @waitcount:			how many cpus expect to find this
  */
 struct rdtgroup {
 	struct kernfs_node	*kn;
 	struct list_head	rdtgroup_list;
 	int			closid;
+	int			flags;
+	atomic_t		waitcount;
 };
 
+/* rdtgroup.flags */
+#define	RDT_DELETED		1
+
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
@@ -154,4 +161,6 @@ union cpuid_0x10_1_edx {
 };
 
 void rdt_cbm_update(void *arg);
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+void rdtgroup_kn_unlock(struct kernfs_node *kn);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index dd0d584..3a87ae2 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -26,10 +26,12 @@
 #include <linux/seq_file.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 #include <uapi/linux/magic.h>
 
 #include <asm/intel_rdt.h>
+#include <asm/intel_rdt_common.h>
 
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 struct kernfs_root *rdt_root;
@@ -39,6 +41,71 @@ LIST_HEAD(rdt_all_groups);
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
+/*
+ * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
+ * we can keep a bitmap of free CLOSIDs in a single integer.
+ *
+ * Using a global CLOSID across all resources has some advantages and
+ * some drawbacks:
+ * + We can simply set "current->closid" to assign a task to a resource
+ *   group.
+ * + Context switch code can avoid extra memory references deciding which
+ *   CLOSID to load into the PQR_ASSOC MSR
+ * - We give up some options in configuring resource groups across multi-socket
+ *   systems.
+ * - Our choices on how to configure each resource become progressively more
+ *   limited as the number of resources grows.
+ */
+static int closid_free_map;
+
+static void closid_init(void)
+{
+	struct rdt_resource *r;
+	int rdt_max_closid;
+
+	/* Compute rdt_max_closid across all resources */
+	rdt_max_closid = 0;
+	for_each_enabled_rdt_resource(r)
+		rdt_max_closid = max(rdt_max_closid, r->num_closid);
+
+	/*
+	 * CDP is "special". Because we share the L3 CBM MSR array
+	 * between L3DATA and L3CODE, we must not use a CLOSID larger
+	 * than they support. Just check against L3DATA because it
+	 * is the same as L3CODE.
+	 */
+	r = &rdt_resources_all[RDT_RESOURCE_L3DATA];
+	if (r->enabled)
+		rdt_max_closid = min(rdt_max_closid, r->num_closid);
+
+	if (rdt_max_closid > 32) {
+		pr_warn("Only using 32 of %d CLOSIDs\n", rdt_max_closid);
+		rdt_max_closid = 32;
+	}
+
+	closid_free_map = BIT_MASK(rdt_max_closid) - 1;
+
+	/* CLOSID 0 is always reserved for the default group */
+	closid_free_map &= ~1;
+}
+
+int closid_alloc(void)
+{
+	int closid = ffs(closid_free_map);
+
+	if (closid == 0)
+		return -ENOSPC;
+	closid--;
+	closid_free_map &= ~(1 << closid);
+
+	return closid;
+}
+
+static void closid_free(int closid)
+{
+	closid_free_map |= 1 << closid;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -271,6 +338,54 @@ static int parse_rdtgroupfs_options(char *data)
 	return ret;
 }
 
+/*
+ * We don't allow rdtgroup directories to be created anywhere
+ * except the root directory. Thus when looking for the rdtgroup
+ * structure for a kernfs node we are either looking at a directory,
+ * in which case the rdtgroup structure is pointed at by the "priv"
+ * field, otherwise we have a file, and need only look to the parent
+ * to find the rdtgroup.
+ */
+static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
+{
+	if (kernfs_type(kn) == KERNFS_DIR)
+		return kn->priv;
+	else
+		return kn->parent->priv;
+}
+
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	atomic_inc(&rdtgrp->waitcount);
+	kernfs_break_active_protection(kn);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	/* Was this group deleted while we waited? */
+	if (rdtgrp->flags & RDT_DELETED)
+		return NULL;
+
+	return rdtgrp;
+}
+
+void rdtgroup_kn_unlock(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+	    (rdtgrp->flags & RDT_DELETED)) {
+		kernfs_unbreak_active_protection(kn);
+		kernfs_put(kn);
+		kfree(rdtgrp);
+	} else {
+		kernfs_unbreak_active_protection(kn);
+	}
+}
+
 static struct dentry *rdt_mount(struct file_system_type *fs_type,
 				int flags, const char *unused_dev_name,
 				void *data)
@@ -293,6 +408,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		goto out;
 	}
 
+	closid_init();
+
 	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
 	if (ret)
 		goto out;
@@ -354,10 +471,39 @@ static int reset_all_cbms(struct rdt_resource *r)
 }
 
 /*
+ * MSR_IA32_PQR_ASSOC is scoped per logical CPU, so all updates
+ * are always in thread context.
+ */
+static void rdt_reset_pqr_assoc_closid(void *v)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+
+	state->closid = 0;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+}
+
+/*
  * Forcibly remove all of subdirectories under root.
  */
 static void rmdir_all_sub(void)
 {
+	struct rdtgroup *rdtgrp, *tmp;
+
+	get_cpu();
+	/* Reset PQR_ASSOC MSR on this cpu. */
+	rdt_reset_pqr_assoc_closid(NULL);
+	/* Reset PQR_ASSOC MSR on the rest of cpus. */
+	smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid,
+			       NULL, 1);
+	put_cpu();
+	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
+		/* Remove each rdtgroup other than root */
+		if (rdtgrp == &rdtgroup_default)
+			continue;
+		kernfs_remove(rdtgrp->kn);
+		list_del(&rdtgrp->rdtgroup_list);
+		kfree(rdtgrp);
+	}
 	kernfs_remove(kn_info);
 }
 
@@ -390,7 +536,108 @@ static struct file_system_type rdt_fs_type = {
 	.kill_sb = rdt_kill_sb,
 };
 
+static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+			  umode_t mode)
+{
+	struct rdtgroup *parent, *rdtgrp;
+	struct kernfs_node *kn;
+	int ret, closid;
+
+	/* Only allow mkdir in the root directory */
+	if (parent_kn != rdtgroup_default.kn)
+		return -EPERM;
+
+	/* Do not accept '\n' to avoid unparsable situation. */
+	if (strchr(name, '\n'))
+		return -EINVAL;
+
+	parent = rdtgroup_kn_lock_live(parent_kn);
+	if (!parent) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	ret = closid_alloc();
+	if (ret < 0)
+		goto out_unlock;
+	closid = ret;
+
+	/* allocate the rdtgroup. */
+	rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
+	if (!rdtgrp) {
+		ret = -ENOSPC;
+		goto out_closid_free;
+	}
+	rdtgrp->closid = closid;
+	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
+
+	/* kernfs creates the directory for rdtgrp */
+	kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_cancel_ref;
+	}
+	rdtgrp->kn = kn;
+
+	/*
+	 * kernfs_remove() will drop the reference count on "kn" which
+	 * will free it. But we still need it to stick around for the
+	 * rdtgroup_kn_unlock(kn} call below. Take one extra reference
+	 * here, which will be dropped inside rdtgroup_kn_unlock().
+	 */
+	kernfs_get(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	ret = 0;
+	goto out_unlock;
+
+out_destroy:
+	kernfs_remove(rdtgrp->kn);
+out_cancel_ref:
+	list_del(&rdtgrp->rdtgroup_list);
+	kfree(rdtgrp);
+out_closid_free:
+	closid_free(closid);
+out_unlock:
+	rdtgroup_kn_unlock(parent_kn);
+	return ret;
+}
+
+static int rdtgroup_rmdir(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(kn);
+		return -ENOENT;
+	}
+
+	rdtgrp->flags = RDT_DELETED;
+	closid_free(rdtgrp->closid);
+	list_del(&rdtgrp->rdtgroup_list);
+
+	/*
+	 * one extra hold on this, will drop when we kfree(rdtgrp)
+	 * in rdtgroup_kn_unlock()
+	 */
+	kernfs_get(kn);
+	kernfs_remove(rdtgrp->kn);
+
+	rdtgroup_kn_unlock(kn);
+
+	return ret;
+}
+
 static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+	.mkdir	= rdtgroup_mkdir,
+	.rmdir	= rdtgroup_rmdir,
 };
 
 static int __init rdtgroup_setup_root(void)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 14/18] x86/intel_rdt: Add cpus file
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (12 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 13/18] x86/intel_rdt: Add mkdir " Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-26 17:57   ` Thomas Gleixner
  2016-10-22 13:20 ` [PATCH v5 15/18] x86/intel_rdt: Add tasks files Fenghua Yu
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Tony Luck <tony.luck@intel.com>

Now we populate each directory with a read/write (mode 0644) file
named "cpus". This is used to over-ride the resources available
to processes in the default resource group when running on specific
CPUs.  Each "cpus" file reads as a cpumask showing which CPUs belong
to this resource group. Initially all online CPUs are assigned to
the default group. They can be added to other groups by writing a
cpumask to the "cpus" file in the directory for the resource group
(which will remove them from the previous group to which they were
assigned). CPU online/offline operations will delete CPUs that go
offline from whatever group they are in and add new CPUs to the
default group.

If there are CPUs assigned to a group when the directory is removed,
they are returned to the default group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |   5 ++
 arch/x86/kernel/cpu/intel_rdt.c          |  10 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 130 ++++++++++++++++++++++++++++++-
 3 files changed, 143 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index a6c7d94..3e8450f 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -12,13 +12,16 @@
  * @kn:				kernfs node
  * @rdtgroup_list:		linked list for all rdtgroups
  * @closid:			closid for this rdtgroup
+ * @cpu_mask:			CPUs assigned to this rdtgroup
  * @flags:			status bits
  * @waitcount:			how many cpus expect to find this
+ *				group when they acquire rdtgroup_mutex
  */
 struct rdtgroup {
 	struct kernfs_node	*kn;
 	struct list_head	rdtgroup_list;
 	int			closid;
+	struct cpumask		cpu_mask;
 	int			flags;
 	atomic_t		waitcount;
 };
@@ -160,6 +163,8 @@ union cpuid_0x10_1_edx {
 	unsigned int full;
 };
 
+DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
+
 void rdt_cbm_update(void *arg);
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index c07e03a..d2d77cb 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -36,6 +36,8 @@
 /* Mutex to protect rdtgroup access. */
 DEFINE_MUTEX(rdtgroup_mutex);
 
+DEFINE_PER_CPU_READ_MOSTLY(int, cpu_closid);
+
 #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
 
 struct rdt_resource rdt_resources_all[] = {
@@ -323,8 +325,11 @@ static int intel_rdt_online_cpu(unsigned int cpu)
 	struct rdt_resource *r;
 
 	mutex_lock(&rdtgroup_mutex);
+	per_cpu(cpu_closid, cpu) = 0;
 	for_each_capable_rdt_resource(r)
 		domain_add_cpu(cpu, r);
+	/* The cpu is set in default rdtgroup after online. */
+	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
 	state->closid = 0;
 	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
 	mutex_unlock(&rdtgroup_mutex);
@@ -334,11 +339,16 @@ static int intel_rdt_online_cpu(unsigned int cpu)
 
 static int intel_rdt_offline_cpu(unsigned int cpu)
 {
+	struct rdtgroup *rdtgrp;
 	struct rdt_resource *r;
 
 	mutex_lock(&rdtgroup_mutex);
 	for_each_capable_rdt_resource(r)
 		domain_remove_cpu(cpu, r);
+	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask))
+			break;
+	}
 	mutex_unlock(&rdtgroup_mutex);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 3a87ae2..9957b50 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -20,6 +20,7 @@
 
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
+#include <linux/cpu.h>
 #include <linux/fs.h>
 #include <linux/sysfs.h>
 #include <linux/kernfs.h>
@@ -188,6 +189,111 @@ static struct kernfs_ops rdtgroup_kf_single_ops = {
 	.seq_show		= rdtgroup_seqfile_show,
 };
 
+static int rdtgroup_cpus_show(struct kernfs_open_file *of,
+			      struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+	if (rdtgrp)
+		seq_printf(s, "%*pb\n", cpumask_pr_args(&rdtgrp->cpu_mask));
+	else
+		ret = -ENOENT;
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret;
+}
+
+static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp, *r;
+	cpumask_var_t tmpmask, newmask;
+	int ret, cpu;
+
+	if (!buf)
+		return -EINVAL;
+
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+	if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) {
+		free_cpumask_var(tmpmask);
+		return -ENOMEM;
+	}
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	ret = cpumask_parse(buf, newmask);
+	if (ret)
+		goto unlock;
+
+	get_online_cpus();
+	/* check that user didn't specify any offline cpus */
+	cpumask_andnot(tmpmask, newmask, cpu_online_mask);
+	if (cpumask_weight(tmpmask)) {
+		ret = -EINVAL;
+		goto end;
+	}
+
+	/* Check whether cpus are dropped from this group */
+	cpumask_andnot(tmpmask, &rdtgrp->cpu_mask, newmask);
+	if (cpumask_weight(tmpmask)) {
+		/* Can't drop from default group */
+		if (rdtgrp == &rdtgroup_default) {
+			ret = -EINVAL;
+			goto end;
+		}
+		/* Give any dropped cpus to rdtgroup_default */
+		cpumask_or(&rdtgroup_default.cpu_mask,
+			   &rdtgroup_default.cpu_mask, tmpmask);
+		for_each_cpu(cpu, tmpmask)
+			per_cpu(cpu_closid, cpu) = 0;
+	}
+
+	/*
+	 * If we added cpus, remove them from previous group that owned them
+	 * and update per-cpu closid
+	 */
+	cpumask_andnot(tmpmask, newmask, &rdtgrp->cpu_mask);
+	if (cpumask_weight(tmpmask)) {
+		list_for_each_entry(r, &rdt_all_groups, rdtgroup_list) {
+			if (r == rdtgrp)
+				continue;
+			cpumask_andnot(&r->cpu_mask, &r->cpu_mask, tmpmask);
+		}
+		for_each_cpu(cpu, tmpmask)
+			per_cpu(cpu_closid, cpu) = rdtgrp->closid;
+	}
+
+	/* Done pushing/pulling - update this group with new mask */
+	cpumask_copy(&rdtgrp->cpu_mask, newmask);
+
+end:
+	put_online_cpus();
+unlock:
+	rdtgroup_kn_unlock(of->kn);
+	free_cpumask_var(tmpmask);
+	free_cpumask_var(newmask);
+
+	return ret ?: nbytes;
+}
+
+/* Files in each rdtgroup */
+static struct rftype rdtgroup_base_files[] = {
+	{
+		.name		= "cpus",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_cpus_write,
+		.seq_show	= rdtgroup_cpus_show,
+	},
+};
+
 static int rdt_num_closid_show(struct kernfs_open_file *of,
 			       struct seq_file *seq, void *v)
 {
@@ -591,6 +697,11 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (ret)
 		goto out_destroy;
 
+	ret = rdtgroup_add_files(kn, rdtgroup_base_files,
+				 ARRAY_SIZE(rdtgroup_base_files));
+	if (ret)
+		goto out_destroy;
+
 	kernfs_activate(kn);
 
 	ret = 0;
@@ -611,7 +722,7 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 static int rdtgroup_rmdir(struct kernfs_node *kn)
 {
 	struct rdtgroup *rdtgrp;
-	int ret = 0;
+	int cpu, ret = 0;
 
 	rdtgrp = rdtgroup_kn_lock_live(kn);
 	if (!rdtgrp) {
@@ -619,6 +730,12 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 		return -ENOENT;
 	}
 
+	/* Give any CPUs back to the default group */
+	cpumask_or(&rdtgroup_default.cpu_mask,
+		   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+	for_each_cpu(cpu, &rdtgrp->cpu_mask)
+		per_cpu(cpu_closid, cpu) = 0;
+
 	rdtgrp->flags = RDT_DELETED;
 	closid_free(rdtgrp->closid);
 	list_del(&rdtgrp->rdtgroup_list);
@@ -642,6 +759,8 @@ static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
 
 static int __init rdtgroup_setup_root(void)
 {
+	int ret;
+
 	rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
 				      KERNFS_ROOT_CREATE_DEACTIVATED,
 				      &rdtgroup_default);
@@ -653,13 +772,20 @@ static int __init rdtgroup_setup_root(void)
 	rdtgroup_default.closid = 0;
 	list_add(&rdtgroup_default.rdtgroup_list, &rdt_all_groups);
 
+	ret = rdtgroup_add_files(rdt_root->kn, rdtgroup_base_files,
+				 ARRAY_SIZE(rdtgroup_base_files));
+	if (ret) {
+		kernfs_destroy_root(rdt_root);
+		goto out;
+	}
+
 	rdtgroup_default.kn = rdt_root->kn;
 	kernfs_activate(rdtgroup_default.kn);
 
 out:
 	mutex_unlock(&rdtgroup_mutex);
 
-	return 0;
+	return ret;
 }
 
 /*
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 15/18] x86/intel_rdt: Add tasks files
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (13 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 14/18] x86/intel_rdt: Add cpus file Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-26 15:27   ` Thomas Gleixner
  2016-10-22 13:20 ` [PATCH v5 16/18] x86/intel_rdt: Add schemata file Fenghua Yu
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

The root directory all subdirectories are automatically populated
with a read/write (mode 0644) file named "tasks". When read it will
show all the task IDs assigned to the resource group. Tasks can be
added (one at a time) to a group by writing the task ID to the file.
E.g.

Membership in a resource group is indicated by a new field in the
task_struct "int closid" which holds the CLOSID for each task. The
default resource group uses CLOSID=0 which means that all existing
tasks when the resctrl file system is mounted belong to the default
group.

A resource group cannot be removed while there are tasks assigned
to it.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 180 +++++++++++++++++++++++++++++++
 include/linux/sched.h                    |   3 +
 2 files changed, 183 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 9957b50..9b11934 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -28,6 +28,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/cpu.h>
+#include <linux/task_work.h>
 
 #include <uapi/linux/magic.h>
 
@@ -283,6 +284,162 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+struct task_move_callback {
+	struct callback_head	work;
+	struct rdtgroup		*rdtgrp;
+};
+
+static void move_myself(struct callback_head *head)
+{
+	struct task_move_callback *callback;
+	struct rdtgroup *rdtgrp;
+
+	callback = container_of(head, struct task_move_callback, work);
+	rdtgrp = callback->rdtgrp;
+
+	/*
+	 * If resource group was deleted before this task work callback
+	 * was invoked, then assign the task to root group and free the
+	 * resource group.
+	 */
+	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+	    (rdtgrp->flags & RDT_DELETED)) {
+		current->closid = 0;
+		kfree(rdtgrp);
+	}
+
+	kfree(callback);
+}
+
+static int __rdtgroup_move_task(struct task_struct *tsk,
+				struct rdtgroup *rdtgrp)
+{
+	struct task_move_callback *callback;
+	int ret;
+
+	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
+	if (!callback)
+		return -ENOMEM;
+	callback->work.func = move_myself;
+	callback->rdtgrp = rdtgrp;
+
+	/*
+	 * Take a refcount, so rdtgrp cannot be freed before the
+	 * callback has been invoked.
+	 */
+	atomic_inc(&rdtgrp->waitcount);
+	ret = task_work_add(tsk, &callback->work, true);
+	if (ret) {
+		/*
+		 * Task is exiting. Drop the refcount and free the callback.
+		 * No need to check the refcount as the group cannot be
+		 * deleted before the write function unlocks rdtgroup_mutex.
+		 */
+		atomic_dec(&rdtgrp->waitcount);
+		kfree(callback);
+	} else {
+		tsk->closid = rdtgrp->closid;
+	}
+	return ret;
+}
+
+static int rdtgroup_task_write_permission(struct task_struct *task,
+					  struct kernfs_open_file *of)
+{
+	const struct cred *cred = current_cred();
+	const struct cred *tcred = get_task_cred(task);
+	int ret = 0;
+
+	/*
+	 * Even if we're attaching all tasks in the thread group, we only
+	 * need to check permissions on one of them.
+	 */
+	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+	    !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->euid, tcred->suid))
+		ret = -EPERM;
+
+	put_cred(tcred);
+	return ret;
+}
+
+static int rdtgroup_move_task(pid_t pid, struct rdtgroup *rdtgrp,
+			      struct kernfs_open_file *of)
+{
+	struct task_struct *tsk;
+	int ret;
+
+	rcu_read_lock();
+	if (pid) {
+		tsk = find_task_by_vpid(pid);
+		if (!tsk) {
+			rcu_read_unlock();
+			return -ESRCH;
+		}
+	} else {
+		tsk = current;
+	}
+
+	get_task_struct(tsk);
+	rcu_read_unlock();
+
+	ret = rdtgroup_task_write_permission(tsk, of);
+	if (!ret)
+		ret = __rdtgroup_move_task(tsk, rdtgrp);
+
+	put_task_struct(tsk);
+	return ret;
+}
+
+static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
+				    char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp;
+	pid_t pid;
+	int ret = 0;
+
+	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
+		return -EINVAL;
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+
+	if (rdtgrp)
+		ret = rdtgroup_move_task(pid, rdtgrp, of);
+	else
+		ret = -ENOENT;
+
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret ?: nbytes;
+}
+
+static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
+{
+	struct task_struct *p;
+
+	rcu_read_lock();
+	for_each_process(p) {
+		if (p->closid == r->closid)
+			seq_printf(s, "%d\n", p->pid);
+	}
+	rcu_read_unlock();
+}
+
+static int rdtgroup_tasks_show(struct kernfs_open_file *of,
+			       struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (rdtgrp)
+		show_rdt_tasks(rdtgrp, s);
+	else
+		ret = -ENOENT;
+	rdtgroup_kn_unlock(of->kn);
+
+	return ret;
+}
+
 /* Files in each rdtgroup */
 static struct rftype rdtgroup_base_files[] = {
 	{
@@ -292,6 +449,13 @@ static struct rftype rdtgroup_base_files[] = {
 		.write		= rdtgroup_cpus_write,
 		.seq_show	= rdtgroup_cpus_show,
 	},
+	{
+		.name		= "tasks",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_tasks_write,
+		.seq_show	= rdtgroup_tasks_show,
+	},
 };
 
 static int rdt_num_closid_show(struct kernfs_open_file *of,
@@ -594,6 +758,13 @@ static void rdt_reset_pqr_assoc_closid(void *v)
 static void rmdir_all_sub(void)
 {
 	struct rdtgroup *rdtgrp, *tmp;
+	struct task_struct *p;
+
+	/* move all tasks to default resource group */
+	read_lock(&tasklist_lock);
+	for_each_process(p)
+		p->closid = 0;
+	read_unlock(&tasklist_lock);
 
 	get_cpu();
 	/* Reset PQR_ASSOC MSR on this cpu. */
@@ -723,6 +894,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 {
 	struct rdtgroup *rdtgrp;
 	int cpu, ret = 0;
+	struct task_struct *p;
 
 	rdtgrp = rdtgroup_kn_lock_live(kn);
 	if (!rdtgrp) {
@@ -730,6 +902,14 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 		return -ENOENT;
 	}
 
+	/* Give any tasks back to the default group */
+	read_lock(&tasklist_lock);
+	for_each_process(p) {
+		if (p->closid == rdtgrp->closid)
+			p->closid = 0;
+	}
+	read_unlock(&tasklist_lock);
+
 	/* Give any CPUs back to the default group */
 	cpumask_or(&rdtgroup_default.cpu_mask,
 		   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..c8f4152 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1791,6 +1791,9 @@ struct task_struct {
 	/* cg_list protected by css_set_lock and tsk->alloc_lock */
 	struct list_head cg_list;
 #endif
+#ifdef CONFIG_INTEL_RDT_A
+	int closid;
+#endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 16/18] x86/intel_rdt: Add schemata file
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (14 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 15/18] x86/intel_rdt: Add tasks files Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-22 13:20 ` [PATCH v5 17/18] x86/intel_rdt: Add scheduler hook Fenghua Yu
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Tony Luck <tony.luck@intel.com>

Last of the per resource group files. Also mode 0644. This one shows
the resources available to the group. Syntax depends on whether the
"cdp" mount option was given. With code/data prioritization disabled
it is simply a list of masks for each cache domain. Initial value
allows access to all of the L3 cache on all domains. E.g. on a 2 socket
Broadwell:
        L3:0=fffff;1=fffff
With CDP enabled, separate masks for data and instructions are provided:
        L3DATA:0=fffff;1=fffff
        L3CODE:0=fffff;1=fffff

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         |   6 +
 arch/x86/kernel/cpu/Makefile             |   2 +-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |   7 +
 arch/x86/kernel/cpu/intel_rdt_schemata.c | 245 +++++++++++++++++++++++++++++++
 4 files changed, 259 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_schemata.c

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 3e8450f..f847189 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -71,6 +71,7 @@ struct rftype {
  * @num_domains:		Number of domains active
  * @msr_base:			Base MSR address for CBMs
  * @tmp_cbms:			Scratch space when updating schemata
+ * @num_tmp_cbms:		Number of CBMs in tmp_cbms
  * @cache_level:		Which cache level defines scope of this domain
  * @cbm_idx_multi:		Multiplier of CBM index
  * @cbm_idx_offset:		Offset of CBM index. CBM index is computed by:
@@ -88,6 +89,7 @@ struct rdt_resource {
 	int			num_domains;
 	int			msr_base;
 	u32			*tmp_cbms;
+	int			num_tmp_cbms;
 	int			cache_level;
 	int			cbm_idx_multi;
 	int			cbm_idx_offset;
@@ -168,4 +170,8 @@ DECLARE_PER_CPU_READ_MOSTLY(int, cpu_closid);
 void rdt_cbm_update(void *arg);
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
+ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
+				char *buf, size_t nbytes, loff_t off);
+int rdtgroup_schemata_show(struct kernfs_open_file *of,
+			   struct seq_file *s, void *v);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index b4334e8..c9f8c81 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
-obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o intel_rdt_rdtgroup.o
+obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_schemata.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 9b11934..4a6b3b8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -456,6 +456,13 @@ static struct rftype rdtgroup_base_files[] = {
 		.write		= rdtgroup_tasks_write,
 		.seq_show	= rdtgroup_tasks_show,
 	},
+	{
+		.name		= "schemata",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_schemata_write,
+		.seq_show	= rdtgroup_schemata_show,
+	},
 };
 
 static int rdt_num_closid_show(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/intel_rdt_schemata.c b/arch/x86/kernel/cpu/intel_rdt_schemata.c
new file mode 100644
index 0000000..7a761ed
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_schemata.c
@@ -0,0 +1,245 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ *    Fenghua Yu <fenghua.yu@intel.com>
+ *    Tony Luck <tony.luck@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/kernfs.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * Check whether a cache bit mask is valid. The SDM says:
+ *	Please note that all (and only) contiguous '1' combinations
+ *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
+ * Additionally Haswell requires at least two bits set.
+ */
+static bool cbm_validate(unsigned long var, struct rdt_resource *r)
+{
+	unsigned long first_bit, zero_bit;
+
+	if (var == 0 || var > r->max_cbm)
+		return false;
+
+	first_bit = find_first_bit(&var, r->cbm_len);
+	zero_bit = find_next_zero_bit(&var, r->cbm_len, first_bit);
+
+	if (find_next_bit(&var, r->cbm_len, zero_bit) < r->cbm_len)
+		return false;
+
+	if ((zero_bit - first_bit) < r->min_cbm_bits)
+		return false;
+	return true;
+}
+
+/*
+ * Read one cache bit mask (hex). Check that it is valid for the current
+ * resource type.
+ */
+static int parse_cbm(char *buf, struct rdt_resource *r)
+{
+	unsigned long data;
+	int ret;
+
+	ret = kstrtoul(buf, 16, &data);
+	if (ret)
+		return ret;
+	if (!cbm_validate(data, r))
+		return -EINVAL;
+	r->tmp_cbms[r->num_tmp_cbms++] = data;
+
+	return 0;
+}
+
+/*
+ * For each domain in this resource we expect to find a series of:
+ *	id=mask
+ * separated by ";". The "id" is in decimal, and must appear in the
+ * right order.
+ */
+static int parse_line(char *line, struct rdt_resource *r)
+{
+	struct rdt_domain *d;
+	char *dom = NULL, *id;
+	unsigned long dom_id;
+
+	list_for_each_entry(d, &r->domains, list) {
+		dom = strsep(&line, ";");
+		if (!dom)
+			return -EINVAL;
+		id = strsep(&dom, "=");
+		if (kstrtoul(id, 10, &dom_id) || dom_id != d->id)
+			return -EINVAL;
+		if (parse_cbm(dom, r))
+			return -EINVAL;
+	}
+
+	/* Any garbage at the end of the line? */
+	if (line && line[0])
+		return -EINVAL;
+	return 0;
+}
+
+static int update_domains(struct rdt_resource *r, int closid)
+{
+	int cpu, idx = 0;
+	struct rdt_domain *d;
+	struct msr_param msr_param;
+	cpumask_var_t cpu_mask;
+
+	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	msr_param.low = closid;
+	msr_param.high = msr_param.low + 1;
+	msr_param.res = r;
+
+	list_for_each_entry(d, &r->domains, list) {
+		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
+		d->cbm[msr_param.low] = r->tmp_cbms[idx++];
+	}
+	cpu = get_cpu();
+	/* Update CBM on this cpu if it's in cpu_mask. */
+	if (cpumask_test_cpu(cpu, cpu_mask))
+		rdt_cbm_update(&msr_param);
+	/* Update CBM on other cpus. */
+	smp_call_function_many(cpu_mask, rdt_cbm_update, &msr_param, 1);
+	put_cpu();
+
+	free_cpumask_var(cpu_mask);
+
+	return 0;
+}
+
+ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
+				char *buf, size_t nbytes, loff_t off)
+{
+	char *tok, *resname;
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	int closid, ret = 0;
+	u32 *l3_cbms = NULL;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+	buf[nbytes - 1] = '\0';
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+
+	closid = rdtgrp->closid;
+
+	/* get scratch space to save all the masks while we validate input */
+	for_each_enabled_rdt_resource(r) {
+		r->tmp_cbms = kcalloc(r->num_domains, sizeof(*l3_cbms),
+				      GFP_KERNEL);
+		if (!r->tmp_cbms) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		r->num_tmp_cbms = 0;
+	}
+
+	while ((tok = strsep(&buf, "\n")) != NULL) {
+		resname = strsep(&tok, ":");
+		if (!tok) {
+			ret = -EINVAL;
+			goto out;
+		}
+		for_each_enabled_rdt_resource(r) {
+			if (!strcmp(resname, r->name) &&
+			    closid < r->num_closid) {
+				ret = parse_line(tok, r);
+				if (ret)
+					goto out;
+				break;
+			}
+		}
+		if (!r->name) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	/* Did the parser find all the masks we need? */
+	for_each_enabled_rdt_resource(r) {
+		if (r->num_tmp_cbms != r->num_domains) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+	for_each_enabled_rdt_resource(r) {
+		ret = update_domains(r, closid);
+		if (ret)
+			goto out;
+	}
+
+out:
+	rdtgroup_kn_unlock(of->kn);
+	for_each_enabled_rdt_resource(r) {
+		kfree(r->tmp_cbms);
+		r->tmp_cbms = NULL;
+	}
+	return ret ?: nbytes;
+}
+
+static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
+{
+	struct rdt_domain *dom;
+	bool sep = false;
+
+	seq_printf(s, "%s:", r->name);
+	list_for_each_entry(dom, &r->domains, list) {
+		if (sep)
+			seq_puts(s, ";");
+		seq_printf(s, "%d=%x", dom->id, dom->cbm[closid]);
+		sep = true;
+	}
+	seq_puts(s, "\n");
+}
+
+int rdtgroup_schemata_show(struct kernfs_open_file *of,
+			   struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	int closid, ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (rdtgrp) {
+		closid = rdtgrp->closid;
+		for_each_enabled_rdt_resource(r) {
+			if (closid < r->num_closid)
+				show_doms(s, r, closid);
+		}
+	} else {
+		ret = -ENOENT;
+	}
+	rdtgroup_kn_unlock(of->kn);
+	return ret;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 17/18] x86/intel_rdt: Add scheduler hook
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (15 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 16/18] x86/intel_rdt: Add schemata file Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-22 13:20 ` [PATCH v5 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
  2016-10-26 21:39 ` [PATCH v5 00/18] Intel Cache Allocation Technology Thomas Gleixner
  18 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Hook the x86 scheduler code to update closid based on whether the current
task is assigned to a specific closid or running on a CPU assigned to a
specific closid.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/intel_rdt.h         | 42 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c          |  1 -
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  3 +++
 arch/x86/kernel/process_32.c             |  4 +++
 arch/x86/kernel/process_64.c             |  4 +++
 5 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index f847189..100e53c 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -1,8 +1,12 @@
 #ifndef _ASM_X86_INTEL_RDT_H
 #define _ASM_X86_INTEL_RDT_H
 
+#ifdef CONFIG_INTEL_RDT_A
+
 #include <linux/jump_label.h>
 
+#include <asm/intel_rdt_common.h>
+
 #define IA32_L3_QOS_CFG		0xc81
 #define IA32_L3_CBM_BASE	0xc90
 #define IA32_L2_CBM_BASE	0xd10
@@ -174,4 +178,42 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
+
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ *   which supports resource control and we enable by mounting the
+ *   resctrl file system.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ *   when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+	if (static_branch_likely(&rdt_enable_key)) {
+		struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+		int closid;
+
+		/*
+		 * If this task has a closid assigned, use it.
+		 * Else use the closid assigned to this cpu.
+		 */
+		closid = current->closid;
+		if (closid == 0)
+			closid = this_cpu_read(cpu_closid);
+
+		if (closid != state->closid) {
+			state->closid = closid;
+			wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
+		}
+	}
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
+#endif /* CONFIG_INTEL_RDT_A */
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index d2d77cb..fd8a75a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -29,7 +29,6 @@
 #include <linux/cacheinfo.h>
 #include <linux/cpuhotplug.h>
 
-#include <asm/intel_rdt_common.h>
 #include <asm/intel-family.h>
 #include <asm/intel_rdt.h>
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 4a6b3b8..fca56ab 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -308,6 +308,9 @@ static void move_myself(struct callback_head *head)
 		kfree(rdtgrp);
 	}
 
+	/* update PQR_ASSOC MSR to make resource group go into effect */
+	intel_rdt_sched_in();
+
 	kfree(callback);
 }
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index bd7be8e..efe7f9f 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -54,6 +54,7 @@
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
+#include <asm/intel_rdt.h>
 
 void __show_regs(struct pt_regs *regs, int all)
 {
@@ -299,5 +300,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
+	/* Load the Intel cache allocation PQR MSR. */
+	intel_rdt_sched_in();
+
 	return prev_p;
 }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3760b3..acd7d6f 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -50,6 +50,7 @@
 #include <asm/switch_to.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/vdso.h>
+#include <asm/intel_rdt.h>
 
 __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
 
@@ -473,6 +474,9 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 			loadsegment(ss, __KERNEL_DS);
 	}
 
+	/* Load the Intel cache allocation PQR MSR. */
+	intel_rdt_sched_in();
+
 	return prev_p;
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v5 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (16 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 17/18] x86/intel_rdt: Add scheduler hook Fenghua Yu
@ 2016-10-22 13:20 ` Fenghua Yu
  2016-10-26 21:39 ` [PATCH v5 00/18] Intel Cache Allocation Technology Thomas Gleixner
  18 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-22 13:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

We create five new files for Intel RDT resource allocation:
arch/x86/kernel/cpu/intel_rdt.c
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
arch/x86/kernel/cpu/intel_rdt_schemata.c
arch/x86/include/asm/intel_rdt.h
Documentation/x86/intel_rdt_ui.txt

Fenghua Yu will maintain this code.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b6c28e1..e23eea9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10106,6 +10106,14 @@ L:	linux-rdma@vger.kernel.org
 S:	Supported
 F:	drivers/infiniband/sw/rdmavt
 
+RDT - RESOURCE ALLOCATION
+M:	Fenghua Yu <fenghua.yu@intel.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	arch/x86/kernel/cpu/intel_rdt*
+F:	arch/x86/include/asm/intel_rdt*
+F:	Documentation/x86/intel_rdt*
+
 READ-COPY UPDATE (RCU)
 M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
 M:	Josh Triplett <josh@joshtriplett.org>
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-22 13:19 ` [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology Fenghua Yu
@ 2016-10-26 13:02   ` Thomas Gleixner
  2016-10-26 16:06     ` Luck, Tony
  2016-10-26 21:14     ` Fenghua Yu
  0 siblings, 2 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 13:02 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +void rdt_cbm_update(void *arg)
> +{
> +	struct msr_param *m = (struct msr_param *)arg;
> +	struct rdt_resource *r = m->res;
> +	int i, cpu = smp_processor_id();
> +	struct rdt_domain *d;
> +
> +	list_for_each_entry(d, &r->domains, list) {

> +static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> +					  struct list_head **pos)
> +{
> +	struct rdt_domain *d;
> +	struct list_head *l;
> +
> +	if (id < 0)
> +		return ERR_PTR(id);
> +
> +	list_for_each(l, &r->domains) {
> +		d = list_entry(l, struct rdt_domain, list);

So above you converted to list_for_each_entry(). Is there a sensible
reason, aside of being sloppy, why is this still using list_for_each()?

> +		/* When id is found, return its domain. */
> +		if (id == d->id)
> +			return d;
> +		/* Stop searching when finding id's position in sorted list. */

What is the reason that this needs to be in a sorted list?

I haven't found one so far. And if there is none, then this can use a hlist.

> +		if (id < d->id)
> +			break;
> +	}
> +	/*
> +	 * No id is found in resource domains. Record the position
> +	 * that the new domain will be added. The posistion is not used
> +	 * when removing a domain.

This comment makes no sense. If you want to document that a caller does not
require the @pos argument, then you really should make it optional and do

	if (pos)
		*pos = l;

But before doing that blindly, you want to explain why sorting is required
at all.

> +	 */
> +	*pos = l;
> +
> +	return NULL;
> +}
> +
> +static void domain_add_cpu(int cpu, struct rdt_resource *r)
> +{
> +	int i, id = get_cache_id(cpu, r->cache_level);
> +	struct list_head *add_pos = NULL;
> +	struct rdt_domain *d;
> +
> +	d = rdt_find_domain(r, id, &add_pos);
> +	if (IS_ERR(d)) {
> +		pr_warn("Could't find cache id for cpu %d\n", cpu);
> +		return;
> +	}
> +
> +	if (d) {
> +		cpumask_set_cpu(cpu, &d->cpu_mask);
> +		return;
> +	}
> +
> +	if (!add_pos) {
> +		pr_warn("Couldn't add cpu %d in %s domain\n", cpu, r->name);

Errm, how can add_pos ever be NULL if you get here? Not at all AFAICT.

> +		return;
> +	}
> +
> +	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
> +	if (!d)
> +		return;
> +
> +	d->id = id;

Please move this after the allocation. This random code ordering just makes
reading hard as one expects that d->id is a prerequisite for the
allocation.

> +	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
> +	if (!d->cbm) {
> +		pr_warn("Failed to alloc CBM array for cpu %d\n", cpu);
> +		kfree(d);
> +		return;
> +	}

New line please. Visually seperating logical code blocks enhances
readability.

> +	for (i = 0; i < r->num_closid; i++) {

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support
  2016-10-22 13:19 ` [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support Fenghua Yu
@ 2016-10-26 13:52   ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 13:52 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +static void l3_qos_cfg_update(void *arg)
> +{
> +	bool enable = *(bool *)arg;
> +
> +	wrmsrl(IA32_L3_QOS_CFG, enable);

Bah. What's wrong with

	bool *enable = arg;

	wrmsrl(IA32_L3_QOS_CFG, *enable);

That does not have this cute type cast, right?

> +static int set_l3_qos_cfg(struct rdt_resource *r, bool enable)
> +{
> +	struct rdt_domain *d;
> +	cpumask_var_t cpu_mask;
> +	int cpu;

That's the last time I ask for consistent variable ordering politely:

	cpumask_var_t cpu_mask;
	struct rdt_domain *d;
	int cpu;

> +static struct dentry *rdt_mount(struct file_system_type *fs_type,
> +				int flags, const char *unused_dev_name,
> +				void *data)
> +{
> +	struct dentry *dentry;
> +	int ret;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	/*
> +	 * resctrl file system can only be mounted once.
> +	 */
> +	if (static_branch_unlikely(&rdt_enable_key)) {
> +		dentry = ERR_PTR(-EBUSY);
> +		goto out;
> +	}
> +
> +	ret = parse_rdtgroupfs_options(data);
> +	if (ret) {
> +		dentry = ERR_PTR(ret);
> +		goto out;
> +	}
> +
> +	dentry = kernfs_mount(fs_type, flags, rdt_root,
> +			      RDTGROUP_SUPER_MAGIC, NULL);
> +	if (IS_ERR(dentry))
> +		goto out;
> +
> +	if (rdt_resources_all[RDT_RESOURCE_L3DATA].enabled &&
> +	    rdt_resources_all[RDT_RESOURCE_L3CODE].enabled) {
> +		ret = set_l3_qos_cfg(&rdt_resources_all[RDT_RESOURCE_L3], true);
> +		if (ret)
> +			goto out;

So you skip the key enable, but the mount persists and the return value of
that function is not propagated. 

> +	}
> +	static_branch_enable(&rdt_enable_key);
> +
> +out:
> +	mutex_unlock(&rdtgroup_mutex);
> +
> +	return dentry;
> +}
> +
> +static int reset_all_cbms(struct rdt_resource *r)
> +{
> +	struct rdt_domain *d;
> +	struct msr_param msr_param;
> +	cpumask_var_t cpu_mask;
> +	int i, cpu;

See above and everywhere ....

> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	msr_param.res = r;
> +	msr_param.low = 0;
> +	msr_param.high = r->num_closid;
> +
> +	/*
> +	 * Reset each domain's all CBMs to max value and copy the domain's

What means 'reset CBMs to max value'? I know what it means as do you and a
few others. 3 month from now we all scratch our heads...

> +static void rdt_kill_sb(struct super_block *sb)
> +{
> +	struct rdt_resource *r;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	/*Put everything back to default values. */
> +	for_each_enabled_rdt_resource(r)
> +		reset_all_cbms(r);
> +	r = &rdt_resources_all[RDT_RESOURCE_L3];
> +	r->enabled = r->capable;
> +	if (boot_cpu_has(X86_FEATURE_CDP_L3)) {

Why?

	if (rdt_resources_all[RDT_RESOURCE_L3DATA].enabled) {

only executes that if the thing was enabled.

> +		rdt_resources_all[RDT_RESOURCE_L3DATA].enabled = false;
> +		rdt_resources_all[RDT_RESOURCE_L3CODE].enabled = false;
> +		set_l3_qos_cfg(r, false);
> +	}
> +
> +	static_branch_disable(&rdt_enable_key);
> +	kernfs_kill_sb(sb);
> +	mutex_unlock(&rdtgroup_mutex);
> +}

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 04/18] x86/intel_rdt: Feature discovery
  2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
@ 2016-10-26 14:15   ` Borislav Petkov
  2016-10-26 14:28     ` Thomas Gleixner
  2016-10-26 21:26   ` [tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits tip-bot for Fenghua Yu
  1 sibling, 1 reply; 49+ messages in thread
From: Borislav Petkov @ 2016-10-26 14:15 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

Subject: [PATCH v5 04/18] x86/intel_rdt: Feature discovery

Subject should start with a verb. I.e.,

"x86/cpufeature: Add RDT CPUID feature bits"

On Sat, Oct 22, 2016 at 06:19:51AM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> Check CPUID leaves for all the Resource Director Technology (RDT)
> Cache Allocation Technology (CAT) bits.
> 
> Presence of allocation features:
>   CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A
> 
> L2 and L3 caches are each separately enabled:
>   CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
>   CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2
> 
> L3 cache may support independent control of allocation for
> code and data (CDP = Code/Data Prioritization):
>   CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 5 +++++
>  arch/x86/kernel/cpu/scattered.c    | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 1188bc8..21f4a7e 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -195,6 +195,10 @@
>  
>  #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
>  

No newlines when consective bits.

> +#define X86_FEATURE_CAT_L3	( 7*32+16) /* Cache Allocation Technology L3 */
> +#define X86_FEATURE_CAT_L2	( 7*32+17) /* Cache Allocation Technology L2 */
> +#define X86_FEATURE_CDP_L3	( 7*32+18) /* Code and Data Prioritization L3 */

Also, this patch or even better, the whole patchset needs to be at least
test-merged with tip/master to check for conflicting changes. Like this
one here:

#define X86_FEATURE_INTEL_PT    ( 7*32+15) /* Intel Processor Trace */
#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
				   ^^^^^^^

#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */
				   ^^^^^^

#define X86_FEATURE_CAT_L3      ( 7*32+16) /* Cache Allocation Technology L3 */
				  ^^^^^^^
#define X86_FEATURE_CAT_L2      ( 7*32+17) /* Cache Allocation Technology L2 */
				  ^^^^^^^

#define X86_FEATURE_CDP_L3      ( 7*32+18) /* Code and Data Prioritization L3 */


>  /* Virtualization flags: Linux defined, word 8 */
>  #define X86_FEATURE_TPR_SHADOW  ( 8*32+ 0) /* Intel TPR Shadow */
>  #define X86_FEATURE_VNMI        ( 8*32+ 1) /* Intel Virtual NMI */
> @@ -219,6 +223,7 @@
>  #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
>  #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
>  #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
> +#define X86_FEATURE_RDT_A	( 9*32+15) /* Resource Director Technology Allocation */
>  #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
>  #define X86_FEATURE_AVX512DQ	( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
>  #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 8cb57df..11f39a2 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -34,6 +34,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
>  		{ X86_FEATURE_INTEL_PT,		CR_EBX,25, 0x00000007, 0 },
>  		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006, 0 },
>  		{ X86_FEATURE_EPB,		CR_ECX, 3, 0x00000006, 0 },
> +		{ X86_FEATURE_CAT_L3,		CR_EBX, 1, 0x00000010, 0 },
> +		{ X86_FEATURE_CAT_L2,		CR_EBX, 2, 0x00000010, 0 },
> +		{ X86_FEATURE_CDP_L3,		CR_ECX, 2, 0x00000010, 1 },
>  		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
>  		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
>  		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },
> -- 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 04/18] x86/intel_rdt: Feature discovery
  2016-10-26 14:15   ` Borislav Petkov
@ 2016-10-26 14:28     ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 14:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Wed, 26 Oct 2016, Borislav Petkov wrote:
> > +#define X86_FEATURE_CAT_L3	( 7*32+16) /* Cache Allocation Technology L3 */
> > +#define X86_FEATURE_CAT_L2	( 7*32+17) /* Cache Allocation Technology L2 */
> > +#define X86_FEATURE_CDP_L3	( 7*32+18) /* Code and Data Prioritization L3 */
> 
> Also, this patch or even better, the whole patchset needs to be at least
> test-merged with tip/master to check for conflicting changes. Like this
> one here:
> 
> #define X86_FEATURE_INTEL_PT    ( 7*32+15) /* Intel Processor Trace */
> #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */
> #define X86_FEATURE_CAT_L3      ( 7*32+16) /* Cache Allocation Technology L3 */

That's an easy one to fixup and I prefer a clean patch against Linus tree
at the moment.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-22 13:19 ` [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system Fenghua Yu
@ 2016-10-26 14:45   ` Thomas Gleixner
  2016-10-26 15:48     ` Luck, Tony
  2016-10-27 18:17     ` Fenghua Yu
  0 siblings, 2 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 14:45 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +static int rdtgroup_add_files(struct kernfs_node *kn, struct rftype *rfts,
> +			      int len)
> +{
> +	struct rftype *rft;
> +	int ret;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	for (rft = rfts; rft < rfts + len; rft++) {
> +		ret = rdtgroup_add_file(kn, rft);
> +		if (ret)
> +			goto error;
> +	}
> +
> +	return 0;
> +error:
> +	pr_warn("%s: failed to add %s, err=%d\n", __func__, rft->name, ret);

Please stop this silly __func__ nonsense. You already have a prefix and if
you write out a proper sentence describing the problem then it's way more
helpful than __func__

	pr_warn("Failed to add file %s, err %d\n" ....

Tells exactly what happened and for a sysadmin this is information enough
as he does not care at all in which function that happens. For the
developer who is poked by that admin it's easy enough to find the function
via the string.

> +/* rdtgroup information files for one cache resource. */
> +static struct rftype res_info_files[] = {
> +	{
> +		.name		= "num_closid",

num_closids please

> +		.mode		= 0444,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= rdt_num_closid_show,
> +	},
> +	{
> +		.name		= "cbm_val",

cbm_val? Is that a value? No, it's the valid bitmask which you can set. So
cmb_mask or something else which is unambiguous is what you want here.

This is a user space interface and we really must make it as intuitive as
possible.

> +/*
> + * Forcibly remove all of subdirectories under root.
> + */
> +static void rmdir_all_sub(void)
> +{
> +	kernfs_remove(kn_info);

What clears kn_info?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system
  2016-10-22 13:20 ` [PATCH v5 13/18] x86/intel_rdt: Add mkdir " Fenghua Yu
@ 2016-10-26 15:01   ` Thomas Gleixner
  2016-10-28 17:51     ` Fenghua Yu
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 15:01 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +/*
> + * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
> + * we can keep a bitmap of free CLOSIDs in a single integer.
> + *
> + * Using a global CLOSID across all resources has some advantages and
> + * some drawbacks:
> + * + We can simply set "current->closid" to assign a task to a resource
> + *   group.
> + * + Context switch code can avoid extra memory references deciding which
> + *   CLOSID to load into the PQR_ASSOC MSR
> + * - We give up some options in configuring resource groups across multi-socket
> + *   systems.
> + * - Our choices on how to configure each resource become progressively more
> + *   limited as the number of resources grows.
> + */
> +static int closid_free_map;
> +
> +static void closid_init(void)
> +{
> +	struct rdt_resource *r;
> +	int rdt_max_closid;
> +
> +	/* Compute rdt_max_closid across all resources */
> +	rdt_max_closid = 0;
> +	for_each_enabled_rdt_resource(r)
> +		rdt_max_closid = max(rdt_max_closid, r->num_closid);

So you decided to silently ignore my objections against this approach. Fine
with me, but that does not solve the problem at all.

Once more:

On a system with L2 and L3 CAT it does not make any sense at all to expose
the closids which exceed the L2 space. Simply because using them wreckages
any L2 partitioning done in the valid L2 space.

If you really want to allow that, then:

   1) It must be a opt-in at mount time

   2) It must be documented clearly along with the mount option

> +	/*
> +	 * CDP is "special". Because we share the L3 CBM MSR array
> +	 * between L3DATA and L3CODE, we must not use a CLOSID larger
> +	 * than they support. Just check against L3DATA because it
> +	 * is the same as L3CODE.
> +	 */
> +	r = &rdt_resources_all[RDT_RESOURCE_L3DATA];
> +	if (r->enabled)
> +		rdt_max_closid = min(rdt_max_closid, r->num_closid);

This explicit special casing is crap, really.

	for_each_enabled_rdt_resource(r)
		rdt_max_closid = max(rdt_max_closid, r->num_closid);

	for_each_enabled_rdt_resource(r) {
		if (!relaxed_max_closid || r->force_min_closid)
			rdt_max_closid = min(rdt_max_closid, r->num_closid);
	}

Handles all cases without 'CDP is special' and whatever nonsense intel will
come up with in future. All you need to do is to add that force_min_closid
field into the resource struct and set it for l3data and l3code.

relaxed_max_closid is set at mount time by an appropriate mount option.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 15/18] x86/intel_rdt: Add tasks files
  2016-10-22 13:20 ` [PATCH v5 15/18] x86/intel_rdt: Add tasks files Fenghua Yu
@ 2016-10-26 15:27   ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 15:27 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
> +{
> +	struct task_struct *p;
> +
> +	rcu_read_lock();
> +	for_each_process(p) {

So this is restricted to processes, but with the write function I can put a
single thread into a group ...

for_each_process_thread() is what you want.

>  static int rdt_num_closid_show(struct kernfs_open_file *of,
> @@ -594,6 +758,13 @@ static void rdt_reset_pqr_assoc_closid(void *v)
>  static void rmdir_all_sub(void)
>  {
>  	struct rdtgroup *rdtgrp, *tmp;
> +	struct task_struct *p;
> +
> +	/* move all tasks to default resource group */
> +	read_lock(&tasklist_lock);
> +	for_each_process(p)
> +		p->closid = 0;

Same issue as above ....

> +	read_unlock(&tasklist_lock);
>  
>  	get_cpu();
>  	/* Reset PQR_ASSOC MSR on this cpu. */
> @@ -723,6 +894,7 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
>  {
>  	struct rdtgroup *rdtgrp;
>  	int cpu, ret = 0;
> +	struct task_struct *p;
>  
>  	rdtgrp = rdtgroup_kn_lock_live(kn);
>  	if (!rdtgrp) {
> @@ -730,6 +902,14 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
>  		return -ENOENT;
>  	}
>  
> +	/* Give any tasks back to the default group */
> +	read_lock(&tasklist_lock);
> +	for_each_process(p) {
> +		if (p->closid == rdtgrp->closid)
> +			p->closid = 0;

Once more.

> +	}
> +	read_unlock(&tasklist_lock);
> +

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-26 14:45   ` Thomas Gleixner
@ 2016-10-26 15:48     ` Luck, Tony
  2016-10-26 17:33       ` Thomas Gleixner
  2016-10-27 18:17     ` Fenghua Yu
  1 sibling, 1 reply; 49+ messages in thread
From: Luck, Tony @ 2016-10-26 15:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yu, Fenghua, Anvin, H Peter, Ingo Molnar, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Hansen, Dave, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Shankar, Ravi V, Prakhya,
	Sai Praneeth, Vikas Shivappa, linux-kernel, x86


> 
>> +        .mode        = 0444,
>> +        .kf_ops        = &rdtgroup_kf_single_ops,
>> +        .seq_show    = rdt_num_closid_show,
>> +    },
>> +    {
>> +        .name        = "cbm_val",
> 
> cbm_val? Is that a value? No, it's the valid bitmask which you can set. So
> cmb_mask or something else which is unambiguous is what you want here.
> 
> This is a user space interface and we really must make it as intuitive as
> possible.

It's a bit mask for cache resources. But this interface will also control resources that are not caches, so I'd like to avoid "cbm" in the name. Perhaps "max_val"?

-Tony

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-26 13:02   ` Thomas Gleixner
@ 2016-10-26 16:06     ` Luck, Tony
  2016-10-26 17:31       ` Thomas Gleixner
  2016-10-26 21:14     ` Fenghua Yu
  1 sibling, 1 reply; 49+ messages in thread
From: Luck, Tony @ 2016-10-26 16:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Yu, Fenghua, Anvin, H Peter, Ingo Molnar, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Hansen, Dave, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Shankar, Ravi V, Prakhya,
	Sai Praneeth, Vikas Shivappa, linux-kernel, x86

Order is visible to users when we print entries in the schemata file, and validate input that they write (we require that they provide all masks in the same order as this list).

If we hot remove a socket, it disappears from the list, and from the schemata file. When we put in a replacement it reappears. I didn't want the user to see:
L3:0=fffff;2=fffff;3=fffff;1=fffff
after hot replace socket 1, hence the sort.

Sent from my iPhone

> On Oct 26, 2016, at 06:05, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Sat, 22 Oct 2016, Fenghua Yu wrote:
>> +void rdt_cbm_update(void *arg)
>> +{
>> +    struct msr_param *m = (struct msr_param *)arg;
>> +    struct rdt_resource *r = m->res;
>> +    int i, cpu = smp_processor_id();
>> +    struct rdt_domain *d;
>> +
>> +    list_for_each_entry(d, &r->domains, list) {
> 
>> +static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
>> +                      struct list_head **pos)
>> +{
>> +    struct rdt_domain *d;
>> +    struct list_head *l;
>> +
>> +    if (id < 0)
>> +        return ERR_PTR(id);
>> +
>> +    list_for_each(l, &r->domains) {
>> +        d = list_entry(l, struct rdt_domain, list);
> 
> So above you converted to list_for_each_entry(). Is there a sensible
> reason, aside of being sloppy, why is this still using list_for_each()?
> 
>> +        /* When id is found, return its domain. */
>> +        if (id == d->id)
>> +            return d;
>> +        /* Stop searching when finding id's position in sorted list. */
> 
> What is the reason that this needs to be in a sorted list?
> 
> I haven't found one so far. And if there is none, then this can use a hlist.
> 
>> +        if (id < d->id)
>> +            break;
>> +    }
>> +    /*
>> +     * No id is found in resource domains. Record the position
>> +     * that the new domain will be added. The posistion is not used
>> +     * when removing a domain.
> 
> This comment makes no sense. If you want to document that a caller does not
> require the @pos argument, then you really should make it optional and do
> 
>    if (pos)
>        *pos = l;
> 
> But before doing that blindly, you want to explain why sorting is required
> at all.
> 
>> +     */
>> +    *pos = l;
>> +
>> +    return NULL;
>> +}
>> +
>> +static void domain_add_cpu(int cpu, struct rdt_resource *r)
>> +{
>> +    int i, id = get_cache_id(cpu, r->cache_level);
>> +    struct list_head *add_pos = NULL;
>> +    struct rdt_domain *d;
>> +
>> +    d = rdt_find_domain(r, id, &add_pos);
>> +    if (IS_ERR(d)) {
>> +        pr_warn("Could't find cache id for cpu %d\n", cpu);
>> +        return;
>> +    }
>> +
>> +    if (d) {
>> +        cpumask_set_cpu(cpu, &d->cpu_mask);
>> +        return;
>> +    }
>> +
>> +    if (!add_pos) {
>> +        pr_warn("Couldn't add cpu %d in %s domain\n", cpu, r->name);
> 
> Errm, how can add_pos ever be NULL if you get here? Not at all AFAICT.
> 
>> +        return;
>> +    }
>> +
>> +    d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
>> +    if (!d)
>> +        return;
>> +
>> +    d->id = id;
> 
> Please move this after the allocation. This random code ordering just makes
> reading hard as one expects that d->id is a prerequisite for the
> allocation.
> 
>> +    d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
>> +    if (!d->cbm) {
>> +        pr_warn("Failed to alloc CBM array for cpu %d\n", cpu);
>> +        kfree(d);
>> +        return;
>> +    }
> 
> New line please. Visually seperating logical code blocks enhances
> readability.
> 
>> +    for (i = 0; i < r->num_closid; i++) {
> 
> Thanks,
> 
>    tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-26 16:06     ` Luck, Tony
@ 2016-10-26 17:31       ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 17:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Yu, Fenghua, Anvin, H Peter, Ingo Molnar, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Hansen, Dave, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Shankar, Ravi V, Prakhya,
	Sai Praneeth, Vikas Shivappa, linux-kernel, x86

On Wed, 26 Oct 2016, Luck, Tony wrote:

> Order is visible to users when we print entries in the schemata file, and
> validate input that they write (we require that they provide all masks in
> the same order as this list).

> If we hot remove a socket, it disappears from the list, and from the
> schemata file. When we put in a replacement it reappears. I didn't want
> the user to see:
> L3:0=fffff;2=fffff;3=fffff;1=fffff
> after hot replace socket 1, hence the sort.

Fair enough. Spelling this out in a comment might prevent questions about
this in the future.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-26 15:48     ` Luck, Tony
@ 2016-10-26 17:33       ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 17:33 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Yu, Fenghua, Anvin, H Peter, Ingo Molnar, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Hansen, Dave, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Shankar, Ravi V, Prakhya,
	Sai Praneeth, Vikas Shivappa, linux-kernel, x86

On Wed, 26 Oct 2016, Luck, Tony wrote:
> >> +        .mode        = 0444,
> >> +        .kf_ops        = &rdtgroup_kf_single_ops,
> >> +        .seq_show    = rdt_num_closid_show,
> >> +    },
> >> +    {
> >> +        .name        = "cbm_val",
> > 
> > cbm_val? Is that a value? No, it's the valid bitmask which you can set. So
> > cmb_mask or something else which is unambiguous is what you want here.
> > 
> > This is a user space interface and we really must make it as intuitive as
> > possible.
> 
> It's a bit mask for cache resources. But this interface will also control resources that are not caches, so I'd like to avoid "cbm" in the name. Perhaps "max_val"?
> 

max_val is misleading if it's always a bit mask. 'bitmask' might not be the
worst choice then.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 14/18] x86/intel_rdt: Add cpus file
  2016-10-22 13:20 ` [PATCH v5 14/18] x86/intel_rdt: Add cpus file Fenghua Yu
@ 2016-10-26 17:57   ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 17:57 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> @@ -323,8 +325,11 @@ static int intel_rdt_online_cpu(unsigned int cpu)
>  	struct rdt_resource *r;
>  
>  	mutex_lock(&rdtgroup_mutex);
> +	per_cpu(cpu_closid, cpu) = 0;
>  	for_each_capable_rdt_resource(r)
>  		domain_add_cpu(cpu, r);
> +	/* The cpu is set in default rdtgroup after online. */
> +	cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
>  	state->closid = 0;
>  	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
>  	mutex_unlock(&rdtgroup_mutex);
> @@ -334,11 +339,16 @@ static int intel_rdt_online_cpu(unsigned int cpu)
>  
>  static int intel_rdt_offline_cpu(unsigned int cpu)
>  {
> +	struct rdtgroup *rdtgrp;
>  	struct rdt_resource *r;
>  
>  	mutex_lock(&rdtgroup_mutex);
>  	for_each_capable_rdt_resource(r)
>  		domain_remove_cpu(cpu, r);
> +	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
> +		if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask))
> +			break;
> +	}

I think we should set cpu_closid to 0 here and cleanup the per cpu state /
MSR as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
  2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
@ 2016-10-26 20:43   ` Thomas Gleixner
  2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
  1 sibling, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 20:43 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> +config INTEL_RDT_A
> +	bool "Intel Resource Director Technology Allocation support"
> +	default n
> +	depends on X86 && CPU_SUP_INTEL

depends on X86 is more than pointless for a config switch which sits in the
arch/x86/Kconfig file....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-26 13:02   ` Thomas Gleixner
  2016-10-26 16:06     ` Luck, Tony
@ 2016-10-26 21:14     ` Fenghua Yu
  2016-10-26 21:18       ` Thomas Gleixner
  1 sibling, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-26 21:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Borislav Petkov, Dave Hansen,
	Nilay Vaish, Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar,
	Sai Prakhya, Vikas Shivappa, linux-kernel, x86

On Wed, Oct 26, 2016 at 03:02:56PM +0200, Thomas Gleixner wrote:
> On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > +void rdt_cbm_update(void *arg)
> > +{
> > +	struct msr_param *m = (struct msr_param *)arg;
> > +	struct rdt_resource *r = m->res;
> > +	int i, cpu = smp_processor_id();
> > +	struct rdt_domain *d;
> > +
> > +	list_for_each_entry(d, &r->domains, list) {
> 
> > +static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> > +					  struct list_head **pos)
> > +{
> > +	struct rdt_domain *d;
> > +	struct list_head *l;
> > +
> > +	if (id < 0)
> > +		return ERR_PTR(id);
> > +
> > +	list_for_each(l, &r->domains) {
> > +		d = list_entry(l, struct rdt_domain, list);
> 
> So above you converted to list_for_each_entry(). Is there a sensible
> reason, aside of being sloppy, why is this still using list_for_each()?

We use list_for_each() because we want to get the list_head "l". The l
is used to find the position that the new domain will be inserted.

The same function rdt_find_domain() takes care of two similar tasks (find
matched domain or find a position in the domain list to insert a new domain).
Maybe not good for two tasks to share the same function?

> 
> > +		/* When id is found, return its domain. */
> > +		if (id == d->id)
> > +			return d;
> > +		/* Stop searching when finding id's position in sorted list. */
> 
> What is the reason that this needs to be in a sorted list?
> 
> I haven't found one so far. And if there is none, then this can use a hlist.
> 
> > +		if (id < d->id)
> > +			break;
> > +	}
> > +	/*
> > +	 * No id is found in resource domains. Record the position
> > +	 * that the new domain will be added. The posistion is not used
> > +	 * when removing a domain.
> 
> This comment makes no sense. If you want to document that a caller does not
> require the @pos argument, then you really should make it optional and do
> 
> 	if (pos)
> 		*pos = l;
> 
> But before doing that blindly, you want to explain why sorting is required
> at all.
> 
> > +	 */
> > +	*pos = l;
> > +
> > +	return NULL;
> > +}
> > +
> > +static void domain_add_cpu(int cpu, struct rdt_resource *r)
> > +{
> > +	int i, id = get_cache_id(cpu, r->cache_level);
> > +	struct list_head *add_pos = NULL;
> > +	struct rdt_domain *d;
> > +
> > +	d = rdt_find_domain(r, id, &add_pos);
> > +	if (IS_ERR(d)) {
> > +		pr_warn("Could't find cache id for cpu %d\n", cpu);
> > +		return;
> > +	}
> > +
> > +	if (d) {
> > +		cpumask_set_cpu(cpu, &d->cpu_mask);
> > +		return;
> > +	}
> > +
> > +	if (!add_pos) {
> > +		pr_warn("Couldn't add cpu %d in %s domain\n", cpu, r->name);
> 
> Errm, how can add_pos ever be NULL if you get here? Not at all AFAICT.
> 
> > +		return;
> > +	}
> > +
> > +	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
> > +	if (!d)
> > +		return;
> > +
> > +	d->id = id;
> 
> Please move this after the allocation. This random code ordering just makes
> reading hard as one expects that d->id is a prerequisite for the
> allocation.
> 
> > +	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
> > +	if (!d->cbm) {
> > +		pr_warn("Failed to alloc CBM array for cpu %d\n", cpu);
> > +		kfree(d);
> > +		return;
> > +	}
> 
> New line please. Visually seperating logical code blocks enhances
> readability.
> 
> > +	for (i = 0; i < r->num_closid; i++) {
> 
> Thanks,
> 
> 	tglx

The following patch #10 is supposed to fix issues you pointed out above.

Is it good now?

---
 arch/x86/include/asm/intel_rdt.h |  35 ++++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 189 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 224 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 9780409..c0d0a6e 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -39,6 +39,34 @@ struct rdt_resource {
 	int			cbm_idx_offset;
 };
 
+/**
+ * struct rdt_domain - group of cpus sharing an RDT resource
+ * @list:	all instances of this resource
+ * @id:		unique id for this instance
+ * @cpu_mask:	which cpus share this resource
+ * @cbm:	array of cache bit masks (indexed by CLOSID)
+ */
+struct rdt_domain {
+	struct list_head	list;
+	int			id;
+	struct cpumask		cpu_mask;
+	u32			*cbm;
+};
+
+/**
+ * struct msr_param - set a range of MSRs from a domain
+ * @res:       The resource to use
+ * @low:       Beginning index from base MSR
+ * @high:      End index
+ */
+struct msr_param {
+	struct rdt_resource	*res;
+	int			low;
+	int			high;
+};
+
+extern struct mutex rdtgroup_mutex;
+
 extern struct rdt_resource rdt_resources_all[];
 
 enum {
@@ -56,6 +84,11 @@ enum {
 	     r++) 							      \
 		if (r->capable)
 
+#define for_each_enabled_rdt_resource(r)				      \
+	for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\
+	     r++)							      \
+		if (r->enabled)
+
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
 	struct {
@@ -71,4 +104,6 @@ union cpuid_0x10_1_edx {
 	} split;
 	unsigned int full;
 };
+
+void rdt_cbm_update(void *arg);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 29308e1..23f1740 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -26,11 +26,16 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpuhotplug.h>
 
 #include <asm/intel_rdt_common.h>
 #include <asm/intel-family.h>
 #include <asm/intel_rdt.h>
 
+/* Mutex to protect rdtgroup access. */
+DEFINE_MUTEX(rdtgroup_mutex);
+
 #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
 
 struct rdt_resource rdt_resources_all[] = {
@@ -72,6 +77,11 @@ struct rdt_resource rdt_resources_all[] = {
 	},
 };
 
+static int cbm_idx(struct rdt_resource *r, int closid)
+{
+	return closid * r->cbm_idx_multi + r->cbm_idx_offset;
+}
+
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -176,14 +186,193 @@ static inline bool get_rdt_resources(void)
 	return ret;
 }
 
+static int get_cache_id(int cpu, int level)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+	int i;
+
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == level)
+			return ci->info_list[i].id;
+	}
+
+	return -1;
+}
+
+void rdt_cbm_update(void *arg)
+{
+	struct msr_param *m = (struct msr_param *)arg;
+	struct rdt_resource *r = m->res;
+	int i, cpu = smp_processor_id();
+	struct rdt_domain *d;
+
+	list_for_each_entry(d, &r->domains, list) {
+		/* Find the domain that contains this CPU */
+		if (cpumask_test_cpu(cpu, &d->cpu_mask))
+			goto found;
+	}
+	pr_info_once("cpu %d not found in any domain for resource %s\n",
+		     cpu, r->name);
+
+	return;
+
+found:
+	for (i = m->low; i < m->high; i++) {
+		int idx = cbm_idx(r, i);
+
+		wrmsrl(r->msr_base + idx, d->cbm[i]);
+	}
+}
+
+/*
+ * rdt_find_domain - Find a domain in a resource that matches input resource id
+ *
+ * Search a resource r's domain list to find the resource id. If the resource
+ * id is found in a domain, return the domain. Otherwise, if requested by
+ * caller, return the first domain whose id is bigger than the input id.
+ * The domain list is sorted by id in ascending order.
+ */
+static struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
+					  struct list_head **pos)
+{
+	struct rdt_domain *d;
+	struct list_head *l;
+
+	if (id < 0)
+		return ERR_PTR(id);
+
+	list_for_each(l, &r->domains) {
+		d = list_entry(l, struct rdt_domain, list);
+		/* When id is found, return its domain. */
+		if (id == d->id)
+			return d;
+		/* Stop searching when finding id's position in sorted list. */
+		if (id < d->id)
+			break;
+	}
+
+	if (pos)
+		*pos = l;
+
+	return NULL;
+}
+
+/*
+ * domain_add_cpu - Add a cpu to a resource's domain list.
+ *
+ * If an existing domain in the resource r's domain list matches the cpu's
+ * resource id, add the cpu in the domain.
+ *
+ * Otherwise, a new domain is allocated and inserted into right position
+ * in the domain list sorted by id in ascending order.
+ *
+ * The order in the domain list is visible to users when we print entries
+ * in the schemata file and schemata input is validated to have the same order
+ * as this list.
+ */
+static void domain_add_cpu(int cpu, struct rdt_resource *r)
+{
+	int i, id = get_cache_id(cpu, r->cache_level);
+	struct list_head *add_pos = NULL;
+	struct rdt_domain *d;
+
+	d = rdt_find_domain(r, id, &add_pos);
+	if (IS_ERR(d)) {
+		pr_warn("Could't find cache id for cpu %d\n", cpu);
+		return;
+	}
+
+	if (d) {
+		cpumask_set_cpu(cpu, &d->cpu_mask);
+		return;
+	}
+
+	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
+	if (!d)
+		return;
+
+	d->cbm = kmalloc_array(r->num_closid, sizeof(*d->cbm), GFP_KERNEL);
+	if (!d->cbm) {
+		pr_warn("Failed to alloc CBM array for cpu %d\n", cpu);
+		kfree(d);
+		return;
+	}
+
+	for (i = 0; i < r->num_closid; i++) {
+		int idx = cbm_idx(r, i);
+
+		d->cbm[i] = r->max_cbm;
+		wrmsrl(r->msr_base + idx, d->cbm[i]);
+	}
+
+	d->id = id;
+	cpumask_set_cpu(cpu, &d->cpu_mask);
+	list_add_tail(&d->list, add_pos);
+	r->num_domains++;
+}
+
+static void domain_remove_cpu(int cpu, struct rdt_resource *r)
+{
+	int id = get_cache_id(cpu, r->cache_level);
+	struct rdt_domain *d;
+
+	d = rdt_find_domain(r, id, NULL);
+	if (IS_ERR_OR_NULL(d)) {
+		pr_warn("Could't find cache id for cpu %d\n", cpu);
+		return;
+	}
+
+	cpumask_clear_cpu(cpu, &d->cpu_mask);
+	if (cpumask_empty(&d->cpu_mask)) {
+		r->num_domains--;
+		kfree(d->cbm);
+		list_del(&d->list);
+		kfree(d);
+	}
+}
+
+static int intel_rdt_online_cpu(unsigned int cpu)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	struct rdt_resource *r;
+
+	mutex_lock(&rdtgroup_mutex);
+	for_each_capable_rdt_resource(r)
+		domain_add_cpu(cpu, r);
+	state->closid = 0;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+static int intel_rdt_offline_cpu(unsigned int cpu)
+{
+	struct rdt_resource *r;
+
+	mutex_lock(&rdtgroup_mutex);
+	for_each_capable_rdt_resource(r)
+		domain_remove_cpu(cpu, r);
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
 static int __init intel_rdt_late_init(void)
 {
 	bool first_resource = true;
 	struct rdt_resource *r;
+	int state;
 
 	if (!get_rdt_resources())
 		return -ENODEV;
 
+	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
+				  "x86/rdt/cat:online:",
+				  intel_rdt_online_cpu, intel_rdt_offline_cpu);
+	if (state < 0)
+		return state;
+
 	pr_info("Intel RDT allocation detected: ");
 	for_each_capable_rdt_resource(r) {
 		if (!first_resource)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology
  2016-10-26 21:14     ` Fenghua Yu
@ 2016-10-26 21:18       ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 21:18 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Wed, 26 Oct 2016, Fenghua Yu wrote:
> On Wed, Oct 26, 2016 at 03:02:56PM +0200, Thomas Gleixner wrote:
> > > +	list_for_each(l, &r->domains) {
> > > +		d = list_entry(l, struct rdt_domain, list);
> > 
> > So above you converted to list_for_each_entry(). Is there a sensible
> > reason, aside of being sloppy, why is this still using list_for_each()?
> 
> We use list_for_each() because we want to get the list_head "l". The l
> is used to find the position that the new domain will be inserted.

Sorry. I completely missed that position information thingy. No change
required then.

> The following patch #10 is supposed to fix issues you pointed out above.
> 
> Is it good now?

Looks fine. 

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [tip:x86/cache] Documentation, ABI: Document the new sysfs files for cpu cache ids
  2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
@ 2016-10-26 21:25   ` tip-bot for Tony Luck
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Tony Luck @ 2016-10-26 21:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, shli, tony.luck, sai.praneeth.prakhya, mingo,
	fenghua.yu, dave.hansen, bp, davidcc, h.peter.anvin, eranian,
	ravi.v.shankar, peterz, hpa, vikas.shivappa, nilayvaish,
	linux-kernel

Commit-ID:  1d78dc59f5ab6f467e49882518453adc7e4caa44
Gitweb:     http://git.kernel.org/tip/1d78dc59f5ab6f467e49882518453adc7e4caa44
Author:     Tony Luck <tony.luck@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:48 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:37 +0200

Documentation, ABI: Document the new sysfs files for cpu cache ids

Add an ABI document entry for /sys/devices/system/cpu/cpu*/cache/index*/id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-2-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 4987417..2a4a423 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -272,6 +272,22 @@ Description:	Parameters for the CPU cache attributes
 				     the modified cache line is written to main
 				     memory only when it is replaced
 
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/id
+Date:		September 2016
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Cache id
+
+		The id provides a unique number for a specific instance of
+		a cache of a particular type. E.g. there may be a level
+		3 unified cache on each socket in a server and we may
+		assign them ids 0, 1, 2, ...
+
+		Note that id value can be non-contiguous. E.g. level 1
+		caches typically exist per core, but there may not be a
+		power of two cores on a socket, so these caches may be
+		numbered 0, 1, 2, 3, 4, 5, 8, 9, 10, ...
+
 What:		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] cacheinfo: Introduce cache id
  2016-10-22 13:19 ` [PATCH v5 02/18] cacheinfo: Introduce cache id Fenghua Yu
@ 2016-10-26 21:25   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: davidcc, mingo, peterz, ravi.v.shankar, linux-kernel, nilayvaish,
	h.peter.anvin, mingo, dave.hansen, shli, sai.praneeth.prakhya,
	hpa, fenghua.yu, bp, vikas.shivappa, tglx, tony.luck, eranian

Commit-ID:  e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2
Gitweb:     http://git.kernel.org/tip/e9a2ea5a1ba09c35258f3663842fb8d8cf2e00c2
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:49 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:37 +0200

cacheinfo: Introduce cache id

Cache management software needs an id for each instance of a cache of
a particular type.

The current cacheinfo structure does not provide any information about
the underlying hardware so there is no way to expose it.

Hardware with cache management features provides means (cpuid, enumeration
etc.) to retrieve the hardware id of a particular cache instance. Cache
instances which share hardware have the same hardware id.

Add an 'id' field to struct cacheinfo to store this information. Expose
this information under the /sys/devices/system/cpu/cpu*/cache/index*/
directory as well.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-3-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/base/cacheinfo.c  | 5 +++++
 include/linux/cacheinfo.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index e9fd32e..00a9688 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev,		\
 	return sprintf(buf, "%u\n", this_leaf->object);		\
 }
 
+show_one(id, id);
 show_one(level, level);
 show_one(coherency_line_size, coherency_line_size);
 show_one(number_of_sets, number_of_sets);
@@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
 	return n;
 }
 
+static DEVICE_ATTR_RO(id);
 static DEVICE_ATTR_RO(level);
 static DEVICE_ATTR_RO(type);
 static DEVICE_ATTR_RO(coherency_line_size);
@@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
 static DEVICE_ATTR_RO(physical_line_partition);
 
 static struct attribute *cache_default_attrs[] = {
+	&dev_attr_id.attr,
 	&dev_attr_type.attr,
 	&dev_attr_level.attr,
 	&dev_attr_shared_cpu_map.attr,
@@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
 	const struct cpumask *mask = &this_leaf->shared_cpu_map;
 	umode_t mode = attr->mode;
 
+	if ((attr == &dev_attr_id.attr) && (this_leaf->attributes & CACHE_ID))
+		return mode;
 	if ((attr == &dev_attr_type.attr) && this_leaf->type)
 		return mode;
 	if ((attr == &dev_attr_level.attr) && this_leaf->level)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2189935..0bcbb67 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -18,6 +18,7 @@ enum cache_type {
 
 /**
  * struct cacheinfo - represent a cache leaf node
+ * @id: This cache's id. It is unique among caches with the same (type, level).
  * @type: type of the cache - data, inst or unified
  * @level: represents the hierarchy in the multi-level cache
  * @coherency_line_size: size of each cache line usually representing
@@ -44,6 +45,7 @@ enum cache_type {
  * keeping, the remaining members form the core properties of the cache
  */
 struct cacheinfo {
+	unsigned int id;
 	enum cache_type type;
 	unsigned int level;
 	unsigned int coherency_line_size;
@@ -61,6 +63,7 @@ struct cacheinfo {
 #define CACHE_WRITE_ALLOCATE	BIT(3)
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
+#define CACHE_ID		BIT(4)
 
 	struct device_node *of_node;
 	bool disable_sysfs;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/intel_cacheinfo: Enable cache id in cache info
  2016-10-22 13:19 ` [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info Fenghua Yu
@ 2016-10-26 21:26   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ravi.v.shankar, eranian, mingo, tony.luck, sai.praneeth.prakhya,
	tglx, h.peter.anvin, mingo, davidcc, linux-kernel, hpa,
	vikas.shivappa, nilayvaish, fenghua.yu, peterz, dave.hansen, bp,
	shli

Commit-ID:  d57e3ab7e34c51a8badeea1b500bfb738d0af66e
Gitweb:     http://git.kernel.org/tip/d57e3ab7e34c51a8badeea1b500bfb738d0af66e
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:50 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:37 +0200

x86/intel_cacheinfo: Enable cache id in cache info

Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.

For more details please see the section on "Cache ID Extraction
Parameters" in "Intel 64 Architecture Processor Topology Enumeration".

Also the documentation of the CPUID instruction in the "Intel 64 and
IA-32 Architectures Software Developer's Manual"

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/intel_cacheinfo.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index de6626c..8dc5720 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -153,6 +153,7 @@ struct _cpuid4_info_regs {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
 	union _cpuid4_leaf_ecx ecx;
+	unsigned int id;
 	unsigned long size;
 	struct amd_northbridge *nb;
 };
@@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct _cpuid4_info_regs *base)
 {
+	this_leaf->id = base->id;
+	this_leaf->attributes = CACHE_ID;
 	this_leaf->level = base->eax.split.level;
 	this_leaf->type = cache_type_map[base->eax.split.type];
 	this_leaf->coherency_line_size =
@@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * ECX as cache index. Then right shift apicid by the number's order to get
+ * cache id for this cache node.
+ */
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	unsigned long num_threads_sharing;
+	int index_msb;
+
+	num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+	index_msb = get_count_order(num_threads_sharing);
+	id4_regs->id = c->apicid >> index_msb;
+}
+
 static int __populate_cache_leaves(unsigned int cpu)
 {
 	unsigned int idx, ret;
@@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu)
 		ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
 		if (ret)
 			return ret;
+		get_cache_id(cpu, &id4_regs);
 		ci_leaf_init(this_leaf++, &id4_regs);
 		__cache_cpumap_setup(cpu, idx, &id4_regs);
 	}

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits
  2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
  2016-10-26 14:15   ` Borislav Petkov
@ 2016-10-26 21:26   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vikas.shivappa, h.peter.anvin, linux-kernel, davidcc,
	sai.praneeth.prakhya, mingo, dave.hansen, peterz, tony.luck,
	mingo, fenghua.yu, tglx, ravi.v.shankar, eranian, shli,
	nilayvaish, bp, hpa

Commit-ID:  4ab1586488cb56ed8728e54c4157cc38646874d9
Gitweb:     http://git.kernel.org/tip/4ab1586488cb56ed8728e54c4157cc38646874d9
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:51 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/cpufeature: Add RDT CPUID feature bits

Check CPUID leaves for all the Resource Director Technology (RDT)
Cache Allocation Technology (CAT) bits.

Presence of allocation features:
  CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A

L2 and L3 caches are each separately enabled:
  CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
  CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2

L3 cache may support independent control of allocation for
code and data (CDP = Code/Data Prioritization):
  CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3

[ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: "Borislav Petkov" <bp@suse.de>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/cpufeatures.h | 4 ++++
 arch/x86/kernel/cpu/scattered.c    | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index a396292..90b8c0b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -189,6 +189,9 @@
 
 #define X86_FEATURE_CPB		( 7*32+ 2) /* AMD Core Performance Boost */
 #define X86_FEATURE_EPB		( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */
+#define X86_FEATURE_CAT_L3	( 7*32+ 4) /* Cache Allocation Technology L3 */
+#define X86_FEATURE_CAT_L2	( 7*32+ 5) /* Cache Allocation Technology L2 */
+#define X86_FEATURE_CDP_L3	( 7*32+ 6) /* Code and Data Prioritization L3 */
 
 #define X86_FEATURE_HW_PSTATE	( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
@@ -221,6 +224,7 @@
 #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
 #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
+#define X86_FEATURE_RDT_A	( 9*32+15) /* Resource Director Technology Allocation */
 #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_AVX512DQ	( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..49fb680 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -36,6 +36,9 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 		{ X86_FEATURE_AVX512_4FMAPS,	CR_EDX, 3, 0x00000007, 0 },
 		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006, 0 },
 		{ X86_FEATURE_EPB,		CR_ECX, 3, 0x00000006, 0 },
+		{ X86_FEATURE_CAT_L3,		CR_EBX, 1, 0x00000010, 0 },
+		{ X86_FEATURE_CAT_L2,		CR_EBX, 2, 0x00000010, 0 },
+		{ X86_FEATURE_CDP_L3,		CR_ECX, 2, 0x00000010, 1 },
 		{ X86_FEATURE_HW_PSTATE,	CR_EDX, 7, 0x80000007, 0 },
 		{ X86_FEATURE_CPB,		CR_EDX, 9, 0x80000007, 0 },
 		{ X86_FEATURE_PROC_FEEDBACK,	CR_EDX,11, 0x80000007, 0 },

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization
  2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
  2016-10-26 20:43   ` Thomas Gleixner
@ 2016-10-26 21:27   ` tip-bot for Fenghua Yu
  1 sibling, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, tony.luck, mingo, linux-kernel, ravi.v.shankar, fenghua.yu,
	dave.hansen, eranian, nilayvaish, tglx, hpa, peterz, mingo, shli,
	sai.praneeth.prakhya, davidcc, h.peter.anvin, vikas.shivappa

Commit-ID:  78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd
Gitweb:     http://git.kernel.org/tip/78e99b4a2b9afb1c304259fcd4a1c71ca97e3acd
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:53 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/intel_rdt: Add CONFIG, Makefile, and basic initialization

Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
control inclusion of Resource Director Technology in the build.

Simple init() routine just checks which features are present. If they are
pr_info() one line summary for each feature for now.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/Kconfig                | 12 +++++++++
 arch/x86/kernel/cpu/Makefile    |  2 ++
 arch/x86/kernel/cpu/intel_rdt.c | 54 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636..770fb5f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -407,6 +407,18 @@ config GOLDFISH
        def_bool y
        depends on X86_GOLDFISH
 
+config INTEL_RDT_A
+	bool "Intel Resource Director Technology Allocation support"
+	default n
+	depends on X86 && CPU_SUP_INTEL
+	help
+	  Select to enable resource allocation which is a sub-feature of
+	  Intel Resource Director Technology(RDT). More information about
+	  RDT can be found in the Intel x86 Architecture Software
+	  Developer Manual.
+
+	  Say N if unsure.
+
 if X86_32
 config X86_EXTENDED_PLATFORM
 	bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4a8697f..cf4bfd0 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
+obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o
+
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
 obj-$(CONFIG_MICROCODE)			+= microcode/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 0000000..7d7aebe
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,54 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * Authors:
+ *    Fenghua Yu <fenghua.yu@intel.com>
+ *    Tony Luck <tony.luck@intel.com>
+ *    Vikas Shivappa <vikas.shivappa@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2016, volume 3, section 17.17.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+
+static inline bool get_rdt_resources(void)
+{
+	bool ret = false;
+
+	if (!boot_cpu_has(X86_FEATURE_RDT_A))
+		return false;
+	if (boot_cpu_has(X86_FEATURE_CAT_L3))
+		ret = true;
+
+	return ret;
+}
+
+static int __init intel_rdt_late_init(void)
+{
+	if (!get_rdt_resources())
+		return -ENODEV;
+
+	pr_info("Intel RDT cache allocation detected\n");
+	if (boot_cpu_has(X86_FEATURE_CDP_L3))
+		pr_info("Intel RDT code data prioritization detected\n");
+
+	return 0;
+}
+
+late_initcall(intel_rdt_late_init);

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/intel_rdt: Add Haswell feature discovery
  2016-10-22 13:19 ` [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery Fenghua Yu
@ 2016-10-26 21:27   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, linux-kernel, davidcc, eranian, h.peter.anvin,
	sai.praneeth.prakhya, hpa, tony.luck, peterz, bp, ravi.v.shankar,
	vikas.shivappa, nilayvaish, tglx, mingo, shli, mingo,
	dave.hansen

Commit-ID:  113c60970cf41723891e3a1b303517eaf8510bb5
Gitweb:     http://git.kernel.org/tip/113c60970cf41723891e3a1b303517eaf8510bb5
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:54 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:38 +0200

x86/intel_rdt: Add Haswell feature discovery

Some Haswell generation CPUs support RDT, but they don't enumerate this via
CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63
(INTEL_FAM6_HASWELL_X)

Move the relevant defines into a common header file which is shared between
RDT/CQM and RDT/Allocation to avoid duplication.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/events/intel/cqm.c             |  2 +-
 arch/x86/include/asm/intel_rdt.h        |  6 ++++
 arch/x86/include/asm/intel_rdt_common.h |  6 ++++
 arch/x86/kernel/cpu/intel_rdt.c         | 49 ++++++++++++++++++++++++++++++---
 4 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8f82b02..df86874 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
+#include <asm/intel_rdt_common.h>
 #include "../perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC	0x0c8f
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 0000000..3aca86d
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_H
+#define _ASM_X86_INTEL_RDT_H
+
+#define IA32_L3_CBM_BASE	0xc90
+
+#endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
new file mode 100644
index 0000000..e6e15cf
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_X86_INTEL_RDT_COMMON_H
+#define _ASM_X86_INTEL_RDT_COMMON_H
+
+#define MSR_IA32_PQR_ASSOC	0x0c8f
+
+#endif /* _ASM_X86_INTEL_RDT_COMMON_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 7d7aebe..f8e35cf 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -27,16 +27,57 @@
 #include <linux/slab.h>
 #include <linux/err.h>
 
+#include <asm/intel_rdt_common.h>
+#include <asm/intel-family.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as they do not have CPUID enumeration support for Cache allocation.
+ * The check for Vendor/Family/Model is not enough to guarantee that
+ * the MSRs won't #GP fault because only the following SKUs support
+ * CAT:
+ *	Intel(R) Xeon(R)  CPU E5-2658  v3  @  2.20GHz
+ *	Intel(R) Xeon(R)  CPU E5-2648L v3  @  1.80GHz
+ *	Intel(R) Xeon(R)  CPU E5-2628L v3  @  2.00GHz
+ *	Intel(R) Xeon(R)  CPU E5-2618L v3  @  2.30GHz
+ *	Intel(R) Xeon(R)  CPU E5-2608L v3  @  2.00GHz
+ *	Intel(R) Xeon(R)  CPU E5-2658A v3  @  2.20GHz
+ *
+ * Probe by trying to write the first of the L3 cach mask registers
+ * and checking that the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86 == 6 &&
+	    boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) {
+		u32 l, h, max_cbm = BIT_MASK(20) - 1;
+
+		if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
+			return false;
+		rdmsr(IA32_L3_CBM_BASE, l, h);
+
+		/* If all the bits were set in MSR, return success */
+		return l == max_cbm;
+	}
+
+	return false;
+}
+
 static inline bool get_rdt_resources(void)
 {
-	bool ret = false;
+	if (cache_alloc_hsw_probe())
+		return true;
 
 	if (!boot_cpu_has(X86_FEATURE_RDT_A))
 		return false;
-	if (boot_cpu_has(X86_FEATURE_CAT_L3))
-		ret = true;
+	if (!boot_cpu_has(X86_FEATURE_CAT_L3))
+		return false;
 
-	return ret;
+	return true;
 }
 
 static int __init intel_rdt_late_init(void)

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID
  2016-10-22 13:19 ` [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Fenghua Yu
@ 2016-10-26 21:28   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fenghua.yu, bp, peterz, vikas.shivappa, tglx, nilayvaish,
	sai.praneeth.prakhya, shli, linux-kernel, eranian, tony.luck,
	h.peter.anvin, mingo, mingo, ravi.v.shankar, dave.hansen, hpa,
	davidcc

Commit-ID:  c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b
Gitweb:     http://git.kernel.org/tip/c1c7c3f9d6bb6999a45f66ea4c6bfbcab87ff34b
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:55 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:39 +0200

x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID

Define struct rdt_resource to hold all the parameterized values for an RDT
resource and fill in the CPUID enumerated values from leaf 0x10 if
available. Hard code them for the MSR detected Haswells.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/intel_rdt.h |  68 ++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 111 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 3aca86d..9780409 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -2,5 +2,73 @@
 #define _ASM_X86_INTEL_RDT_H
 
 #define IA32_L3_CBM_BASE	0xc90
+#define IA32_L2_CBM_BASE	0xd10
 
+/**
+ * struct rdt_resource - attributes of an RDT resource
+ * @enabled:			Is this feature enabled on this machine
+ * @capable:			Is this feature available on this machine
+ * @name:			Name to use in "schemata" file
+ * @num_closid:			Number of CLOSIDs available
+ * @max_cbm:			Largest Cache Bit Mask allowed
+ * @min_cbm_bits:		Minimum number of consecutive bits to be set
+ *				in a cache bit mask
+ * @domains:			All domains for this resource
+ * @num_domains:		Number of domains active
+ * @msr_base:			Base MSR address for CBMs
+ * @tmp_cbms:			Scratch space when updating schemata
+ * @cache_level:		Which cache level defines scope of this domain
+ * @cbm_idx_multi:		Multiplier of CBM index
+ * @cbm_idx_offset:		Offset of CBM index. CBM index is computed by:
+ *				closid * cbm_idx_multi + cbm_idx_offset
+ */
+struct rdt_resource {
+	bool			enabled;
+	bool			capable;
+	char			*name;
+	int			num_closid;
+	int			cbm_len;
+	int			min_cbm_bits;
+	u32			max_cbm;
+	struct list_head	domains;
+	int			num_domains;
+	int			msr_base;
+	u32			*tmp_cbms;
+	int			cache_level;
+	int			cbm_idx_multi;
+	int			cbm_idx_offset;
+};
+
+extern struct rdt_resource rdt_resources_all[];
+
+enum {
+	RDT_RESOURCE_L3,
+	RDT_RESOURCE_L3DATA,
+	RDT_RESOURCE_L3CODE,
+	RDT_RESOURCE_L2,
+
+	/* Must be the last */
+	RDT_NUM_RESOURCES,
+};
+
+#define for_each_capable_rdt_resource(r)				      \
+	for (r = rdt_resources_all; r < rdt_resources_all + RDT_NUM_RESOURCES;\
+	     r++) 							      \
+		if (r->capable)
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
+union cpuid_0x10_1_eax {
+	struct {
+		unsigned int cbm_len:5;
+	} split;
+	unsigned int full;
+};
+
+/* CPUID.(EAX=10H, ECX=ResID=1).EDX */
+union cpuid_0x10_1_edx {
+	struct {
+		unsigned int cos_max:16;
+	} split;
+	unsigned int full;
+};
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f8e35cf..157dc8d0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -31,6 +31,47 @@
 #include <asm/intel-family.h>
 #include <asm/intel_rdt.h>
 
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+
+struct rdt_resource rdt_resources_all[] = {
+	{
+		.name		= "L3",
+		.domains	= domain_init(RDT_RESOURCE_L3),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset	= 0
+	},
+	{
+		.name		= "L3DATA",
+		.domains	= domain_init(RDT_RESOURCE_L3DATA),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 2,
+		.cbm_idx_offset	= 0
+	},
+	{
+		.name		= "L3CODE",
+		.domains	= domain_init(RDT_RESOURCE_L3CODE),
+		.msr_base	= IA32_L3_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 3,
+		.cbm_idx_multi	= 2,
+		.cbm_idx_offset	= 1
+	},
+	{
+		.name		= "L2",
+		.domains	= domain_init(RDT_RESOURCE_L2),
+		.msr_base	= IA32_L2_CBM_BASE,
+		.min_cbm_bits	= 1,
+		.cache_level	= 2,
+		.cbm_idx_multi	= 1,
+		.cbm_idx_offset	= 0
+	},
+};
+
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -54,6 +95,7 @@ static inline bool cache_alloc_hsw_probe(void)
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
 	    boot_cpu_data.x86 == 6 &&
 	    boot_cpu_data.x86_model == INTEL_FAM6_HASWELL_X) {
+		struct rdt_resource *r  = &rdt_resources_all[RDT_RESOURCE_L3];
 		u32 l, h, max_cbm = BIT_MASK(20) - 1;
 
 		if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
@@ -61,33 +103,88 @@ static inline bool cache_alloc_hsw_probe(void)
 		rdmsr(IA32_L3_CBM_BASE, l, h);
 
 		/* If all the bits were set in MSR, return success */
-		return l == max_cbm;
+		if (l != max_cbm)
+			return false;
+
+		r->num_closid = 4;
+		r->cbm_len = 20;
+		r->max_cbm = max_cbm;
+		r->min_cbm_bits = 2;
+		r->capable = true;
+		r->enabled = true;
+
+		return true;
 	}
 
 	return false;
 }
 
+static void rdt_get_config(int idx, struct rdt_resource *r)
+{
+	union cpuid_0x10_1_eax eax;
+	union cpuid_0x10_1_edx edx;
+	u32 ebx, ecx;
+
+	cpuid_count(0x00000010, idx, &eax.full, &ebx, &ecx, &edx.full);
+	r->num_closid = edx.split.cos_max + 1;
+	r->cbm_len = eax.split.cbm_len + 1;
+	r->max_cbm = BIT_MASK(eax.split.cbm_len + 1) - 1;
+	r->capable = true;
+	r->enabled = true;
+}
+
+static void rdt_get_cdp_l3_config(int type)
+{
+	struct rdt_resource *r_l3 = &rdt_resources_all[RDT_RESOURCE_L3];
+	struct rdt_resource *r = &rdt_resources_all[type];
+
+	r->num_closid = r_l3->num_closid / 2;
+	r->cbm_len = r_l3->cbm_len;
+	r->max_cbm = r_l3->max_cbm;
+	r->capable = true;
+	/*
+	 * By default, CDP is disabled. CDP can be enabled by mount parameter
+	 * "cdp" during resctrl file system mount time.
+	 */
+	r->enabled = false;
+}
+
 static inline bool get_rdt_resources(void)
 {
+	bool ret = false;
+
 	if (cache_alloc_hsw_probe())
 		return true;
 
 	if (!boot_cpu_has(X86_FEATURE_RDT_A))
 		return false;
-	if (!boot_cpu_has(X86_FEATURE_CAT_L3))
-		return false;
 
-	return true;
+	if (boot_cpu_has(X86_FEATURE_CAT_L3)) {
+		rdt_get_config(1, &rdt_resources_all[RDT_RESOURCE_L3]);
+		if (boot_cpu_has(X86_FEATURE_CDP_L3)) {
+			rdt_get_cdp_l3_config(RDT_RESOURCE_L3DATA);
+			rdt_get_cdp_l3_config(RDT_RESOURCE_L3CODE);
+		}
+		ret = true;
+	}
+	if (boot_cpu_has(X86_FEATURE_CAT_L2)) {
+		/* CPUID 0x10.2 fields are same format at 0x10.1 */
+		rdt_get_config(2, &rdt_resources_all[RDT_RESOURCE_L2]);
+		ret = true;
+	}
+
+	return ret;
 }
 
 static int __init intel_rdt_late_init(void)
 {
+	struct rdt_resource *r;
+
 	if (!get_rdt_resources())
 		return -ENODEV;
 
-	pr_info("Intel RDT cache allocation detected\n");
-	if (boot_cpu_has(X86_FEATURE_CDP_L3))
-		pr_info("Intel RDT code data prioritization detected\n");
+	for_each_capable_rdt_resource(r)
+		pr_info("Intel RDT %s allocation detected\n", r->name);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between CQM and CAT
  2016-10-22 13:19 ` [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT Fenghua Yu
@ 2016-10-26 21:29   ` tip-bot for Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: tip-bot for Fenghua Yu @ 2016-10-26 21:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, ravi.v.shankar, davidcc, nilayvaish, sai.praneeth.prakhya,
	tony.luck, peterz, mingo, h.peter.anvin, mingo, linux-kernel,
	dave.hansen, eranian, bp, vikas.shivappa, shli, fenghua.yu, hpa

Commit-ID:  6b281569df649ed76145c527028fbbe8a32493aa
Gitweb:     http://git.kernel.org/tip/6b281569df649ed76145c527028fbbe8a32493aa
Author:     Fenghua Yu <fenghua.yu@intel.com>
AuthorDate: Sat, 22 Oct 2016 06:19:56 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Wed, 26 Oct 2016 23:12:39 +0200

x86/cqm: Share PQR_ASSOC related data between CQM and CAT

PQR_ASSOC MSR contains the RMID used for preformance monitoring of cache
occupancy and memory bandwidth. The upper 32bit of this MSR contain the
CLOSID for cache allocation. So we need to share the information between
the two facilities.

Move the rdt data structure declaration into the shared header file and
make the per cpu data structure containing the MSR values global.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "David Carrillo-Cisneros" <davidcc@google.com>
Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: "Stephane Eranian" <eranian@google.com>
Cc: "Dave Hansen" <dave.hansen@intel.com>
Cc: "Shaohua Li" <shli@fb.com>
Cc: "Nilay Vaish" <nilayvaish@gmail.com>
Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
Cc: "Ingo Molnar" <mingo@elte.hu>
Cc: "Borislav Petkov" <bp@suse.de>
Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
Link: http://lkml.kernel.org/r/1477142405-32078-10-git-send-email-fenghua.yu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/events/intel/cqm.c             | 21 +--------------------
 arch/x86/include/asm/intel_rdt_common.h | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index df86874..0c45cc8 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
 unsigned int mbm_socket_max;
 
-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid:		The cached Resource Monitoring ID
- * @closid:		The cached Class Of Service ID
- * @rmid_usecnt:	The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
-	u32			rmid;
-	u32			closid;
-	int			rmid_usecnt;
-};
-
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
  * updated from a remote CPU. Both functions which modify the state
  * (intel_cqm_event_start and intel_cqm_event_stop) are called with
  * interrupts disabled, which is sufficient for the protection.
  */
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index e6e15cf..b31081b 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -3,4 +3,25 @@
 
 #define MSR_IA32_PQR_ASSOC	0x0c8f
 
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:		The cached Resource Monitoring ID
+ * @closid:		The cached Class Of Service ID
+ * @rmid_usecnt:	The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+	u32			rmid;
+	u32			closid;
+	int			rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
 #endif /* _ASM_X86_INTEL_RDT_COMMON_H */

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 00/18] Intel Cache Allocation Technology
  2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
                   ` (17 preceding siblings ...)
  2016-10-22 13:20 ` [PATCH v5 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
@ 2016-10-26 21:39 ` Thomas Gleixner
  2016-10-26 21:54   ` Fenghua Yu
  18 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-26 21:39 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Sat, 22 Oct 2016, Fenghua Yu wrote:
> This version should cover all comments from Thomas.

Emphasis on should :)

But this series is a major step forward and I decided to merge the first
lot:
 
> 0001-Documentation-ABI-Add-a-document-entry-for-cache-id.patch
> 0002-cacheinfo-Introduce-cache-id.patch
> 0003-x86-intel_cacheinfo-Enable-cache-id-in-x86.patch
> 0004-x86-intel_rdt-Feature-discovery.patch

> 0006-x86-intel_rdt-Add-CONFIG-Makefile-and-basic-initiali.patch
> 0007-x86-intel_rdt-Add-Haswell-feature-discovery.patch
> 0008-x86-intel_rdt-Pick-up-L3-L2-RDT-parameters-from-CPUID.patch
> 0009-x86-cqm-Move-PQR_ASSOC-management-code-into-generic-.patch

I fixed up #0004 (including Borislavs comments) and polished some of the
changelogs a bit.

This reduces the size of the series and prevents that these parts get
[un]intentionally fat fingered once again.

I intentionally left out the documentation patch as that one needs to be
updated when you address the max closid issues.

Please work against tip x86/cache from now on.

The remaining issues are not that big, so I think we are really close.

Thanks

       tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 00/18] Intel Cache Allocation Technology
  2016-10-26 21:39 ` [PATCH v5 00/18] Intel Cache Allocation Technology Thomas Gleixner
@ 2016-10-26 21:54   ` Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-26 21:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Borislav Petkov, Dave Hansen,
	Nilay Vaish, Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar,
	Sai Prakhya, Vikas Shivappa, linux-kernel, x86

On Wed, Oct 26, 2016 at 11:39:47PM +0200, Thomas Gleixner wrote:
> On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > This version should cover all comments from Thomas.
> 
> Emphasis on should :)
> 
> But this series is a major step forward and I decided to merge the first
> lot:
>  
> > 0001-Documentation-ABI-Add-a-document-entry-for-cache-id.patch
> > 0002-cacheinfo-Introduce-cache-id.patch
> > 0003-x86-intel_cacheinfo-Enable-cache-id-in-x86.patch
> > 0004-x86-intel_rdt-Feature-discovery.patch
> 
> > 0006-x86-intel_rdt-Add-CONFIG-Makefile-and-basic-initiali.patch
> > 0007-x86-intel_rdt-Add-Haswell-feature-discovery.patch
> > 0008-x86-intel_rdt-Pick-up-L3-L2-RDT-parameters-from-CPUID.patch
> > 0009-x86-cqm-Move-PQR_ASSOC-management-code-into-generic-.patch
> 
> I fixed up #0004 (including Borislavs comments) and polished some of the
> changelogs a bit.
> 
> This reduces the size of the series and prevents that these parts get
> [un]intentionally fat fingered once again.
> 
> I intentionally left out the documentation patch as that one needs to be
> updated when you address the max closid issues.
> 
> Please work against tip x86/cache from now on.
> 
> The remaining issues are not that big, so I think we are really close.
> 
> Thanks
> 
>        tglx

This is really great!

I'm working on releasing updated remaining patches based on the tip x86/cache.

Thank you, Thomas!

-Fenghua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-26 14:45   ` Thomas Gleixner
  2016-10-26 15:48     ` Luck, Tony
@ 2016-10-27 18:17     ` Fenghua Yu
  2016-10-27 18:25       ` Thomas Gleixner
  1 sibling, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-27 18:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Borislav Petkov, Dave Hansen,
	Nilay Vaish, Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar,
	Sai Prakhya, Vikas Shivappa, linux-kernel, x86

On Wed, Oct 26, 2016 at 04:45:50PM +0200, Thomas Gleixner wrote:
> On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > +/*
> > + * Forcibly remove all of subdirectories under root.
> > + */
> > +static void rmdir_all_sub(void)
> > +{
> > +	kernfs_remove(kn_info);
> 
> What clears kn_info?

Is the question "Why clears kn_info?"

kn_info is created during mount time and has different contents
under "info" directory with CDP enabled or disabled by mount parameter "cdp".

umount needs to remove kn_info so that it's ready to be created next time
during mount time. So user can do CAT mount, check "info", umount, and then
CDP mount, check "info". The user will see different contents in two "info"
checks.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-27 18:17     ` Fenghua Yu
@ 2016-10-27 18:25       ` Thomas Gleixner
  2016-10-27 18:35         ` Fenghua Yu
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-27 18:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86



On Thu, 27 Oct 2016, Fenghua Yu wrote:

> On Wed, Oct 26, 2016 at 04:45:50PM +0200, Thomas Gleixner wrote:
> > On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > > +/*
> > > + * Forcibly remove all of subdirectories under root.
> > > + */
> > > +static void rmdir_all_sub(void)
> > > +{
> > > +	kernfs_remove(kn_info);
> > 
> > What clears kn_info?
> 
> Is the question "Why clears kn_info?"
> 
> kn_info is created during mount time and has different contents
> under "info" directory with CDP enabled or disabled by mount parameter "cdp".
> 
> umount needs to remove kn_info so that it's ready to be created next time
> during mount time. So user can do CAT mount, check "info", umount, and then
> CDP mount, check "info". The user will see different contents in two "info"
> checks.

Lemme rephrase. What does: kn_info = NULL; ?

We should clear static variables for correctness sake. The current code has
no problem with that, but 5 month down the road something is going to trip
over the stale reference.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system
  2016-10-27 18:25       ` Thomas Gleixner
@ 2016-10-27 18:35         ` Fenghua Yu
  0 siblings, 0 replies; 49+ messages in thread
From: Fenghua Yu @ 2016-10-27 18:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Borislav Petkov, Dave Hansen,
	Nilay Vaish, Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar,
	Sai Prakhya, Vikas Shivappa, linux-kernel, x86

On Thu, Oct 27, 2016 at 08:25:57PM +0200, Thomas Gleixner wrote:
> 
> 
> On Thu, 27 Oct 2016, Fenghua Yu wrote:
> 
> > On Wed, Oct 26, 2016 at 04:45:50PM +0200, Thomas Gleixner wrote:
> > > On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > > > +/*
> > > > + * Forcibly remove all of subdirectories under root.
> > > > + */
> > > > +static void rmdir_all_sub(void)
> > > > +{
> > > > +	kernfs_remove(kn_info);
> > > 
> > > What clears kn_info?
> > 
> > Is the question "Why clears kn_info?"
> > 
> > kn_info is created during mount time and has different contents
> > under "info" directory with CDP enabled or disabled by mount parameter "cdp".
> > 
> > umount needs to remove kn_info so that it's ready to be created next time
> > during mount time. So user can do CAT mount, check "info", umount, and then
> > CDP mount, check "info". The user will see different contents in two "info"
> > checks.
> 
> Lemme rephrase. What does: kn_info = NULL; ?
> 
> We should clear static variables for correctness sake. The current code has
> no problem with that, but 5 month down the road something is going to trip
> over the stale reference.
> 
> Thanks,
> 
> 	tglx

Ok. I see. I'll add kn_info = NULL after kernfs_remove().

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system
  2016-10-26 15:01   ` Thomas Gleixner
@ 2016-10-28 17:51     ` Fenghua Yu
  2016-10-28 18:41       ` Thomas Gleixner
  0 siblings, 1 reply; 49+ messages in thread
From: Fenghua Yu @ 2016-10-28 17:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, H. Peter Anvin, Ingo Molnar, Tony Luck,
	Peter Zijlstra, Stephane Eranian, Borislav Petkov, Dave Hansen,
	Nilay Vaish, Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar,
	Sai Prakhya, Vikas Shivappa, linux-kernel, x86

On Wed, Oct 26, 2016 at 05:01:32PM +0200, Thomas Gleixner wrote:
> On Sat, 22 Oct 2016, Fenghua Yu wrote:
> > +/*
> > + * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
> > + * we can keep a bitmap of free CLOSIDs in a single integer.
> > + *
> > + * Using a global CLOSID across all resources has some advantages and
> > + * some drawbacks:
> > + * + We can simply set "current->closid" to assign a task to a resource
> > + *   group.
> > + * + Context switch code can avoid extra memory references deciding which
> > + *   CLOSID to load into the PQR_ASSOC MSR
> > + * - We give up some options in configuring resource groups across multi-socket
> > + *   systems.
> > + * - Our choices on how to configure each resource become progressively more
> > + *   limited as the number of resources grows.
> > + */
> > +static int closid_free_map;
> > +
> > +static void closid_init(void)
> > +{
> > +	struct rdt_resource *r;
> > +	int rdt_max_closid;
> > +
> > +	/* Compute rdt_max_closid across all resources */
> > +	rdt_max_closid = 0;
> > +	for_each_enabled_rdt_resource(r)
> > +		rdt_max_closid = max(rdt_max_closid, r->num_closid);
> 
> So you decided to silently ignore my objections against this approach. Fine
> with me, but that does not solve the problem at all.
> 
> Once more:
> 
> On a system with L2 and L3 CAT it does not make any sense at all to expose
> the closids which exceed the L2 space. Simply because using them wreckages
> any L2 partitioning done in the valid L2 space.
> 
> If you really want to allow that, then:
> 
>    1) It must be a opt-in at mount time
> 
>    2) It must be documented clearly along with the mount option
> 
> > +	/*
> > +	 * CDP is "special". Because we share the L3 CBM MSR array
> > +	 * between L3DATA and L3CODE, we must not use a CLOSID larger
> > +	 * than they support. Just check against L3DATA because it
> > +	 * is the same as L3CODE.
> > +	 */
> > +	r = &rdt_resources_all[RDT_RESOURCE_L3DATA];
> > +	if (r->enabled)
> > +		rdt_max_closid = min(rdt_max_closid, r->num_closid);
> 
> This explicit special casing is crap, really.
> 
> 	for_each_enabled_rdt_resource(r)
> 		rdt_max_closid = max(rdt_max_closid, r->num_closid);
> 
> 	for_each_enabled_rdt_resource(r) {
> 		if (!relaxed_max_closid || r->force_min_closid)
> 			rdt_max_closid = min(rdt_max_closid, r->num_closid);
> 	}
> 
> Handles all cases without 'CDP is special' and whatever nonsense intel will
> come up with in future. All you need to do is to add that force_min_closid
> field into the resource struct and set it for l3data and l3code.
> 
> relaxed_max_closid is set at mount time by an appropriate mount option.

Can we just do a simple implementation that finds the minimal closid? We
can implement the maximum closid and mount parameter later?

AFAIK, the minimal closid works in all current situations (L3, L3DATA,
L3CODE, and L2) and there is no platform that needs to use max closid yet.

This is the updated patch that only change is to use minimal closid in
closid_init(). Does it look good?

---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 237 +++++++++++++++++++++++++++++++
 2 files changed, 246 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 39ed561..a6c7d94 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -12,13 +12,20 @@
  * @kn:				kernfs node
  * @rdtgroup_list:		linked list for all rdtgroups
  * @closid:			closid for this rdtgroup
+ * @flags:			status bits
+ * @waitcount:			how many cpus expect to find this
  */
 struct rdtgroup {
 	struct kernfs_node	*kn;
 	struct list_head	rdtgroup_list;
 	int			closid;
+	int			flags;
+	atomic_t		waitcount;
 };
 
+/* rdtgroup.flags */
+#define	RDT_DELETED		1
+
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
@@ -154,4 +161,6 @@ union cpuid_0x10_1_edx {
 };
 
 void rdt_cbm_update(void *arg);
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+void rdtgroup_kn_unlock(struct kernfs_node *kn);
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index ebab170..296ee23 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -26,10 +26,12 @@
 #include <linux/seq_file.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/cpu.h>
 
 #include <uapi/linux/magic.h>
 
 #include <asm/intel_rdt.h>
+#include <asm/intel_rdt_common.h>
 
 DEFINE_STATIC_KEY_FALSE(rdt_enable_key);
 struct kernfs_root *rdt_root;
@@ -39,6 +41,61 @@ LIST_HEAD(rdt_all_groups);
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
+/*
+ * Trivial allocator for CLOSIDs. Since h/w only supports a small number,
+ * we can keep a bitmap of free CLOSIDs in a single integer.
+ *
+ * Using a global CLOSID across all resources has some advantages and
+ * some drawbacks:
+ * + We can simply set "current->closid" to assign a task to a resource
+ *   group.
+ * + Context switch code can avoid extra memory references deciding which
+ *   CLOSID to load into the PQR_ASSOC MSR
+ * - We give up some options in configuring resource groups across multi-socket
+ *   systems.
+ * - Our choices on how to configure each resource become progressively more
+ *   limited as the number of resources grows.
+ */
+static int closid_free_map;
+
+static void closid_init(void)
+{
+	struct rdt_resource *r;
+	int rdt_min_closid;
+
+	/* Compute rdt_min_closid across all resources */
+	rdt_min_closid = 0;
+	for_each_enabled_rdt_resource(r)
+		rdt_min_closid = min(rdt_min_closid, r->num_closid);
+
+	if (rdt_min_closid > 32) {
+		pr_warn("Only using 32 of %d CLOSIDs\n", rdt_min_closid);
+		rdt_min_closid = 32;
+	}
+
+	closid_free_map = BIT_MASK(rdt_min_closid) - 1;
+
+	/* CLOSID 0 is always reserved for the default group */
+	closid_free_map &= ~1;
+}
+
+int closid_alloc(void)
+{
+	int closid = ffs(closid_free_map);
+
+	if (closid == 0)
+		return -ENOSPC;
+	closid--;
+	closid_free_map &= ~(1 << closid);
+
+	return closid;
+}
+
+static void closid_free(int closid)
+{
+	closid_free_map |= 1 << closid;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -271,6 +328,54 @@ static int parse_rdtgroupfs_options(char *data)
 	return ret;
 }
 
+/*
+ * We don't allow rdtgroup directories to be created anywhere
+ * except the root directory. Thus when looking for the rdtgroup
+ * structure for a kernfs node we are either looking at a directory,
+ * in which case the rdtgroup structure is pointed at by the "priv"
+ * field, otherwise we have a file, and need only look to the parent
+ * to find the rdtgroup.
+ */
+static struct rdtgroup *kernfs_to_rdtgroup(struct kernfs_node *kn)
+{
+	if (kernfs_type(kn) == KERNFS_DIR)
+		return kn->priv;
+	else
+		return kn->parent->priv;
+}
+
+struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	atomic_inc(&rdtgrp->waitcount);
+	kernfs_break_active_protection(kn);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	/* Was this group deleted while we waited? */
+	if (rdtgrp->flags & RDT_DELETED)
+		return NULL;
+
+	return rdtgrp;
+}
+
+void rdtgroup_kn_unlock(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp = kernfs_to_rdtgroup(kn);
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
+	    (rdtgrp->flags & RDT_DELETED)) {
+		kernfs_unbreak_active_protection(kn);
+		kernfs_put(kn);
+		kfree(rdtgrp);
+	} else {
+		kernfs_unbreak_active_protection(kn);
+	}
+}
+
 static struct dentry *rdt_mount(struct file_system_type *fs_type,
 				int flags, const char *unused_dev_name,
 				void *data)
@@ -302,6 +407,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		}
 	}
 
+	closid_init();
+
 	ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
 	if (ret)
 		goto out;
@@ -358,10 +465,39 @@ static int reset_all_cbms(struct rdt_resource *r)
 }
 
 /*
+ * MSR_IA32_PQR_ASSOC is scoped per logical CPU, so all updates
+ * are always in thread context.
+ */
+static void rdt_reset_pqr_assoc_closid(void *v)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+
+	state->closid = 0;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+}
+
+/*
  * Forcibly remove all of subdirectories under root.
  */
 static void rmdir_all_sub(void)
 {
+	struct rdtgroup *rdtgrp, *tmp;
+
+	get_cpu();
+	/* Reset PQR_ASSOC MSR on this cpu. */
+	rdt_reset_pqr_assoc_closid(NULL);
+	/* Reset PQR_ASSOC MSR on the rest of cpus. */
+	smp_call_function_many(cpu_online_mask, rdt_reset_pqr_assoc_closid,
+			       NULL, 1);
+	put_cpu();
+	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
+		/* Remove each rdtgroup other than root */
+		if (rdtgrp == &rdtgroup_default)
+			continue;
+		kernfs_remove(rdtgrp->kn);
+		list_del(&rdtgrp->rdtgroup_list);
+		kfree(rdtgrp);
+	}
 	kernfs_remove(kn_info);
 }
 
@@ -394,7 +530,108 @@ static struct file_system_type rdt_fs_type = {
 	.kill_sb = rdt_kill_sb,
 };
 
+static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+			  umode_t mode)
+{
+	struct rdtgroup *parent, *rdtgrp;
+	struct kernfs_node *kn;
+	int ret, closid;
+
+	/* Only allow mkdir in the root directory */
+	if (parent_kn != rdtgroup_default.kn)
+		return -EPERM;
+
+	/* Do not accept '\n' to avoid unparsable situation. */
+	if (strchr(name, '\n'))
+		return -EINVAL;
+
+	parent = rdtgroup_kn_lock_live(parent_kn);
+	if (!parent) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	ret = closid_alloc();
+	if (ret < 0)
+		goto out_unlock;
+	closid = ret;
+
+	/* allocate the rdtgroup. */
+	rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
+	if (!rdtgrp) {
+		ret = -ENOSPC;
+		goto out_closid_free;
+	}
+	rdtgrp->closid = closid;
+	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
+
+	/* kernfs creates the directory for rdtgrp */
+	kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_cancel_ref;
+	}
+	rdtgrp->kn = kn;
+
+	/*
+	 * kernfs_remove() will drop the reference count on "kn" which
+	 * will free it. But we still need it to stick around for the
+	 * rdtgroup_kn_unlock(kn} call below. Take one extra reference
+	 * here, which will be dropped inside rdtgroup_kn_unlock().
+	 */
+	kernfs_get(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	ret = 0;
+	goto out_unlock;
+
+out_destroy:
+	kernfs_remove(rdtgrp->kn);
+out_cancel_ref:
+	list_del(&rdtgrp->rdtgroup_list);
+	kfree(rdtgrp);
+out_closid_free:
+	closid_free(closid);
+out_unlock:
+	rdtgroup_kn_unlock(parent_kn);
+	return ret;
+}
+
+static int rdtgroup_rmdir(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(kn);
+		return -ENOENT;
+	}
+
+	rdtgrp->flags = RDT_DELETED;
+	closid_free(rdtgrp->closid);
+	list_del(&rdtgrp->rdtgroup_list);
+
+	/*
+	 * one extra hold on this, will drop when we kfree(rdtgrp)
+	 * in rdtgroup_kn_unlock()
+	 */
+	kernfs_get(kn);
+	kernfs_remove(rdtgrp->kn);
+
+	rdtgroup_kn_unlock(kn);
+
+	return ret;
+}
+
 static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+	.mkdir	= rdtgroup_mkdir,
+	.rmdir	= rdtgroup_rmdir,
 };
 
 static int __init rdtgroup_setup_root(void)
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v5 13/18] x86/intel_rdt: Add mkdir to resctrl file system
  2016-10-28 17:51     ` Fenghua Yu
@ 2016-10-28 18:41       ` Thomas Gleixner
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Gleixner @ 2016-10-28 18:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: H. Peter Anvin, Ingo Molnar, Tony Luck, Peter Zijlstra,
	Stephane Eranian, Borislav Petkov, Dave Hansen, Nilay Vaish,
	Shaohua Li, David Carrillo-Cisneros, Ravi V Shankar, Sai Prakhya,
	Vikas Shivappa, linux-kernel, x86

On Fri, 28 Oct 2016, Fenghua Yu wrote:
> On Wed, Oct 26, 2016 at 05:01:32PM +0200, Thomas Gleixner wrote:
> > Handles all cases without 'CDP is special' and whatever nonsense intel will
> > come up with in future. All you need to do is to add that force_min_closid
> > field into the resource struct and set it for l3data and l3code.
> > 
> > relaxed_max_closid is set at mount time by an appropriate mount option.
> 
> Can we just do a simple implementation that finds the minimal closid? We
> can implement the maximum closid and mount parameter later?

Sure. I'm a great fan of: Keep It Simple
 
> AFAIK, the minimal closid works in all current situations (L3, L3DATA,
> L3CODE, and L2) and there is no platform that needs to use max closid yet.

Good.
 
> This is the updated patch that only change is to use minimal closid in
> closid_init(). Does it look good?
> 
> +static void closid_init(void)
> +{
> +	struct rdt_resource *r;
> +	int rdt_min_closid;
> +
> +	/* Compute rdt_min_closid across all resources */
> +	rdt_min_closid = 0;
> +	for_each_enabled_rdt_resource(r)
> +		rdt_min_closid = min(rdt_min_closid, r->num_closid);

The result will be 0 :) You want to init rdt_min_closid to 32 or INT_MAX;

Other than that it's perfect !

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-10-28 18:45 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-22 13:19 [PATCH v5 00/18] Intel Cache Allocation Technology Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 01/18] Documentation, ABI: Add a document entry for cache id Fenghua Yu
2016-10-26 21:25   ` [tip:x86/cache] Documentation, ABI: Document the new sysfs files for cpu cache ids tip-bot for Tony Luck
2016-10-22 13:19 ` [PATCH v5 02/18] cacheinfo: Introduce cache id Fenghua Yu
2016-10-26 21:25   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 03/18] x86/intel_cacheinfo: Enable cache id in cache info Fenghua Yu
2016-10-26 21:26   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 04/18] x86/intel_rdt: Feature discovery Fenghua Yu
2016-10-26 14:15   ` Borislav Petkov
2016-10-26 14:28     ` Thomas Gleixner
2016-10-26 21:26   ` [tip:x86/cache] x86/cpufeature: Add RDT CPUID feature bits tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 05/18] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 06/18] x86/intel_rdt: Add CONFIG, Makefile, and basic initialization Fenghua Yu
2016-10-26 20:43   ` Thomas Gleixner
2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 07/18] x86/intel_rdt: Add Haswell feature discovery Fenghua Yu
2016-10-26 21:27   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 08/18] x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID Fenghua Yu
2016-10-26 21:28   ` [tip:x86/cache] " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 09/18] x86/cqm: Move PQR_ASSOC management code into generic code used by both CQM and CAT Fenghua Yu
2016-10-26 21:29   ` [tip:x86/cache] x86/cqm: Share PQR_ASSOC related data between " tip-bot for Fenghua Yu
2016-10-22 13:19 ` [PATCH v5 10/18] x86/intel_rdt: Build structures for each resource based on cache topology Fenghua Yu
2016-10-26 13:02   ` Thomas Gleixner
2016-10-26 16:06     ` Luck, Tony
2016-10-26 17:31       ` Thomas Gleixner
2016-10-26 21:14     ` Fenghua Yu
2016-10-26 21:18       ` Thomas Gleixner
2016-10-22 13:19 ` [PATCH v5 11/18] x86/intel_rdt: Add basic resctrl filesystem support Fenghua Yu
2016-10-26 13:52   ` Thomas Gleixner
2016-10-22 13:19 ` [PATCH v5 12/18] x86/intel_rdt: Add "info" files to resctrl file system Fenghua Yu
2016-10-26 14:45   ` Thomas Gleixner
2016-10-26 15:48     ` Luck, Tony
2016-10-26 17:33       ` Thomas Gleixner
2016-10-27 18:17     ` Fenghua Yu
2016-10-27 18:25       ` Thomas Gleixner
2016-10-27 18:35         ` Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 13/18] x86/intel_rdt: Add mkdir " Fenghua Yu
2016-10-26 15:01   ` Thomas Gleixner
2016-10-28 17:51     ` Fenghua Yu
2016-10-28 18:41       ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 14/18] x86/intel_rdt: Add cpus file Fenghua Yu
2016-10-26 17:57   ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 15/18] x86/intel_rdt: Add tasks files Fenghua Yu
2016-10-26 15:27   ` Thomas Gleixner
2016-10-22 13:20 ` [PATCH v5 16/18] x86/intel_rdt: Add schemata file Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 17/18] x86/intel_rdt: Add scheduler hook Fenghua Yu
2016-10-22 13:20 ` [PATCH v5 18/18] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
2016-10-26 21:39 ` [PATCH v5 00/18] Intel Cache Allocation Technology Thomas Gleixner
2016-10-26 21:54   ` Fenghua Yu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.