linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology
@ 2016-07-13  1:02 Fenghua Yu
  2016-07-13  1:02 ` [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
                   ` (31 more replies)
  0 siblings, 32 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

L3 cache allocation allows per task control over which areas of the last
level cache are available for allocation. It is the first resource that
can be controlled as part of Intel Resource Director Technology (RDT).
This patch series creates a framework that will make it easy to add
additional resources (like L2 cache).

See Intel Software Developer manual volume 3, chapter 17 for architectural
details. Also Documentation/x86/intel_rdt.txt and
Documentation/x86/intel_rdt_ui.txt (in parts 0001 & 0013 of this patch
series).

A previous implementation used "cgroups" as the user interface. This was
rejected.

The new interface:
1) Aligns better with the h/w capabilities provided
2) Gives finer control (per thread instead of process)
3) Gives control over kernel threads as well as user threads
4) Allows resource allocation policies to be tied to certain cpus across
all contexts (tglx request)

Note that parts 1-12 are largely unchanged from what was posted last year
except for the removal of cgroup pieces and dynamic CAT/CDP switch.

Fenghua Yu (20):
  Documentation, x86: Documentation for Intel resource allocation user
    interface
  x86/cpufeatures: Get max closid and max cbm len and clean feature
    comments and code
  cacheinfo: Introduce cache id
  Documentation, ABI: Add a document entry for cache id
  x86, intel_cacheinfo: Enable cache id in x86
  drivers/base/cacheinfo.c: Export some cacheinfo functions for others
    to use
  sched.h: Add rg_list and rdtgroup in task_struct
  magic number for rscctrl file system
  x86/intel_rdt.h: Header for inter_rdt.c
  x86/intel_rdt_rdtgroup.h: Header for user interface
  x86/intel_rdt.c: Extend RDT to per cache and per resources
  Task fork and exit for rdtgroup
  x86/intel_rdt_rdtgroup.c: User interface for RDT
  x86/intel_rdt_rdtgroup.c: Create info directory
  x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands
  x86/intel_rdt_rdtgroup.c: Read and write cpus
  x86/intel_rdt_rdtgroup.c: Tasks iterator and write
  x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  MAINTAINERS: Add maintainer for Intel RDT resource allocation
  x86/Makefile: Build intel_rdt_rdtgroup.c

Vikas Shivappa (12):
  x86/intel_rdt: Cache Allocation documentation
  x86/intel_rdt: Add support for Cache Allocation detection
  x86/intel_rdt: Add Class of service management
  x86/intel_rdt: Add L3 cache capacity bitmask management
  x86/intel_rdt: Implement scheduling support for Intel RDT
  x86/intel_rdt: Hot cpu support for Cache Allocation
  x86/intel_rdt: Intel haswell Cache Allocation enumeration
  Define CONFIG_INTEL_RDT
  x86/intel_rdt: Intel Code Data Prioritization detection
  x86/intel_rdt: Adds support to enable Code Data Prioritization
  x86/intel_rdt: Class of service and capacity bitmask management for
    CDP
  x86/intel_rdt: Hot cpu update for code data prioritization

 Documentation/ABI/testing/sysfs-devices-system-cpu |   17 +
 Documentation/x86/intel_rdt.txt                    |  109 +
 Documentation/x86/intel_rdt_ui.txt                 |  268 +++
 MAINTAINERS                                        |    8 +
 arch/x86/Kconfig                                   |   12 +
 arch/x86/events/intel/cqm.c                        |   26 +-
 arch/x86/include/asm/cpufeature.h                  |    2 +
 arch/x86/include/asm/cpufeatures.h                 |   13 +-
 arch/x86/include/asm/intel_rdt.h                   |  139 ++
 arch/x86/include/asm/intel_rdt_rdtgroup.h          |  229 ++
 arch/x86/include/asm/pqr_common.h                  |   27 +
 arch/x86/include/asm/processor.h                   |    3 +
 arch/x86/kernel/cpu/Makefile                       |    2 +
 arch/x86/kernel/cpu/common.c                       |   19 +
 arch/x86/kernel/cpu/intel_cacheinfo.c              |   20 +
 arch/x86/kernel/cpu/intel_rdt.c                    |  838 ++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c           | 2230 ++++++++++++++++++++
 arch/x86/kernel/process_64.c                       |    6 +
 drivers/base/cacheinfo.c                           |    7 +-
 include/linux/cacheinfo.h                          |    5 +
 include/linux/sched.h                              |    4 +
 include/uapi/linux/magic.h                         |    2 +
 kernel/exit.c                                      |    2 +
 kernel/fork.c                                      |    4 +
 24 files changed, 3965 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/x86/intel_rdt.txt
 create mode 100644 Documentation/x86/intel_rdt_ui.txt
 create mode 100644 arch/x86/include/asm/intel_rdt.h
 create mode 100644 arch/x86/include/asm/intel_rdt_rdtgroup.h
 create mode 100644 arch/x86/include/asm/pqr_common.h
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c

-- 
2.5.0

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Adds a description of Cache allocation technology, overview of kernel
framework implementation. The framework has APIs to manage class of
service, capacity bitmask(CBM), scheduling support and other
architecture specific implementation. The APIs are used to build the
rscctrl interface in later patches.

Cache allocation is a sub-feature of Resource Director Technology (RDT)
or Platform Shared resource control which provides support to control
Platform shared resources like L3 cache.

Cache Allocation Technology provides a way for the Software (OS/VMM) to
restrict cache allocation to a defined 'subset' of cache which may be
overlapping with other 'subsets'. This feature is used when allocating a
line in cache ie when pulling new data into the cache. The tasks are
grouped into CLOS (class of service). OS uses MSR writes to indicate the
CLOSid of the thread when scheduling in and to indicate the cache
capacity associated with the CLOSid. Currently cache allocation is
supported for L3 cache.

More information can be found in the Intel SDM June 2015, Volume 3,
section 17.16.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/x86/intel_rdt.txt | 109 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 109 insertions(+)
 create mode 100644 Documentation/x86/intel_rdt.txt

diff --git a/Documentation/x86/intel_rdt.txt b/Documentation/x86/intel_rdt.txt
new file mode 100644
index 0000000..05ec819
--- /dev/null
+++ b/Documentation/x86/intel_rdt.txt
@@ -0,0 +1,109 @@
+        Intel RDT
+        ---------
+
+Copyright (C) 2014 Intel Corporation
+Written by vikas.shivappa@linux.intel.com
+
+CONTENTS:
+=========
+
+1. Cache Allocation Technology
+  1.1 What is RDT and Cache allocation ?
+  1.2 Why is Cache allocation needed ?
+  1.3 Cache allocation implementation overview
+  1.4 Assignment of CBM and CLOS
+  1.5 Scheduling and Context Switch
+
+1. Cache Allocation Technology
+===================================
+
+1.1 What is RDT and Cache allocation
+------------------------------------
+
+Cache allocation is a sub-feature of Resource Director Technology (RDT)
+Allocation or Platform Shared resource control which provides support to
+control Platform shared resources like L3 cache. Currently L3 Cache is
+the only resource that is supported in RDT. More information can be
+found in the Intel SDM June 2015, Volume 3, section 17.16.
+
+Cache Allocation Technology provides a way for the Software (OS/VMM) to
+restrict cache allocation to a defined 'subset' of cache which may be
+overlapping with other 'subsets'. This feature is used when allocating a
+line in cache ie when pulling new data into the cache. The programming
+of the h/w is done via programming MSRs.
+
+The different cache subsets are identified by CLOS identifier (class of
+service) and each CLOS has a CBM (cache bit mask). The CBM is a
+contiguous set of bits which defines the amount of cache resource that
+is available for each 'subset'.
+
+1.2 Why is Cache allocation needed
+----------------------------------
+
+In todays new processors the number of cores is continuously increasing
+especially in large scale usage models where VMs are used like
+webservers and datacenters. The number of cores increase the number of
+threads or workloads that can simultaneously be run. When
+multi-threaded-applications, VMs, workloads run concurrently they
+compete for shared resources including L3 cache.
+
+The architecture also allows dynamically changing these subsets during
+runtime to further optimize the performance of the higher priority
+application with minimal degradation to the low priority app.
+Additionally, resources can be rebalanced for system throughput benefit.
+
+This technique may be useful in managing large computer server systems
+with large L3 cache, in the cloud and container context. Examples may be
+large servers running instances of webservers or database servers. In
+such complex systems, these subsets can be used for more careful placing
+of the available cache resources by a centralized root accessible
+interface.
+
+A specific use case may be to solve the noisy neighbour issue when a app
+which is constantly copying data like streaming app is using large
+amount of cache which could have otherwise been used by a high priority
+computing application. Using the cache allocation feature, the streaming
+application can be confined to use a smaller cache and the high priority
+application be awarded a larger amount of cache space.
+
+1.3 Cache allocation implementation Overview
+--------------------------------------------
+
+Kernel has a new field in the task_struct called 'closid' which
+represents the Class of service ID of the task.
+
+There is a 1:1 CLOSid <-> CBM (capacity bit mask) mapping. A CLOS (Class
+of service) is represented by a CLOSid. Each closid would have one CBM
+and would just represent one cache 'subset'.  The tasks would get to
+fill the L3 cache represented by the capacity bit mask or CBM.
+
+The APIs to manage the closid and CBM can be used to develop user
+interfaces.
+
+1.4 Assignment of CBM, CLOS
+---------------------------
+
+The framework provides APIs to manage the closid and CBM which can be
+used to develop user/kernel mode interfaces.
+
+1.5 Scheduling and Context Switch
+---------------------------------
+
+During context switch kernel implements this by writing the CLOSid of
+the task to the CPU's IA32_PQR_ASSOC MSR. The MSR is only written when
+there is a change in the CLOSid for the CPU in order to minimize the
+latency incurred during context switch.
+
+The following considerations are done for the PQR MSR write so that it
+has minimal impact on scheduling hot path:
+ - This path doesn't exist on any non-intel platforms.
+ - On Intel platforms, this would not exist by default unless INTEL_RDT
+ is enabled.
+ - remains a no-op when INTEL_RDT is enabled and intel hardware does
+ not support the feature.
+ - When feature is available, does not do MSR write till the user
+ starts using the feature *and* assigns a new cache capacity mask.
+ - per cpu PQR values are cached and the MSR write is only done when
+ there is a task with different PQR is scheduled on the CPU. Typically
+ if the task groups are bound to be scheduled on a set of CPUs, the
+ number of MSR writes is greatly reduced.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
  2016-07-13  1:02 ` [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-26 19:00   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 03/32] x86/intel_rdt: Add Class of service management Fenghua Yu
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

This patch includes CPUID enumeration routines for Cache allocation and
new values to track resources to the cpuinfo_x86 structure.

Cache allocation provides a way for the Software (OS/VMM) to restrict
cache allocation to a defined 'subset' of cache which may be overlapping
with other 'subsets'. This feature is used when allocating a line in
cache ie when pulling new data into the cache. The programming of the
hardware is done via programming MSRs (model specific registers).

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 10 +++++++---
 arch/x86/include/asm/processor.h   |  3 +++
 arch/x86/kernel/cpu/Makefile       |  2 ++
 arch/x86/kernel/cpu/common.c       | 15 ++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c    | 40 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 67 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 4a41348..667acf3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -12,7 +12,7 @@
 /*
  * Defines x86 CPU feature bits
  */
-#define NCAPINTS	18	/* N 32-bit words worth of info */
+#define NCAPINTS	19	/* N 32-bit words worth of info */
 #define NBUGINTS	1	/* N 32-bit bug flags */
 
 /*
@@ -220,6 +220,7 @@
 #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
 #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
+#define X86_FEATURE_RDT		( 9*32+15) /* Resource Allocation */
 #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_AVX512DQ	( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
@@ -284,8 +285,11 @@
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */
-#define X86_FEATURE_SUCCOR	(17*32+1) /* Uncorrectable error containment and recovery */
-#define X86_FEATURE_SMCA	(17*32+3) /* Scalable MCA */
+#define X86_FEATURE_SUCCOR	(17*32+ 1) /* Uncorrectable error containment and recovery */
+#define X86_FEATURE_SMCA	(17*32+ 3) /* Scalable MCA */
+
+/* Intel-defined CPU features, CPUID level 0x00000010:0 (ebx), word 18 */
+#define X86_FEATURE_CAT_L3      (18*32+ 1) /* Cache Allocation L3 */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 62c6cc3..598c9bc 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -119,6 +119,9 @@ struct cpuinfo_x86 {
 	int			x86_cache_occ_scale;	/* scale to bytes */
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
+	/* Cache Allocation values: */
+	u16			x86_cache_max_cbm_len;
+	u16			x86_cache_max_closid;
 	/* cpuid returned max cores value: */
 	u16			 x86_max_cores;
 	u16			apicid;
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 4a8697f..39b8e6f 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -42,6 +42,8 @@ obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr-watchdog.o
 
 obj-$(CONFIG_HYPERVISOR_GUEST)		+= vmware.o hypervisor.o mshyperv.o
 
+obj-$(CONFIG_INTEL_RDT)			+= intel_rdt.o
+
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0fe6953..42c90cb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -711,6 +711,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		}
 	}
 
+	/* Additional Intel-defined flags: level 0x00000010 */
+	if (c->cpuid_level >= 0x00000010) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[14] = ebx;
+
+		if (cpu_has(c, X86_FEATURE_CAT_L3)) {
+
+			cpuid_count(0x00000010, 1, &eax, &ebx, &ecx, &edx);
+			c->x86_cache_max_closid = edx + 1;
+			c->x86_cache_max_cbm_len = eax + 1;
+		}
+	}
+
 	/* AMD-defined flags: level 0x80000001 */
 	eax = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = eax;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 0000000..f49e970
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,40 @@
+/*
+ * Resource Director Technology(RDT)
+ * - Cache Allocation code.
+ *
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * 2015-05-25 Written by
+ *    Vikas Shivappa <vikas.shivappa@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual June 2015, volume 3, section 17.15.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+
+static int __init intel_rdt_late_init(void)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+	if (!cpu_has(c, X86_FEATURE_CAT_L3))
+		return -ENODEV;
+
+	pr_info("Intel cache allocation detected\n");
+
+	return 0;
+}
+
+late_initcall(intel_rdt_late_init);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 03/32] x86/intel_rdt: Add Class of service management
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
  2016-07-13  1:02 ` [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
  2016-07-13  1:02 ` [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Adds some data-structures and APIs to support Class of service
management(closid). There is a new clos_cbm table which keeps a 1:1
mapping between closid and capacity bit mask (cbm)
and a count of usage of closid. Each task would be associated with a
Closid at a time and this patch adds a new field closid to task_struct
to keep track of the same.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h | 12 ++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 82 +++++++++++++++++++++++++++++++++++++++-
 include/linux/sched.h            |  3 ++
 3 files changed, 95 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_rdt.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 0000000..88b7643
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,12 @@
+#ifndef _RDT_H_
+#define _RDT_H_
+
+#ifdef CONFIG_INTEL_RDT
+
+struct clos_cbm_table {
+	unsigned long l3_cbm;
+	unsigned int clos_refcnt;
+};
+
+#endif
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index f49e970..d79213a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,17 +24,95 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * cctable maintains 1:1 mapping between CLOSid and cache bitmask.
+ */
+static struct clos_cbm_table *cctable;
+/*
+ * closid availability bit map.
+ */
+unsigned long *closmap;
+static DEFINE_MUTEX(rdt_group_mutex);
+
+static inline void closid_get(u32 closid)
+{
+	struct clos_cbm_table *cct = &cctable[closid];
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	cct->clos_refcnt++;
+}
+
+static int closid_alloc(u32 *closid)
+{
+	u32 maxid;
+	u32 id;
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	maxid = boot_cpu_data.x86_cache_max_closid;
+	id = find_first_zero_bit(closmap, maxid);
+	if (id == maxid)
+		return -ENOSPC;
+
+	set_bit(id, closmap);
+	closid_get(id);
+	*closid = id;
+
+	return 0;
+}
+
+static inline void closid_free(u32 closid)
+{
+	clear_bit(closid, closmap);
+	cctable[closid].l3_cbm = 0;
+}
+
+static void closid_put(u32 closid)
+{
+	struct clos_cbm_table *cct = &cctable[closid];
+
+	lockdep_assert_held(&rdt_group_mutex);
+	if (WARN_ON(!cct->clos_refcnt))
+		return;
+
+	if (!--cct->clos_refcnt)
+		closid_free(closid);
+}
 
 static int __init intel_rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 maxid, max_cbm_len;
+	int err = 0, size;
 
 	if (!cpu_has(c, X86_FEATURE_CAT_L3))
 		return -ENODEV;
 
-	pr_info("Intel cache allocation detected\n");
+	maxid = c->x86_cache_max_closid;
+	max_cbm_len = c->x86_cache_max_cbm_len;
 
-	return 0;
+	size = maxid * sizeof(struct clos_cbm_table);
+	cctable = kzalloc(size, GFP_KERNEL);
+	if (!cctable) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	size = BITS_TO_LONGS(maxid) * sizeof(long);
+	closmap = kzalloc(size, GFP_KERNEL);
+	if (!closmap) {
+		kfree(cctable);
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	pr_info("Intel cache allocation enabled\n");
+out_err:
+
+	return err;
 }
 
 late_initcall(intel_rdt_late_init);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..6005bba 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1761,6 +1761,9 @@ struct task_struct {
 	/* cg_list protected by css_set_lock and tsk->alloc_lock */
 	struct list_head cg_list;
 #endif
+#ifdef CONFIG_INTEL_RDT
+	u32 closid;
+#endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (2 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 03/32] x86/intel_rdt: Add Class of service management Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-22  7:12   ` Marcelo Tosatti
  2016-07-13  1:02 ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
                   ` (27 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

This patch adds different APIs to manage the L3 cache capacity bitmask.
The capacity bit mask(CBM) needs to have only contiguous bits set. The
current implementation has a global CBM for each class of service id.
There are APIs added to update the CBM via MSR write to IA32_L3_MASK_n
on all packages. Other APIs are to read and write entries to the
clos_cbm_table.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |   4 ++
 arch/x86/kernel/cpu/intel_rdt.c  | 133 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 136 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 88b7643..4f45dc8 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,6 +3,10 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#define MAX_CBM_LENGTH			32
+#define IA32_L3_CBM_BASE		0xc90
+#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
+
 struct clos_cbm_table {
 	unsigned long l3_cbm;
 	unsigned int clos_refcnt;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index d79213a..6ad5b48 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -34,8 +34,22 @@ static struct clos_cbm_table *cctable;
  * closid availability bit map.
  */
 unsigned long *closmap;
+/*
+ * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
+ */
+static cpumask_t rdt_cpumask;
+/*
+ * Temporary cpumask used during hot cpu notificaiton handling. The usage
+ * is serialized by hot cpu locks.
+ */
+static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
 
+struct rdt_remote_data {
+	int msr;
+	u64 val;
+};
+
 static inline void closid_get(u32 closid)
 {
 	struct clos_cbm_table *cct = &cctable[closid];
@@ -82,11 +96,126 @@ static void closid_put(u32 closid)
 		closid_free(closid);
 }
 
+static bool cbm_validate(unsigned long var)
+{
+	u32 max_cbm_len = boot_cpu_data.x86_cache_max_cbm_len;
+	unsigned long first_bit, zero_bit;
+	u64 max_cbm;
+
+	if (bitmap_weight(&var, max_cbm_len) < 1)
+		return false;
+
+	max_cbm = (1ULL << max_cbm_len) - 1;
+	if (var & ~max_cbm)
+		return false;
+
+	first_bit = find_first_bit(&var, max_cbm_len);
+	zero_bit = find_next_zero_bit(&var, max_cbm_len, first_bit);
+
+	if (find_next_bit(&var, max_cbm_len, zero_bit) < max_cbm_len)
+		return false;
+
+	return true;
+}
+
+static int clos_cbm_table_read(u32 closid, unsigned long *l3_cbm)
+{
+	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	if (closid >= maxid)
+		return -EINVAL;
+
+	*l3_cbm = cctable[closid].l3_cbm;
+
+	return 0;
+}
+
+/*
+ * clos_cbm_table_update() - Update a clos cbm table entry.
+ * @closid: the closid whose cbm needs to be updated
+ * @cbm: the new cbm value that has to be updated
+ *
+ * This assumes the cbm is validated as per the interface requirements
+ * and the cache allocation requirements(through the cbm_validate).
+ */
+static int clos_cbm_table_update(u32 closid, unsigned long cbm)
+{
+	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	if (closid >= maxid)
+		return -EINVAL;
+
+	cctable[closid].l3_cbm = cbm;
+
+	return 0;
+}
+
+static bool cbm_search(unsigned long cbm, u32 *closid)
+{
+	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+	u32 i;
+
+	for (i = 0; i < maxid; i++) {
+		if (cctable[i].clos_refcnt &&
+		    bitmap_equal(&cbm, &cctable[i].l3_cbm, MAX_CBM_LENGTH)) {
+			*closid = i;
+			return true;
+		}
+	}
+
+	return false;
+}
+
+static void closcbm_map_dump(void)
+{
+	u32 i;
+
+	pr_debug("CBMMAP\n");
+	for (i = 0; i < boot_cpu_data.x86_cache_max_closid; i++) {
+		pr_debug("l3_cbm: 0x%x,clos_refcnt: %u\n",
+		 (unsigned int)cctable[i].l3_cbm, cctable[i].clos_refcnt);
+	}
+}
+
+static void msr_cpu_update(void *arg)
+{
+	struct rdt_remote_data *info = arg;
+
+	wrmsrl(info->msr, info->val);
+}
+
+/*
+ * msr_update_all() - Update the msr for all packages.
+ */
+static inline void msr_update_all(int msr, u64 val)
+{
+	struct rdt_remote_data info;
+
+	info.msr = msr;
+	info.val = val;
+	on_each_cpu_mask(&rdt_cpumask, msr_cpu_update, &info, 1);
+}
+
+static inline bool rdt_cpumask_update(int cpu)
+{
+	cpumask_and(&tmp_cpumask, &rdt_cpumask, topology_core_cpumask(cpu));
+	if (cpumask_empty(&tmp_cpumask)) {
+		cpumask_set_cpu(cpu, &rdt_cpumask);
+		return true;
+	}
+
+	return false;
+}
+
 static int __init intel_rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	u32 maxid, max_cbm_len;
-	int err = 0, size;
+	int err = 0, size, i;
 
 	if (!cpu_has(c, X86_FEATURE_CAT_L3))
 		return -ENODEV;
@@ -109,6 +238,8 @@ static int __init intel_rdt_late_init(void)
 		goto out_err;
 	}
 
+	for_each_online_cpu(i)
+		rdt_cpumask_update(i);
 	pr_info("Intel cache allocation enabled\n");
 out_err:
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (3 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-25 16:25   ` Nilay Vaish
  2016-07-25 16:31   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
                   ` (26 subsequent siblings)
  31 siblings, 2 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For
Cache Allocation, MSR write would let the task fill in the cache
'subset' represented by the task's capacity bit mask.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the task belongs to the CPU's IA32_PQR_ASSOC MSR.

This patch also implements a common software cache for IA32_PQR_MSR
(RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and
Cache allocation. CMT updates the RMID where as cache_alloc updates the
CLOSid in the software cache. During scheduling when the new RMID/CLOSid
value is different from the cached values, IA32_PQR_MSR is updated.
Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250
 cycles) this software cache is necessary to avoid reading the MSR to
compare the current CLOSid value.

The following considerations are done for the PQR MSR write so that it
minimally impacts scheduler hot path:
 - This path does not exist on any non-intel platforms.
 - On Intel platforms, this would not exist by default unless INTEL_RDT
 is enabled.
 - remains a no-op when INTEL_RDT is enabled and intel SKU does not
 support the feature.
 - When feature is available and enabled, never does MSR write till the
 user manually starts using one of the capacity bit masks.
 - MSR write is only done when there is a task with different Closid is
 scheduled on the CPU. Typically if the task groups are bound to be
 scheduled on a set of CPUs, the number of MSR writes is greatly
 reduced.
 - A per CPU cache of CLOSids is maintained to do the check so that we
 don't have to do a rdmsr which actually costs a lot of cycles.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/events/intel/cqm.c       | 26 +++-----------------------
 arch/x86/include/asm/intel_rdt.h  | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/pqr_common.h | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c   | 25 +++++++++++++++++++++++++
 arch/x86/kernel/process_64.c      |  6 ++++++
 5 files changed, 89 insertions(+), 23 deletions(-)
 create mode 100644 arch/x86/include/asm/pqr_common.h

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 7b5fd81..ac76e43 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
+#include <asm/pqr_common.h>
 #include "../perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC	0x0c8f
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
 unsigned int mbm_socket_max;
 
-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid:		The cached Resource Monitoring ID
- * @closid:		The cached Class Of Service ID
- * @rmid_usecnt:	The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
-	u32			rmid;
-	u32			closid;
-	int			rmid_usecnt;
-};
-
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
  * updated from a remote CPU. Both functions which modify the state
  * (intel_cqm_event_start and intel_cqm_event_stop) are called with
  * interrupts disabled, which is sufficient for the protection.
  */
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
@@ -476,9 +457,9 @@ static void cqm_mask_call(struct rmid_read *rr)
  */
 static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
 {
-	struct perf_event *event;
 	struct list_head *head = &group->hw.cqm_group_entry;
 	u32 old_rmid = group->hw.cqm_rmid;
+	struct perf_event *event;
 
 	lockdep_assert_held(&cache_mutex);
 
@@ -1583,7 +1564,6 @@ static void intel_cqm_cpu_starting(unsigned int cpu)
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 	state->rmid = 0;
-	state->closid = 0;
 	state->rmid_usecnt = 0;
 
 	WARN_ON(c->x86_cache_max_rmid != cqm_max_rmid);
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 4f45dc8..afb6da3 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,14 +3,42 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#include <linux/jump_label.h>
+
 #define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
 #define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
 
+extern struct static_key rdt_enable_key;
+void __intel_rdt_sched_in(void *dummy);
+
 struct clos_cbm_table {
 	unsigned long l3_cbm;
 	unsigned int clos_refcnt;
 };
 
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ * which supports L3 cache allocation.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ * when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+	/*
+	 * Call the schedule in code only when RDT is enabled.
+	 */
+	if (static_key_false(&rdt_enable_key))
+		__intel_rdt_sched_in(NULL);
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
 #endif
 #endif
diff --git a/arch/x86/include/asm/pqr_common.h b/arch/x86/include/asm/pqr_common.h
new file mode 100644
index 0000000..11e985c
--- /dev/null
+++ b/arch/x86/include/asm/pqr_common.h
@@ -0,0 +1,27 @@
+#ifndef _X86_RDT_H_
+#define _X86_RDT_H_
+
+#define MSR_IA32_PQR_ASSOC	0x0c8f
+
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:		The cached Resource Monitoring ID
+ * @closid:		The cached Class Of Service ID
+ * @rmid_usecnt:	The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+	u32			rmid;
+	u32			closid;
+	int			rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6ad5b48..8379df8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,6 +24,8 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/sched.h>
+#include <asm/pqr_common.h>
 #include <asm/intel_rdt.h>
 
 /*
@@ -44,12 +46,33 @@ static cpumask_t rdt_cpumask;
  */
 static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
+struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
 
 struct rdt_remote_data {
 	int msr;
 	u64 val;
 };
 
+void __intel_rdt_sched_in(void *dummy)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	u32 closid = current->closid;
+
+	if (closid == state->closid)
+		return;
+
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
+	state->closid = closid;
+}
+
+/*
+ * Synchronize the IA32_PQR_ASSOC MSR of all currently running tasks.
+ */
+static inline void closid_tasks_sync(void)
+{
+	on_each_cpu_mask(cpu_online_mask, __intel_rdt_sched_in, NULL, 1);
+}
+
 static inline void closid_get(u32 closid)
 {
 	struct clos_cbm_table *cct = &cctable[closid];
@@ -240,6 +263,8 @@ static int __init intel_rdt_late_init(void)
 
 	for_each_online_cpu(i)
 		rdt_cpumask_update(i);
+
+	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Intel cache allocation enabled\n");
 out_err:
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6e789ca..71fc41a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -48,6 +48,7 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
+#include <asm/intel_rdt.h>
 #include <asm/xen/hypervisor.h>
 
 asmlinkage extern void ret_from_fork(void);
@@ -472,6 +473,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 			loadsegment(ss, __KERNEL_DS);
 	}
 
+	/*
+	 * Load the Intel cache allocation PQR MSR.
+	 */
+	intel_rdt_sched_in();
+
 	return prev_p;
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (4 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  9:19   ` Thomas Gleixner
  2016-07-14  0:40   ` David Carrillo-Cisneros
  2016-07-13  1:02 ` [PATCH 07/32] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
                   ` (25 subsequent siblings)
  31 siblings, 2 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

This patch adds hot plug cpu support for Intel Cache allocation. Support
includes updating the cache bitmask MSRs IA32_L3_QOS_n when a new CPU
package comes online or goes offline. The IA32_L3_QOS_n MSRs are one per
Class of service on each CPU package. The new package's MSRs are
synchronized with the values of existing MSRs. Also the software cache
for IA32_PQR_ASSOC MSRs are reset during hot cpu notifications.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 76 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 8379df8..31f8588 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,6 +24,7 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/cpu.h>
 #include <linux/sched.h>
 #include <asm/pqr_common.h>
 #include <asm/intel_rdt.h>
@@ -234,6 +235,75 @@ static inline bool rdt_cpumask_update(int cpu)
 	return false;
 }
 
+/*
+ * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
+ * which are one per CLOSid on the current package.
+ */
+static void cbm_update_msrs(void *dummy)
+{
+	int maxid = boot_cpu_data.x86_cache_max_closid;
+	struct rdt_remote_data info;
+	unsigned int i;
+
+	for (i = 0; i < maxid; i++) {
+		if (cctable[i].clos_refcnt) {
+			info.msr = CBM_FROM_INDEX(i);
+			info.val = cctable[i].l3_cbm;
+			msr_cpu_update(&info);
+		}
+	}
+}
+
+static inline void intel_rdt_cpu_start(int cpu)
+{
+	struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
+
+	state->closid = 0;
+	mutex_lock(&rdt_group_mutex);
+	if (rdt_cpumask_update(cpu))
+		smp_call_function_single(cpu, cbm_update_msrs, NULL, 1);
+	mutex_unlock(&rdt_group_mutex);
+}
+
+static void intel_rdt_cpu_exit(unsigned int cpu)
+{
+	int i;
+
+	mutex_lock(&rdt_group_mutex);
+	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
+		mutex_unlock(&rdt_group_mutex);
+		return;
+	}
+
+	cpumask_and(&tmp_cpumask, topology_core_cpumask(cpu), cpu_online_mask);
+	cpumask_clear_cpu(cpu, &tmp_cpumask);
+	i = cpumask_any(&tmp_cpumask);
+
+	if (i < nr_cpu_ids)
+		cpumask_set_cpu(i, &rdt_cpumask);
+	mutex_unlock(&rdt_group_mutex);
+}
+
+static int intel_rdt_cpu_notifier(struct notifier_block *nb,
+				  unsigned long action, void *hcpu)
+{
+	unsigned int cpu  = (unsigned long)hcpu;
+
+	switch (action) {
+	case CPU_DOWN_FAILED:
+	case CPU_ONLINE:
+		intel_rdt_cpu_start(cpu);
+		break;
+	case CPU_DOWN_PREPARE:
+		intel_rdt_cpu_exit(cpu);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
 static int __init intel_rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
@@ -261,9 +331,15 @@ static int __init intel_rdt_late_init(void)
 		goto out_err;
 	}
 
+	cpu_notifier_register_begin();
+
 	for_each_online_cpu(i)
 		rdt_cpumask_update(i);
 
+	__hotcpu_notifier(intel_rdt_cpu_notifier, 0);
+
+	cpu_notifier_register_done();
+
 	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Intel cache allocation enabled\n");
 out_err:
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 07/32] x86/intel_rdt: Intel haswell Cache Allocation enumeration
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (5 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 08/32] Define CONFIG_INTEL_RDT Fenghua Yu
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

This patch is specific to Intel haswell (hsw) server SKUs. Cache
Allocation on hsw server needs to be enumerated separately as HSW does
not have support for CPUID enumeration for Cache Allocation. This patch
does a probe by writing a CLOSid (Class of service id) into high 32 bits
of IA32_PQR_MSR and see if the bits stick. The probe is only done after
confirming that the CPU is HSW server. Other hardcoded values are:

 - L3 cache bit mask must be at least two bits.
 - Maximum CLOSids supported is always 4.
 - Maximum bits support in cache bit mask is always 20.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 59 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 31f8588..ecaf8e6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -38,6 +38,10 @@ static struct clos_cbm_table *cctable;
  */
 unsigned long *closmap;
 /*
+ * Minimum bits required in Cache bitmask.
+ */
+static unsigned int min_bitmask_len = 1;
+/*
  * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
  */
 static cpumask_t rdt_cpumask;
@@ -54,6 +58,57 @@ struct rdt_remote_data {
 	u64 val;
 };
 
+/*
+ * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
+ * as it does not have CPUID enumeration support for Cache allocation.
+ *
+ * Probes by writing to the high 32 bits(CLOSid) of the IA32_PQR_MSR and
+ * testing if the bits stick. Max CLOSids is always 4 and max cbm length
+ * is always 20 on hsw server parts. The minimum cache bitmask length
+ * allowed for HSW server is always 2 bits. Hardcode all of them.
+ */
+static inline bool cache_alloc_hsw_probe(void)
+{
+	u32 l, h_old, h_new, h_tmp;
+
+	if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
+		return false;
+
+	/*
+	 * Default value is always 0 if feature is present.
+	 */
+	h_tmp = h_old ^ 0x1U;
+	if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp) ||
+	    rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_new))
+		return false;
+
+	if (h_tmp != h_new)
+		return false;
+
+	wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old);
+
+	boot_cpu_data.x86_cache_max_closid = 4;
+	boot_cpu_data.x86_cache_max_cbm_len = 20;
+	min_bitmask_len = 2;
+
+	return true;
+}
+
+static inline bool cache_alloc_supported(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_CAT_L3))
+		return true;
+
+	/*
+	 * Probe for Haswell server CPUs.
+	 */
+	if (c->x86 == 0x6 && c->x86_model == 0x3f)
+		return cache_alloc_hsw_probe();
+
+	return false;
+}
+
+
 void __intel_rdt_sched_in(void *dummy)
 {
 	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
@@ -126,7 +181,7 @@ static bool cbm_validate(unsigned long var)
 	unsigned long first_bit, zero_bit;
 	u64 max_cbm;
 
-	if (bitmap_weight(&var, max_cbm_len) < 1)
+	if (bitmap_weight(&var, max_cbm_len) < min_bitmask_len)
 		return false;
 
 	max_cbm = (1ULL << max_cbm_len) - 1;
@@ -310,7 +365,7 @@ static int __init intel_rdt_late_init(void)
 	u32 maxid, max_cbm_len;
 	int err = 0, size, i;
 
-	if (!cpu_has(c, X86_FEATURE_CAT_L3))
+	if (!cache_alloc_supported(c))
 		return -ENODEV;
 
 	maxid = c->x86_cache_max_closid;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 08/32] Define CONFIG_INTEL_RDT
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (6 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 07/32] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13 10:25   ` Thomas Gleixner
  2016-07-13  1:02 ` [PATCH 09/32] x86/intel_rdt: Intel Code Data Prioritization detection Fenghua Yu
                   ` (23 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

CONFIG_INTEL_RDT is defined. The option provides support for resource
allocation which is a sub-feature of Intel Resource Director Technology
(RDT).

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/Kconfig                 | 12 ++++++++++++
 arch/x86/include/asm/intel_rdt.h | 10 +++++++++-
 arch/x86/kernel/cpu/intel_rdt.c  | 12 +++++++-----
 include/linux/sched.h            |  3 ---
 4 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d9a94da..231d44c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -408,6 +408,18 @@ config GOLDFISH
        def_bool y
        depends on X86_GOLDFISH
 
+config INTEL_RDT
+	bool "Intel Resource Director Technology support"
+	depends on X86_64 && CPU_SUP_INTEL
+	help
+	  This option provides support for resource allocation which is a
+	  sub-feature of Intel Resource Director Technology(RDT).
+	  Current implementation supports L3 cache allocation.
+	  Using this feature a user can specify the amount of L3 cache space
+	  into which an application can fill.
+
+	  Say N if unsure.
+
 if X86_32
 config X86_EXTENDED_PLATFORM
 	bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index afb6da3..6e20314 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -24,8 +24,16 @@ struct clos_cbm_table {
  * on scheduler hot path:
  * - This will stay as no-op unless we are running on an Intel SKU
  * which supports L3 cache allocation.
+ * - When support is present and enabled, does not do any
+ * IA32_PQR_MSR writes until the user starts really using the feature
+ * ie creates a rdtgroup directory and assigns a cache_mask thats
+ * different from the root rdtgroup's cache_mask.
  * - Caches the per cpu CLOSid values and does the MSR write only
- * when a task with a different CLOSid is scheduled in.
+ * when a task with a different CLOSid is scheduled in. That
+ * means the task belongs to a different rdtgroup.
+ * - Closids are allocated so that different rdtgroup directories
+ * with same cache_mask gets the same CLOSid. This minimizes CLOSids
+ * used and reduces MSR write frequency.
  */
 static inline void intel_rdt_sched_in(void)
 {
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index ecaf8e6..525390a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -108,17 +108,19 @@ static inline bool cache_alloc_supported(struct cpuinfo_x86 *c)
 	return false;
 }
 
-
 void __intel_rdt_sched_in(void *dummy)
 {
 	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-	u32 closid = current->closid;
 
-	if (closid == state->closid)
+	/*
+	 * Currently closid is always 0. When  user interface is added,
+	 * closid will come from user interface.
+	 */
+	if (state->closid == 0)
 		return;
 
-	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
-	state->closid = closid;
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
+	state->closid = 0;
 }
 
 /*
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6005bba..253538f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1761,9 +1761,6 @@ struct task_struct {
 	/* cg_list protected by css_set_lock and tsk->alloc_lock */
 	struct list_head cg_list;
 #endif
-#ifdef CONFIG_INTEL_RDT
-	u32 closid;
-#endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 09/32] x86/intel_rdt: Intel Code Data Prioritization detection
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (7 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 08/32] Define CONFIG_INTEL_RDT Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization Fenghua Yu
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

This patch adds enumeration support for Code Data Prioritization(CDP)
feature found in future Intel Xeon processors. It includes CPUID
enumeration routines for CDP.

CDP is an extension to Cache Allocation and lets threads allocate subset
of L3 cache for code and data separately. The allocation is represented
by the code or data cache capacity bit mask(cbm) MSRs
IA32_L3_QOS_MASK_n. Each Class of service would be associated with one
dcache_cbm and one icache_cbm MSR and hence the number of available
CLOSids is halved with CDP. The association for a CLOSid 'n' is shown
below :

data_cbm_address (n) = base + (n <<1)
code_cbm_address (n) = base + (n <<1) +1.
During scheduling the kernel writes the CLOSid
of the thread to IA32_PQR_ASSOC_MSR.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 5 ++++-
 arch/x86/kernel/cpu/common.c       | 3 ++-
 arch/x86/kernel/cpu/intel_rdt.c    | 2 ++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 667acf3..16489b3 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -12,7 +12,7 @@
 /*
  * Defines x86 CPU feature bits
  */
-#define NCAPINTS	19	/* N 32-bit words worth of info */
+#define NCAPINTS	20	/* N 32-bit words worth of info */
 #define NBUGINTS	1	/* N 32-bit bug flags */
 
 /*
@@ -291,6 +291,9 @@
 /* Intel-defined CPU features, CPUID level 0x00000010:0 (ebx), word 18 */
 #define X86_FEATURE_CAT_L3      (18*32+ 1) /* Cache Allocation L3 */
 
+/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x00000010:1 (ecx), word 19 */
+#define X86_FEATURE_CDP_L3	(19*32+ 2) /* Code data prioritization L3 */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 42c90cb..a695e58 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -716,13 +716,14 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		u32 eax, ebx, ecx, edx;
 
 		cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[14] = ebx;
+		c->x86_capability[17] = ebx;
 
 		if (cpu_has(c, X86_FEATURE_CAT_L3)) {
 
 			cpuid_count(0x00000010, 1, &eax, &ebx, &ecx, &edx);
 			c->x86_cache_max_closid = edx + 1;
 			c->x86_cache_max_cbm_len = eax + 1;
+			c->x86_capability[18] = ecx;
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 525390a..5fa0f9d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -399,6 +399,8 @@ static int __init intel_rdt_late_init(void)
 
 	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Intel cache allocation enabled\n");
+	if (cpu_has(c, X86_FEATURE_CDP_L3))
+		pr_info("Intel code data prioritization detected\n");
 out_err:
 
 	return err;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (8 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 09/32] x86/intel_rdt: Intel Code Data Prioritization detection Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-26 19:23   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 11/32] x86/intel_rdt: Class of service and capacity bitmask management for CDP Fenghua Yu
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

On Intel SKUs that support Code Data Prioritization(CDP), intel_rdt
operates in 2 modes - legacy cache allocation mode/default or CDP mode.

When CDP is enabled, the number of available CLOSids is halved. Hence the
enabling is done when less than half the number of CLOSids available are
used. When CDP is enabled each CLOSid maps to a
data cache mask and an instruction cache mask. The enabling itself is done
by writing to the IA32_PQOS_CFG MSR and can dynamically be enabled or
disabled.

CDP is disabled when for each (dcache_cbm,icache_cbm) pair, the
dcache_cbm = icache_cbm.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |  7 +++++
 arch/x86/kernel/cpu/intel_rdt.c  | 66 ++++++++++++++++++++++++++--------------
 2 files changed, 51 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 6e20314..f2cb91d 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -8,6 +8,7 @@
 #define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
 #define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
+#define MSR_IA32_PQOS_CFG		0xc81
 
 extern struct static_key rdt_enable_key;
 void __intel_rdt_sched_in(void *dummy);
@@ -17,6 +18,12 @@ struct clos_cbm_table {
 	unsigned int clos_refcnt;
 };
 
+struct clos_config {
+	unsigned long *closmap;
+	u32 max_closid;
+	u32 closids_used;
+};
+
 /*
  * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
  *
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 5fa0f9d..14b9011 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -34,10 +34,6 @@
  */
 static struct clos_cbm_table *cctable;
 /*
- * closid availability bit map.
- */
-unsigned long *closmap;
-/*
  * Minimum bits required in Cache bitmask.
  */
 static unsigned int min_bitmask_len = 1;
@@ -52,6 +48,11 @@ static cpumask_t rdt_cpumask;
 static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
+static struct clos_config cconfig;
+static bool cdp_enabled;
+
+#define __DCBM_TABLE_INDEX(x)	(x << 1)
+#define __ICBM_TABLE_INDEX(x)	((x << 1) + 1)
 
 struct rdt_remote_data {
 	int msr;
@@ -147,22 +148,28 @@ static int closid_alloc(u32 *closid)
 
 	lockdep_assert_held(&rdt_group_mutex);
 
-	maxid = boot_cpu_data.x86_cache_max_closid;
-	id = find_first_zero_bit(closmap, maxid);
+	maxid = cconfig.max_closid;
+	id = find_first_zero_bit(cconfig.closmap, maxid);
 	if (id == maxid)
 		return -ENOSPC;
 
-	set_bit(id, closmap);
+	set_bit(id, cconfig.closmap);
 	closid_get(id);
 	*closid = id;
+	cconfig.closids_used++;
 
 	return 0;
 }
 
 static inline void closid_free(u32 closid)
 {
-	clear_bit(closid, closmap);
+	clear_bit(closid, cconfig.closmap);
 	cctable[closid].l3_cbm = 0;
+
+	if (WARN_ON(!cconfig.closids_used))
+		return;
+
+	cconfig.closids_used--;
 }
 
 static void closid_put(u32 closid)
@@ -199,45 +206,45 @@ static bool cbm_validate(unsigned long var)
 	return true;
 }
 
-static int clos_cbm_table_read(u32 closid, unsigned long *l3_cbm)
+static int clos_cbm_table_read(u32 index, unsigned long *l3_cbm)
 {
-	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+	u32 orig_maxid = boot_cpu_data.x86_cache_max_closid;
 
 	lockdep_assert_held(&rdt_group_mutex);
 
-	if (closid >= maxid)
+	if (index >= orig_maxid)
 		return -EINVAL;
 
-	*l3_cbm = cctable[closid].l3_cbm;
+	*l3_cbm = cctable[index].l3_cbm;
 
 	return 0;
 }
 
 /*
  * clos_cbm_table_update() - Update a clos cbm table entry.
- * @closid: the closid whose cbm needs to be updated
+ * @index: index of the table entry whose cbm needs to be updated
  * @cbm: the new cbm value that has to be updated
  *
  * This assumes the cbm is validated as per the interface requirements
  * and the cache allocation requirements(through the cbm_validate).
  */
-static int clos_cbm_table_update(u32 closid, unsigned long cbm)
+static int clos_cbm_table_update(u32 index, unsigned long cbm)
 {
-	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+	u32 orig_maxid = boot_cpu_data.x86_cache_max_closid;
 
 	lockdep_assert_held(&rdt_group_mutex);
 
-	if (closid >= maxid)
+	if (index >= orig_maxid)
 		return -EINVAL;
 
-	cctable[closid].l3_cbm = cbm;
+	cctable[index].l3_cbm = cbm;
 
 	return 0;
 }
 
 static bool cbm_search(unsigned long cbm, u32 *closid)
 {
-	u32 maxid = boot_cpu_data.x86_cache_max_closid;
+	u32 maxid = cconfig.max_closid;
 	u32 i;
 
 	for (i = 0; i < maxid; i++) {
@@ -281,6 +288,21 @@ static inline void msr_update_all(int msr, u64 val)
 	on_each_cpu_mask(&rdt_cpumask, msr_cpu_update, &info, 1);
 }
 
+static bool code_data_mask_equal(void)
+{
+	int i, dindex, iindex;
+
+	for (i = 0; i < cconfig.max_closid; i++) {
+		dindex = __DCBM_TABLE_INDEX(i);
+		iindex = __ICBM_TABLE_INDEX(i);
+		if (cctable[dindex].clos_refcnt &&
+		     (cctable[dindex].l3_cbm != cctable[iindex].l3_cbm))
+			return false;
+	}
+
+	return true;
+}
+
 static inline bool rdt_cpumask_update(int cpu)
 {
 	cpumask_and(&tmp_cpumask, &rdt_cpumask, topology_core_cpumask(cpu));
@@ -298,7 +320,7 @@ static inline bool rdt_cpumask_update(int cpu)
  */
 static void cbm_update_msrs(void *dummy)
 {
-	int maxid = boot_cpu_data.x86_cache_max_closid;
+	int maxid = cconfig.max_closid;
 	struct rdt_remote_data info;
 	unsigned int i;
 
@@ -306,7 +328,7 @@ static void cbm_update_msrs(void *dummy)
 		if (cctable[i].clos_refcnt) {
 			info.msr = CBM_FROM_INDEX(i);
 			info.val = cctable[i].l3_cbm;
-			msr_cpu_update(&info);
+			msr_cpu_update((void *) &info);
 		}
 	}
 }
@@ -381,8 +403,8 @@ static int __init intel_rdt_late_init(void)
 	}
 
 	size = BITS_TO_LONGS(maxid) * sizeof(long);
-	closmap = kzalloc(size, GFP_KERNEL);
-	if (!closmap) {
+	cconfig.closmap = kzalloc(size, GFP_KERNEL);
+	if (!cconfig.closmap) {
 		kfree(cctable);
 		err = -ENOMEM;
 		goto out_err;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/32] x86/intel_rdt: Class of service and capacity bitmask management for CDP
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (9 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 12/32] x86/intel_rdt: Hot cpu update for code data prioritization Fenghua Yu
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Add support to manage CLOSid(CLass Of Service id) and capacity
bitmask(cbm) for code data prioritization(CDP).

Closid management includes changes to allocating, freeing closid and
closid_get and closid_put and changes to closid availability map during
CDP set up. CDP has a separate cbm for code and data.

Each closid is mapped to a (dcache_cbm, icache_cbm) pair when cdp mode
is enabled.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 14b9011..7a03671 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -30,7 +30,13 @@
 #include <asm/intel_rdt.h>
 
 /*
- * cctable maintains 1:1 mapping between CLOSid and cache bitmask.
+ * During cache alloc mode cctable maintains 1:1 mapping between
+ * CLOSid and l3_cbm.
+ *
+ * During CDP mode, the cctable maintains a 1:2 mapping between the closid
+ * and (dcache_cbm, icache_cbm) pair.
+ * index of a dcache_cbm for CLOSid 'n' = n << 1.
+ * index of a icache_cbm for CLOSid 'n' = n << 1 + 1
  */
 static struct clos_cbm_table *cctable;
 /*
@@ -53,6 +59,13 @@ static bool cdp_enabled;
 
 #define __DCBM_TABLE_INDEX(x)	(x << 1)
 #define __ICBM_TABLE_INDEX(x)	((x << 1) + 1)
+#define __DCBM_MSR_INDEX(x)			\
+	CBM_FROM_INDEX(__DCBM_TABLE_INDEX(x))
+#define __ICBM_MSR_INDEX(x)			\
+	CBM_FROM_INDEX(__ICBM_TABLE_INDEX(x))
+
+#define DCBM_TABLE_INDEX(x)	(x << cdp_enabled)
+#define ICBM_TABLE_INDEX(x)	((x << cdp_enabled) + cdp_enabled)
 
 struct rdt_remote_data {
 	int msr;
@@ -132,9 +145,12 @@ static inline void closid_tasks_sync(void)
 	on_each_cpu_mask(cpu_online_mask, __intel_rdt_sched_in, NULL, 1);
 }
 
+/*
+ * When cdp mode is enabled, refcnt is maintained in the dcache_cbm entry.
+ */
 static inline void closid_get(u32 closid)
 {
-	struct clos_cbm_table *cct = &cctable[closid];
+	struct clos_cbm_table *cct = &cctable[DCBM_TABLE_INDEX(closid)];
 
 	lockdep_assert_held(&rdt_group_mutex);
 
@@ -164,7 +180,7 @@ static int closid_alloc(u32 *closid)
 static inline void closid_free(u32 closid)
 {
 	clear_bit(closid, cconfig.closmap);
-	cctable[closid].l3_cbm = 0;
+	cctable[DCBM_TABLE_INDEX(closid)].l3_cbm = 0;
 
 	if (WARN_ON(!cconfig.closids_used))
 		return;
@@ -174,7 +190,7 @@ static inline void closid_free(u32 closid)
 
 static void closid_put(u32 closid)
 {
-	struct clos_cbm_table *cct = &cctable[closid];
+	struct clos_cbm_table *cct = &cctable[DCBM_TABLE_INDEX(closid)];
 
 	lockdep_assert_held(&rdt_group_mutex);
 	if (WARN_ON(!cct->clos_refcnt))
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/32] x86/intel_rdt: Hot cpu update for code data prioritization
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (10 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 11/32] x86/intel_rdt: Class of service and capacity bitmask management for CDP Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Updates hot cpu notification handling for code data prioritization(cdp).
The capacity bitmask(cbm) is global for both data and instruction and we
need to update the new online package with all the cbms by writing to
the IA32_L3_QOS_n MSRs.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 27 +++++++++++++++++++++------
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 7a03671..057aef1 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -330,6 +330,26 @@ static inline bool rdt_cpumask_update(int cpu)
 	return false;
 }
 
+static void cbm_update_msr(u32 index)
+{
+	struct rdt_remote_data info;
+	int dindex;
+
+	dindex = DCBM_TABLE_INDEX(index);
+	if (cctable[dindex].clos_refcnt) {
+
+		info.msr = CBM_FROM_INDEX(dindex);
+		info.val = cctable[dindex].l3_cbm;
+		msr_cpu_update((void *) &info);
+
+		if (cdp_enabled) {
+			info.msr = __ICBM_MSR_INDEX(index);
+			info.val = cctable[dindex + 1].l3_cbm;
+			msr_cpu_update((void *) &info);
+		}
+	}
+}
+
 /*
  * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
  * which are one per CLOSid on the current package.
@@ -337,15 +357,10 @@ static inline bool rdt_cpumask_update(int cpu)
 static void cbm_update_msrs(void *dummy)
 {
 	int maxid = cconfig.max_closid;
-	struct rdt_remote_data info;
 	unsigned int i;
 
 	for (i = 0; i < maxid; i++) {
-		if (cctable[i].clos_refcnt) {
-			info.msr = CBM_FROM_INDEX(i);
-			info.val = cctable[i].l3_cbm;
-			msr_cpu_update((void *) &info);
-		}
+		cbm_update_msr(i);
 	}
 }
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (11 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 12/32] x86/intel_rdt: Hot cpu update for code data prioritization Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13 12:47   ` Thomas Gleixner
                     ` (2 more replies)
  2016-07-13  1:02 ` [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code Fenghua Yu
                   ` (18 subsequent siblings)
  31 siblings, 3 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

The documentation describes user interface of how to allocate resource
in Intel RDT.

Please note that the documentation covers generic user interface. Current
patch set code only implemente CAT L3. CAT L2 code will be sent later.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 268 +++++++++++++++++++++++++++++++++++++
 1 file changed, 268 insertions(+)
 create mode 100644 Documentation/x86/intel_rdt_ui.txt

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
new file mode 100644
index 0000000..c52baf5
--- /dev/null
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -0,0 +1,268 @@
+User Interface for Resource Allocation in Intel Resource Director Technology
+
+Copyright (C) 2016 Intel Corporation
+
+Fenghua Yu <fenghua.yu@intel.com>
+
+We create a new file system rscctrl in /sys/fs as user interface for Cache
+Allocation Technology (CAT) and future resource allocations in Intel
+Resource Director Technology (RDT). User can allocate cache or other
+resources to tasks or cpus through this interface.
+
+CONTENTS
+========
+
+	1. Terms
+	2. Mount rscctrl file system
+	3. Hierarchy in rscctrl
+	4. Create and remove sub-directory
+	5. Add/remove a task in a partition
+	6. Add/remove a CPU in a partition
+	7. Some usage examples
+
+
+1. Terms
+========
+
+We use the following terms and concepts in this documentation.
+
+RDT: Intel Resoure Director Technology
+
+CAT: Cache Allocation Technology
+
+CDP: Code and Data Prioritization
+
+CBM: Cache Bit Mask
+
+Cache ID: A cache identification. It is unique in one cache index on the
+platform. User can find cache ID in cache sysfs interface:
+/sys/devices/system/cpu/cpu*/cache/index*/id
+
+Share resource domain: A few different resources can share same QoS mask
+MSRs array. For example, one L2 cache can share QoS MSRs with its next level
+L3 cache. A domain number represents the L2 cache, the L3 cache, the L2
+cache's shared cpumask, and the L3 cache's shared cpumask.
+
+2. Mount rscctrl file system
+============================
+
+Like other file systems, the rscctrl file system needs to be mounted before
+it can be used.
+
+mount -t rscctrl rscctrl <-o cdp,verbose> /sys/fs/rscctrl
+
+This command mounts the rscctrl file system under /sys/fs/rscctrl.
+
+Options are optional:
+
+cdp: Enable Code and Data Prioritization (CDP). Without the option, CDP
+is disabled.
+
+verbose: Output more info in the "info" file under info directory and in
+dmesg. This is mainly for debug.
+
+
+3. Hierarchy in rscctrl
+=======================
+
+The initial hierarchy of the rscctrl file system is as follows after mount:
+
+/sys/fs/rscctrl/info/info
+		    /<resource0>/<resource0 specific info files>
+		    /<resource1>/<resource1 specific info files>
+			....
+	       /tasks
+	       /cpus
+	       /schemas
+
+There are a few files and sub-directories in the hierarchy.
+
+3.1. info
+---------
+
+The read-only sub-directory "info" in root directory has RDT related
+system info.
+
+The "info" file under the info sub-directory shows general info of the system.
+It shows shared domain and the resources within this domain.
+
+Each resource has its own info sub-directory. User can read the information
+for allocation. For example, l3 directory has max_closid, max_cbm_len,
+domain_to_cache_id.
+
+3.2. tasks
+----------
+
+The file "tasks" has all task ids in the root directory initially. The
+thread ids in the file will be added or removed among sub-directories or
+partitions. A task id only stays in one directory at the same time.
+
+3.3. cpus
+
+The file "cpus" has a cpu mask that specifies the CPUs that are bound to the
+schemas. Any tasks scheduled on the cpus will use the schemas. User can set
+both "cpus" and "tasks" to share the same schema in one directory. But when
+a CPU is bound to a schema, a task running on the CPU uses this schema and
+kernel will ignore scheam set up for the task in "tasks".
+
+Initial value is all zeros which means there is no CPU bound to the schemas
+in the root directory and tasks use the schemas.
+
+3.4. schemas
+------------
+
+The file "schemas" has default allocation masks/values for all resources on
+each socket/cpu. Format of the file "schemas" is in multiple lines and each
+line represents masks or values for one resource.
+
+Format of one resource schema line is as follows:
+
+<resource name>:<resource id0>=<schema>;<resource id1>=<schema>;...
+
+As one example, CAT L3's schema format is:
+
+L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+On a two socket machine, L3's schema line could be:
+
+L3:0=ff;1=c0
+
+which means this line in "schemas" file is for CAT L3, L3 cache id 0's CBM
+is 0xff, and L3 cache id 1's CBM is 0xc0.
+
+If one resource is disabled, its line is not shown in schemas file.
+
+The schema line can be expended for situations. L3 cbms format can be
+expended to CDP enabled L3 cbms format:
+
+L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
+
+Initial value is all ones which means all tasks use all resources initially.
+
+4. Create and remove sub-directory
+===================================
+
+User can create a sub-directory under the root directory by "mkdir" command.
+User can remove the sub-directory by "rmdir" command.
+
+Each sub-directory represents a resource allocation policy that user can
+allocate resources for tasks or cpus.
+
+Each directory has three files "tasks", "cpus", and "schemas". The meaning
+of each file is same as the files in the root directory.
+
+When a directory is created, initial contents of the files are:
+
+tasks: Empty. This means no task currently uses this allocation schemas.
+cpus: All zeros. This means no CPU uses this allocation schemas.
+schemas: All ones. This means all resources can be used in this allocation.
+
+5. Add/remove a task in a partition
+===================================
+
+User can add/remove a task by writing its PID in "tasks" in a partition.
+User can read PIDs stored in one "tasks" file.
+
+One task PID only exists in one partition/directory at the same time. If PID
+is written in a new directory, it's removed automatically from its last
+directory.
+
+6. Add/remove a CPU in a partition
+==================================
+
+User can add/remove a CPU by writing its bit in "cpus" in a partition.
+User can read CPUs stored in one "cpus" file.
+
+One CPU only exists in one partition/directory if user wants it to be bound
+to any "schemas". Kernel guarantees uniqueness of the CPU in the whole
+directory to make sure it only uses one schemas. If a CPU is written in one
+new directory, it's automatically removed from its original directory if it
+exists in the original directory.
+
+Or it doesn't exist in the whole directory if user doesn't bind it to any
+"schemas".
+
+7. Some usage examples
+======================
+
+7.1 Example 1 for sharing CLOSID on socket 0 between two partitions
+
+Only L3 cbm is enabled. Assume the machine is 2-socket and dual-core without
+hyperthreading.
+
+#mount -t rscctrl rscctrl /sys/fs/rscctrl
+#cd /sys/fs/rscctrl
+#mkdir p0 p1
+#echo "L3:0=3;1=c" > /sys/fs/rscctrl/p0/schemas
+#echo "L3:0=3;1=3" > /sys/fs/rscctrl/p1/schemas
+
+In partition p0, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
+CLOSID 0 for cbm=0xc on socket 1.
+
+In partition p1, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
+CLOSID 1 for cbm=0x3 on socket 1.
+
+When p1/schemas is updated for socket 0, kernel searches existing
+IA32_L3_QOS_MASK_n MSR registers and finds that 0x3 is in IA32_L3_QOS_MASK_0
+register already. Therefore CLOSID 0 is shared between partition 0 and
+partition 1 on socket 0.
+
+When p1/schemas is udpated for socket 1, kernel searches existing
+IA32_L3_QOS_MASK_n registers and doesn't find a matching cbm. Therefore
+CLOSID 1 is created and IA32_L3_QOS_MASK_1=0xc.
+
+7.2 Example 2 for allocating L3 cache for real-time apps
+
+Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
+processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
+neighbors, each of the two real-time tasks exclusively occupies one quarter
+of L3 cache on socket 0. Assume L3 cbm max width is 20 bits.
+
+#mount -t rscctrl rscctrl /sys/fs/rscctrl
+#cd /sys/fs/rscctrl
+#mkdir p0 p1
+#taskset 0x1 1234
+#taskset 0x2 5678
+#cd /sys/fs/rscctrl/
+#edit schemas to have following allocation:
+L3:0=3ff;1=fffff
+
+which means that all tasks use whole L3 cache 1 and half of L3 cache 0.
+
+#cd ..
+#mkdir p1 p2
+#cd p1
+#echo 1234 >tasks
+#edit schemas to have following two lines:
+L3:0=f8000;1=fffff
+
+which means task 1234 uses L3 cbm=0xf8000, i.e. one quarter of L3 cache 0
+and whole L3 cache 1.
+
+Since 1234 is tied to processor 0, it actually uses the quarter of L3
+on socket 0 only.
+
+#cd ../p2
+#echo 5678 >tasks
+#edit schemas to have following two lines:
+L3:0=7c00;1=fffff
+
+Which means that task 5678 uses L3 cbm=0x7c00, another quarter of L3 cache 0
+and whole L3 cache 1.
+
+Since 5678 is tied to processor 1, it actually only uses the quarter of L3
+on socket 0.
+
+Internally three CLOSIDs are allocated on L3 cache 0:
+IA32_L3_QOS_MASK_0 = 0x3ff
+IA32_L3_QOS_MASK_1 = 0xf8000
+IA32_L3_QOS_MASK_2 = 0x7c00.
+
+Each CLOSID's reference count=1 on L3 cache 0. There is no shared cbms on
+cache 0.
+
+Only one CLOSID is allocated on L3 cache 1:
+
+IA32_L3_QOS_MASK_0=0xfffff. It's shared by root, p1 and p2.
+
+Therefore CLOSID 0's reference count=3 on L3 cache 1.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (12 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-27 16:49   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 15/32] cacheinfo: Introduce cache id Fenghua Yu
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Define two new cpuid leaves for CAT and CDP. The leaves are used in
x86_capability to avoid hard coded index.

Clean comments for RDT, CAT_L3, and CDP_L3 cpufeatures.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpufeature.h  |  2 ++
 arch/x86/include/asm/cpufeatures.h |  6 +++---
 arch/x86/include/asm/processor.h   |  6 +++---
 arch/x86/kernel/cpu/common.c       | 11 +++++++----
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 483fb54..cd3b0bd 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -28,6 +28,8 @@ enum cpuid_leafs
 	CPUID_8000_000A_EDX,
 	CPUID_7_ECX,
 	CPUID_8000_0007_EBX,
+	CPUID_10_0_EBX,
+	CPUID_10_1_ECX,
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 16489b3..588932a 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,7 +220,7 @@
 #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
 #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
-#define X86_FEATURE_RDT		( 9*32+15) /* Resource Allocation */
+#define X86_FEATURE_RDT		( 9*32+15) /* Resource Director Technology */
 #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_AVX512DQ	( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
@@ -289,10 +289,10 @@
 #define X86_FEATURE_SMCA	(17*32+ 3) /* Scalable MCA */
 
 /* Intel-defined CPU features, CPUID level 0x00000010:0 (ebx), word 18 */
-#define X86_FEATURE_CAT_L3      (18*32+ 1) /* Cache Allocation L3 */
+#define X86_FEATURE_CAT_L3	(18*32+ 1) /* Cache Allocation L3 */
 
 /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x00000010:1 (ecx), word 19 */
-#define X86_FEATURE_CDP_L3	(19*32+ 2) /* Code data prioritization L3 */
+#define X86_FEATURE_CDP_L3	(19*32+ 2) /* Code Data Prioritization L3 */
 
 /*
  * BUG word(s)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 598c9bc..308aa03 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -119,9 +119,9 @@ struct cpuinfo_x86 {
 	int			x86_cache_occ_scale;	/* scale to bytes */
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
-	/* Cache Allocation values: */
-	u16			x86_cache_max_cbm_len;
-	u16			x86_cache_max_closid;
+	/* Cache Allocation l3 values: */
+	u16			x86_l3_max_cbm_len;
+	u16			x86_l3_max_closid;
 	/* cpuid returned max cores value: */
 	u16			 x86_max_cores;
 	u16			apicid;
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a695e58..e945e70 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -716,14 +716,17 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		u32 eax, ebx, ecx, edx;
 
 		cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
-		c->x86_capability[17] = ebx;
+		c->x86_capability[CPUID_10_0_EBX] = ebx;
 
 		if (cpu_has(c, X86_FEATURE_CAT_L3)) {
 
 			cpuid_count(0x00000010, 1, &eax, &ebx, &ecx, &edx);
-			c->x86_cache_max_closid = edx + 1;
-			c->x86_cache_max_cbm_len = eax + 1;
-			c->x86_capability[18] = ecx;
+			c->x86_l3_max_closid = edx + 1;
+			c->x86_l3_max_cbm_len = eax + 1;
+			c->x86_capability[CPUID_10_1_ECX] = ecx;
+		} else {
+			c->x86_l3_max_closid = -1;
+			c->x86_l3_max_cbm_len = -1;
 		}
 	}
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 15/32] cacheinfo: Introduce cache id
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (13 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-27 17:04   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 16/32] Documentation, ABI: Add a document entry for " Fenghua Yu
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Each cache is described by cacheinfo and is unique in the same index
across the platform. But there is no id for a cache. We introduce cache
ID to identify a cache.

Intel Cache Allocation Technology (CAT) allows some control on the
allocation policy within each cache that it controls. We need a unique
cache ID for each cache level to allow the user to specify which
controls are applied to which cache. Cache id is a concise way to specify
a cache.

Cache id is first enabled on x86. It can be enabled on other platforms
as well. The cache id is not necessary contiguous.

Add an "id" entry to /sys/devices/system/cpu/cpu*/cache/index*/

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.com>
---
 drivers/base/cacheinfo.c  | 5 +++++
 include/linux/cacheinfo.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index e9fd32e..2a21c15 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -233,6 +233,7 @@ static ssize_t file_name##_show(struct device *dev,		\
 	return sprintf(buf, "%u\n", this_leaf->object);		\
 }
 
+show_one(id, id);
 show_one(level, level);
 show_one(coherency_line_size, coherency_line_size);
 show_one(number_of_sets, number_of_sets);
@@ -314,6 +315,7 @@ static ssize_t write_policy_show(struct device *dev,
 	return n;
 }
 
+static DEVICE_ATTR_RO(id);
 static DEVICE_ATTR_RO(level);
 static DEVICE_ATTR_RO(type);
 static DEVICE_ATTR_RO(coherency_line_size);
@@ -327,6 +329,7 @@ static DEVICE_ATTR_RO(shared_cpu_list);
 static DEVICE_ATTR_RO(physical_line_partition);
 
 static struct attribute *cache_default_attrs[] = {
+	&dev_attr_id.attr,
 	&dev_attr_type.attr,
 	&dev_attr_level.attr,
 	&dev_attr_shared_cpu_map.attr,
@@ -350,6 +353,8 @@ cache_default_attrs_is_visible(struct kobject *kobj,
 	const struct cpumask *mask = &this_leaf->shared_cpu_map;
 	umode_t mode = attr->mode;
 
+	if ((attr == &dev_attr_id.attr) && this_leaf->attributes & CACHE_ID)
+		return mode;
 	if ((attr == &dev_attr_type.attr) && this_leaf->type)
 		return mode;
 	if ((attr == &dev_attr_level.attr) && this_leaf->level)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2189935..cf6984d 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -18,6 +18,7 @@ enum cache_type {
 
 /**
  * struct cacheinfo - represent a cache leaf node
+ * @id: This cache's id. ID is unique in the same index on the platform.
  * @type: type of the cache - data, inst or unified
  * @level: represents the hierarchy in the multi-level cache
  * @coherency_line_size: size of each cache line usually representing
@@ -44,6 +45,7 @@ enum cache_type {
  * keeping, the remaining members form the core properties of the cache
  */
 struct cacheinfo {
+	unsigned int id;
 	enum cache_type type;
 	unsigned int level;
 	unsigned int coherency_line_size;
@@ -61,6 +63,7 @@ struct cacheinfo {
 #define CACHE_WRITE_ALLOCATE	BIT(3)
 #define CACHE_ALLOCATE_POLICY_MASK	\
 	(CACHE_READ_ALLOCATE | CACHE_WRITE_ALLOCATE)
+#define CACHE_ID		BIT(4)
 
 	struct device_node *of_node;
 	bool disable_sysfs;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 16/32] Documentation, ABI: Add a document entry for cache id
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (14 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 15/32] cacheinfo: Introduce cache id Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86 Fenghua Yu
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Add an ABI document entry for /sys/devices/system/cpu/cpu*/cache/index*/id.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 1650133..ecd82e4 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -272,6 +272,23 @@ Description:	Parameters for the CPU cache attributes
 				     the modified cache line is written to main
 				     memory only when it is replaced
 
+
+What:		/sys/devices/system/cpu/cpu*/cache/index*/id
+Date:		July 2016
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Cache id
+
+		The id identifies a hardware cache of the system within a given
+		cache index in a set of cache indices. The "index" name is
+		simply a nomenclature from CPUID's leaf 4 which enumerates all
+		caches on the system by referring to each one as a cache index.
+		The (cache index, cache id) pair is unique for the whole
+		system.
+
+		Currently id is implemented on x86. On other platforms, id is
+		not enabled yet.
+
+
 What:		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/turbo_stat
 		/sys/devices/system/cpu/cpuX/cpufreq/throttle_stats/sub_turbo_stat
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (15 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 16/32] Documentation, ABI: Add a document entry for " Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-28  5:41   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 18/32] drivers/base/cacheinfo.c: Export some cacheinfo functions for others to use Fenghua Yu
                   ` (14 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Enable cache id in x86. Cache id comes from APIC ID and CPUID4.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <bp@suse.com>
---
 arch/x86/kernel/cpu/intel_cacheinfo.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
index de6626c..8dc5720 100644
--- a/arch/x86/kernel/cpu/intel_cacheinfo.c
+++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
@@ -153,6 +153,7 @@ struct _cpuid4_info_regs {
 	union _cpuid4_leaf_eax eax;
 	union _cpuid4_leaf_ebx ebx;
 	union _cpuid4_leaf_ecx ecx;
+	unsigned int id;
 	unsigned long size;
 	struct amd_northbridge *nb;
 };
@@ -894,6 +895,8 @@ static void __cache_cpumap_setup(unsigned int cpu, int index,
 static void ci_leaf_init(struct cacheinfo *this_leaf,
 			 struct _cpuid4_info_regs *base)
 {
+	this_leaf->id = base->id;
+	this_leaf->attributes = CACHE_ID;
 	this_leaf->level = base->eax.split.level;
 	this_leaf->type = cache_type_map[base->eax.split.type];
 	this_leaf->coherency_line_size =
@@ -920,6 +923,22 @@ static int __init_cache_level(unsigned int cpu)
 	return 0;
 }
 
+/*
+ * The max shared threads number comes from CPUID.4:EAX[25-14] with input
+ * ECX as cache index. Then right shift apicid by the number's order to get
+ * cache id for this cache node.
+ */
+static void get_cache_id(int cpu, struct _cpuid4_info_regs *id4_regs)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+	unsigned long num_threads_sharing;
+	int index_msb;
+
+	num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
+	index_msb = get_count_order(num_threads_sharing);
+	id4_regs->id = c->apicid >> index_msb;
+}
+
 static int __populate_cache_leaves(unsigned int cpu)
 {
 	unsigned int idx, ret;
@@ -931,6 +950,7 @@ static int __populate_cache_leaves(unsigned int cpu)
 		ret = cpuid4_cache_lookup_regs(idx, &id4_regs);
 		if (ret)
 			return ret;
+		get_cache_id(cpu, &id4_regs);
 		ci_leaf_init(this_leaf++, &id4_regs);
 		__cache_cpumap_setup(cpu, idx, &id4_regs);
 	}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 18/32] drivers/base/cacheinfo.c: Export some cacheinfo functions for others to use
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (16 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86 Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

We use ci_cpu_cacheinfo in CAT. Export this function for CAT to reuse.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 drivers/base/cacheinfo.c  | 2 +-
 include/linux/cacheinfo.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 2a21c15..f6e269a 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -29,7 +29,7 @@
 #include <linux/sysfs.h>
 
 /* pointer to per cpu cacheinfo */
-static DEFINE_PER_CPU(struct cpu_cacheinfo, ci_cpu_cacheinfo);
+DEFINE_PER_CPU(struct cpu_cacheinfo, ci_cpu_cacheinfo);
 #define ci_cacheinfo(cpu)	(&per_cpu(ci_cpu_cacheinfo, cpu))
 #define cache_leaves(cpu)	(ci_cacheinfo(cpu)->num_leaves)
 #define per_cpu_cacheinfo(cpu)	(ci_cacheinfo(cpu)->info_list)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index cf6984d..fa5e829 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -94,6 +94,8 @@ int func(unsigned int cpu)					\
 	return ret;						\
 }
 
+#define ci_cacheinfo(cpu)       (&per_cpu(ci_cpu_cacheinfo, cpu))
+
 struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
 int init_cache_level(unsigned int cpu);
 int populate_cache_leaves(unsigned int cpu);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (17 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 18/32] drivers/base/cacheinfo.c: Export some cacheinfo functions for others to use Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13 12:56   ` Thomas Gleixner
  2016-07-28  5:53   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 20/32] magic number for rscctrl file system Fenghua Yu
                   ` (12 subsequent siblings)
  31 siblings, 2 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

rg_list is linked list to connect to other tasks in a rdtgroup.

The point of rdtgroup allows the task to access its own rdtgroup directly.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/sched.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..55adf17 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1761,6 +1761,10 @@ struct task_struct {
 	/* cg_list protected by css_set_lock and tsk->alloc_lock */
 	struct list_head cg_list;
 #endif
+#ifdef CONFIG_INTEL_RDT
+	struct list_head rg_list;
+	struct rdtgroup *rdtgroup;
+#endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 20/32] magic number for rscctrl file system
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (18 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-28  5:57   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c Fenghua Yu
                   ` (11 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 include/uapi/linux/magic.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 546b388..655036a 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -57,6 +57,8 @@
 #define CGROUP_SUPER_MAGIC	0x27e0eb
 #define CGROUP2_SUPER_MAGIC	0x63677270
 
+#define RDTGROUP_SUPER_MAGIC	0x7655821
+
 
 #define STACK_END_MAGIC		0x57AC6E9D
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (19 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 20/32] magic number for rscctrl file system Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-28 14:07   ` Nilay Vaish
  2016-07-13  1:02 ` [PATCH 22/32] x86/intel_rdt_rdtgroup.h: Header for user interface Fenghua Yu
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

The header mainly provides functions to call from the user interface
file intel_rdt_rdtgroup.c.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h | 87 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 81 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index f2cb91d..4c5e0ac 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,27 +3,99 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#include <linux/seq_file.h>
 #include <linux/jump_label.h>
 
-#define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
-#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
-#define MSR_IA32_PQOS_CFG		0xc81
+#define L3_CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
+
+#define MSR_IA32_L3_QOS_CFG		0xc81
+
+enum resource_type {
+	RESOURCE_L3  = 0,
+	RESOURCE_NUM = 1,
+};
+
+#define MAX_CACHE_LEAVES        4
+#define MAX_CACHE_DOMAINS       64
+
+DECLARE_PER_CPU_READ_MOSTLY(int, cpu_l3_domain);
+DECLARE_PER_CPU_READ_MOSTLY(struct rdtgroup *, cpu_rdtgroup);
 
 extern struct static_key rdt_enable_key;
 void __intel_rdt_sched_in(void *dummy);
+extern bool use_rdtgroup_tasks;
+
+extern bool cdp_enabled;
+
+struct rdt_opts {
+	bool cdp_enabled;
+	bool verbose;
+	bool simulate_cat_l3;
+};
+
+struct cache_domain {
+	cpumask_t shared_cpu_map[MAX_CACHE_DOMAINS];
+	unsigned int max_cache_domains_num;
+	unsigned int level;
+	unsigned int shared_cache_id[MAX_CACHE_DOMAINS];
+};
+
+extern struct rdt_opts rdt_opts;
 
 struct clos_cbm_table {
-	unsigned long l3_cbm;
+	unsigned long cbm;
 	unsigned int clos_refcnt;
 };
 
 struct clos_config {
-	unsigned long *closmap;
+	unsigned long **closmap;
 	u32 max_closid;
-	u32 closids_used;
 };
 
+struct shared_domain {
+	struct cpumask cpumask;
+	int l3_domain;
+};
+
+#define for_each_cache_domain(domain, start_domain, max_domain)	\
+	for (domain = start_domain; domain < max_domain; domain++)
+
+extern struct clos_config cconfig;
+extern struct shared_domain *shared_domain;
+extern int shared_domain_num;
+
+extern struct rdtgroup *root_rdtgrp;
+extern void rdtgroup_fork(struct task_struct *child);
+extern void rdtgroup_post_fork(struct task_struct *child);
+
+extern struct clos_cbm_table **l3_cctable;
+
+extern unsigned int min_bitmask_len;
+extern void msr_cpu_update(void *arg);
+extern inline void closid_get(u32 closid, int domain);
+extern void closid_put(u32 closid, int domain);
+extern void closid_free(u32 closid, int domain, int level);
+extern int closid_alloc(u32 *closid, int domain);
+extern bool cat_l3_enabled;
+extern unsigned int get_domain_num(int level);
+extern struct shared_domain *shared_domain;
+extern int shared_domain_num;
+extern inline int get_dcbm_table_index(int x);
+extern inline int get_icbm_table_index(int x);
+
+extern int get_cache_leaf(int level, int cpu);
+
+extern void cbm_update_l3_msr(void *pindex);
+extern int level_to_leaf(int level);
+
+extern void init_msrs(bool cdpenabled);
+extern bool cat_enabled(int level);
+extern u64 max_cbm(int level);
+extern u32 max_cbm_len(int level);
+
+extern void rdtgroup_exit(struct task_struct *tsk);
+
 /*
  * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
  *
@@ -54,6 +126,9 @@ static inline void intel_rdt_sched_in(void)
 #else
 
 static inline void intel_rdt_sched_in(void) {}
+static inline void rdtgroup_fork(struct task_struct *child) {}
+static inline void rdtgroup_post_fork(struct task_struct *child) {}
+static inline void rdtgroup_exit(struct task_struct *tsk) {}
 
 #endif
 #endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 22/32] x86/intel_rdt_rdtgroup.h: Header for user interface
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (20 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:02 ` [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources Fenghua Yu
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

This is header file for user interface file intel_rdt_rdtgroup.c.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt_rdtgroup.h | 226 ++++++++++++++++++++++++++++++
 1 file changed, 226 insertions(+)
 create mode 100644 arch/x86/include/asm/intel_rdt_rdtgroup.h

diff --git a/arch/x86/include/asm/intel_rdt_rdtgroup.h b/arch/x86/include/asm/intel_rdt_rdtgroup.h
new file mode 100644
index 0000000..797fed3
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt_rdtgroup.h
@@ -0,0 +1,226 @@
+#ifndef _RDT_PGROUP_H
+#define _RDT_PGROUP_H
+#define MAX_RDTGROUP_TYPE_NAMELEN	32
+#define MAX_RDTGROUP_ROOT_NAMELEN	64
+#define MAX_RFTYPE_NAME			64
+
+#include <asm/intel_rdt.h>
+
+extern void rdtgroup_exit(struct task_struct *tsk);
+
+struct pss_set {
+	void  *self;
+	/*
+	 * Lists running through all tasks using this rdtgroup.
+	 */
+	struct list_head tasks;
+
+	/* all css_task_iters currently walking this cset */
+	struct list_head task_iters;
+
+	/* Reference count */
+	atomic_t refcount;
+};
+
+/* cftype->flags */
+enum {
+	RFTYPE_WORLD_WRITABLE = (1 << 4),/* (DON'T USE FOR NEW FILES) S_IWUGO */
+
+	/* internal flags, do not use outside rdtgroup core proper */
+	__RFTYPE_ONLY_ON_DFL  = (1 << 16),/* only on default hierarchy */
+	__RFTYPE_NOT_ON_DFL   = (1 << 17),/* not on default hierarchy */
+};
+
+/* bits in struct rdtgroup flags field */
+enum {
+	/* rdtroup requires release notifications to userspace */
+	RDTGRP_NOTIFY_ON_RELEASE,
+	/*
+	 * Clone the parent's configuration when creating a new child
+	 * cpuset rdtgroup.  For historical reasons, this option can be
+	 * specified at mount time and thus is implemented here.
+	 */
+	RDTGRP_CPUSET_CLONE_CHILDREN,
+};
+
+#define CACHE_LEVEL3		3
+
+struct cache_resource {
+	u64 *cbm;
+	u64 *cbm2;
+	int *closid;
+	int *refcnt;
+};
+
+struct rdt_resource {
+	bool valid;
+	int closid[MAX_CACHE_DOMAINS];
+	/* Add more resources here. */
+};
+
+struct rdtgroup {
+	unsigned long flags;		/* "unsigned long" so bitops work */
+
+	/*
+	 * idr allocated in-hierarchy ID.
+	 *
+	 * ID 0 is not used, the ID of the root rdtgroup is always 1, and a
+	 * new rdtgroup will be assigned with a smallest available ID.
+	 *
+	 * Allocating/Removing ID must be protected by rdtgroup_mutex.
+	 */
+	int id;
+
+	/*
+	 * The depth this rdtgroup is at.  The root is at depth zero and each
+	 * step down the hierarchy increments the level.  This along with
+	 * ancestor_ids[] can determine whether a given rdtgroup is a
+	 * descendant of another without traversing the hierarchy.
+	 */
+	int level;
+
+	/*
+	 * Each non-empty css_set associated with this rdtgroup contributes
+	 * one to populated_cnt.  All children with non-zero popuplated_cnt
+	 * of their own contribute one.  The count is zero iff there's no
+	 * task in this rdtgroup or its subtree.
+	 */
+	int populated_cnt;
+
+	struct kernfs_node *kn;		/* rdtgroup kernfs entry */
+
+	struct rdtgroup_root *root;
+
+	struct list_head rdtgroup_list;
+
+	struct pss_set pset;
+	struct cpumask cpu_mask;
+	char schema[1024];
+
+	/* used to wait for offlining of csses */
+	wait_queue_head_t offline_waitq;
+
+	struct rdt_resource resource;
+
+	/* ids of the ancestors at each level including self */
+	int ancestor_ids[];
+};
+
+struct rftype {
+	/*
+	 * By convention, the name should begin with the name of the
+	 * subsystem, followed by a period.  Zero length string indicates
+	 * end of cftype array.
+	 */
+	char name[MAX_CFTYPE_NAME];
+	unsigned long private;
+
+	/*
+	 * The maximum length of string, excluding trailing nul, that can
+	 * be passed to write.  If < PAGE_SIZE-1, PAGE_SIZE-1 is assumed.
+	 */
+	size_t max_write_len;
+
+	/* CFTYPE_* flags */
+	unsigned int flags;
+
+	/*
+	 * If non-zero, should contain the offset from the start of css to
+	 * a struct rdtgroup_file field.  rdtgroup will record the handle of
+	 * the created file into it.  The recorded handle can be used as
+	 * long as the containing css remains accessible.
+	 */
+	unsigned int file_offset;
+
+	/*
+	 * Fields used for internal bookkeeping.  Initialized automatically
+	 * during registration.
+	 */
+	struct kernfs_ops *kf_ops;
+
+	/*
+	 * read_u64() is a shortcut for the common case of returning a
+	 * single integer. Use it in place of read()
+	 */
+	u64 (*read_u64)(struct rftype *rft);
+	/*
+	 * read_s64() is a signed version of read_u64()
+	 */
+	s64 (*read_s64)(struct rftype *rft);
+
+	/* generic seq_file read interface */
+	int (*seq_show)(struct seq_file *sf, void *v);
+
+	/* optional ops, implement all or none */
+	void *(*seq_start)(struct seq_file *sf, loff_t *ppos);
+	void *(*seq_next)(struct seq_file *sf, void *v, loff_t *ppos);
+	void (*seq_stop)(struct seq_file *sf, void *v);
+
+	/*
+	 * write_u64() is a shortcut for the common case of accepting
+	 * a single integer (as parsed by simple_strtoull) from
+	 * userspace. Use in place of write(); return 0 or error.
+	 */
+	int (*write_u64)(struct rftype *rft, u64 val);
+	/*
+	 * write_s64() is a signed version of write_u64()
+	 */
+	int (*write_s64)(struct rftype *rft, s64 val);
+
+	/*
+	 * write() is the generic write callback which maps directly to
+	 * kernfs write operation and overrides all other operations.
+	 * Maximum write size is determined by ->max_write_len.  Use
+	 * of_css/cft() to access the associated css and cft.
+	 */
+	ssize_t (*write)(struct kernfs_open_file *of,
+			 char *buf, size_t nbytes, loff_t off);
+};
+
+struct rdtgroup_root {
+	struct kernfs_root *kf_root;
+
+	/* Unique id for this hierarchy. */
+	int hierarchy_id;
+
+	/* The root rdtgroup.  Root is destroyed on its release. */
+	struct rdtgroup rdtgrp;
+
+	/* for rdtgrp->ancestor_ids[0] */
+	int rdtgrp_ancestor_id_storage;
+
+	/* Number of rdtgroups in the hierarchy */
+	atomic_t nr_rdtgrps;
+
+	/* A list running through the active hierarchies */
+	struct list_head root_list;
+
+	/* Hierarchy-specific flags */
+	unsigned int flags;
+
+	/* IDs for rdtgroups in this hierarchy */
+	struct idr rdtgroup_idr;
+
+	/* The name for this hierarchy - may be empty */
+	char name[MAX_RDTGROUP_ROOT_NAMELEN];
+};
+
+/* no synchronization, the result can only be used as a hint */
+static inline bool rdtgroup_is_populated(struct rdtgroup *rdtgrp)
+{
+	return rdtgrp->populated_cnt;
+}
+
+/* cft/css accessors for cftype->write() operation */
+static inline struct rftype *of_rft(struct kernfs_open_file *of)
+{
+	return of->kn->priv;
+}
+
+/* cft/css accessors for cftype->seq_*() operations */
+static inline struct rftype *seq_rft(struct seq_file *seq)
+{
+	return of_rft(seq->private);
+}
+
+#endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (21 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 22/32] x86/intel_rdt_rdtgroup.h: Header for user interface Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13 13:07   ` Thomas Gleixner
  2016-07-13  1:02 ` [PATCH 24/32] Task fork and exit for rdtgroup Fenghua Yu
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

QoS mask MSRs array is per cache. We need to allocate CLOSID per cache
instead global CLOSID.

A few different resources can share same QoS mask MSRs array. For
example, one L2 cache can share QoS MSRs with its next level
L3 cache. A domain number represents the L2 cache, the L3 cache, the L2
cache's shared cpumask, and the L3 cache's shared cpumask.

cctable is extended to be index by domain number so that each cache
has its own control table.

shared_domain is introduced to cover multiple resources sharing
CLOSID.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |   1 +
 arch/x86/kernel/cpu/intel_rdt.c  | 738 +++++++++++++++++++++++++++++----------
 2 files changed, 546 insertions(+), 193 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 4c5e0ac..5aacc4a 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -77,6 +77,7 @@ extern inline void closid_get(u32 closid, int domain);
 extern void closid_put(u32 closid, int domain);
 extern void closid_free(u32 closid, int domain, int level);
 extern int closid_alloc(u32 *closid, int domain);
+extern struct mutex rdtgroup_mutex;
 extern bool cat_l3_enabled;
 extern unsigned int get_domain_num(int level);
 extern struct shared_domain *shared_domain;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 057aef1..017c833 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -17,7 +17,7 @@
  * more details.
  *
  * More information about RDT be found in the Intel (R) x86 Architecture
- * Software Developer Manual June 2015, volume 3, section 17.15.
+ * Software Developer Manual.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -26,46 +26,49 @@
 #include <linux/err.h>
 #include <linux/cpu.h>
 #include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cacheinfo.h>
 #include <asm/pqr_common.h>
 #include <asm/intel_rdt.h>
+#include <asm/intel_rdt_rdtgroup.h>
 
 /*
- * During cache alloc mode cctable maintains 1:1 mapping between
- * CLOSid and l3_cbm.
- *
- * During CDP mode, the cctable maintains a 1:2 mapping between the closid
- * and (dcache_cbm, icache_cbm) pair.
- * index of a dcache_cbm for CLOSid 'n' = n << 1.
- * index of a icache_cbm for CLOSid 'n' = n << 1 + 1
+ * cctable maintains 1:1 mapping between CLOSid and cache bitmask.
  */
-static struct clos_cbm_table *cctable;
+struct clos_cbm_table **l3_cctable;
+
 /*
  * Minimum bits required in Cache bitmask.
  */
-static unsigned int min_bitmask_len = 1;
+unsigned int min_bitmask_len = 1;
+
 /*
  * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
  */
-static cpumask_t rdt_cpumask;
-/*
- * Temporary cpumask used during hot cpu notificaiton handling. The usage
- * is serialized by hot cpu locks.
- */
-static cpumask_t tmp_cpumask;
-static DEFINE_MUTEX(rdt_group_mutex);
+cpumask_t rdt_l3_cpumask;
+
+bool cat_l3_enabled;
+
 struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
-static struct clos_config cconfig;
-static bool cdp_enabled;
+struct clos_config cconfig;
+bool cdp_enabled;
 
-#define __DCBM_TABLE_INDEX(x)	(x << 1)
-#define __ICBM_TABLE_INDEX(x)	((x << 1) + 1)
-#define __DCBM_MSR_INDEX(x)			\
-	CBM_FROM_INDEX(__DCBM_TABLE_INDEX(x))
-#define __ICBM_MSR_INDEX(x)			\
-	CBM_FROM_INDEX(__ICBM_TABLE_INDEX(x))
+#define __DCBM_TABLE_INDEX(x) (x << 1)
+#define __ICBM_TABLE_INDEX(x) ((x << 1) + 1)
+#define __ICBM_MSR_INDEX(x)                    \
+	L3_CBM_FROM_INDEX(__ICBM_TABLE_INDEX(x))
 
-#define DCBM_TABLE_INDEX(x)	(x << cdp_enabled)
-#define ICBM_TABLE_INDEX(x)	((x << cdp_enabled) + cdp_enabled)
+#define DCBM_TABLE_INDEX(x)    (x << cdp_enabled)
+#define ICBM_TABLE_INDEX(x)    ((x << cdp_enabled) + cdp_enabled)
+
+inline int get_dcbm_table_index(int x)
+{
+	return DCBM_TABLE_INDEX(x);
+}
+inline int get_icbm_table_index(int x)
+{
+	return ICBM_TABLE_INDEX(x);
+}
 
 struct rdt_remote_data {
 	int msr;
@@ -101,14 +104,76 @@ static inline bool cache_alloc_hsw_probe(void)
 
 	wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old);
 
-	boot_cpu_data.x86_cache_max_closid = 4;
-	boot_cpu_data.x86_cache_max_cbm_len = 20;
+	boot_cpu_data.x86_l3_max_closid = 4;
+	boot_cpu_data.x86_l3_max_cbm_len = 20;
 	min_bitmask_len = 2;
 
 	return true;
 }
 
-static inline bool cache_alloc_supported(struct cpuinfo_x86 *c)
+u32 max_cbm_len(int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return boot_cpu_data.x86_l3_max_cbm_len;
+	default:
+		break;
+	}
+
+	return (u32)~0;
+}
+
+u64 max_cbm(int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return (1ULL << boot_cpu_data.x86_l3_max_cbm_len) - 1;
+	default:
+		break;
+	}
+
+	return (u64)~0;
+}
+
+static u32 hw_max_closid(int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return  boot_cpu_data.x86_l3_max_closid;
+	default:
+		break;
+	}
+
+	WARN(1, "invalid level\n");
+	return 0;
+}
+
+static int cbm_from_index(u32 i, int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return L3_CBM_FROM_INDEX(i);
+	default:
+		break;
+	}
+
+	WARN(1, "invalid level\n");
+	return 0;
+}
+
+bool cat_enabled(int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return cat_l3_enabled;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static inline bool cat_l3_supported(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CAT_L3))
 		return true;
@@ -122,230 +187,286 @@ static inline bool cache_alloc_supported(struct cpuinfo_x86 *c)
 	return false;
 }
 
+DEFINE_MUTEX(rdtgroup_mutex);
+
+DEFINE_PER_CPU_READ_MOSTLY(int, cpu_l3_domain) = -1;
+DEFINE_PER_CPU_READ_MOSTLY(int, cpu_shared_domain) = -1;
+DEFINE_PER_CPU_READ_MOSTLY(struct rdtgroup *, cpu_rdtgroup) = 0;
+
 void __intel_rdt_sched_in(void *dummy)
 {
 	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	struct rdtgroup *rdtgrp;
+	int closid;
+	int cpu = smp_processor_id();
+	int domain;
+
+	/* Don't write PQR register if rscctrl is not mounted. */
+	if (!rdtgroup_mounted)
+		return;
 
 	/*
-	 * Currently closid is always 0. When  user interface is added,
-	 * closid will come from user interface.
+	 * First find rdtgroup for this cpu.
+	 * If no rdtgroup is found for this cpu, find the task's rdtgroup.
 	 */
-	if (state->closid == 0)
+	rdtgrp = per_cpu(cpu_rdtgroup, cpu);
+	if (!rdtgrp) {
+		rdtgrp = current->rdtgroup;
+
+		if (!rdtgrp)
+			return;
+	}
+
+	domain = per_cpu(cpu_shared_domain, cpu);
+	closid = rdtgrp->resource.closid[domain];
+
+	if (closid == state->closid)
 		return;
 
-	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, 0);
-	state->closid = 0;
-}
+	state->closid = closid;
+	/* Don't really write PQR register in simulation mode. */
+	if (unlikely(rdt_opts.simulate_cat_l3))
+		return;
 
-/*
- * Synchronize the IA32_PQR_ASSOC MSR of all currently running tasks.
- */
-static inline void closid_tasks_sync(void)
-{
-	on_each_cpu_mask(cpu_online_mask, __intel_rdt_sched_in, NULL, 1);
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
 }
 
 /*
  * When cdp mode is enabled, refcnt is maintained in the dcache_cbm entry.
  */
-static inline void closid_get(u32 closid)
+inline void closid_get(u32 closid, int domain)
 {
-	struct clos_cbm_table *cct = &cctable[DCBM_TABLE_INDEX(closid)];
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (cat_l3_enabled) {
+		int l3_domain;
+		int dindex;
 
-	lockdep_assert_held(&rdt_group_mutex);
+		l3_domain = shared_domain[domain].l3_domain;
+		dindex = DCBM_TABLE_INDEX(closid);
+		l3_cctable[l3_domain][dindex].clos_refcnt++;
+		if (cdp_enabled) {
+			int iindex = ICBM_TABLE_INDEX(closid);
 
-	cct->clos_refcnt++;
+			l3_cctable[l3_domain][iindex].clos_refcnt++;
+		}
+	}
 }
 
-static int closid_alloc(u32 *closid)
+int closid_alloc(u32 *closid, int domain)
 {
 	u32 maxid;
 	u32 id;
 
-	lockdep_assert_held(&rdt_group_mutex);
+	lockdep_assert_held(&rdtgroup_mutex);
 
 	maxid = cconfig.max_closid;
-	id = find_first_zero_bit(cconfig.closmap, maxid);
+	id = find_first_zero_bit((unsigned long *)cconfig.closmap[domain],
+				 maxid);
+
 	if (id == maxid)
 		return -ENOSPC;
 
-	set_bit(id, cconfig.closmap);
-	closid_get(id);
+	set_bit(id, (unsigned long *)cconfig.closmap[domain]);
+	closid_get(id, domain);
 	*closid = id;
-	cconfig.closids_used++;
 
 	return 0;
 }
 
-static inline void closid_free(u32 closid)
+unsigned int get_domain_num(int level)
 {
-	clear_bit(closid, cconfig.closmap);
-	cctable[DCBM_TABLE_INDEX(closid)].l3_cbm = 0;
-
-	if (WARN_ON(!cconfig.closids_used))
-		return;
+	if (level == CACHE_LEVEL3)
+		return cpumask_weight(&rdt_l3_cpumask);
+	else
+		return -EINVAL;
+}
 
-	cconfig.closids_used--;
+int level_to_leaf(int level)
+{
+	switch (level) {
+	case CACHE_LEVEL3:
+		return 3;
+	default:
+		return -EINVAL;
+	}
 }
 
-static void closid_put(u32 closid)
+void closid_free(u32 closid, int domain, int level)
 {
-	struct clos_cbm_table *cct = &cctable[DCBM_TABLE_INDEX(closid)];
+	struct clos_cbm_table **cctable;
+	int leaf;
+	struct cpumask *mask;
+	int cpu;
+
+	if (level == CACHE_LEVEL3)
+		cctable = l3_cctable;
+
+	clear_bit(closid, (unsigned long *)cconfig.closmap[domain]);
+
+	if (level == CACHE_LEVEL3) {
+		cctable[domain][closid].cbm = max_cbm(level);
+		leaf = level_to_leaf(level);
+		mask = &cache_domains[leaf].shared_cpu_map[domain];
+		cpu = cpumask_first(mask);
+		smp_call_function_single(cpu, cbm_update_l3_msr, &closid, 1);
+	}
+}
 
-	lockdep_assert_held(&rdt_group_mutex);
+static void _closid_put(u32 closid, struct clos_cbm_table *cct,
+			int domain, int level)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
 	if (WARN_ON(!cct->clos_refcnt))
 		return;
 
 	if (!--cct->clos_refcnt)
-		closid_free(closid);
+		closid_free(closid, domain, level);
 }
 
-static bool cbm_validate(unsigned long var)
+void closid_put(u32 closid, int domain)
 {
-	u32 max_cbm_len = boot_cpu_data.x86_cache_max_cbm_len;
-	unsigned long first_bit, zero_bit;
-	u64 max_cbm;
-
-	if (bitmap_weight(&var, max_cbm_len) < min_bitmask_len)
-		return false;
+	struct clos_cbm_table *cct;
 
-	max_cbm = (1ULL << max_cbm_len) - 1;
-	if (var & ~max_cbm)
-		return false;
-
-	first_bit = find_first_bit(&var, max_cbm_len);
-	zero_bit = find_next_zero_bit(&var, max_cbm_len, first_bit);
-
-	if (find_next_bit(&var, max_cbm_len, zero_bit) < max_cbm_len)
-		return false;
+	if (cat_l3_enabled) {
+		int l3_domain = shared_domain[domain].l3_domain;
 
-	return true;
+		cct = &l3_cctable[l3_domain][DCBM_TABLE_INDEX(closid)];
+		_closid_put(closid, cct, l3_domain, CACHE_LEVEL3);
+		if (cdp_enabled) {
+			cct = &l3_cctable[l3_domain][ICBM_TABLE_INDEX(closid)];
+			_closid_put(closid, cct, l3_domain, CACHE_LEVEL3);
+		}
+	}
 }
 
-static int clos_cbm_table_read(u32 index, unsigned long *l3_cbm)
+void msr_cpu_update(void *arg)
 {
-	u32 orig_maxid = boot_cpu_data.x86_cache_max_closid;
+	struct rdt_remote_data *info = arg;
 
-	lockdep_assert_held(&rdt_group_mutex);
+	if (unlikely(rdt_opts.verbose))
+		pr_info("Write %lx to msr %x on cpu%d\n",
+			(unsigned long)info->val, info->msr,
+			smp_processor_id());
 
-	if (index >= orig_maxid)
-		return -EINVAL;
+	if (unlikely(rdt_opts.simulate_cat_l3))
+		return;
 
-	*l3_cbm = cctable[index].l3_cbm;
+	wrmsrl(info->msr, info->val);
+}
 
-	return 0;
+static struct cpumask *rdt_cache_cpumask(int level)
+{
+	return &rdt_l3_cpumask;
 }
 
 /*
- * clos_cbm_table_update() - Update a clos cbm table entry.
- * @index: index of the table entry whose cbm needs to be updated
- * @cbm: the new cbm value that has to be updated
- *
- * This assumes the cbm is validated as per the interface requirements
- * and the cache allocation requirements(through the cbm_validate).
+ * msr_update_all() - Update the msr for all packages.
  */
-static int clos_cbm_table_update(u32 index, unsigned long cbm)
+static inline void msr_update_all(int msr, u64 val, int level)
 {
-	u32 orig_maxid = boot_cpu_data.x86_cache_max_closid;
-
-	lockdep_assert_held(&rdt_group_mutex);
-
-	if (index >= orig_maxid)
-		return -EINVAL;
-
-	cctable[index].l3_cbm = cbm;
+	struct rdt_remote_data info;
 
-	return 0;
+	info.msr = msr;
+	info.val = val;
+	on_each_cpu_mask(rdt_cache_cpumask(level), msr_cpu_update, &info, 1);
 }
 
-static bool cbm_search(unsigned long cbm, u32 *closid)
+static void init_qos_msrs(int level)
 {
-	u32 maxid = cconfig.max_closid;
-	u32 i;
+	if (cat_enabled(level)) {
+		u32 maxcbm;
+		u32 i;
 
-	for (i = 0; i < maxid; i++) {
-		if (cctable[i].clos_refcnt &&
-		    bitmap_equal(&cbm, &cctable[i].l3_cbm, MAX_CBM_LENGTH)) {
-			*closid = i;
-			return true;
-		}
+		maxcbm = max_cbm(level);
+		for (i = 0; i < hw_max_closid(level); i++)
+			msr_update_all(cbm_from_index(i, level), maxcbm, level);
 	}
-
-	return false;
 }
 
-static void closcbm_map_dump(void)
+/*
+ * Initialize QOS_MASK_n registers to all 1's.
+ *
+ * Initialize L3_QOS_CFG register to enable or disable CDP.
+ */
+void init_msrs(bool cdpenabled)
 {
-	u32 i;
-
-	pr_debug("CBMMAP\n");
-	for (i = 0; i < boot_cpu_data.x86_cache_max_closid; i++) {
-		pr_debug("l3_cbm: 0x%x,clos_refcnt: %u\n",
-		 (unsigned int)cctable[i].l3_cbm, cctable[i].clos_refcnt);
+	if (cat_enabled(CACHE_LEVEL3)) {
+		init_qos_msrs(CACHE_LEVEL3);
+		msr_update_all(MSR_IA32_L3_QOS_CFG, cdpenabled, CACHE_LEVEL3);
 	}
-}
-
-static void msr_cpu_update(void *arg)
-{
-	struct rdt_remote_data *info = arg;
 
-	wrmsrl(info->msr, info->val);
 }
 
-/*
- * msr_update_all() - Update the msr for all packages.
- */
-static inline void msr_update_all(int msr, u64 val)
+int get_cache_leaf(int level, int cpu)
 {
-	struct rdt_remote_data info;
+	int index;
+	struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+	struct cacheinfo *this_leaf;
+	int num_leaves = this_cpu_ci->num_leaves;
+
+	for (index = 0; index < num_leaves; index++) {
+		this_leaf = this_cpu_ci->info_list + index;
+		if (this_leaf->level == level)
+			return index;
+	}
 
-	info.msr = msr;
-	info.val = val;
-	on_each_cpu_mask(&rdt_cpumask, msr_cpu_update, &info, 1);
+	return -EINVAL;
 }
 
-static bool code_data_mask_equal(void)
+static struct cpumask *get_shared_cpu_map(int cpu, int level)
 {
-	int i, dindex, iindex;
+	int index;
+	struct cacheinfo *leaf;
+	struct cpu_cacheinfo *cpu_ci = get_cpu_cacheinfo(cpu);
 
-	for (i = 0; i < cconfig.max_closid; i++) {
-		dindex = __DCBM_TABLE_INDEX(i);
-		iindex = __ICBM_TABLE_INDEX(i);
-		if (cctable[dindex].clos_refcnt &&
-		     (cctable[dindex].l3_cbm != cctable[iindex].l3_cbm))
-			return false;
-	}
+	index = get_cache_leaf(level, cpu);
+	if (index < 0)
+		return 0;
 
-	return true;
+	leaf = cpu_ci->info_list + index;
+
+	return &leaf->shared_cpu_map;
 }
 
-static inline bool rdt_cpumask_update(int cpu)
+inline bool rdt_cpumask_update(struct cpumask *cpumask, int cpu, int level)
 {
-	cpumask_and(&tmp_cpumask, &rdt_cpumask, topology_core_cpumask(cpu));
+	struct cpumask *shared_cpu_map;
+	cpumask_t tmp_cpumask;
+
+	shared_cpu_map = get_shared_cpu_map(cpu, level);
+	if (!shared_cpu_map)
+		return false;
+
+	cpumask_and(&tmp_cpumask, cpumask, shared_cpu_map);
 	if (cpumask_empty(&tmp_cpumask)) {
-		cpumask_set_cpu(cpu, &rdt_cpumask);
+		cpumask_set_cpu(cpu, cpumask);
 		return true;
 	}
 
 	return false;
 }
 
-static void cbm_update_msr(u32 index)
+void cbm_update_l3_msr(void *pindex)
 {
-	struct rdt_remote_data info;
+	int index;
 	int dindex;
+	int l3_domain;
+	struct clos_cbm_table *pl3_cctable;
+	struct rdt_remote_data info;
 
+	index = *(int *)pindex;
 	dindex = DCBM_TABLE_INDEX(index);
-	if (cctable[dindex].clos_refcnt) {
-
-		info.msr = CBM_FROM_INDEX(dindex);
-		info.val = cctable[dindex].l3_cbm;
-		msr_cpu_update((void *) &info);
-
+	l3_domain =  per_cpu(cpu_l3_domain, smp_processor_id());
+	pl3_cctable = &l3_cctable[l3_domain][dindex];
+	if (pl3_cctable->clos_refcnt) {
+		info.msr = L3_CBM_FROM_INDEX(dindex);
+		info.val = pl3_cctable->cbm;
+		msr_cpu_update(&info);
 		if (cdp_enabled) {
 			info.msr = __ICBM_MSR_INDEX(index);
-			info.val = cctable[dindex + 1].l3_cbm;
-			msr_cpu_update((void *) &info);
+			info.val = l3_cctable[l3_domain][dindex+1].cbm;
+			msr_cpu_update(&info);
 		}
 	}
 }
@@ -356,11 +477,13 @@ static void cbm_update_msr(u32 index)
  */
 static void cbm_update_msrs(void *dummy)
 {
-	int maxid = cconfig.max_closid;
-	unsigned int i;
+	int maxid;
+	int index;
 
-	for (i = 0; i < maxid; i++) {
-		cbm_update_msr(i);
+	maxid = cconfig.max_closid;
+	if (cat_l3_enabled) {
+		for (index = 0; index < maxid; index++)
+			cbm_update_l3_msr(&index);
 	}
 }
 
@@ -369,19 +492,44 @@ static inline void intel_rdt_cpu_start(int cpu)
 	struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
 
 	state->closid = 0;
-	mutex_lock(&rdt_group_mutex);
-	if (rdt_cpumask_update(cpu))
+	mutex_lock(&rdtgroup_mutex);
+	if (rdt_cpumask_update(&rdt_l3_cpumask, cpu, CACHE_LEVEL3))
 		smp_call_function_single(cpu, cbm_update_msrs, NULL, 1);
-	mutex_unlock(&rdt_group_mutex);
+	mutex_unlock(&rdtgroup_mutex);
 }
 
 static void intel_rdt_cpu_exit(unsigned int cpu)
 {
+	cpumask_t tmp_cpumask;
+	struct cpumask *shared_cpu_map;
+	int new_cpu;
 	int i;
+	int l3_domain;
+	int level;
+	int leaf;
+
+	mutex_lock(&rdtgroup_mutex);
 
-	mutex_lock(&rdt_group_mutex);
-	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
-		mutex_unlock(&rdt_group_mutex);
+	level = CACHE_LEVEL3;
+
+	l3_domain = per_cpu(cpu_l3_domain, cpu);
+	leaf = level_to_leaf(level);
+	shared_cpu_map = &cache_domains[leaf].shared_cpu_map[l3_domain];
+
+	cpumask_clear_cpu(cpu, &rdt_l3_cpumask);
+	cpumask_clear_cpu(cpu, shared_cpu_map);
+	if (cpumask_empty(shared_cpu_map))
+		goto out;
+
+	new_cpu = cpumask_first(shared_cpu_map);
+	rdt_cpumask_update(&rdt_l3_cpumask, new_cpu, level);
+
+out:
+	mutex_unlock(&rdtgroup_mutex);
+	return;
+
+	if (cpumask_test_and_clear_cpu(cpu, &rdt_l3_cpumask)) {
+		mutex_unlock(&rdtgroup_mutex);
 		return;
 	}
 
@@ -390,8 +538,8 @@ static void intel_rdt_cpu_exit(unsigned int cpu)
 	i = cpumask_any(&tmp_cpumask);
 
 	if (i < nr_cpu_ids)
-		cpumask_set_cpu(i, &rdt_cpumask);
-	mutex_unlock(&rdt_group_mutex);
+		cpumask_set_cpu(i, &rdt_l3_cpumask);
+	mutex_unlock(&rdtgroup_mutex);
 }
 
 static int intel_rdt_cpu_notifier(struct notifier_block *nb,
@@ -414,37 +562,242 @@ static int intel_rdt_cpu_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+/*
+ * Initialize per-cpu cpu_l3_domain.
+ *
+ * cpu_l3_domain numbers are consequtive integer starting from 0.
+ * Sets up 1:1 mapping of cpu id and cpu_l3_domain.
+ */
+static int __init cpu_cache_domain_init(int level)
+{
+	int i, j;
+	int max_cpu_cache_domain = 0;
+	int index;
+	struct cacheinfo *leaf;
+	int *domain;
+	struct cpu_cacheinfo *cpu_ci;
+
+	for_each_online_cpu(i) {
+		domain = &per_cpu(cpu_l3_domain, i);
+		if (*domain == -1) {
+			index = get_cache_leaf(level, i);
+			if (index < 0)
+				return -EINVAL;
+
+			cpu_ci = get_cpu_cacheinfo(i);
+			leaf = cpu_ci->info_list + index;
+			if (cpumask_empty(&leaf->shared_cpu_map)) {
+				WARN(1, "no shared cpu for L2\n");
+				return -EINVAL;
+			}
+
+			for_each_cpu(j, &leaf->shared_cpu_map) {
+				domain = &per_cpu(cpu_l3_domain, j);
+				*domain = max_cpu_cache_domain;
+			}
+			max_cpu_cache_domain++;
+		}
+	}
+
+	return 0;
+}
+
+struct rdt_opts rdt_opts = {
+	.cdp_enabled = false,
+	.verbose = false,
+	.simulate_cat_l3 = false,
+};
+
+static bool disable_cat_l3 __initdata;
+
+static int __init rdt_setup(char *str)
+{
+	char *tok;
+
+	while ((tok = strsep(&str, ",")) != NULL) {
+		if (!*tok)
+			return -EINVAL;
+
+		if (strcmp(tok, "simulate_cat_l3") == 0) {
+			pr_info("Simulate CAT L3\n");
+			rdt_opts.simulate_cat_l3 = true;
+		} else if (strcmp(tok, "disable_cat_l3") == 0) {
+			pr_info("CAT L3 is disabled\n");
+			disable_cat_l3 = true;
+		} else {
+			pr_info("Invalid rdt option\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+__setup("rscctrl=", rdt_setup);
+
+static inline bool resource_alloc_enabled(void)
+{
+	return cat_l3_enabled;
+}
+
+struct shared_domain *shared_domain;
+int shared_domain_num;
+
+static int shared_domain_init(void)
+{
+	int l3_domain_num = get_domain_num(CACHE_LEVEL3);
+	int size;
+	int domain;
+	struct cpumask *cpumask;
+	struct cpumask *shared_cpu_map;
+	int cpu;
+
+	if (cat_l3_enabled) {
+		shared_domain_num = l3_domain_num;
+		cpumask = &rdt_l3_cpumask;
+	} else
+		return -EINVAL;
+
+	size = shared_domain_num * sizeof(struct shared_domain);
+	shared_domain = kzalloc(size, GFP_KERNEL);
+	if (!shared_domain)
+		return -EINVAL;
+
+	domain = 0;
+	for_each_cpu(cpu, cpumask) {
+		if (cat_l3_enabled)
+			shared_domain[domain].l3_domain =
+					per_cpu(cpu_l3_domain, cpu);
+		else
+			shared_domain[domain].l3_domain = -1;
+
+		shared_cpu_map = get_shared_cpu_map(cpu, CACHE_LEVEL3);
+
+		cpumask_copy(&shared_domain[domain].cpumask, shared_cpu_map);
+
+		domain++;
+	}
+	for_each_online_cpu(cpu) {
+		if (cat_l3_enabled)
+			per_cpu(cpu_shared_domain, cpu) =
+					per_cpu(cpu_l3_domain, cpu);
+	}
+
+	return 0;
+}
+
+static int cconfig_init(int maxid)
+{
+	int num;
+	int domain;
+	unsigned long *closmap_block;
+	int maxid_size;
+
+	maxid_size = BITS_TO_LONGS(maxid);
+	num = maxid_size * shared_domain_num;
+	cconfig.closmap = kcalloc(maxid, sizeof(unsigned long *), GFP_KERNEL);
+	if (!cconfig.closmap)
+		goto out_free;
+
+	closmap_block = kcalloc(num, sizeof(unsigned long), GFP_KERNEL);
+	if (!closmap_block)
+		goto out_free;
+
+	for (domain = 0; domain < shared_domain_num; domain++)
+		cconfig.closmap[domain] = (unsigned long *)closmap_block +
+					  domain * maxid_size;
+
+	cconfig.max_closid = maxid;
+
+	return 0;
+out_free:
+	kfree(cconfig.closmap);
+	kfree(closmap_block);
+	return -ENOMEM;
+}
+
+static int __init cat_cache_init(int level, int maxid,
+				 struct clos_cbm_table ***cctable)
+{
+	int domain_num;
+	int domain;
+	int size;
+	int ret = 0;
+	struct clos_cbm_table *p;
+
+	domain_num = get_domain_num(level);
+	size = domain_num * sizeof(struct clos_cbm_table *);
+	*cctable = kzalloc(size, GFP_KERNEL);
+	if (!*cctable) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	size = maxid * domain_num * sizeof(struct clos_cbm_table);
+	p = kzalloc(size, GFP_KERNEL);
+	if (!p) {
+		kfree(*cctable);
+		ret = -ENOMEM;
+		goto out;
+	}
+	for (domain = 0; domain < domain_num; domain++)
+		(*cctable)[domain] = p + domain * maxid;
+
+	ret = cpu_cache_domain_init(level);
+	if (ret) {
+		kfree(*cctable);
+		kfree(p);
+	}
+out:
+	return ret;
+}
 static int __init intel_rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
-	u32 maxid, max_cbm_len;
-	int err = 0, size, i;
-
-	if (!cache_alloc_supported(c))
+	u32 maxid;
+	int i;
+	int ret;
+
+	if (unlikely(disable_cat_l3))
+		cat_l3_enabled = false;
+	else if (cat_l3_supported(c))
+		cat_l3_enabled = true;
+	else if (rdt_opts.simulate_cat_l3 &&
+		 get_cache_leaf(CACHE_LEVEL3, 0) >= 0)
+		cat_l3_enabled = true;
+	else
+		cat_l3_enabled = false;
+
+	if (!resource_alloc_enabled())
 		return -ENODEV;
 
-	maxid = c->x86_cache_max_closid;
-	max_cbm_len = c->x86_cache_max_cbm_len;
-
-	size = maxid * sizeof(struct clos_cbm_table);
-	cctable = kzalloc(size, GFP_KERNEL);
-	if (!cctable) {
-		err = -ENOMEM;
-		goto out_err;
+	if (rdt_opts.simulate_cat_l3) {
+		boot_cpu_data.x86_l3_max_closid = 16;
+		boot_cpu_data.x86_l3_max_cbm_len = 20;
+	}
+	for_each_online_cpu(i) {
+		rdt_cpumask_update(&rdt_l3_cpumask, i, CACHE_LEVEL3);
 	}
 
-	size = BITS_TO_LONGS(maxid) * sizeof(long);
-	cconfig.closmap = kzalloc(size, GFP_KERNEL);
-	if (!cconfig.closmap) {
-		kfree(cctable);
-		err = -ENOMEM;
-		goto out_err;
+	maxid = 0;
+	if (cat_l3_enabled) {
+		maxid = boot_cpu_data.x86_l3_max_closid;
+		ret = cat_cache_init(CACHE_LEVEL3, maxid, &l3_cctable);
+		if (ret)
+			cat_l3_enabled = false;
 	}
 
-	cpu_notifier_register_begin();
+	if (!cat_l3_enabled)
+		return -ENOSPC;
+
+	ret = shared_domain_init();
+	if (ret)
+		return -ENODEV;
 
-	for_each_online_cpu(i)
-		rdt_cpumask_update(i);
+	ret = cconfig_init(maxid);
+	if (ret)
+		return ret;
+
+	cpu_notifier_register_begin();
 
 	__hotcpu_notifier(intel_rdt_cpu_notifier, 0);
 
@@ -454,9 +807,8 @@ static int __init intel_rdt_late_init(void)
 	pr_info("Intel cache allocation enabled\n");
 	if (cpu_has(c, X86_FEATURE_CDP_L3))
 		pr_info("Intel code data prioritization detected\n");
-out_err:
 
-	return err;
+	return 0;
 }
 
 late_initcall(intel_rdt_late_init);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 24/32] Task fork and exit for rdtgroup
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (22 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13 13:14   ` Thomas Gleixner
  2016-07-13  1:02 ` [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT Fenghua Yu
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

When a task is forked, it inherites its parent rdtgroup. The task
can be moved to other rdtgroup during its run time.

When the task exits, it's deleted from it's current rdtgroup's task
list.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h |  1 +
 arch/x86/kernel/cpu/intel_rdt.c  | 22 ++++++++++++++++++++++
 kernel/exit.c                    |  2 ++
 kernel/fork.c                    |  4 ++++
 4 files changed, 29 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 5aacc4a..f2298f3 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -22,6 +22,7 @@ enum resource_type {
 DECLARE_PER_CPU_READ_MOSTLY(int, cpu_l3_domain);
 DECLARE_PER_CPU_READ_MOSTLY(struct rdtgroup *, cpu_rdtgroup);
 
+extern spinlock_t rdtgroup_task_lock;
 extern struct static_key rdt_enable_key;
 void __intel_rdt_sched_in(void *dummy);
 extern bool use_rdtgroup_tasks;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 017c833..901156d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -812,3 +812,25 @@ static int __init intel_rdt_late_init(void)
 }
 
 late_initcall(intel_rdt_late_init);
+
+void rdtgroup_fork(struct task_struct *child)
+{
+	INIT_LIST_HEAD(&child->rg_list);
+	child->rdtgroup = NULL;
+}
+
+void rdtgroup_post_fork(struct task_struct *child)
+{
+	if (!use_rdtgroup_tasks)
+		return;
+
+	spin_lock_irq(&rdtgroup_task_lock);
+	if (list_empty(&child->rg_list)) {
+		struct rdtgroup *rdtgrp = current->rdtgroup;
+
+		list_add_tail(&child->rg_list, &rdtgrp->pset.tasks);
+		child->rdtgroup = rdtgrp;
+		atomic_inc(&rdtgrp->pset.refcount);
+	}
+	spin_unlock_irq(&rdtgroup_task_lock);
+}
diff --git a/kernel/exit.c b/kernel/exit.c
index 9e6e135..04346b6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -59,6 +59,7 @@
 #include <asm/unistd.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
+#include <asm/intel_rdt.h>
 
 static void __unhash_process(struct task_struct *p, bool group_dead)
 {
@@ -757,6 +758,7 @@ void do_exit(long code)
 	perf_event_exit_task(tsk);
 
 	cgroup_exit(tsk);
+	rdtgroup_exit(tsk);
 
 	/*
 	 * FIXME: do that only when needed, using sched_exit tracepoint
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a7ec0c..d9bb373 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -84,6 +84,8 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 
+#include <asm/intel_rdt.h>
+
 #include <trace/events/sched.h>
 
 #define CREATE_TRACE_POINTS
@@ -1408,6 +1410,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->audit_context = NULL;
 	threadgroup_change_begin(current);
 	cgroup_fork(p);
+	rdtgroup_fork(p);
 #ifdef CONFIG_NUMA
 	p->mempolicy = mpol_dup(p->mempolicy);
 	if (IS_ERR(p->mempolicy)) {
@@ -1647,6 +1650,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
+	rdtgroup_post_fork(p);
 	threadgroup_change_end(current);
 	perf_event_fork(p);
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (23 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 24/32] Task fork and exit for rdtgroup Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-14 12:30   ` Thomas Gleixner
  2016-07-13  1:02 ` [PATCH 26/32] x86/intel_rdt_rdtgroup.c: Create info directory Fenghua Yu
                   ` (6 subsequent siblings)
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

We introduce a new rscctrl file system mounted under /sys/fs/rscctrl.
User uses this file system to control resource allocation.

Hiearchy of the file system is as follows:
/sys/fs/rscctrl/info/info
		    /<resource0>/<resource0 specific info files>
		    /<resource1>/<resource1 specific info files>
			....
		/tasks
		/cpus
		/schemas
		/sub-dir1
		/sub-dir2
		....

User can specify which task uses which schemas for resource allocation.

More details can be found in Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/intel_rdt.h          |   3 +
 arch/x86/include/asm/intel_rdt_rdtgroup.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c           |   2 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c  | 881 ++++++++++++++++++++++++++++++
 4 files changed, 889 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index f2298f3..90b6047 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -42,6 +42,9 @@ struct cache_domain {
 	unsigned int shared_cache_id[MAX_CACHE_DOMAINS];
 };
 
+extern struct cache_domain cache_domains[MAX_CACHE_LEAVES];
+
+
 extern struct rdt_opts rdt_opts;
 
 struct clos_cbm_table {
diff --git a/arch/x86/include/asm/intel_rdt_rdtgroup.h b/arch/x86/include/asm/intel_rdt_rdtgroup.h
index 797fed3..b0bcf72 100644
--- a/arch/x86/include/asm/intel_rdt_rdtgroup.h
+++ b/arch/x86/include/asm/intel_rdt_rdtgroup.h
@@ -205,6 +205,9 @@ struct rdtgroup_root {
 	char name[MAX_RDTGROUP_ROOT_NAMELEN];
 };
 
+extern int __init rdtgroup_init(void);
+extern bool rdtgroup_mounted;
+
 /* no synchronization, the result can only be used as a hint */
 static inline bool rdtgroup_is_populated(struct rdtgroup *rdtgrp)
 {
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 901156d..e483a1d 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -803,6 +803,8 @@ static int __init intel_rdt_late_init(void)
 
 	cpu_notifier_register_done();
 
+	rdtgroup_init();
+
 	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Intel cache allocation enabled\n");
 	if (cpu_has(c, X86_FEATURE_CDP_L3))
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
new file mode 100644
index 0000000..e1936d2
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -0,0 +1,881 @@
+/*
+ * Resource Director Technology(RDT)
+ * - User interface for Resource Alloction in RDT.
+ *
+ * Copyright (C) 2016 Intel Corporation
+ *
+ * 2016 Written by
+ *    Fenghua Yu <fenghua.yu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/cred.h>
+#include <linux/ctype.h>
+#include <linux/errno.h>
+#include <linux/init_task.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/magic.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/mount.h>
+#include <linux/pagemap.h>
+#include <linux/proc_fs.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/percpu-rwsem.h>
+#include <linux/string.h>
+#include <linux/sort.h>
+#include <linux/pid_namespace.h>
+#include <linux/idr.h>
+#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */
+#include <linux/kthread.h>
+#include <linux/delay.h>
+#include <linux/atomic.h>
+#include <linux/cpumask.h>
+#include <linux/cacheinfo.h>
+#include <linux/cacheinfo.h>
+#include <net/sock.h>
+#include <asm/intel_rdt_rdtgroup.h>
+#include <asm/intel_rdt.h>
+
+/**
+ * kernfs_root - find out the kernfs_root a kernfs_node belongs to
+ * @kn: kernfs_node of interest
+ *
+ * Return the kernfs_root @kn belongs to.
+ */
+static inline struct kernfs_root *get_kernfs_root(struct kernfs_node *kn)
+{
+	/* if parent exists, it's always a dir; otherwise, @sd is a dir */
+	if (kn->parent)
+		kn = kn->parent;
+	return kn->dir.root;
+}
+
+/*
+ * Protects rdtgroup_idr so that IDs can be released without grabbing
+ * rdtgroup_mutex.
+ */
+static DEFINE_SPINLOCK(rdtgroup_idr_lock);
+
+struct percpu_rw_semaphore rdtgroup_threadgroup_rwsem;
+
+#define MAX_CPUMASK_CHAR_IN_HEX	(NR_CPUS/4)
+
+static struct rftype rdtgroup_root_base_files[];
+
+#define RDTGROUP_FILE_NAME_MAX		(MAX_RDTGROUP_TYPE_NAMELEN +	\
+					 MAX_RFTYPE_NAME + 2)
+static char *rdtgroup_file_name(const struct rftype *rft, char *buf)
+{
+	strncpy(buf, rft->name, RDTGROUP_FILE_NAME_MAX);
+	return buf;
+}
+
+/**
+ * rdtgroup_file_mode - deduce file mode of a control file
+ * @cft: the control file in question
+ *
+ * S_IRUGO for read, S_IWUSR for write.
+ */
+static umode_t rdtgroup_file_mode(const struct rftype *rft)
+{
+	umode_t mode = 0;
+
+	if (rft->read_u64 || rft->read_s64 || rft->seq_show)
+		mode |= S_IRUGO;
+
+	if (rft->write_u64 || rft->write_s64 || rft->write) {
+		if (rft->flags & RFTYPE_WORLD_WRITABLE)
+			mode |= S_IWUGO;
+		else
+			mode |= S_IWUSR;
+	}
+
+	return mode;
+}
+
+/* set uid and gid of rdtgroup dirs and files to that of the creator */
+static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
+{
+	struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID,
+			       .ia_uid = current_fsuid(),
+			       .ia_gid = current_fsgid(), };
+
+	if (uid_eq(iattr.ia_uid, GLOBAL_ROOT_UID) &&
+	    gid_eq(iattr.ia_gid, GLOBAL_ROOT_GID))
+		return 0;
+
+	return kernfs_setattr(kn, &iattr);
+}
+
+struct rdtgroup *root_rdtgrp;
+static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
+{
+	char name[RDTGROUP_FILE_NAME_MAX];
+	struct kernfs_node *kn;
+	struct lock_class_key *key = NULL;
+	int ret;
+
+	kn = __kernfs_create_file(parent_kn, rdtgroup_file_name(rft, name),
+				  rdtgroup_file_mode(rft), 0, rft->kf_ops, rft,
+				  NULL, key);
+	if (IS_ERR(kn))
+		return PTR_ERR(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret) {
+		kernfs_remove(kn);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void rdtgroup_rm_file(struct kernfs_node *kn, const struct rftype *rft)
+{
+	char name[RDTGROUP_FILE_NAME_MAX];
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	kernfs_remove_by_name(kn, rdtgroup_file_name(rft, name));
+}
+
+static int rdtgroup_addrm_files(struct kernfs_node *kn, struct rftype rfts[],
+			      bool is_add)
+{
+	struct rftype *rft, *rft_end = NULL;
+	int ret;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+restart:
+	for (rft = rfts; rft != rft_end && rft->name[0] != '\0'; rft++) {
+		if (is_add) {
+			ret = rdtgroup_add_file(kn, rft);
+			if (ret) {
+				pr_warn("%s: failed to add %s, err=%d\n",
+					__func__, rft->name, ret);
+				rft_end = rft;
+				is_add = false;
+				goto restart;
+			}
+		} else {
+			rdtgroup_rm_file(kn, rft);
+		}
+	}
+	return 0;
+}
+
+static enum resource_type get_kn_res_type(struct kernfs_node *kn)
+{
+	return RESOURCE_L3;
+}
+
+static int rdt_max_closid_show(struct seq_file *seq, void *v)
+{
+	struct kernfs_open_file *of = seq->private;
+	enum resource_type res_type;
+
+	res_type = get_kn_res_type(of->kn);
+
+	switch (res_type) {
+	case RESOURCE_L3:
+		seq_printf(seq, "%d\n",
+			boot_cpu_data.x86_l3_max_closid);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int rdt_max_cbm_len_show(struct seq_file *seq, void *v)
+{
+	struct kernfs_open_file *of = seq->private;
+	enum resource_type res_type;
+
+	res_type = get_kn_res_type(of->kn);
+	switch (res_type) {
+	case RESOURCE_L3:
+		seq_printf(seq, "%d\n",
+			boot_cpu_data.x86_l3_max_cbm_len);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+static int get_shared_domain(int domain, int level)
+{
+	int sd;
+
+	for_each_cache_domain(sd, 0, shared_domain_num) {
+		if (cat_l3_enabled && level == CACHE_LEVEL3) {
+			if (shared_domain[sd].l3_domain == domain)
+				return sd;
+		}
+	}
+
+	return -1;
+}
+
+static int rdtgroup_populate_dir(struct kernfs_node *kn)
+{
+	struct rftype *rfts;
+
+	rfts = rdtgroup_root_base_files;
+	return rdtgroup_addrm_files(kn, rfts, true);
+}
+
+static struct rftype rdtgroup_partition_base_files[];
+static int rdtgroup_partition_populate_dir(struct kernfs_node *kn)
+{
+	struct rftype *rfts;
+
+	rfts = rdtgroup_partition_base_files;
+
+	return rdtgroup_addrm_files(kn, rfts, true);
+}
+
+static int rdtgroup_procs_write_permission(struct task_struct *task,
+					   struct kernfs_open_file *of)
+{
+	const struct cred *cred = current_cred();
+	const struct cred *tcred = get_task_cred(task);
+	int ret = 0;
+
+	/*
+	 * even if we're attaching all tasks in the thread group, we only
+	 * need to check permissions on one of them.
+	 */
+	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
+	    !uid_eq(cred->euid, tcred->uid) &&
+	    !uid_eq(cred->euid, tcred->suid))
+		ret = -EACCES;
+
+	put_cred(tcred);
+	return ret;
+}
+
+bool use_rdtgroup_tasks;
+
+static void init_rdtgroup_housekeeping(struct rdtgroup *rdtgrp)
+{
+	init_waitqueue_head(&rdtgrp->offline_waitq);
+	rdtgrp->pset.self = rdtgrp;
+	INIT_LIST_HEAD(&rdtgrp->pset.task_iters);
+}
+
+static LIST_HEAD(rdtgroup_lists);
+static void init_rdtgroup_root(struct rdtgroup_root *root)
+{
+	struct rdtgroup *rdtgrp = &root->rdtgrp;
+
+	INIT_LIST_HEAD(&root->root_list);
+	INIT_LIST_HEAD(&rdtgrp->rdtgroup_list);
+	list_add_tail(&rdtgrp->rdtgroup_list, &rdtgroup_lists);
+	atomic_set(&root->nr_rdtgrps, 1);
+	rdtgrp->root = root;
+	init_rdtgroup_housekeeping(rdtgrp);
+	idr_init(&root->rdtgroup_idr);
+}
+
+static DEFINE_IDR(rdtgroup_hierarchy_idr);
+static int rdtgroup_init_root_id(struct rdtgroup_root *root)
+{
+	int id;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	id = idr_alloc_cyclic(&rdtgroup_hierarchy_idr, root, 0, 0, GFP_KERNEL);
+	if (id < 0)
+		return id;
+
+	root->hierarchy_id = id;
+	return 0;
+}
+
+static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops;
+/* IDR wrappers which synchronize using rdtgroup_idr_lock */
+static int rdtgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
+			    gfp_t gfp_mask)
+{
+	int ret;
+
+	idr_preload(gfp_mask);
+	spin_lock_bh(&rdtgroup_idr_lock);
+	ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM);
+	spin_unlock_bh(&rdtgroup_idr_lock);
+	idr_preload_end();
+	return ret;
+}
+
+/* hierarchy ID allocation and mapping, protected by rdtgroup_mutex */
+static void rdtgroup_exit_root_id(struct rdtgroup_root *root)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (root->hierarchy_id) {
+		idr_remove(&rdtgroup_hierarchy_idr, root->hierarchy_id);
+		root->hierarchy_id = 0;
+	}
+}
+
+static struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp;
+
+	if (kernfs_type(kn) == KERNFS_DIR)
+		rdtgrp = kn->priv;
+	else
+		rdtgrp = kn->parent->priv;
+
+	kernfs_break_active_protection(kn);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	return rdtgrp;
+}
+
+static void rdtgroup_kn_unlock(struct kernfs_node *kn)
+{
+	mutex_unlock(&rdtgroup_mutex);
+
+	kernfs_unbreak_active_protection(kn);
+}
+
+static char *res_info_dir_name(enum resource_type res_type, char *name)
+{
+	switch (res_type) {
+	case RESOURCE_L3:
+		strncpy(name, "l3", RDTGROUP_FILE_NAME_MAX);
+		break;
+	default:
+		break;
+	}
+
+	return name;
+}
+
+static int rdtgroup_setup_root(struct rdtgroup_root *root,
+			       unsigned long ss_mask)
+{
+	int ret;
+
+	root_rdtgrp = &root->rdtgrp;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	ret = rdtgroup_idr_alloc(&root->rdtgroup_idr, root_rdtgrp,
+				 1, 2, GFP_KERNEL);
+	if (ret < 0)
+		goto out;
+
+	root_rdtgrp->id = ret;
+	root_rdtgrp->ancestor_ids[0] = ret;
+
+	ret = rdtgroup_init_root_id(root);
+	if (ret)
+		goto cancel_ref;
+
+	root->kf_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
+					   KERNFS_ROOT_CREATE_DEACTIVATED,
+					   root_rdtgrp);
+	if (IS_ERR(root->kf_root)) {
+		ret = PTR_ERR(root->kf_root);
+		goto exit_root_id;
+	}
+	root_rdtgrp->kn = root->kf_root->kn;
+
+	ret = rdtgroup_populate_dir(root->kf_root->kn);
+	if (ret)
+		goto destroy_root;
+
+	/*
+	 * Link the root rdtgroup in this hierarchy into all the css_set
+	 * objects.
+	 */
+	WARN_ON(atomic_read(&root->nr_rdtgrps) != 1);
+
+	kernfs_activate(root_rdtgrp->kn);
+	ret = 0;
+	goto out;
+
+destroy_root:
+	kernfs_destroy_root(root->kf_root);
+	root->kf_root = NULL;
+exit_root_id:
+	rdtgroup_exit_root_id(root);
+cancel_ref:
+out:
+	return ret;
+}
+
+#define cache_leaves(cpu)       (get_cpu_cacheinfo(cpu)->num_leaves)
+
+struct cache_domain cache_domains[MAX_CACHE_LEAVES];
+
+static int get_shared_cache_id(int cpu, int level)
+{
+	struct cpuinfo_x86 *c;
+	int index_msb;
+	struct cpu_cacheinfo *this_cpu_ci;
+	struct cacheinfo *this_leaf;
+
+	this_cpu_ci = get_cpu_cacheinfo(cpu);
+
+	this_leaf = this_cpu_ci->info_list + level_to_leaf(level);
+	return this_leaf->id;
+	return c->apicid >> index_msb;
+}
+
+static __init void init_cache_domains(void)
+{
+	int cpu, domain;
+	struct cpu_cacheinfo *this_cpu_ci;
+	struct cacheinfo *this_leaf;
+	int leaves;
+	char buf[MAX_CPUMASK_CHAR_IN_HEX + 1];
+	unsigned int level;
+
+	for (leaves = 0; leaves < cache_leaves(0); leaves++) {
+		for_each_online_cpu(cpu) {
+			struct cpumask *mask;
+
+			this_cpu_ci = get_cpu_cacheinfo(cpu);
+			this_leaf = this_cpu_ci->info_list + leaves;
+			cache_domains[leaves].level = this_leaf->level;
+			mask = &this_leaf->shared_cpu_map;
+			cpumap_print_to_pagebuf(false, buf, mask);
+			for (domain = 0; domain < MAX_CACHE_DOMAINS; domain++) {
+				if (cpumask_test_cpu(cpu,
+				&cache_domains[leaves].shared_cpu_map[domain]))
+					break;
+			}
+			if (domain == MAX_CACHE_DOMAINS) {
+				domain =
+				  cache_domains[leaves].max_cache_domains_num++;
+
+				cache_domains[leaves].shared_cpu_map[domain] =
+					*mask;
+
+				level = cache_domains[leaves].level;
+				cache_domains[leaves].shared_cache_id[domain] =
+					get_shared_cache_id(cpu, level);
+			}
+		}
+	}
+}
+
+static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
+				  char *buf, size_t nbytes, loff_t off);
+
+DEFINE_SPINLOCK(rdtgroup_task_lock);
+
+void rdtgroup_exit(struct task_struct *tsk)
+{
+
+	spin_lock_irq(&rdtgroup_task_lock);
+	if (!list_empty(&tsk->rg_list)) {
+		struct rdtgroup *rdtgrp = tsk->rdtgroup;
+
+		list_del_init(&tsk->rg_list);
+		tsk->rdtgroup = NULL;
+		atomic_dec(&rdtgrp->pset.refcount);
+	}
+	spin_unlock_irq(&rdtgroup_task_lock);
+}
+
+static struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
+static void rdtgroup_kn_unlock(struct kernfs_node *kn);
+
+static struct rftype rdtgroup_partition_base_files[] = {
+	{
+		.name = "tasks",
+		.seq_show = rdtgroup_pidlist_show,
+		.write = rdtgroup_tasks_write,
+	},
+	{
+		.name = "cpus",
+		.write = rdtgroup_cpus_write,
+		.seq_show = rdtgroup_cpus_show,
+	},
+	{
+		.name = "schemas",
+		.write = rdtgroup_schemas_write,
+		.seq_show = rdtgroup_schemas_show,
+	},
+	{ }	/* terminate */
+};
+
+/* rdtgroup core interface files */
+static struct rftype rdtgroup_root_base_files[] = {
+	{
+		.name = "tasks",
+		.seq_show = rdtgroup_pidlist_show,
+		.write = rdtgroup_tasks_write,
+	},
+	{
+		.name = "cpus",
+		.write = rdtgroup_cpus_write,
+		.seq_show = rdtgroup_cpus_show,
+	},
+	{
+		.name = "schemas",
+		.write = rdtgroup_schemas_write,
+		.seq_show = rdtgroup_schemas_show,
+	},
+	{ }	/* terminate */
+};
+
+static void *rdtgroup_idr_replace(struct idr *idr, void *ptr, int id)
+{
+	void *ret;
+
+	spin_lock_bh(&rdtgroup_idr_lock);
+	ret = idr_replace(idr, ptr, id);
+	spin_unlock_bh(&rdtgroup_idr_lock);
+	return ret;
+}
+
+static int rdtgroup_destroy_locked(struct rdtgroup *rdtgrp)
+	__releases(&rdtgroup_mutex) __acquires(&rdtgroup_mutex)
+{
+	int shared_domain;
+	int closid;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	/*
+	 * Only migration can raise populated from zero and we're already
+	 * holding rdtgroup_mutex.
+	 */
+	if (rdtgroup_is_populated(rdtgrp))
+		return -EBUSY;
+
+	/* free closid occupied by this rdtgroup. */
+	for_each_cache_domain(shared_domain, 0, shared_domain_num) {
+		closid = rdtgrp->resource.closid[shared_domain];
+		closid_put(closid, shared_domain);
+	}
+
+	list_del_init(&rdtgrp->rdtgroup_list);
+
+	/*
+	 * Remove @rdtgrp directory along with the base files.  @rdtgrp has an
+	 * extra ref on its kn.
+	 */
+	kernfs_remove(rdtgrp->kn);
+
+	return 0;
+}
+
+static void rdtgroup_idr_remove(struct idr *idr, int id)
+{
+	spin_lock_bh(&rdtgroup_idr_lock);
+	idr_remove(idr, id);
+	spin_unlock_bh(&rdtgroup_idr_lock);
+}
+
+static int
+rdtgroup_move_task_all(struct rdtgroup *src_rdtgrp, struct rdtgroup *dst_rdtgrp)
+{
+	struct list_head *tasks;
+
+	tasks = &src_rdtgrp->pset.tasks;
+	while (!list_empty(tasks)) {
+		struct task_struct *tsk;
+		struct list_head *pos;
+		pid_t pid;
+		int ret;
+
+		pos = tasks->next;
+		tsk = list_entry(pos, struct task_struct, rg_list);
+		pid = tsk->pid;
+		ret = rdtgroup_move_task(pid, dst_rdtgrp, false, NULL);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * Remove all of subdirectories under root.
+ */
+static int rmdir_all_sub(void)
+{
+	struct rdtgroup *rdtgrp;
+	int cpu;
+	int ret = 0;
+	struct list_head *l;
+
+	while (!list_is_last(&root_rdtgrp->rdtgroup_list, &rdtgroup_lists)) {
+		l = rdtgroup_lists.next;
+		if (l == &root_rdtgrp->rdtgroup_list)
+			l = l->next;
+
+		rdtgrp = list_entry(l, struct rdtgroup, rdtgroup_list);
+		if (rdtgrp == root_rdtgrp)
+			continue;
+
+		rdtgroup_move_task_all(rdtgrp, root_rdtgrp);
+
+		for_each_cpu(cpu, &rdtgrp->cpu_mask)
+			per_cpu(cpu_rdtgroup, cpu) = 0;
+
+		ret = rdtgroup_destroy_locked(rdtgrp);
+		if (ret)
+			goto out;
+	}
+
+out:
+	return ret;
+}
+
+/*
+ * The default hierarchy.
+ */
+struct rdtgroup_root rdtgrp_dfl_root;
+EXPORT_SYMBOL_GPL(rdtgrp_dfl_root);
+
+static int parse_rdtgroupfs_options(char *data)
+{
+	char *token, *o = data;
+	int nr_opts = 0;
+
+	while ((token = strsep(&o, ",")) != NULL) {
+		nr_opts++;
+
+		if (!*token)
+			return -EINVAL;
+		if (!strcmp(token, "cdp")) {
+			/* Enable CDP */
+			rdt_opts.cdp_enabled = true;
+			continue;
+		}
+		if (!strcmp(token, "verbose")) {
+			rdt_opts.verbose = true;
+			continue;
+		}
+	}
+
+	return 0;
+}
+
+static void release_root_closid(void)
+{
+	int domain;
+	int closid;
+
+	if (!root_rdtgrp->resource.valid)
+		return;
+
+	for_each_cache_domain(domain, 0, shared_domain_num) {
+		/* Put closid in root rdtgrp's domain if valid. */
+		closid = root_rdtgrp->resource.closid[domain];
+		closid_put(closid, domain);
+	}
+}
+
+static void setup_task_rg_lists(struct rdtgroup *rdtgrp, bool enable)
+{
+	struct task_struct *p, *g;
+
+	spin_lock_irq(&rdtgroup_task_lock);
+	if (enable)
+		INIT_LIST_HEAD(&rdtgrp->pset.tasks);
+	use_rdtgroup_tasks = enable;
+
+	/*
+	 * We need tasklist_lock because RCU is not safe against
+	 * while_each_thread(). Besides, a forking task that has passed
+	 * rdtgroup_post_fork() without seeing use_task_css_set_links = 1
+	 * is not guaranteed to have its child immediately visible in the
+	 * tasklist if we walk through it with RCU.
+	 */
+	read_lock(&tasklist_lock);
+	do_each_thread(g, p) {
+		WARN_ON_ONCE(enable ? !list_empty(&p->rg_list) :
+			     list_empty(&p->rg_list));
+
+		/*
+		 * We should check if the process is exiting, otherwise
+		 * it will race with rdtgroup_exit() in that the list
+		 * entry won't be deleted though the process has exited.
+		 * Do it while holding siglock so that we don't end up
+		 * racing against rdtgroup_exit().
+		 */
+		spin_lock_irq(&p->sighand->siglock);
+		if (!(p->flags & PF_EXITING)) {
+			if (enable) {
+				list_add_tail(&p->rg_list, &rdtgrp->pset.tasks);
+				p->rdtgroup = rdtgrp;
+				atomic_inc(&rdtgrp->pset.refcount);
+			} else {
+				list_del_init(&p->rg_list);
+				p->rdtgroup = NULL;
+				atomic_dec(&rdtgrp->pset.refcount);
+			}
+		}
+		spin_unlock_irq(&p->sighand->siglock);
+	} while_each_thread(g, p);
+	read_unlock(&tasklist_lock);
+	spin_unlock_irq(&rdtgroup_task_lock);
+}
+
+/*
+ * The default hierarchy always exists but is hidden until mounted for the
+ * first time.  This is for backward compatibility.
+ */
+static bool rdtgrp_dfl_root_visible;
+
+static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf,
+				 size_t nbytes, loff_t off)
+{
+	struct rftype *rft = of->kn->priv;
+
+	if (rft->write)
+		return rft->write(of, buf, nbytes, off);
+
+	return -EINVAL;
+}
+
+static void *rdtgroup_seqfile_start(struct seq_file *seq, loff_t *ppos)
+{
+	return seq_rft(seq)->seq_start(seq, ppos);
+}
+
+static void *rdtgroup_seqfile_next(struct seq_file *seq, void *v, loff_t *ppos)
+{
+	return seq_rft(seq)->seq_next(seq, v, ppos);
+}
+
+static void rdtgroup_seqfile_stop(struct seq_file *seq, void *v)
+{
+	seq_rft(seq)->seq_stop(seq, v);
+}
+
+static int rdtgroup_seqfile_show(struct seq_file *m, void *arg)
+{
+	struct rftype *rft = seq_rft(m);
+
+	if (rft->seq_show)
+		return rft->seq_show(m, arg);
+	return 0;
+}
+
+static struct kernfs_ops rdtgroup_kf_ops = {
+	.atomic_write_len	= PAGE_SIZE,
+	.write			= rdtgroup_file_write,
+	.seq_start		= rdtgroup_seqfile_start,
+	.seq_next		= rdtgroup_seqfile_next,
+	.seq_stop		= rdtgroup_seqfile_stop,
+	.seq_show		= rdtgroup_seqfile_show,
+};
+
+static struct kernfs_ops rdtgroup_kf_single_ops = {
+	.atomic_write_len	= PAGE_SIZE,
+	.write			= rdtgroup_file_write,
+	.seq_show		= rdtgroup_seqfile_show,
+};
+
+static void rdtgroup_exit_rftypes(struct rftype *rfts)
+{
+	struct rftype *rft;
+
+	for (rft = rfts; rft->name[0] != '\0'; rft++) {
+		/* free copy for custom atomic_write_len, see init_cftypes() */
+		if (rft->max_write_len && rft->max_write_len != PAGE_SIZE)
+			kfree(rft->kf_ops);
+		rft->kf_ops = NULL;
+
+		/* revert flags set by rdtgroup core while adding @cfts */
+		rft->flags &= ~(__RFTYPE_ONLY_ON_DFL | __RFTYPE_NOT_ON_DFL);
+	}
+}
+
+static int rdtgroup_init_rftypes(struct rftype *rfts)
+{
+	struct rftype *rft;
+
+	for (rft = rfts; rft->name[0] != '\0'; rft++) {
+		struct kernfs_ops *kf_ops;
+
+		if (rft->seq_start)
+			kf_ops = &rdtgroup_kf_ops;
+		else
+			kf_ops = &rdtgroup_kf_single_ops;
+
+		/*
+		 * Ugh... if @cft wants a custom max_write_len, we need to
+		 * make a copy of kf_ops to set its atomic_write_len.
+		 */
+		if (rft->max_write_len && rft->max_write_len != PAGE_SIZE) {
+			kf_ops = kmemdup(kf_ops, sizeof(*kf_ops), GFP_KERNEL);
+			if (!kf_ops) {
+				rdtgroup_exit_rftypes(rfts);
+				return -ENOMEM;
+			}
+			kf_ops->atomic_write_len = rft->max_write_len;
+		}
+
+		rft->kf_ops = kf_ops;
+	}
+
+	return 0;
+}
+
+static struct list_head rdtgroups;
+
+struct rdtgroup_root rdtgrp_dfl_root;
+/*
+ * rdtgroup_init - rdtgroup initialization
+ *
+ * Register rdtgroup filesystem, and initialize any subsystems that didn't
+ * request early init.
+ */
+int __init rdtgroup_init(void)
+{
+	WARN_ON(percpu_init_rwsem(&rdtgroup_threadgroup_rwsem));
+	WARN_ON(rdtgroup_init_rftypes(rdtgroup_root_base_files));
+
+	WARN_ON(rdtgroup_init_rftypes(res_info_files));
+	WARN_ON(rdtgroup_init_rftypes(info_files));
+
+	WARN_ON(rdtgroup_init_rftypes(rdtgroup_partition_base_files));
+	mutex_lock(&rdtgroup_mutex);
+
+	init_rdtgroup_root(&rdtgrp_dfl_root);
+	WARN_ON(rdtgroup_setup_root(&rdtgrp_dfl_root, 0));
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	WARN_ON(sysfs_create_mount_point(fs_kobj, "rscctrl"));
+	WARN_ON(register_filesystem(&rdt_fs_type));
+	init_cache_domains();
+
+	INIT_LIST_HEAD(&rdtgroups);
+
+	return 0;
+}
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 26/32] x86/intel_rdt_rdtgroup.c: Create info directory
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (24 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT Fenghua Yu
@ 2016-07-13  1:02 ` Fenghua Yu
  2016-07-13  1:03 ` [PATCH 27/32] x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands Fenghua Yu
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:02 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

During boot time, the "info" directory is set up under rscctrl root.
it contains one "info" file and one resource specific directory
if the resource is enabled.

If L3 is enabled, "l3" sub-directory is created under the "info"
directory. There are three l3 specific info files under it:
max_closid, max_cbm_len, and domain_to_cache_id.

The "info" directory is exposed to user after rscctrl is mounted.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 259 +++++++++++++++++++++++++++++++
 1 file changed, 259 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e1936d2..b2140a8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -238,6 +238,179 @@ static int get_shared_domain(int domain, int level)
 
 	return -1;
 }
+static void rdt_info_show_cat(struct seq_file *seq, int level)
+{
+	int domain;
+	int domain_num = get_domain_num(level);
+	int closid;
+	u64 cbm;
+	struct clos_cbm_table **cctable;
+	int maxid;
+	int shared_domain;
+	int cnt;
+
+	if (level == CACHE_LEVEL3)
+		cctable = l3_cctable;
+	else
+		return;
+
+	maxid = cconfig.max_closid;
+	for (domain = 0; domain < domain_num; domain++) {
+		seq_printf(seq, "domain %d:\n", domain);
+		shared_domain = get_shared_domain(domain, level);
+		for (closid = 0; closid < maxid; closid++) {
+			int dindex, iindex;
+
+			if (test_bit(closid,
+			(unsigned long *)cconfig.closmap[shared_domain])) {
+				dindex = get_dcbm_table_index(closid);
+				cbm = cctable[domain][dindex].cbm;
+				cnt = cctable[domain][dindex].clos_refcnt;
+				seq_printf(seq, "cbm[%d]=%lx, refcnt=%d\n",
+					 dindex, (unsigned long)cbm, cnt);
+				if (cdp_enabled) {
+					iindex = get_icbm_table_index(closid);
+					cbm = cctable[domain][iindex].cbm;
+					cnt =
+					   cctable[domain][iindex].clos_refcnt;
+					seq_printf(seq,
+						   "cbm[%d]=%lx, refcnt=%d\n",
+						   iindex, (unsigned long)cbm,
+						   cnt);
+				}
+			} else {
+				cbm = max_cbm(level);
+				cnt = 0;
+				dindex = get_dcbm_table_index(closid);
+				seq_printf(seq, "cbm[%d]=%lx, refcnt=%d\n",
+					 dindex, (unsigned long)cbm, cnt);
+				if (cdp_enabled) {
+					iindex = get_icbm_table_index(closid);
+					seq_printf(seq,
+						 "cbm[%d]=%lx, refcnt=%d\n",
+						 iindex, (unsigned long)cbm,
+						 cnt);
+				}
+			}
+		}
+	}
+}
+
+static void show_shared_domain(struct seq_file *seq)
+{
+	int domain;
+
+	seq_puts(seq, "Shared domains:\n");
+
+	for_each_cache_domain(domain, 0, shared_domain_num) {
+		struct shared_domain *sd;
+
+		sd = &shared_domain[domain];
+		seq_printf(seq, "domain[%d]:", domain);
+		if (cat_enabled(CACHE_LEVEL3))
+			seq_printf(seq, "l3_domain=%d ", sd->l3_domain);
+		seq_printf(seq, "cpumask=%*pb\n",
+			   cpumask_pr_args(&sd->cpumask));
+	}
+}
+
+static int rdt_info_show(struct seq_file *seq, void *v)
+{
+	show_shared_domain(seq);
+
+	if (cat_l3_enabled) {
+		if (rdt_opts.verbose)
+			rdt_info_show_cat(seq, CACHE_LEVEL3);
+	}
+
+	seq_puts(seq, "\n");
+
+	return 0;
+}
+
+static int res_type_to_level(enum resource_type res_type, int *level)
+{
+	int ret = 0;
+
+	switch (res_type) {
+	case RESOURCE_L3:
+		*level = CACHE_LEVEL3;
+		break;
+	case RESOURCE_NUM:
+		ret = -EINVAL;
+		break;
+	}
+
+	return ret;
+}
+
+static int domain_to_cache_id_show(struct seq_file *seq, void *v)
+{
+	struct kernfs_open_file *of = seq->private;
+	enum resource_type res_type;
+	int domain;
+	int leaf;
+	int level = 0;
+	int ret;
+
+	res_type = (enum resource_type)of->kn->parent->priv;
+
+	ret = res_type_to_level(res_type, &level);
+	if (ret)
+		return 0;
+
+	leaf =	get_cache_leaf(level, 0);
+
+	for (domain = 0; domain < get_domain_num(level); domain++) {
+		unsigned int cid;
+
+		cid = cache_domains[leaf].shared_cache_id[domain];
+		seq_printf(seq, "%d:%d\n", domain, cid);
+	}
+
+	return 0;
+}
+
+static struct rftype info_files[] = {
+	{
+		.name = "info",
+		.seq_show = rdt_info_show,
+	},
+	{ }	/* terminate */
+};
+
+/* rdtgroup information files for one cache resource. */
+static struct rftype res_info_files[] = {
+	{
+		.name = "max_closid",
+		.seq_show = rdt_max_closid_show,
+	},
+	{
+		.name = "max_cbm_len",
+		.seq_show = rdt_max_cbm_len_show,
+	},
+	{
+		.name = "domain_to_cache_id",
+		.seq_show = domain_to_cache_id_show,
+	},
+	{ }	/* terminate */
+};
+
+static int info_populate_dir(struct kernfs_node *kn)
+{
+	struct rftype *rfts;
+
+	rfts = info_files;
+	return rdtgroup_addrm_files(kn, rfts, true);
+}
+
+static int res_info_populate_dir(struct kernfs_node *kn)
+{
+	struct rftype *rfts;
+
+	rfts = res_info_files;
+	return rdtgroup_addrm_files(kn, rfts, true);
+}
 
 static int rdtgroup_populate_dir(struct kernfs_node *kn)
 {
@@ -377,6 +550,90 @@ static char *res_info_dir_name(enum resource_type res_type, char *name)
 	return name;
 }
 
+static int create_res_info(enum resource_type res_type,
+			   struct kernfs_node *parent_kn)
+{
+	struct kernfs_node *kn;
+	char name[RDTGROUP_FILE_NAME_MAX];
+	int ret;
+
+	res_info_dir_name(res_type, name);
+	kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, NULL);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out;
+	}
+
+	/*
+	 * This extra ref will be put in kernfs_remove() and guarantees
+	 * that @rdtgrp->kn is always accessible.
+	 */
+	kernfs_get(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	ret = res_info_populate_dir(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	ret = 0;
+	goto out;
+
+out_destroy:
+	kernfs_remove(kn);
+out:
+	return ret;
+
+}
+
+static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn,
+				    const char *name)
+{
+	struct kernfs_node *kn;
+	int ret;
+
+	if (parent_kn != root_rdtgrp->kn)
+		return -EPERM;
+
+	/* create the directory */
+	kn = kernfs_create_dir(parent_kn, "info", parent_kn->mode, root_rdtgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out;
+	}
+
+	ret = info_populate_dir(kn);
+	if (ret)
+		goto out_destroy;
+
+	if (cat_enabled(CACHE_LEVEL3))
+		create_res_info(RESOURCE_L3, kn);
+
+	/*
+	 * This extra ref will be put in kernfs_remove() and guarantees
+	 * that @rdtgrp->kn is always accessible.
+	 */
+	kernfs_get(kn);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	ret = 0;
+	goto out;
+
+out_destroy:
+	kernfs_remove(kn);
+out:
+	return ret;
+}
+
 static int rdtgroup_setup_root(struct rdtgroup_root *root,
 			       unsigned long ss_mask)
 {
@@ -411,6 +668,8 @@ static int rdtgroup_setup_root(struct rdtgroup_root *root,
 	if (ret)
 		goto destroy_root;
 
+	rdtgroup_create_info_dir(root->kf_root->kn, "info_dir");
+
 	/*
 	 * Link the root rdtgroup in this hierarchy into all the css_set
 	 * objects.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 27/32] x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (25 preceding siblings ...)
  2016-07-13  1:02 ` [PATCH 26/32] x86/intel_rdt_rdtgroup.c: Create info directory Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  2016-07-13  1:03 ` [PATCH 28/32] x86/intel_rdt_rdtgroup.c: Read and write cpus Fenghua Yu
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Four basic file system commands are implement for rscctrl.
mount, umount, mkdir, and rmdir.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 237 +++++++++++++++++++++++++++++++
 1 file changed, 237 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b2140a8..91ea3509 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -856,6 +856,139 @@ static void rdtgroup_idr_remove(struct idr *idr, int id)
 	spin_unlock_bh(&rdtgroup_idr_lock);
 }
 
+
+static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
+			umode_t mode)
+{
+	struct rdtgroup *parent, *rdtgrp;
+	struct rdtgroup_root *root;
+	struct kernfs_node *kn;
+	int level, ret;
+
+	if (parent_kn != root_rdtgrp->kn)
+		return -EPERM;
+
+	/* Do not accept '\n' to avoid unparsable situation.
+	 */
+	if (strchr(name, '\n'))
+		return -EINVAL;
+
+	parent = rdtgroup_kn_lock_live(parent_kn);
+	if (!parent)
+		return -ENODEV;
+	root = parent->root;
+	level = parent->level + 1;
+
+	/* allocate the rdtgroup and its ID, 0 is reserved for the root */
+	rdtgrp = kzalloc(sizeof(*rdtgrp) +
+			 sizeof(rdtgrp->ancestor_ids[0]) * (level + 1),
+			 GFP_KERNEL);
+	if (!rdtgrp) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	/*
+	 * Temporarily set the pointer to NULL, so idr_find() won't return
+	 * a half-baked rdtgroup.
+	 */
+	rdtgrp->id = rdtgroup_idr_alloc(&root->rdtgroup_idr, NULL, 2, 0,
+					GFP_KERNEL);
+	if (rdtgrp->id < 0) {
+		ret = -ENOMEM;
+		goto out_cancel_ref;
+	}
+
+	INIT_LIST_HEAD(&rdtgrp->pset.tasks);
+
+	init_rdtgroup_housekeeping(rdtgrp);
+	cpumask_clear(&rdtgrp->cpu_mask);
+
+	rdtgrp->root = root;
+	rdtgrp->level = level;
+
+	if (test_bit(RDTGRP_CPUSET_CLONE_CHILDREN, &parent->flags))
+		set_bit(RDTGRP_CPUSET_CLONE_CHILDREN, &rdtgrp->flags);
+
+	/* create the directory */
+	kn = kernfs_create_dir(parent->kn, name, mode, rdtgrp);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_free_id;
+	}
+	rdtgrp->kn = kn;
+
+	/*
+	 * This extra ref will be put in kernfs_remove() and guarantees
+	 * that @rdtgrp->kn is always accessible.
+	 */
+	kernfs_get(kn);
+
+	atomic_inc(&root->nr_rdtgrps);
+
+	/*
+	 * @rdtgrp is now fully operational.  If something fails after this
+	 * point, it'll be released via the normal destruction path.
+	 */
+	rdtgroup_idr_replace(&root->rdtgroup_idr, rdtgrp, rdtgrp->id);
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
+
+	ret = rdtgroup_partition_populate_dir(kn);
+	if (ret)
+		goto out_destroy;
+
+	kernfs_activate(kn);
+
+	list_add_tail(&rdtgrp->rdtgroup_list, &rdtgroup_lists);
+	/* Generate default schema for rdtgrp. */
+	ret = get_default_resources(rdtgrp);
+	if (ret)
+		goto out_destroy;
+
+	ret = 0;
+	goto out_unlock;
+
+out_free_id:
+	rdtgroup_idr_remove(&root->rdtgroup_idr, rdtgrp->id);
+out_cancel_ref:
+	kfree(rdtgrp);
+out_unlock:
+	rdtgroup_kn_unlock(parent_kn);
+	return ret;
+
+out_destroy:
+	rdtgroup_destroy_locked(rdtgrp);
+	goto out_unlock;
+}
+
+static int rdtgroup_rmdir(struct kernfs_node *kn)
+{
+	struct rdtgroup *rdtgrp;
+	int cpu;
+	int ret = 0;
+
+	rdtgrp = rdtgroup_kn_lock_live(kn);
+	if (!rdtgrp)
+		return -ENODEV;
+
+	if (!list_empty(&rdtgrp->pset.tasks)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	for_each_cpu(cpu, &rdtgrp->cpu_mask)
+		per_cpu(cpu_rdtgroup, cpu) = 0;
+
+	ret = rdtgroup_destroy_locked(rdtgrp);
+
+out:
+	rdtgroup_kn_unlock(kn);
+	return ret;
+}
+
 static int
 rdtgroup_move_task_all(struct rdtgroup *src_rdtgrp, struct rdtgroup *dst_rdtgrp)
 {
@@ -957,6 +1090,11 @@ static void release_root_closid(void)
 	}
 }
 
+static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
+	.mkdir			= rdtgroup_mkdir,
+	.rmdir			= rdtgroup_rmdir,
+};
+
 static void setup_task_rg_lists(struct rdtgroup *rdtgrp, bool enable)
 {
 	struct task_struct *p, *g;
@@ -1009,6 +1147,105 @@ static void setup_task_rg_lists(struct rdtgroup *rdtgrp, bool enable)
  */
 static bool rdtgrp_dfl_root_visible;
 
+bool rdtgroup_mounted;
+
+static struct dentry *rdt_mount(struct file_system_type *fs_type,
+			 int flags, const char *unused_dev_name,
+			 void *data)
+{
+	struct super_block *pinned_sb = NULL;
+	struct rdtgroup_root *root;
+	struct dentry *dentry;
+	int ret;
+	bool new_sb;
+
+	/*
+	 * The first time anyone tries to mount a rdtgroup, enable the list
+	 * linking tasks and fix up all existing tasks.
+	 */
+	if (rdtgroup_mounted)
+		return ERR_PTR(-EBUSY);
+
+	rdt_opts.cdp_enabled = false;
+	rdt_opts.verbose = false;
+	cdp_enabled = false;
+
+	ret = parse_rdtgroupfs_options(data);
+	if (ret)
+		goto out_mount;
+
+	if (rdt_opts.cdp_enabled) {
+		cdp_enabled = true;
+		cconfig.max_closid >>= cdp_enabled;
+		pr_info("CDP is enabled\n");
+	}
+
+	init_msrs(cdp_enabled);
+
+	rdtgrp_dfl_root_visible = true;
+	root = &rdtgrp_dfl_root;
+
+	ret = get_default_resources(&root->rdtgrp);
+	if (ret)
+		return ERR_PTR(-ENOSPC);
+
+out_mount:
+	dentry = kernfs_mount(fs_type, flags, root->kf_root,
+			      RDTGROUP_SUPER_MAGIC,
+			      &new_sb);
+	if (IS_ERR(dentry) || !new_sb)
+		goto out_unlock;
+
+	/*
+	 * If @pinned_sb, we're reusing an existing root and holding an
+	 * extra ref on its sb.  Mount is complete.  Put the extra ref.
+	 */
+	if (pinned_sb) {
+		WARN_ON(new_sb);
+		deactivate_super(pinned_sb);
+	}
+
+	setup_task_rg_lists(&root->rdtgrp, true);
+
+	cpumask_clear(&root->rdtgrp.cpu_mask);
+	rdtgroup_mounted = true;
+
+	return dentry;
+
+out_unlock:
+	return ERR_PTR(ret);
+}
+
+static void rdt_kill_sb(struct super_block *sb)
+{
+	int ret;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	ret = rmdir_all_sub();
+	if (ret)
+		goto out_unlock;
+
+	setup_task_rg_lists(root_rdtgrp, false);
+	release_root_closid();
+	root_rdtgrp->resource.valid = false;
+
+	/* Restore max_closid to original value. */
+	cconfig.max_closid <<= cdp_enabled;
+
+	kernfs_kill_sb(sb);
+	rdtgroup_mounted = false;
+out_unlock:
+
+	mutex_unlock(&rdtgroup_mutex);
+}
+
+static struct file_system_type rdt_fs_type = {
+	.name = "rscctrl",
+	.mount = rdt_mount,
+	.kill_sb = rdt_kill_sb,
+};
+
 static ssize_t rdtgroup_file_write(struct kernfs_open_file *of, char *buf,
 				 size_t nbytes, loff_t off)
 {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 28/32] x86/intel_rdt_rdtgroup.c: Read and write cpus
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (26 preceding siblings ...)
  2016-07-13  1:03 ` [PATCH 27/32] x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  2016-07-13  1:03 ` [PATCH 29/32] x86/intel_rdt_rdtgroup.c: Tasks iterator and write Fenghua Yu
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Normally each task is associated with one rdtgroup and we use the schema
for that rdtgroup whenever the task is running. The user can designate
some cpus to always use the same schema, regardless of which task is
running. To do that the user write a cpumask bit string to the "cpus"
file.

A cpu can only be listed in one rdtgroup. If the user specifies a cpu
that is currently assigned to a different rdtgroup, it is removed
from that rdtgroup.

See Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 54 ++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 91ea3509..b5f42f5 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -767,6 +767,60 @@ void rdtgroup_exit(struct task_struct *tsk)
 
 static struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 static void rdtgroup_kn_unlock(struct kernfs_node *kn);
+static int rdtgroup_cpus_show(struct seq_file *s, void *v)
+{
+	struct kernfs_open_file *of = s->private;
+	struct rdtgroup *rdtgrp;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	seq_printf(s, "%*pb\n", cpumask_pr_args(&rdtgrp->cpu_mask));
+	rdtgroup_kn_unlock(of->kn);
+
+	return 0;
+}
+
+static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
+			char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp;
+	unsigned long bitmap[BITS_TO_LONGS(NR_CPUS)];
+	struct cpumask *cpumask;
+	int cpu;
+	struct list_head *l;
+	struct rdtgroup *r;
+
+	if (!buf)
+		return -EINVAL;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp)
+		return -ENODEV;
+
+	if (list_empty(&rdtgroup_lists))
+		goto end;
+
+	__bitmap_parse(buf, strlen(buf), 0, bitmap, nr_cpu_ids);
+
+	cpumask = to_cpumask(bitmap);
+
+	list_for_each(l, &rdtgroup_lists) {
+		r = list_entry(l, struct rdtgroup, rdtgroup_list);
+		if (r == rdtgrp)
+			continue;
+
+		for_each_cpu_and(cpu, &r->cpu_mask, cpumask)
+			cpumask_clear_cpu(cpu, &r->cpu_mask);
+	}
+
+	cpumask_copy(&rdtgrp->cpu_mask, cpumask);
+	for_each_cpu(cpu, cpumask)
+		per_cpu(cpu_rdtgroup, cpu) = rdtgrp;
+
+end:
+	rdtgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
 
 static struct rftype rdtgroup_partition_base_files[] = {
 	{
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 29/32] x86/intel_rdt_rdtgroup.c: Tasks iterator and write
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (27 preceding siblings ...)
  2016-07-13  1:03 ` [PATCH 28/32] x86/intel_rdt_rdtgroup.c: Read and write cpus Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  2016-07-13  1:03 ` [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface Fenghua Yu
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

"tasks" file in rdtgroup contains task pids. User can move a task pid
to one directory. A task can only stay in one directory at the same
time.

Each rdtgroup contains a rg_list. When a pid is written to this
rdtgroup's tasks, the task's rg_list is added in the rdtgroup's
linked list and deleted from its previous rdtgroup's linked list.

When user reads the "tasks" file, all pids are shown in the order
from small to large.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 138 +++++++++++++++++++++++++++++--
 1 file changed, 132 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b5f42f5..e6e8757 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -76,8 +76,6 @@ static DEFINE_SPINLOCK(rdtgroup_idr_lock);
 
 struct percpu_rw_semaphore rdtgroup_threadgroup_rwsem;
 
-#define MAX_CPUMASK_CHAR_IN_HEX	(NR_CPUS/4)
-
 static struct rftype rdtgroup_root_base_files[];
 
 #define RDTGROUP_FILE_NAME_MAX		(MAX_RDTGROUP_TYPE_NAMELEN +	\
@@ -714,7 +712,6 @@ static __init void init_cache_domains(void)
 	struct cpu_cacheinfo *this_cpu_ci;
 	struct cacheinfo *this_leaf;
 	int leaves;
-	char buf[MAX_CPUMASK_CHAR_IN_HEX + 1];
 	unsigned int level;
 
 	for (leaves = 0; leaves < cache_leaves(0); leaves++) {
@@ -725,7 +722,6 @@ static __init void init_cache_domains(void)
 			this_leaf = this_cpu_ci->info_list + leaves;
 			cache_domains[leaves].level = this_leaf->level;
 			mask = &this_leaf->shared_cpu_map;
-			cpumap_print_to_pagebuf(false, buf, mask);
 			for (domain = 0; domain < MAX_CACHE_DOMAINS; domain++) {
 				if (cpumask_test_cpu(cpu,
 				&cache_domains[leaves].shared_cpu_map[domain]))
@@ -753,7 +749,6 @@ DEFINE_SPINLOCK(rdtgroup_task_lock);
 
 void rdtgroup_exit(struct task_struct *tsk)
 {
-
 	spin_lock_irq(&rdtgroup_task_lock);
 	if (!list_empty(&tsk->rg_list)) {
 		struct rdtgroup *rdtgrp = tsk->rdtgroup;
@@ -822,6 +817,29 @@ end:
 	return nbytes;
 }
 
+static void show_rdt_tasks(struct list_head *tasks, struct seq_file *s)
+{
+	struct list_head *pos;
+
+	list_for_each(pos, tasks) {
+		struct task_struct *tsk;
+
+		tsk = list_entry(pos, struct task_struct, rg_list);
+		seq_printf(s, "%d\n", tsk->pid);
+	}
+}
+
+static int rdtgroup_pidlist_show(struct seq_file *s, void *v)
+{
+	struct kernfs_open_file *of = s->private;
+	struct rdtgroup *rdtgrp;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	show_rdt_tasks(&rdtgrp->pset.tasks, s);
+	rdtgroup_kn_unlock(of->kn);
+	return 0;
+}
+
 static struct rftype rdtgroup_partition_base_files[] = {
 	{
 		.name = "tasks",
@@ -861,6 +879,114 @@ static struct rftype rdtgroup_root_base_files[] = {
 	{ }	/* terminate */
 };
 
+/*
+ * Insert task into rdtgrp's tasks with pid in order.
+ */
+static int add_task_in_order(struct task_struct *tsk, struct rdtgroup *rdtgrp)
+{
+	struct list_head *l;
+	struct list_head *start_task = &rdtgrp->pset.tasks;
+
+	if (list_empty(start_task)) {
+		list_add_tail(&tsk->rg_list, start_task);
+		return 0;
+	}
+
+	for (l = start_task->next; l != start_task; l = l->next) {
+		struct task_struct *t;
+
+		t = list_entry(l, struct task_struct, rg_list);
+		WARN_ON(t->pid == tsk->pid);
+		if (t->pid > tsk->pid)
+			break;
+	}
+
+	list_add_tail(&tsk->rg_list, l);
+
+	return 0;
+}
+
+int _rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp)
+{
+	spin_lock_irq(&rdtgroup_task_lock);
+	list_del_init(&tsk->rg_list);
+
+	add_task_in_order(tsk, rdtgrp);
+
+	tsk->rdtgroup = rdtgrp;
+	spin_unlock_irq(&rdtgroup_task_lock);
+	return 0;
+}
+
+static int rdtgroup_move_task(pid_t pid, struct rdtgroup *rdtgrp,
+			      bool threadgroup, struct kernfs_open_file *of)
+{
+	struct task_struct *tsk;
+	int ret;
+
+	percpu_down_write(&rdtgroup_threadgroup_rwsem);
+	rcu_read_lock();
+	if (pid) {
+		tsk = find_task_by_vpid(pid);
+		if (!tsk) {
+			ret = -ESRCH;
+			goto out_unlock_rcu;
+		}
+	} else {
+		tsk = current;
+	}
+
+	if (threadgroup)
+		tsk = tsk->group_leader;
+
+	get_task_struct(tsk);
+	rcu_read_unlock();
+
+	ret = rdtgroup_procs_write_permission(tsk, of);
+	if (!ret)
+		_rdtgroup_move_task(tsk, rdtgrp);
+
+	put_task_struct(tsk);
+	goto out_unlock_threadgroup;
+
+out_unlock_rcu:
+	rcu_read_unlock();
+out_unlock_threadgroup:
+	percpu_up_write(&rdtgroup_threadgroup_rwsem);
+	return ret;
+}
+
+ssize_t _rdtgroup_procs_write(struct rdtgroup *rdtgrp,
+			   struct kernfs_open_file *of, char *buf,
+			   size_t nbytes, loff_t off, bool threadgroup)
+{
+	pid_t pid;
+	int ret;
+
+	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
+		return -EINVAL;
+
+	ret = rdtgroup_move_task(pid, rdtgrp, threadgroup, of);
+
+	return ret ?: nbytes;
+}
+
+static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
+				  char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp;
+	int ret;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp)
+		return -ENODEV;
+
+	ret = _rdtgroup_procs_write(rdtgrp, of, buf, nbytes, off, false);
+
+	rdtgroup_kn_unlock(of->kn);
+	return ret;
+}
+
 static void *rdtgroup_idr_replace(struct idr *idr, void *ptr, int id)
 {
 	void *ret;
@@ -1180,7 +1306,7 @@ static void setup_task_rg_lists(struct rdtgroup *rdtgrp, bool enable)
 		spin_lock_irq(&p->sighand->siglock);
 		if (!(p->flags & PF_EXITING)) {
 			if (enable) {
-				list_add_tail(&p->rg_list, &rdtgrp->pset.tasks);
+				add_task_in_order(p, rdtgrp);
 				p->rdtgroup = rdtgrp;
 				atomic_inc(&rdtgrp->pset.refcount);
 			} else {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (28 preceding siblings ...)
  2016-07-13  1:03 ` [PATCH 29/32] x86/intel_rdt_rdtgroup.c: Tasks iterator and write Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  2016-07-14  0:41   ` David Carrillo-Cisneros
  2016-07-13  1:03 ` [PATCH 31/32] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
  2016-07-13  1:03 ` [PATCH 32/32] x86/Makefile: Build intel_rdt_rdtgroup.c Fenghua Yu
  31 siblings, 1 reply; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

There is one "schemas" file in each rdtgroup directory. User can input
schemas in the file to control how to allocate resources.

The input schemas first needs to pass validation. If there is no syntax
issue, kernel digests the input schemas and find CLOSID for each
domain for each resource.

A shared domain covers a few different resource domains which share
the same CLOSID. Kernel will find a CLOSID in each shared domain. If
an existing CLOSID and its CBMs match input schemas, the CLOSID is
shared by this rdtgroup. Otherwise, kernel tries to alloc a new
CLOSID for this rdtgroup. If a new CLOSID is available, update QoS MASK
MSRs. If no more CLOSID is available, kernel report ENODEV to user.

A shared domain is in preparation for multiple resources (like L2)
that will be added very soon.

User can read the schemas saved in the file.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 673 +++++++++++++++++++++++++++++++
 1 file changed, 673 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e6e8757..bb85995 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -817,6 +817,679 @@ end:
 	return nbytes;
 }
 
+static int get_res_type(char **res, enum resource_type *res_type)
+{
+	char *tok;
+
+	tok = strsep(res, ":");
+	if (tok == NULL)
+		return -EINVAL;
+
+	if (!strcmp(tok, "L3")) {
+		*res_type = RESOURCE_L3;
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int divide_resources(char *buf, char *resources[RESOURCE_NUM])
+{
+	char *tok;
+	unsigned int resource_num = 0;
+	int ret = 0;
+	char *res;
+	char *res_block;
+	size_t size;
+	enum resource_type res_type;
+
+	size = strlen(buf) + 1;
+	res = kzalloc(size, GFP_KERNEL);
+	if (!res) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	while ((tok = strsep(&buf, "\n")) != NULL) {
+		if (strlen(tok) == 0)
+			break;
+		if (resource_num++ >= 1) {
+			pr_info("More than one line of resource input!\n");
+			ret = -EINVAL;
+			goto out;
+		}
+		strcpy(res, tok);
+	}
+
+	res_block = res;
+	ret = get_res_type(&res_block, &res_type);
+	if (ret) {
+		pr_info("Unknown resource type!");
+		goto out;
+	}
+
+	if (res_type == RESOURCE_L3 && cat_enabled(CACHE_LEVEL3)) {
+		strcpy(resources[RESOURCE_L3], res_block);
+	} else {
+		pr_info("Invalid resource type!");
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	kfree(res);
+	return ret;
+}
+
+static bool cbm_validate(unsigned long var, int level)
+{
+	u32 maxcbmlen = max_cbm_len(level);
+	unsigned long first_bit, zero_bit;
+
+	if (bitmap_weight(&var, maxcbmlen) < min_bitmask_len)
+		return false;
+
+	if (var & ~max_cbm(level))
+		return false;
+
+	first_bit = find_first_bit(&var, maxcbmlen);
+	zero_bit = find_next_zero_bit(&var, maxcbmlen, first_bit);
+
+	if (find_next_bit(&var, maxcbmlen, zero_bit) < maxcbmlen)
+		return false;
+
+	return true;
+}
+
+static int get_input_cbm(char *tok, struct cache_resource *l,
+			 int input_domain_num, int level)
+{
+	int ret;
+
+	if (!cdp_enabled) {
+		if (tok == NULL)
+			return -EINVAL;
+
+		ret = kstrtoul(tok, 16,
+			       (unsigned long *)&l->cbm[input_domain_num]);
+		if (ret)
+			return ret;
+
+		if (!cbm_validate(l->cbm[input_domain_num], level))
+			return -EINVAL;
+	} else  {
+		char *input_cbm1_str;
+
+		input_cbm1_str = strsep(&tok, ",");
+		if (input_cbm1_str == NULL || tok == NULL)
+			return -EINVAL;
+
+		ret = kstrtoul(input_cbm1_str, 16,
+			       (unsigned long *)&l->cbm[input_domain_num]);
+		if (ret)
+			return ret;
+
+		if (!cbm_validate(l->cbm[input_domain_num], level))
+			return -EINVAL;
+
+		ret = kstrtoul(tok, 16,
+			       (unsigned long *)&l->cbm2[input_domain_num]);
+		if (ret)
+			return ret;
+
+		if (!cbm_validate(l->cbm2[input_domain_num], level))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+
+static int get_cache_schema(char *buf, struct cache_resource *l, int level,
+			 struct rdtgroup *rdtgrp)
+{
+	char *tok, *tok_cache_id;
+	int ret;
+	int domain_num;
+	int input_domain_num;
+	int len;
+	unsigned int input_cache_id;
+	unsigned int cid;
+	unsigned int leaf;
+
+	if (!cat_enabled(level) && strcmp(buf, ";")) {
+		pr_info("Disabled resource should have empty schema\n");
+		return -EINVAL;
+	}
+
+	len = strlen(buf);
+	/*
+	 * Translate cache id based cbm from one line string with format
+	 * "<cache prefix>:<cache id0>=xxxx;<cache id1>=xxxx;..." for
+	 * disabled cdp.
+	 * Or
+	 * "<cache prefix>:<cache id0>=xxxxx,xxxxx;<cache id1>=xxxxx,xxxxx;..."
+	 * for enabled cdp.
+	 */
+	input_domain_num = 0;
+	while ((tok = strsep(&buf, ";")) != NULL) {
+		tok_cache_id = strsep(&tok, "=");
+		if (tok_cache_id == NULL)
+			goto cache_id_err;
+
+		ret = kstrtouint(tok_cache_id, 16, &input_cache_id);
+		if (ret)
+			goto cache_id_err;
+
+		leaf = level_to_leaf(level);
+		cid = cache_domains[leaf].shared_cache_id[input_domain_num];
+		if (input_cache_id != cid)
+			goto cache_id_err;
+
+		ret = get_input_cbm(tok, l, input_domain_num, level);
+		if (ret)
+			goto cbm_err;
+
+		input_domain_num++;
+		if (input_domain_num > get_domain_num(level)) {
+			pr_info("domain number is more than max %d\n",
+				MAX_CACHE_DOMAINS);
+			return -EINVAL;
+		}
+	}
+
+	domain_num = get_domain_num(level);
+	if (domain_num != input_domain_num) {
+		pr_info("%s input domain number %d doesn't match domain number %d\n",
+			"l3",
+			input_domain_num, domain_num);
+
+		return -EINVAL;
+	}
+
+	return 0;
+
+cache_id_err:
+	pr_info("Invalid cache id in field %d for L%1d\n", input_domain_num,
+		level);
+	return -EINVAL;
+
+cbm_err:
+	pr_info("Invalid cbm in field %d for cache L%d\n",
+		input_domain_num, level);
+	return -EINVAL;
+}
+
+struct resources {
+	struct cache_resource *l3;
+};
+
+static bool cbm_found(struct cache_resource *l, struct rdtgroup *r,
+		      int domain, int level)
+{
+	int closid;
+	int l3_domain;
+	u64 cctable_cbm;
+	u64 cbm;
+	int dindex;
+
+	closid = r->resource.closid[domain];
+
+	if (level == CACHE_LEVEL3) {
+		l3_domain = shared_domain[domain].l3_domain;
+		cbm = l->cbm[l3_domain];
+		dindex = get_dcbm_table_index(closid);
+		cctable_cbm = l3_cctable[l3_domain][dindex].cbm;
+		if (cdp_enabled) {
+			u64 icbm;
+			u64 cctable_icbm;
+			int iindex;
+
+			icbm = l->cbm2[l3_domain];
+			iindex = get_icbm_table_index(closid);
+			cctable_icbm = l3_cctable[l3_domain][iindex].cbm;
+
+			return cbm == cctable_cbm && icbm == cctable_icbm;
+		}
+
+		return cbm == cctable_cbm;
+	}
+
+	return false;
+}
+
+enum {
+	CURRENT_CLOSID,
+	REUSED_OWN_CLOSID,
+	REUSED_OTHER_CLOSID,
+	NEW_CLOSID,
+};
+
+/*
+ * Check if the reference counts are all ones in rdtgrp's domain.
+ */
+static bool one_refcnt(struct rdtgroup *rdtgrp, int domain)
+{
+	int refcnt;
+	int closid;
+
+	closid = rdtgrp->resource.closid[domain];
+	if (cat_l3_enabled) {
+		int l3_domain;
+		int dindex;
+
+		l3_domain = shared_domain[domain].l3_domain;
+		dindex = get_dcbm_table_index(closid);
+		refcnt = l3_cctable[l3_domain][dindex].clos_refcnt;
+		if (refcnt != 1)
+			return false;
+
+		if (cdp_enabled) {
+			int iindex;
+
+			iindex = get_icbm_table_index(closid);
+			refcnt = l3_cctable[l3_domain][iindex].clos_refcnt;
+
+			if (refcnt != 1)
+				return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Go through all shared domains. Check if there is an existing closid
+ * in all rdtgroups that matches l3 cbms in the shared
+ * domain. If find one, reuse the closid. Otherwise, allocate a new one.
+ */
+static int get_rdtgroup_resources(struct resources *resources_set,
+				  struct rdtgroup *rdtgrp)
+{
+	struct cache_resource *l3;
+	bool l3_cbm_found;
+	struct list_head *l;
+	struct rdtgroup *r;
+	u64 cbm;
+	int rdt_closid[MAX_CACHE_DOMAINS];
+	int rdt_closid_type[MAX_CACHE_DOMAINS];
+	int domain;
+	int closid;
+	int ret;
+
+	l3 = resources_set->l3;
+	memcpy(rdt_closid, rdtgrp->resource.closid,
+	       shared_domain_num * sizeof(int));
+	for (domain = 0; domain < shared_domain_num; domain++) {
+		if (rdtgrp->resource.valid) {
+			/*
+			 * If current rdtgrp is the only user of cbms in
+			 * this domain, will replace the cbms with the input
+			 * cbms and reuse its own closid.
+			 */
+			if (one_refcnt(rdtgrp, domain)) {
+				closid = rdtgrp->resource.closid[domain];
+				rdt_closid[domain] = closid;
+				rdt_closid_type[domain] = REUSED_OWN_CLOSID;
+				continue;
+			}
+
+			l3_cbm_found = true;
+
+			if (cat_l3_enabled)
+				l3_cbm_found = cbm_found(l3, rdtgrp, domain,
+							 CACHE_LEVEL3);
+
+			/*
+			 * If the cbms in this shared domain are already
+			 * existing in current rdtgrp, record the closid
+			 * and its type.
+			 */
+			if (l3_cbm_found) {
+				closid = rdtgrp->resource.closid[domain];
+				rdt_closid[domain] = closid;
+				rdt_closid_type[domain] = CURRENT_CLOSID;
+				continue;
+			}
+		}
+
+		/*
+		 * If the cbms are not found in this rdtgrp, search other
+		 * rdtgroups and see if there are matched cbms.
+		 */
+		l3_cbm_found = cat_l3_enabled ? false : true;
+		list_for_each(l, &rdtgroup_lists) {
+			r = list_entry(l, struct rdtgroup, rdtgroup_list);
+			if (r == rdtgrp || !r->resource.valid)
+				continue;
+
+			if (cat_l3_enabled)
+				l3_cbm_found = cbm_found(l3, r, domain,
+							 CACHE_LEVEL3);
+
+			if (l3_cbm_found) {
+				/* Get the closid that matches l3 cbms.*/
+				closid = r->resource.closid[domain];
+				rdt_closid[domain] = closid;
+				rdt_closid_type[domain] = REUSED_OTHER_CLOSID;
+				break;
+			}
+		}
+
+		if (!l3_cbm_found) {
+			/*
+			 * If no existing closid is found, allocate
+			 * a new one.
+			 */
+			ret = closid_alloc(&closid, domain);
+			if (ret)
+				goto err;
+			rdt_closid[domain] = closid;
+			rdt_closid_type[domain] = NEW_CLOSID;
+		}
+	}
+
+	/*
+	 * Now all closid are ready in rdt_closid. Update rdtgrp's closid.
+	 */
+	for_each_cache_domain(domain, 0, shared_domain_num) {
+		/*
+		 * Nothing is changed if the same closid and same cbms were
+		 * found in this rdtgrp's domain.
+		 */
+		if (rdt_closid_type[domain] == CURRENT_CLOSID)
+			continue;
+
+		/*
+		 * Put rdtgroup closid. No need to put the closid if we
+		 * just change cbms and keep the closid (REUSED_OWN_CLOSID).
+		 */
+		if (rdtgrp->resource.valid &&
+		    rdt_closid_type[domain] != REUSED_OWN_CLOSID) {
+			/* Put old closid in this rdtgrp's domain if valid. */
+			closid = rdtgrp->resource.closid[domain];
+			closid_put(closid, domain);
+		}
+
+		/*
+		 * Replace the closid in this rdtgrp's domain with saved
+		 * closid that was newly allocted (NEW_CLOSID), or found in
+		 * another rdtgroup's domains (REUSED_CLOSID), or found in
+		 * this rdtgrp (REUSED_OWN_CLOSID).
+		 */
+		closid = rdt_closid[domain];
+		rdtgrp->resource.closid[domain] = closid;
+
+		/*
+		 * Get the reused other rdtgroup's closid. No need to get the
+		 * closid newly allocated (NEW_CLOSID) because it's been
+		 * already got in closid_alloc(). And no need to get the closid
+		 * for resued own closid (REUSED_OWN_CLOSID).
+		 */
+		if (rdt_closid_type[domain] == REUSED_OTHER_CLOSID)
+			closid_get(closid, domain);
+
+		/*
+		 * If the closid comes from a newly allocated closid
+		 * (NEW_CLOSID), or found in this rdtgrp (REUSED_OWN_CLOSID),
+		 * cbms for this closid will be updated in MSRs.
+		 */
+		if (rdt_closid_type[domain] == NEW_CLOSID ||
+		    rdt_closid_type[domain] == REUSED_OWN_CLOSID) {
+			/*
+			 * Update cbm in cctable with the newly allocated
+			 * closid.
+			 */
+			if (cat_l3_enabled) {
+				int cpu;
+				struct cpumask *mask;
+				int dindex;
+				int l3_domain = shared_domain[domain].l3_domain;
+				int leaf = level_to_leaf(CACHE_LEVEL3);
+
+				cbm = l3->cbm[l3_domain];
+				dindex = get_dcbm_table_index(closid);
+				l3_cctable[l3_domain][dindex].cbm = cbm;
+				if (cdp_enabled) {
+					int iindex;
+
+					cbm = l3->cbm2[l3_domain];
+					iindex = get_icbm_table_index(closid);
+					l3_cctable[l3_domain][iindex].cbm = cbm;
+				}
+
+				mask =
+				&cache_domains[leaf].shared_cpu_map[l3_domain];
+
+				cpu = cpumask_first(mask);
+				smp_call_function_single(cpu, cbm_update_l3_msr,
+							 &closid, 1);
+			}
+		}
+	}
+
+	rdtgrp->resource.valid = true;
+
+	return 0;
+err:
+	/* Free previously allocated closid. */
+	for_each_cache_domain(domain, 0, shared_domain_num) {
+		if (rdt_closid_type[domain] != NEW_CLOSID)
+			continue;
+
+		closid_put(rdt_closid[domain], domain);
+
+	}
+
+	return ret;
+}
+
+static void init_cache_resource(struct cache_resource *l)
+{
+	l->cbm = NULL;
+	l->cbm2 = NULL;
+	l->closid = NULL;
+	l->refcnt = NULL;
+}
+
+static void free_cache_resource(struct cache_resource *l)
+{
+	kfree(l->cbm);
+	kfree(l->cbm2);
+	kfree(l->closid);
+	kfree(l->refcnt);
+}
+
+static int alloc_cache_resource(struct cache_resource *l, int level)
+{
+	int domain_num = get_domain_num(level);
+
+	l->cbm = kcalloc(domain_num, sizeof(*l->cbm), GFP_KERNEL);
+	l->cbm2 = kcalloc(domain_num, sizeof(*l->cbm2), GFP_KERNEL);
+	l->closid = kcalloc(domain_num, sizeof(*l->closid), GFP_KERNEL);
+	l->refcnt = kcalloc(domain_num, sizeof(*l->refcnt), GFP_KERNEL);
+	if (l->cbm && l->cbm2 && l->closid && l->refcnt)
+		return 0;
+
+	return -ENOMEM;
+}
+
+/*
+ * This function digests schemas given in text buf. If the schemas are in
+ * right format and there is enough closid, input the schemas in rdtgrp
+ * and update resource cctables.
+ *
+ * Inputs:
+ *	buf: string buffer containing schemas
+ *	rdtgrp: current rdtgroup holding schemas.
+ *
+ * Return:
+ *	0 on success or error code.
+ */
+static int get_resources(char *buf, struct rdtgroup *rdtgrp)
+{
+	char *resources[RESOURCE_NUM];
+	struct cache_resource l3;
+	struct resources resources_set;
+	int ret;
+	char *resources_block;
+	int i;
+	int size = strlen(buf) + 1;
+
+	resources_block = kcalloc(RESOURCE_NUM, size, GFP_KERNEL);
+	if (!resources_block)
+		return -ENOMEM;
+
+	for (i = 0; i < RESOURCE_NUM; i++)
+		resources[i] = (char *)(resources_block + i * size);
+
+	ret = divide_resources(buf, resources);
+	if (ret) {
+		kfree(resources_block);
+		return -EINVAL;
+	}
+
+	init_cache_resource(&l3);
+
+	if (cat_l3_enabled) {
+		ret = alloc_cache_resource(&l3, CACHE_LEVEL3);
+		if (ret)
+			goto out;
+
+		ret = get_cache_schema(resources[RESOURCE_L3], &l3,
+				       CACHE_LEVEL3, rdtgrp);
+		if (ret)
+			goto out;
+
+		resources_set.l3 = &l3;
+	} else
+		resources_set.l3 = NULL;
+
+	ret = get_rdtgroup_resources(&resources_set, rdtgrp);
+
+out:
+	kfree(resources_block);
+	free_cache_resource(&l3);
+
+	return ret;
+}
+
+static void gen_cache_prefix(char *buf, int level)
+{
+	sprintf(buf, "L%1d:", level == CACHE_LEVEL3 ? 3 : 2);
+}
+
+static int get_cache_id(int domain, int level)
+{
+	return cache_domains[level_to_leaf(level)].shared_cache_id[domain];
+}
+
+static void gen_cache_buf(char *buf, int level)
+{
+	int domain;
+	char buf1[1024];
+	int domain_num;
+	u64 val;
+
+	gen_cache_prefix(buf, level);
+
+	domain_num = get_domain_num(level);
+
+	val = max_cbm(level);
+
+	for (domain = 0; domain < domain_num; domain++) {
+		sprintf(buf1, "%d=%lx", get_cache_id(domain, level),
+			(unsigned long)val);
+		strcat(buf, buf1);
+		if (cdp_enabled) {
+			sprintf(buf1, ",%lx", (unsigned long)val);
+			strcat(buf, buf1);
+		}
+		if (domain < domain_num - 1)
+			sprintf(buf1, ";");
+		else
+			sprintf(buf1, "\n");
+		strcat(buf, buf1);
+	}
+}
+
+/*
+ * Set up schemas in root rdtgroup. All schemas in all resources are default
+ * values (all 1's) for all domains.
+ *
+ * Input: root rdtgroup.
+ * Return: 0: successful
+ *	   non-0: error code
+ */
+static int get_default_resources(struct rdtgroup *rdtgrp)
+{
+	char schema[1024];
+	int ret = 0;
+
+	strcpy(rdtgrp->schema, "");
+
+	if (cat_enabled(CACHE_LEVEL3)) {
+		gen_cache_buf(schema, CACHE_LEVEL3);
+
+		if (strlen(schema)) {
+			char buf[1024];
+
+			strcpy(buf, schema);
+			ret = get_resources(buf, rdtgrp);
+			if (ret)
+				return ret;
+		}
+		strcat(rdtgrp->schema, schema);
+	}
+
+	return ret;
+}
+
+static ssize_t rdtgroup_schemas_write(struct kernfs_open_file *of,
+			char *buf, size_t nbytes, loff_t off)
+{
+	int ret = 0;
+	struct rdtgroup *rdtgrp;
+	char *schema;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp)
+		return -ENODEV;
+
+	schema = kzalloc(sizeof(char) * strlen(buf) + 1, GFP_KERNEL);
+	if (!schema) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(schema, buf, strlen(buf) + 1);
+
+	ret = get_resources(buf, rdtgrp);
+	if (ret)
+		goto out;
+
+	memcpy(rdtgrp->schema, schema, strlen(schema) + 1);
+
+out:
+	kfree(schema);
+
+out_unlock:
+	rdtgroup_kn_unlock(of->kn);
+	return nbytes;
+}
+
+static int rdtgroup_schemas_show(struct seq_file *s, void *v)
+{
+	struct kernfs_open_file *of = s->private;
+	struct rdtgroup *rdtgrp;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	seq_printf(s, "%s", rdtgrp->schema);
+	rdtgroup_kn_unlock(of->kn);
+	return 0;
+}
+
 static void show_rdt_tasks(struct list_head *tasks, struct seq_file *s)
 {
 	struct list_head *pos;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 31/32] MAINTAINERS: Add maintainer for Intel RDT resource allocation
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (29 preceding siblings ...)
  2016-07-13  1:03 ` [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  2016-07-13  1:03 ` [PATCH 32/32] x86/Makefile: Build intel_rdt_rdtgroup.c Fenghua Yu
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

We create six new files for Intel RDT resource allocation:
arch/x86/kernel/cpu/intel_rdt.c
arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
arch/x86/include/asm/intel_rdt.h
arch/x86/include/asm/intel_rdt_rdtgroup.h
Documentation/x86/intel_rdt.txt
Documentation/x86/intel_rdt_ui.txt

Add maintainer in MAINTAINERS to maintain the files.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 MAINTAINERS | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1209323..6fbdad6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9554,6 +9554,14 @@ L:	linux-rdma@vger.kernel.org
 S:	Supported
 F:	drivers/infiniband/sw/rdmavt
 
+RDT - RESOURCE ALLOCATION
+M:	Fenghua Yu <fenghua.yu@intel.com>
+L:	linux-kernel@vger.kernel.org
+S:	Supported
+F:	arch/x86/kernel/cpu/intel_rdt*
+F:	arch/x86/include/asm/intel_rdt*
+F:	Documentation/x86/intel_rdt*
+
 READ-COPY UPDATE (RCU)
 M:	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
 M:	Josh Triplett <josh@joshtriplett.org>
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 32/32] x86/Makefile: Build intel_rdt_rdtgroup.c
  2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
                   ` (30 preceding siblings ...)
  2016-07-13  1:03 ` [PATCH 31/32] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
@ 2016-07-13  1:03 ` Fenghua Yu
  31 siblings, 0 replies; 81+ messages in thread
From: Fenghua Yu @ 2016-07-13  1:03 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya
  Cc: linux-kernel, x86, Fenghua Yu

From: Fenghua Yu <fenghua.yu@intel.com>

Build the user interface file intel_rdt_rdtgroup.c.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 39b8e6f..963c54a 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,6 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
+obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o
+
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
 obj-$(CONFIG_MICROCODE)			+= microcode/
@@ -42,8 +44,6 @@ obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr-watchdog.o
 
 obj-$(CONFIG_HYPERVISOR_GUEST)		+= vmware.o hypervisor.o mshyperv.o
 
-obj-$(CONFIG_INTEL_RDT)			+= intel_rdt.o
-
 ifdef CONFIG_X86_FEATURE_NAMES
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation
  2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
@ 2016-07-13  9:19   ` Thomas Gleixner
  2016-07-21 19:46     ` Shivappa Vikas
  2016-07-14  0:40   ` David Carrillo-Cisneros
  1 sibling, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13  9:19 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:
>  static int __init intel_rdt_late_init(void)
>  {
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
> @@ -261,9 +331,15 @@ static int __init intel_rdt_late_init(void)
>  		goto out_err;
>  	}
>  
> +	cpu_notifier_register_begin();
> +
>  	for_each_online_cpu(i)
>  		rdt_cpumask_update(i);
>  
> +	__hotcpu_notifier(intel_rdt_cpu_notifier, 0);

CPU hotplug notifiers are phased out. Please use the new state machine
interfaces.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/32] Define CONFIG_INTEL_RDT
  2016-07-13  1:02 ` [PATCH 08/32] Define CONFIG_INTEL_RDT Fenghua Yu
@ 2016-07-13 10:25   ` Thomas Gleixner
  2016-07-13 18:05     ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 10:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:

Subject: Define CONFIG_INTEL_RDT

That does not qualify as a proper patch subject

> From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> 
> CONFIG_INTEL_RDT is defined. 

That tells us what?

> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -24,8 +24,16 @@ struct clos_cbm_table {
>   * on scheduler hot path:
>   * - This will stay as no-op unless we are running on an Intel SKU
>   * which supports L3 cache allocation.
> + * - When support is present and enabled, does not do any
> + * IA32_PQR_MSR writes until the user starts really using the feature
> + * ie creates a rdtgroup directory and assigns a cache_mask thats
> + * different from the root rdtgroup's cache_mask.
>   * - Caches the per cpu CLOSid values and does the MSR write only
> - * when a task with a different CLOSid is scheduled in.
> + * when a task with a different CLOSid is scheduled in. That
> + * means the task belongs to a different rdtgroup.
> + * - Closids are allocated so that different rdtgroup directories
> + * with same cache_mask gets the same CLOSid. This minimizes CLOSids
> + * used and reduces MSR write frequency.

How is this and the following changes related to $subject ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
@ 2016-07-13 12:47   ` Thomas Gleixner
  2016-07-13 17:13     ` Luck, Tony
  2016-07-27 16:20   ` Nilay Vaish
  2016-08-03 22:15   ` Marcelo Tosatti
  2 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 12:47 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:
> +3. Hierarchy in rscctrl
> +=======================

What means rscctrl?

You were not able to find a more cryptic acronym?

> +
> +The initial hierarchy of the rscctrl file system is as follows after mount:
> +
> +/sys/fs/rscctrl/info/info
> +		    /<resource0>/<resource0 specific info files>
> +		    /<resource1>/<resource1 specific info files>
> +			....
> +	       /tasks
> +	       /cpus
> +	       /schemas
> +
> +There are a few files and sub-directories in the hierarchy.

Shouldn't that read:

The following files and sub-directories are available:

> +3.1. info
> +---------

Those sub points want to be indented so it's clear where they belong to.

> +
> +The read-only sub-directory "info" in root directory has RDT related
> +system info.
> +
> +The "info" file under the info sub-directory shows general info of the system.
> +It shows shared domain and the resources within this domain.
> +
> +Each resource has its own info sub-directory. User can read the information
> +for allocation. For example, l3 directory has max_closid, max_cbm_len,
> +domain_to_cache_id.

Can you please restructure this so it's more obvious what you want to explain.

    The "info" directory contains read-only system information:
  
    3.1.1 info

    The read-only file 'info' contains general information of the resource
    control facility:

    - Shared domains and the resources associated to those domains

    3.1.2 resources

    Each resource has its seperate sub directory, which contains resource
    specific information.

    3.1.2.1 L3 specific files

    - max_closid:		The maximum number of available closids
      				(explain closid ....)
    - max_cbm_len:     	       	....
    - domain_to_cache_id:      	....

So when you add L2 then you can add a proper description of the L2 related
files.

> +3.2. tasks
> +----------
> +
> +The file "tasks" has all task ids in the root directory initially.

This does not make sense.

   The tasks file contains all thread ids which are associated to the root
   resource partition. Initially all threads are associated to this.

   Threads can be moved to other tasks files in resource partitions. A thread
   can only be associated with a single resource partition.

> +thread ids in the file will be added or removed among sub-directories or
> +partitions. A task id only stays in one directory at the same time.

Is a task required to be associated to at least one 'tasks' file?

> +3.3. cpus
> +
> +The file "cpus" has a cpu mask that specifies the CPUs that are bound to the
> +schemas.

Please explain the concept of schemata (I prefer schemata as plural of schema,
but that's just my preference) before explaining what the cpumask in this file
means.

> +Any tasks scheduled on the cpus will use the schemas. User can set
> +both "cpus" and "tasks" to share the same schema in one directory. But when
> +a CPU is bound to a schema, a task running on the CPU uses this schema and
> +kernel will ignore scheam set up for the task in "tasks".

This does not make any sense. 

When a task is bound to a schema then this should have preference over the
schema which is associated to the CPU. The CPU association is meant for tasks
which are not bound to a particular partition/schema.

So the initial setup should be:

   - All CPUs are associated to the root resource partition

   - No thread is associated to a particular resource partition

When a thread is added to a 'tasks' file of a partition then this partition
takes preference. If it's removed, i.e. the association to a partition is
undone, then the CPU association is used.

I have no idea why you think that all threads should be in a tasks file by
default. Associating CPUs in the first place makes a lot more sense as it
represents the topology of the system nicely.

> +Initial value is all zeros which means there is no CPU bound to the schemas
> +in the root directory and tasks use the schemas.

As I said above this is backwards.

> +3.4. schemas
> +------------
> +
> +The file "schemas" has default allocation masks/values for all resources on
> +each socket/cpu. Format of the file "schemas" is in multiple lines and each
> +line represents masks or values for one resource.

You really want to explain that the 'tasks', 'cpus' and 'schemata' files are
available on all levels of the resource hierarchy. The special case of the
files in the root partition is, that there default values are set when the
facility is initialized.

> +Format of one resource schema line is as follows:
> +
> +<resource name>:<resource id0>=<schema>;<resource id1>=<schema>;...

> +As one example, CAT L3's schema format is:

That's crap. You want a proper sub point explaining the L3 schema format and
not 'one example'.

  3.4.1 L3 schema

  L3 resource ids are the L3 domains, which are currently per socket.

  The format for CBM only partitioning is:

      L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...

      <cbm> is the cache allocation bitmask in hex    

      Example:

	L3:0=ff;1=c0;

      	Explanation of example ....
 
  For CBM and CDP paritioning the format is:

      L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...

      Example:
         ....

> +If one resource is disabled, its line is not shown in schemas file.

That means:	  

     Resources which are not described in a schemata file are disabled for
     that particular partition.

Right?

Now that raises the question how this is supposed to work. Let's assume that
we have a partition 'foo' and thread X is in the tasks file of that
partition. The schema of that partition contains only an L2 entry. What's the
L3 association for thread X? Nothing at all?

> +The schema line can be expended for situations. L3 cbms format can be

You probably wanted to say extended, right?

> +4. Create and remove sub-directory
> +===================================

What is the meaning of a 'sub-directory'. I assume it's a resource
partition. So this chapter should be named so. The fact that the partition is
based on a directory is just an implementation detail.

> +User can create a sub-directory under the root directory by "mkdir" command.
> +User can remove the sub-directory by "rmdir" command.

User? Any user?

> +
> +Each sub-directory represents a resource allocation policy that user can
> +allocate resources for tasks or cpus.
> +
> +Each directory has three files "tasks", "cpus", and "schemas". The meaning
> +of each file is same as the files in the root directory.
> +
> +When a directory is created, initial contents of the files are:
> +
> +tasks: Empty. This means no task currently uses this allocation schemas.
> +cpus: All zeros. This means no CPU uses this allocation schemas.
> +schemas: All ones. This means all resources can be used in this allocation.

> +5. Add/remove a task in a partition
> +===================================
> +
> +User can add/remove a task by writing its PID in "tasks" in a partition.
> +User can read PIDs stored in one "tasks" file.
> +
> +One task PID only exists in one partition/directory at the same time. If PID
> +is written in a new directory, it's removed automatically from its last
> +directory.

Please use partition consistently. Aside of that this belongs to the
description of the 'tasks' file.

> +
> +6. Add/remove a CPU in a partition
> +==================================
> +
> +User can add/remove a CPU by writing its bit in "cpus" in a partition.
> +User can read CPUs stored in one "cpus" file.

Any (l)user ?

> +One CPU only exists in one partition/directory if user wants it to be bound
> +to any "schemas". Kernel guarantees uniqueness of the CPU in the whole
> +directory to make sure it only uses one schemas. If a CPU is written in one

   ^^^^^^^^^
You mean hierarchy here, right?

> +new directory, it's automatically removed from its original directory if it
> +exists in the original directory.

Please use partition not directory.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct
  2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
@ 2016-07-13 12:56   ` Thomas Gleixner
  2016-07-13 17:50     ` Yu, Fenghua
  2016-07-28  5:53   ` Nilay Vaish
  1 sibling, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 12:56 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> rg_list is linked list to connect to other tasks in a rdtgroup.

Can you please prefix that member proper, e.g. rdt_group_list
 
> The point of rdtgroup allows the task to access its own rdtgroup directly.

'pointer' perhaps? Also please use a proper prefix: rdt_group

A proper description might be:

  rdt_group points to the rdt group to which the task belongs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources
  2016-07-13  1:02 ` [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources Fenghua Yu
@ 2016-07-13 13:07   ` Thomas Gleixner
  2016-07-13 17:40     ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 13:07 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:

> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> QoS mask MSRs array is per cache. We need to allocate CLOSID per cache
> instead global CLOSID.
> 
> A few different resources can share same QoS mask MSRs array. For
> example, one L2 cache can share QoS MSRs with its next level
> L3 cache. A domain number represents the L2 cache, the L3 cache, the L2
> cache's shared cpumask, and the L3 cache's shared cpumask.
> 
> cctable is extended to be index by domain number so that each cache
> has its own control table.
> 
> shared_domain is introduced to cover multiple resources sharing
> CLOSID.

This patch does a dozen different things at once. Can you please split the
cleanup parts, the parts where statics are removed and the actual changes in
the representation model apart so this can be reviewed?

And while at this, please fold back thes cleanups into the original
patches. There is no point to have these changes seperate.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 24/32] Task fork and exit for rdtgroup
  2016-07-13  1:02 ` [PATCH 24/32] Task fork and exit for rdtgroup Fenghua Yu
@ 2016-07-13 13:14   ` Thomas Gleixner
  2016-07-13 17:32     ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 13:14 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:
> +void rdtgroup_fork(struct task_struct *child)
> +{
> +	INIT_LIST_HEAD(&child->rg_list);
> +	child->rdtgroup = NULL;
> +}
> +
> +void rdtgroup_post_fork(struct task_struct *child)
> +{
> +	if (!use_rdtgroup_tasks)
> +		return;
> +
> +	spin_lock_irq(&rdtgroup_task_lock);
> +	if (list_empty(&child->rg_list)) {

Why would the list be non empty after a fork?

> +		struct rdtgroup *rdtgrp = current->rdtgroup;
> +
> +		list_add_tail(&child->rg_list, &rdtgrp->pset.tasks);
> +		child->rdtgroup = rdtgrp;
> +		atomic_inc(&rdtgrp->pset.refcount);
> +	}
> +	spin_unlock_irq(&rdtgroup_task_lock);
> +}
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 9e6e135..04346b6 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -59,6 +59,7 @@
>  #include <asm/unistd.h>
>  #include <asm/pgtable.h>
>  #include <asm/mmu_context.h>
> +#include <asm/intel_rdt.h>
>  
>  static void __unhash_process(struct task_struct *p, bool group_dead)
>  {
> @@ -757,6 +758,7 @@ void do_exit(long code)
>  	perf_event_exit_task(tsk);
>  
>  	cgroup_exit(tsk);
> +	rdtgroup_exit(tsk);

-ENOSUCHFUNCTION

Please provide the implementations first and then hook it up not the other way
round.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13 12:47   ` Thomas Gleixner
@ 2016-07-13 17:13     ` Luck, Tony
  2016-07-14  6:53       ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Luck, Tony @ 2016-07-13 17:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Wed, Jul 13, 2016 at 02:47:30PM +0200, Thomas Gleixner wrote:
> On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > +3. Hierarchy in rscctrl
> > +=======================
> 
> What means rscctrl?
> 
> You were not able to find a more cryptic acronym?

rscctrl == resource control

Intel marketing would (probably) like us to use:

   /sys/fs/Intel(R) Resource Director Technology(TM)/

Happy to take suggestions for something in between those
extremes :-)

> > +Any tasks scheduled on the cpus will use the schemas. User can set
> > +both "cpus" and "tasks" to share the same schema in one directory. But when
> > +a CPU is bound to a schema, a task running on the CPU uses this schema and
> > +kernel will ignore scheam set up for the task in "tasks".
> 
> This does not make any sense. 
> 
> When a task is bound to a schema then this should have preference over the
> schema which is associated to the CPU. The CPU association is meant for tasks
> which are not bound to a particular partition/schema.
> 
> So the initial setup should be:
> 
>    - All CPUs are associated to the root resource partition
> 
>    - No thread is associated to a particular resource partition
> 
> When a thread is added to a 'tasks' file of a partition then this partition
> takes preference. If it's removed, i.e. the association to a partition is
> undone, then the CPU association is used.
> 
> I have no idea why you think that all threads should be in a tasks file by
> default. Associating CPUs in the first place makes a lot more sense as it
> represents the topology of the system nicely.

If we did it that way, it would be harder to change the default
resources.  E.g. now we start with all processes in the root
rdtgroup.  We can change the schema for the root group and restrict
them to, say, 60% of L3 cache on one (or all) sockets - giving us
40% of cache to give out to one or more groups.

So what we've implemented (and perhaps need to explain better here)
is that every thread always belongs to one (and only one) rdtgroup.
It will use the resources described in that group whereever it runs,
except in the case where we have designated some cpus as special snowflakes.
When a cpu is assigned to an rdtgroup the schema for the cpu has
precedence (i.e. we write the MSR with a CLOSID once, and then it
never changes).

Some of this is confusing because people will very likely also use
cpu affinity to control where their processes run. But affinity is
orthogonal to rdtgroup membership.

I think what we have allows you to so all the things we talked about.
But if we are missing a case, or if things can be simplified while
still retaining the same functionality then lets discuss that.
Otherwise we can revise the documentation to explain all this better.

> 
> > +Initial value is all zeros which means there is no CPU bound to the schemas
> > +in the root directory and tasks use the schemas.
> 
> As I said above this is backwards.

> > +If one resource is disabled, its line is not shown in schemas file.
> 
> That means:	  
> 
>      Resources which are not described in a schemata file are disabled for
>      that particular partition.
> 
> Right?
> 
> Now that raises the question how this is supposed to work. Let's assume that
> we have a partition 'foo' and thread X is in the tasks file of that
> partition. The schema of that partition contains only an L2 entry. What's the
> L3 association for thread X? Nothing at all?

Resources are either enabled or disabled globally. Each schema file
must provide details for every enabled resource. So if we are on a
processor that supports both L2 and L3, we will normally have schema
files that specify both.  We could boot with the "disable_cat_l2"
kernel command line option and then every schema file would just
specify L3 (and the MSRs for L2 would all be set to all-ones so that
everyone had full access to the L2 on each core).

> > +User can create a sub-directory under the root directory by "mkdir" command.
> > +User can remove the sub-directory by "rmdir" command.
> 
> User? Any user?

Well if someone did:
 # chmod 777 /sys/fs/rscctrl
then any user could make directories.  That would be inadvisable.
You could use 775 and let a trusted group have control so that you
didn't require root access to modify things.

Should we say "system administrator" rather than "user"?

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 24/32] Task fork and exit for rdtgroup
  2016-07-13 13:14   ` Thomas Gleixner
@ 2016-07-13 17:32     ` Yu, Fenghua
  2016-07-13 21:02       ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 17:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

On Wed, July 2016, Thomas Gleixner wrote
> On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > +void rdtgroup_post_fork(struct task_struct *child) {
> > +	if (!use_rdtgroup_tasks)
> > +		return;
> > +
> > +	spin_lock_irq(&rdtgroup_task_lock);
> > +	if (list_empty(&child->rg_list)) {
> 
> Why would the list be non empty after a fork?

In this situation for a pid:
1.rdtgroup_fork(): rg_list=null.
2.setup_task_rg_lists(): rg_list is setup
3.rdtgroup_fork(): rg_list is not empty

This situation happens only during rscctrl mount time. Before mount, post_fork()
returns from !use_rdtgroup_tasks and doesn't set up rg_list. After mount, rg_list()
is always empty in post_fork(). But we need to check rg_list for above situation.

Does that make sense? Any suggestion for better soluation?

> 
> > +		struct rdtgroup *rdtgrp = current->rdtgroup;
> > +
> > +		list_add_tail(&child->rg_list, &rdtgrp->pset.tasks);
> > +		child->rdtgroup = rdtgrp;
> > +		atomic_inc(&rdtgrp->pset.refcount);
> > +	}
> > +	spin_unlock_irq(&rdtgroup_task_lock);
> > +}
> > diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e135..04346b6
> > 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -59,6 +59,7 @@
> >  #include <asm/unistd.h>
> >  #include <asm/pgtable.h>
> >  #include <asm/mmu_context.h>
> > +#include <asm/intel_rdt.h>
> >
> >  static void __unhash_process(struct task_struct *p, bool group_dead)
> > { @@ -757,6 +758,7 @@ void do_exit(long code)
> >  	perf_event_exit_task(tsk);
> >
> >  	cgroup_exit(tsk);
> > +	rdtgroup_exit(tsk);
> 
> -ENOSUCHFUNCTION
> 
> Please provide the implementations first and then hook it up not the other
> way round.

Will do.
> 
> Thanks,
> 
> 	tglx

[-- Attachment #2: IMG_20160712_165549743.jpg --]
[-- Type: image/jpeg, Size: 211443 bytes --]

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources
  2016-07-13 13:07   ` Thomas Gleixner
@ 2016-07-13 17:40     ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 17:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 6:07 AM
> To: Yu, Fenghua <fenghua.yu@intel.com>
> Cc: Ingo Molnar <mingo@elte.hu>; Anvin, H Peter
> <h.peter.anvin@intel.com>; Luck, Tony <tony.luck@intel.com>; Tejun Heo
> <tj@kernel.org>; Borislav Petkov <bp@suse.de>; Stephane Eranian
> <eranian@google.com>; Peter Zijlstra <peterz@infradead.org>; Marcelo
> Tosatti <mtosatti@redhat.com>; David Carrillo-Cisneros
> <davidcc@google.com>; Shankar, Ravi V <ravi.v.shankar@intel.com>; Vikas
> Shivappa <vikas.shivappa@linux.intel.com>; Prakhya, Sai Praneeth
> <sai.praneeth.prakhya@intel.com>; linux-kernel <linux-
> kernel@vger.kernel.org>; x86 <x86@kernel.org>
> Subject: Re: [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per
> resources
> 
> On Tue, 12 Jul 2016, Fenghua Yu wrote:
> 
> > From: Fenghua Yu <fenghua.yu@intel.com>
> >
> > QoS mask MSRs array is per cache. We need to allocate CLOSID per cache
> > instead global CLOSID.
> >
> > A few different resources can share same QoS mask MSRs array. For
> > example, one L2 cache can share QoS MSRs with its next level
> > L3 cache. A domain number represents the L2 cache, the L3 cache, the
> > L2 cache's shared cpumask, and the L3 cache's shared cpumask.
> >
> > cctable is extended to be index by domain number so that each cache
> > has its own control table.
> >
> > shared_domain is introduced to cover multiple resources sharing
> > CLOSID.
> 
> This patch does a dozen different things at once. Can you please split the
> cleanup parts, the parts where statics are removed and the actual changes in
> the representation model apart so this can be reviewed?
> 
> And while at this, please fold back thes cleanups into the original patches.
> There is no point to have these changes seperate.

Sure. Will do this.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct
  2016-07-13 12:56   ` Thomas Gleixner
@ 2016-07-13 17:50     ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 17:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 5:56 AM
> On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > From: Fenghua Yu <fenghua.yu@intel.com>
> >
> > rg_list is linked list to connect to other tasks in a rdtgroup.
> 
> Can you please prefix that member proper, e.g. rdt_group_list

There is another similar name in task_struct, which is cg_list for cgroup list.
I just follow that name to have a rg_list for rdtgroup list.

If you think rdt_group_list is a better name, I sure will change rg_list
to rdt_group_list.

> 
> > The point of rdtgroup allows the task to access its own rdtgroup directly.
> 
> 'pointer' perhaps? Also please use a proper prefix: rdt_group
> 
> A proper description might be:
> 
>   rdt_group points to the rdt group to which the task belongs.

In the patch set, I use the name "rdtgroup" to represent a rdt group.

Should I change the name "rdtgroup" to "rdt_group" in all patches
Including descriptions and code?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 08/32] Define CONFIG_INTEL_RDT
  2016-07-13 10:25   ` Thomas Gleixner
@ 2016-07-13 18:05     ` Yu, Fenghua
  2016-07-13 21:09       ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 18:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 3:26 AM
> On Tue, 12 Jul 2016, Fenghua Yu wrote:
> 
> Subject: Define CONFIG_INTEL_RDT
> 
> That does not qualify as a proper patch subject
> 
> > From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> >
> > CONFIG_INTEL_RDT is defined.
> 
> That tells us what?
> 
> > --- a/arch/x86/include/asm/intel_rdt.h
> > +++ b/arch/x86/include/asm/intel_rdt.h
> > @@ -24,8 +24,16 @@ struct clos_cbm_table {
> >   * on scheduler hot path:
> >   * - This will stay as no-op unless we are running on an Intel SKU
> >   * which supports L3 cache allocation.
> > + * - When support is present and enabled, does not do any
> > + * IA32_PQR_MSR writes until the user starts really using the feature
> > + * ie creates a rdtgroup directory and assigns a cache_mask thats
> > + * different from the root rdtgroup's cache_mask.
> >   * - Caches the per cpu CLOSid values and does the MSR write only
> > - * when a task with a different CLOSid is scheduled in.
> > + * when a task with a different CLOSid is scheduled in. That
> > + * means the task belongs to a different rdtgroup.
> > + * - Closids are allocated so that different rdtgroup directories
> > + * with same cache_mask gets the same CLOSid. This minimizes CLOSids
> > + * used and reduces MSR write frequency.
> 
> How is this and the following changes related to $subject ?

No, this piece of code is not related to $subject.

Here is why this patch behaves like this:

This patch and actually first 12 patches are directly from last year's cgroup base CAT patch
series. The last year's patch series had gone 16 versions already. Because
the first 12 patches have been reviewed many times, we keep them untouched
(except removing cgroup code in patch 8 and some unused cdp code in patch 11)
and release other patches on top of the first 12 patches.

I fully agree this patch should be split if we want to have a good overall
patch series.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 24/32] Task fork and exit for rdtgroup
  2016-07-13 17:32     ` Yu, Fenghua
@ 2016-07-13 21:02       ` Thomas Gleixner
  2016-07-13 21:22         ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 21:02 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

On Wed, 13 Jul 2016, Yu, Fenghua wrote:
> On Wed, July 2016, Thomas Gleixner wrote
> > On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > > +void rdtgroup_post_fork(struct task_struct *child) {
> > > +	if (!use_rdtgroup_tasks)
> > > +		return;
> > > +
> > > +	spin_lock_irq(&rdtgroup_task_lock);
> > > +	if (list_empty(&child->rg_list)) {
> > 
> > Why would the list be non empty after a fork?
> 
> In this situation for a pid:
> 1.rdtgroup_fork(): rg_list=null.
> 2.setup_task_rg_lists(): rg_list is setup
> 3.rdtgroup_fork(): rg_list is not empty

Why would rdtgroup_fork() be called twice for a given thread?

> This situation happens only during rscctrl mount time. Before mount, post_fork()
> returns from !use_rdtgroup_tasks and doesn't set up rg_list. After mount, rg_list()
> is always empty in post_fork(). But we need to check rg_list for above situation.
> 
> Does that make sense?

No, that does not make any sense at all.

> Any suggestion for better soluation?

The problem you have is:

    fork
	list_init(rg_list);
	write_lock(tasklist_lock);
	
	  task becomes visible

	write_unlock(tasklist_lock);

	rdtgroup_post_fork();
	    if (!use_rdtgroup_tasks)
	       return;

	    spin_lock_irq(&rdtgroup_task_lock);
  	    list_add();
	    spin_unlock_irq(&rdtgroup_task_lock);

I have no idea why this lock must be taken with _irq, but thats another
story. Let's look at the mount side:

       spin_lock_irq(&rdtgroup_task_lock);
       read_lock(&tasklist_lock);
       
       do_each_thread(g, p) { 
       	   WARN_ON(More magic crap happening there)

	   spin_lock_irq(&p->sighand->siglock);
	   list_add();
	   spin_unlock_irq(&p->sighand->siglock);
		      ^^^^ 
Great: You undo the irq disable of (&rdtgroup_task_lock) above! Oh well....

       read_unlock(&tasklist_lock);
       spin_unlock_irq(&rdtgroup_task_lock);

So you need all this magic in rdtgroup_post_fork() and setup_task_rg_lists()
just because you blindly positioned rdtgroup_post_fork() at the point where
the cgroup_post_fork() stuff is. But you did not think a second about the
locking rules here otherwise they would be documented somewhere.

You need a read_lock(&tasklist_lock) for the mount part anyway. So why don't
you do the obvious:

    fork
	list_init(rg_list);
	write_lock(tasklist_lock);

	rdtgroup_post_fork();
	    if (use_rdtgroup_tasks)
	        spin_lock(&rdtgroup_task_lock);
	        list_add();
	        spin_unlock(&rdtgroup_task_lock);

        task becomes visible

	write_unlock(tasklist_lock);

And reorder the lock ordering in the mount path:

       read_lock(&tasklist_lock);
       spin_lock(&rdtgroup_task_lock);
	
Now using rdtgroup_task_lock to protect current->rdtgroup is horrible as
well. You need task->sighand->siglock in the mount path anyway to prevent exit
races. So you can simplify the whole magic to:

    fork
	list_init(rg_list);
	write_lock(tasklist_lock);

        spin_lock(&current->sighand->siglock);

	rdtgroup_post_fork();
	    if (use_rdtgroup_tasks)
	        list_add();

	spin_unlock(&current->sighand->siglock);
	write_unlock(tasklist_lock);

That removes an extra lock/unlock operation from the fork path because
current->sighand->siglock is taken inside of the tasklist_lock write locked
section already.

So you need protection for use_rdtgroup_task, which is a complete misnomer
btw. (rdtgroup_active would be too obvious, right?). That protection is simple
because you can set that flag with tasklist_lock read locked which you hold
anyway for iterating all threads in the mount path.

Aside of that you need to take tsk->sighand->siglock when you change
tsk->rdtgroup, but that's a no-brainer and it gives you the extra benefit that
you can protect such an operation against exit of the task that way by
checking PF_EXITING under the lock. I don't see any protection against exit in
your current implementation when a task is moved to a different partition.

Please sit down and describe the complete locking and protection scheme of
this stuff. I'm not going to figure this out from the obscure code another
time.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 08/32] Define CONFIG_INTEL_RDT
  2016-07-13 18:05     ` Yu, Fenghua
@ 2016-07-13 21:09       ` Thomas Gleixner
  2016-07-13 21:18         ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-13 21:09 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

On Wed, 13 Jul 2016, Yu, Fenghua wrote:
> Here is why this patch behaves like this:
> 
> This patch and actually first 12 patches are directly from last year's
> cgroup base CAT patch series. The last year's patch series had gone 16
> versions already. Because the first 12 patches have been reviewed many
> times, we keep them untouched (except removing cgroup code in patch 8 and
> some unused cdp code in patch 11) and release other patches on top of the
> first 12 patches.

Which is not making the review any simpler. In order to understand the
modifications I have to go back and page in the original stuff from last year
once again. So I have to read the original patch first to understand the
modifications and then get the overall picture of the new stuff. Please fold
stuff back to the proper places so I can start reviewing this thing under the
new design idea instead of twisting my brain around two designs.
 
> I fully agree this patch should be split if we want to have a good overall
> patch series.

Good.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 08/32] Define CONFIG_INTEL_RDT
  2016-07-13 21:09       ` Thomas Gleixner
@ 2016-07-13 21:18         ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 21:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 2:10 PM
> On Wed, 13 Jul 2016, Yu, Fenghua wrote:
> > Here is why this patch behaves like this:
> >
> > This patch and actually first 12 patches are directly from last year's
> > cgroup base CAT patch series. The last year's patch series had gone 16
> > versions already. Because the first 12 patches have been reviewed many
> > times, we keep them untouched (except removing cgroup code in patch 8
> > and some unused cdp code in patch 11) and release other patches on top
> > of the first 12 patches.
> 
> Which is not making the review any simpler. In order to understand the
> modifications I have to go back and page in the original stuff from last year
> once again. So I have to read the original patch first to understand the
> modifications and then get the overall picture of the new stuff. Please fold
> stuff back to the proper places so I can start reviewing this thing under the
> new design idea instead of twisting my brain around two designs.

Ok. I will do that.

> 
> > I fully agree this patch should be split if we want to have a good
> > overall patch series.
> 
> Good.
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 24/32] Task fork and exit for rdtgroup
  2016-07-13 21:02       ` Thomas Gleixner
@ 2016-07-13 21:22         ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-13 21:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Shankar, Ravi V,
	Vikas Shivappa, Prakhya, Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 2:03 PM
> On Wed, 13 Jul 2016, Yu, Fenghua wrote:
> > On Wed, July 2016, Thomas Gleixner wrote
> > > On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > > > +void rdtgroup_post_fork(struct task_struct *child) {
> > > > +	if (!use_rdtgroup_tasks)
> > > > +		return;
> > > > +
> > > > +	spin_lock_irq(&rdtgroup_task_lock);
> > > > +	if (list_empty(&child->rg_list)) {
> > >
> > > Why would the list be non empty after a fork?
> >
> > In this situation for a pid:
> > 1.rdtgroup_fork(): rg_list=null.
> > 2.setup_task_rg_lists(): rg_list is setup
> > 3.rdtgroup_fork(): rg_list is not empty
> 
> Why would rdtgroup_fork() be called twice for a given thread?
> 
> > This situation happens only during rscctrl mount time. Before mount,
> > post_fork() returns from !use_rdtgroup_tasks and doesn't set up
> > rg_list. After mount, rg_list() is always empty in post_fork(). But we need
> to check rg_list for above situation.
> >
> > Does that make sense?
> 
> No, that does not make any sense at all.
> 
> > Any suggestion for better soluation?
> 
> The problem you have is:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
> 	  task becomes visible
> 
> 	write_unlock(tasklist_lock);
> 
> 	rdtgroup_post_fork();
> 	    if (!use_rdtgroup_tasks)
> 	       return;
> 
> 	    spin_lock_irq(&rdtgroup_task_lock);
>   	    list_add();
> 	    spin_unlock_irq(&rdtgroup_task_lock);
> 
> I have no idea why this lock must be taken with _irq, but thats another story.
> Let's look at the mount side:
> 
>        spin_lock_irq(&rdtgroup_task_lock);
>        read_lock(&tasklist_lock);
> 
>        do_each_thread(g, p) {
>        	   WARN_ON(More magic crap happening there)
> 
> 	   spin_lock_irq(&p->sighand->siglock);
> 	   list_add();
> 	   spin_unlock_irq(&p->sighand->siglock);
> 		      ^^^^
> Great: You undo the irq disable of (&rdtgroup_task_lock) above! Oh well....
> 
>        read_unlock(&tasklist_lock);
>        spin_unlock_irq(&rdtgroup_task_lock);
> 
> So you need all this magic in rdtgroup_post_fork() and setup_task_rg_lists()
> just because you blindly positioned rdtgroup_post_fork() at the point where
> the cgroup_post_fork() stuff is. But you did not think a second about the
> locking rules here otherwise they would be documented somewhere.
> 
> You need a read_lock(&tasklist_lock) for the mount part anyway. So why
> don't you do the obvious:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
> 	rdtgroup_post_fork();
> 	    if (use_rdtgroup_tasks)
> 	        spin_lock(&rdtgroup_task_lock);
> 	        list_add();
> 	        spin_unlock(&rdtgroup_task_lock);
> 
>         task becomes visible
> 
> 	write_unlock(tasklist_lock);
> 
> And reorder the lock ordering in the mount path:
> 
>        read_lock(&tasklist_lock);
>        spin_lock(&rdtgroup_task_lock);
> 
> Now using rdtgroup_task_lock to protect current->rdtgroup is horrible as
> well. You need task->sighand->siglock in the mount path anyway to prevent
> exit races. So you can simplify the whole magic to:
> 
>     fork
> 	list_init(rg_list);
> 	write_lock(tasklist_lock);
> 
>         spin_lock(&current->sighand->siglock);
> 
> 	rdtgroup_post_fork();
> 	    if (use_rdtgroup_tasks)
> 	        list_add();
> 
> 	spin_unlock(&current->sighand->siglock);
> 	write_unlock(tasklist_lock);
> 
> That removes an extra lock/unlock operation from the fork path because
> current->sighand->siglock is taken inside of the tasklist_lock write
> current->sighand->locked
> section already.
> 
> So you need protection for use_rdtgroup_task, which is a complete
> misnomer btw. (rdtgroup_active would be too obvious, right?). That
> protection is simple because you can set that flag with tasklist_lock read
> locked which you hold anyway for iterating all threads in the mount path.
> 
> Aside of that you need to take tsk->sighand->siglock when you change
> tsk->rdtgroup, but that's a no-brainer and it gives you the extra
> tsk->benefit that
> you can protect such an operation against exit of the task that way by
> checking PF_EXITING under the lock. I don't see any protection against exit in
> your current implementation when a task is moved to a different partition.
> 
> Please sit down and describe the complete locking and protection scheme of
> this stuff. I'm not going to figure this out from the obscure code another time.
> 
> Thanks,
> 
> 	tglx

Sure, I'll rethink of the locking and protection scheme for the tasks.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation
  2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
  2016-07-13  9:19   ` Thomas Gleixner
@ 2016-07-14  0:40   ` David Carrillo-Cisneros
  2016-07-14 22:58     ` Yu, Fenghua
  1 sibling, 1 reply; 81+ messages in thread
From: David Carrillo-Cisneros @ 2016-07-14  0:40 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Ravi V Shankar, Vikas Shivappa, Sai Prakhya,
	linux-kernel, x86

> +static inline void intel_rdt_cpu_start(int cpu)
> +{
> +       struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
> +
> +       state->closid = 0;
> +       mutex_lock(&rdt_group_mutex);
> +       if (rdt_cpumask_update(cpu))
> +               smp_call_function_single(cpu, cbm_update_msrs, NULL, 1);
> +       mutex_unlock(&rdt_group_mutex);

what happens if cpu's with a cache_id not available at boot comes online?

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  2016-07-13  1:03 ` [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface Fenghua Yu
@ 2016-07-14  0:41   ` David Carrillo-Cisneros
  2016-07-14  6:11     ` Thomas Gleixner
  2016-07-14  6:32     ` Yu, Fenghua
  0 siblings, 2 replies; 81+ messages in thread
From: David Carrillo-Cisneros @ 2016-07-14  0:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Ravi V Shankar, Vikas Shivappa, Sai Prakhya,
	linux-kernel, x86

> +static int get_res_type(char **res, enum resource_type *res_type)
> +{
> +       char *tok;
> +
> +       tok = strsep(res, ":");
> +       if (tok == NULL)
> +               return -EINVAL;
> +
> +       if (!strcmp(tok, "L3")) {

Maybe use strstrip to allow a more readable input ? i.e. "L3 : <schema> "

> +               *res_type = RESOURCE_L3;
> +               return 0;
> +       }
> +
> +       return -EINVAL;
> +}
> +
> +static int divide_resources(char *buf, char *resources[RESOURCE_NUM])
> +{
> +       char *tok;
> +       unsigned int resource_num = 0;
> +       int ret = 0;
> +       char *res;
> +       char *res_block;
> +       size_t size;
> +       enum resource_type res_type;
> +
> +       size = strlen(buf) + 1;
> +       res = kzalloc(size, GFP_KERNEL);
> +       if (!res) {
> +               ret = -ENOSPC;

-ENOMEM?

> +
> +       res_block = res;
> +       ret = get_res_type(&res_block, &res_type);
> +       if (ret) {
> +               pr_info("Unknown resource type!");
> +               goto out;
> +       }

does this work if res_block doesn't have ":"? don't you need to check res_block?

> +static int get_cache_schema(char *buf, struct cache_resource *l, int level,
> +                        struct rdtgroup *rdtgrp)
> +{
> +       char *tok, *tok_cache_id;
> +       int ret;
> +       int domain_num;
> +       int input_domain_num;
> +       int len;
> +       unsigned int input_cache_id;
> +       unsigned int cid;
> +       unsigned int leaf;
> +
> +       if (!cat_enabled(level) && strcmp(buf, ";")) {
> +               pr_info("Disabled resource should have empty schema\n");
> +               return -EINVAL;
> +       }
> +
> +       len = strlen(buf);
> +       /*
> +        * Translate cache id based cbm from one line string with format
> +        * "<cache prefix>:<cache id0>=xxxx;<cache id1>=xxxx;..." for
> +        * disabled cdp.
> +        * Or
> +        * "<cache prefix>:<cache id0>=xxxxx,xxxxx;<cache id1>=xxxxx,xxxxx;..."
> +        * for enabled cdp.
> +        */
> +       input_domain_num = 0;
> +       while ((tok = strsep(&buf, ";")) != NULL) {
> +               tok_cache_id = strsep(&tok, "=");
> +               if (tok_cache_id == NULL)
> +                       goto cache_id_err;

what if no "=" ? , also would be nice to allow spaces around "="  .

> +
> +               ret = kstrtouint(tok_cache_id, 16, &input_cache_id);
> +               if (ret)
> +                       goto cache_id_err;
> +
> +               leaf = level_to_leaf(level);

why is this in the loop?

> +               cid = cache_domains[leaf].shared_cache_id[input_domain_num];
> +               if (input_cache_id != cid)
> +                       goto cache_id_err;

so schemata must be present for all cache_id's and sorted in
increasing order of cache_id? what's the point of having the cache_id#
then?

> +
> +/*
> + * Check if the reference counts are all ones in rdtgrp's domain.
> + */
> +static bool one_refcnt(struct rdtgroup *rdtgrp, int domain)
> +{
> +       int refcnt;
> +       int closid;
> +
> +       closid = rdtgrp->resource.closid[domain];
> +       if (cat_l3_enabled) {

if cat_l3_enabled == false, then reference counts are always one?

> + * Go through all shared domains. Check if there is an existing closid
> + * in all rdtgroups that matches l3 cbms in the shared
> + * domain. If find one, reuse the closid. Otherwise, allocate a new one.
> + */
> +static int get_rdtgroup_resources(struct resources *resources_set,
> +                                 struct rdtgroup *rdtgrp)
> +{
> +       struct cache_resource *l3;
> +       bool l3_cbm_found;
> +       struct list_head *l;
> +       struct rdtgroup *r;
> +       u64 cbm;
> +       int rdt_closid[MAX_CACHE_DOMAINS];
> +       int rdt_closid_type[MAX_CACHE_DOMAINS];
> +       int domain;
> +       int closid;
> +       int ret;
> +
> +       l3 = resources_set->l3;

l3 is NULL if cat_l3_enabled == false but it seems like it may be used
later even though.

> +       memcpy(rdt_closid, rdtgrp->resource.closid,
> +              shared_domain_num * sizeof(int));
> +       for (domain = 0; domain < shared_domain_num; domain++) {
> +               if (rdtgrp->resource.valid) {
> +                       /*
> +                        * If current rdtgrp is the only user of cbms in
> +                        * this domain, will replace the cbms with the input
> +                        * cbms and reuse its own closid.
> +                        */
> +                       if (one_refcnt(rdtgrp, domain)) {
> +                               closid = rdtgrp->resource.closid[domain];
> +                               rdt_closid[domain] = closid;
> +                               rdt_closid_type[domain] = REUSED_OWN_CLOSID;
> +                               continue;
> +                       }
> +
> +                       l3_cbm_found = true;
> +
> +                       if (cat_l3_enabled)
> +                               l3_cbm_found = cbm_found(l3, rdtgrp, domain,
> +                                                        CACHE_LEVEL3);
> +
> +                       /*
> +                        * If the cbms in this shared domain are already
> +                        * existing in current rdtgrp, record the closid
> +                        * and its type.
> +                        */
> +                       if (l3_cbm_found) {
> +                               closid = rdtgrp->resource.closid[domain];
> +                               rdt_closid[domain] = closid;
> +                               rdt_closid_type[domain] = CURRENT_CLOSID;

a new l3 resource will be created if cat_l3_enabled is false.


> +static void init_cache_resource(struct cache_resource *l)
> +{
> +       l->cbm = NULL;
> +       l->cbm2 = NULL;

is cbm2 the data bitmask for when CDP is enabled? if so, a more
descriptive name may help.

> +       l->closid = NULL;
> +       l->refcnt = NULL;
> +}
> +
> +static void free_cache_resource(struct cache_resource *l)
> +{
> +       kfree(l->cbm);
> +       kfree(l->cbm2);
> +       kfree(l->closid);
> +       kfree(l->refcnt);

this function is used to clean up alloc_cache_resource in the error
path of get_resources where it's not necessarily true that all of l's
members were allocated.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  2016-07-14  0:41   ` David Carrillo-Cisneros
@ 2016-07-14  6:11     ` Thomas Gleixner
  2016-07-14  6:16       ` Yu, Fenghua
  2016-07-14  6:32     ` Yu, Fenghua
  1 sibling, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-14  6:11 UTC (permalink / raw)
  To: David Carrillo-Cisneros
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Ravi V Shankar, Vikas Shivappa, Sai Prakhya,
	linux-kernel, x86

On Wed, 13 Jul 2016, David Carrillo-Cisneros wrote:
> > +static void free_cache_resource(struct cache_resource *l)
> > +{
> > +       kfree(l->cbm);
> > +       kfree(l->cbm2);
> > +       kfree(l->closid);
> > +       kfree(l->refcnt);
> 
> this function is used to clean up alloc_cache_resource in the error
> path of get_resources where it's not necessarily true that all of l's
> members were allocated.

kfree handles kfree(NULL) nicely.....

Thanks,

	tglx
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  2016-07-14  6:11     ` Thomas Gleixner
@ 2016-07-14  6:16       ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-14  6:16 UTC (permalink / raw)
  To: Thomas Gleixner, David Carrillo-Cisneros
  Cc: Ingo Molnar, Anvin, H Peter, Luck, Tony, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Shankar, Ravi V, Vikas Shivappa, Prakhya,
	Sai Praneeth, linux-kernel, x86

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Wednesday, July 13, 2016 11:11 PM
> On Wed, 13 Jul 2016, David Carrillo-Cisneros wrote:
> > > +static void free_cache_resource(struct cache_resource *l) {
> > > +       kfree(l->cbm);
> > > +       kfree(l->cbm2);
> > > +       kfree(l->closid);
> > > +       kfree(l->refcnt);
> >
> > this function is used to clean up alloc_cache_resource in the error
> > path of get_resources where it's not necessarily true that all of l's
> > members were allocated.
> 
> kfree handles kfree(NULL) nicely.....

Yes, that's right. If I check the pointer before kfree(), checkpatch.pl will
report warning for that and suggest kfree(NULL) is safe and code is 
short.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface
  2016-07-14  0:41   ` David Carrillo-Cisneros
  2016-07-14  6:11     ` Thomas Gleixner
@ 2016-07-14  6:32     ` Yu, Fenghua
  1 sibling, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-14  6:32 UTC (permalink / raw)
  To: David Carrillo-Cisneros
  Cc: Thomas Gleixner, Ingo Molnar, Anvin, H Peter, Luck, Tony,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Shankar, Ravi V, Vikas Shivappa, Prakhya,
	Sai Praneeth, linux-kernel, x86

> From: David Carrillo-Cisneros [mailto:davidcc@google.com]
> > +static int get_res_type(char **res, enum resource_type *res_type) {
> > +       char *tok;
> > +
> > +       tok = strsep(res, ":");
> > +       if (tok == NULL)
> > +               return -EINVAL;
> > +
> > +       if (!strcmp(tok, "L3")) {
> 
> Maybe use strstrip to allow a more readable input ? i.e. "L3 : <schema> "


> 
> > +               *res_type = RESOURCE_L3;
> > +               return 0;
> > +       }
> > +
> > +       return -EINVAL;
> > +}
> > +
> > +static int divide_resources(char *buf, char *resources[RESOURCE_NUM])
> > +{
> > +       char *tok;
> > +       unsigned int resource_num = 0;
> > +       int ret = 0;
> > +       char *res;
> > +       char *res_block;
> > +       size_t size;
> > +       enum resource_type res_type;
> > +
> > +       size = strlen(buf) + 1;
> > +       res = kzalloc(size, GFP_KERNEL);
> > +       if (!res) {
> > +               ret = -ENOSPC;
> 
> -ENOMEM?

Will change to -ENOMEM.

> 
> > +
> > +       res_block = res;
> > +       ret = get_res_type(&res_block, &res_type);
> > +       if (ret) {
> > +               pr_info("Unknown resource type!");
> > +               goto out;
> > +       }
> 
> does this work if res_block doesn't have ":"? don't you need to check
> res_block?

get_res_type() checks ":" and return -EINVAL if no ":".

> 
> > +static int get_cache_schema(char *buf, struct cache_resource *l, int level,
> > +                        struct rdtgroup *rdtgrp) {
> > +       char *tok, *tok_cache_id;
> > +       int ret;
> > +       int domain_num;
> > +       int input_domain_num;
> > +       int len;
> > +       unsigned int input_cache_id;
> > +       unsigned int cid;
> > +       unsigned int leaf;
> > +
> > +       if (!cat_enabled(level) && strcmp(buf, ";")) {
> > +               pr_info("Disabled resource should have empty schema\n");
> > +               return -EINVAL;
> > +       }
> > +
> > +       len = strlen(buf);
> > +       /*
> > +        * Translate cache id based cbm from one line string with format
> > +        * "<cache prefix>:<cache id0>=xxxx;<cache id1>=xxxx;..." for
> > +        * disabled cdp.
> > +        * Or
> > +        * "<cache prefix>:<cache id0>=xxxxx,xxxxx;<cache
> id1>=xxxxx,xxxxx;..."
> > +        * for enabled cdp.
> > +        */
> > +       input_domain_num = 0;
> > +       while ((tok = strsep(&buf, ";")) != NULL) {
> > +               tok_cache_id = strsep(&tok, "=");
> > +               if (tok_cache_id == NULL)
> > +                       goto cache_id_err;
> 
> what if no "=" ? , also would be nice to allow spaces around "="  .

Without "=-", reports id error. Sure I can strip the spaces around "=".

> 
> > +
> > +               ret = kstrtouint(tok_cache_id, 16, &input_cache_id);
> > +               if (ret)
> > +                       goto cache_id_err;
> > +
> > +               leaf = level_to_leaf(level);
> 
> why is this in the loop?

Leaf is the cache  index number which is contiguous starting from 0. We need to save and get
cache id info from cache index.

<leaf, cache id> uniquely identifies a cache.

Architecturally level can not be used to identify a cache. There could be 2 caches in one level, ie. icache and dcache.

So we need to translate from level 
> 
> > +               cid = cache_domains[leaf].shared_cache_id[input_domain_num];
> > +               if (input_cache_id != cid)
> > +                       goto cache_id_err;
> 
> so schemata must be present for all cache_id's and sorted in increasing order
> of cache_id? what's the point of having the cache_id# then?

Cache_id may be not be contiguous. It can not be used directly as array index.
For user interface, user inputs cache_id to identify a cache. Internally kernel uses
domain, which is contiguous and used as index for internally saved cbm. Kernel
interface code does the mapping between cache_id and domain number.

> 
> > +
> > +/*
> > + * Check if the reference counts are all ones in rdtgrp's domain.
> > + */
> > +static bool one_refcnt(struct rdtgroup *rdtgrp, int domain) {
> > +       int refcnt;
> > +       int closid;
> > +
> > +       closid = rdtgrp->resource.closid[domain];
> > +       if (cat_l3_enabled) {
> 
> if cat_l3_enabled == false, then reference counts are always one?

I can change to return false if cat_l3_enabled==false.

> 
> > + * Go through all shared domains. Check if there is an existing
> > +closid
> > + * in all rdtgroups that matches l3 cbms in the shared
> > + * domain. If find one, reuse the closid. Otherwise, allocate a new one.
> > + */
> > +static int get_rdtgroup_resources(struct resources *resources_set,
> > +                                 struct rdtgroup *rdtgrp) {
> > +       struct cache_resource *l3;
> > +       bool l3_cbm_found;
> > +       struct list_head *l;
> > +       struct rdtgroup *r;
> > +       u64 cbm;
> > +       int rdt_closid[MAX_CACHE_DOMAINS];
> > +       int rdt_closid_type[MAX_CACHE_DOMAINS];
> > +       int domain;
> > +       int closid;
> > +       int ret;
> > +
> > +       l3 = resources_set->l3;
> 
> l3 is NULL if cat_l3_enabled == false but it seems like it may be used later
> even though.


> 
> > +       memcpy(rdt_closid, rdtgrp->resource.closid,
> > +              shared_domain_num * sizeof(int));
> > +       for (domain = 0; domain < shared_domain_num; domain++) {
> > +               if (rdtgrp->resource.valid) {
> > +                       /*
> > +                        * If current rdtgrp is the only user of cbms in
> > +                        * this domain, will replace the cbms with the input
> > +                        * cbms and reuse its own closid.
> > +                        */
> > +                       if (one_refcnt(rdtgrp, domain)) {
> > +                               closid = rdtgrp->resource.closid[domain];
> > +                               rdt_closid[domain] = closid;
> > +                               rdt_closid_type[domain] = REUSED_OWN_CLOSID;
> > +                               continue;
> > +                       }
> > +
> > +                       l3_cbm_found = true;
> > +
> > +                       if (cat_l3_enabled)
> > +                               l3_cbm_found = cbm_found(l3, rdtgrp, domain,
> > +
> > + CACHE_LEVEL3);
> > +
> > +                       /*
> > +                        * If the cbms in this shared domain are already
> > +                        * existing in current rdtgrp, record the closid
> > +                        * and its type.
> > +                        */
> > +                       if (l3_cbm_found) {
> > +                               closid = rdtgrp->resource.closid[domain];
> > +                               rdt_closid[domain] = closid;
> > +                               rdt_closid_type[domain] =
> > + CURRENT_CLOSID;
> 
> a new l3 resource will be created if cat_l3_enabled is false.

This code actually handles both l3 and l2.

L2 patches will be sent out later on top of this patch set.

With this patch set alone, when l3 is disabled, we will not come
to this function because initialization  will fail and there is no
rscctrl file system.

> 
> 
> > +static void init_cache_resource(struct cache_resource *l) {
> > +       l->cbm = NULL;
> > +       l->cbm2 = NULL;
> 
> is cbm2 the data bitmask for when CDP is enabled? if so, a more descriptive
> name may help.

Yes, that's right. It's the second CBM when CDP is enabled. Maybe I change
to a nicer name.

> 
> > +       l->closid = NULL;
> > +       l->refcnt = NULL;
> > +}
> > +
> > +static void free_cache_resource(struct cache_resource *l) {
> > +       kfree(l->cbm);
> > +       kfree(l->cbm2);
> > +       kfree(l->closid);
> > +       kfree(l->refcnt);
> 
> this function is used to clean up alloc_cache_resource in the error path of
> get_resources where it's not necessarily true that all of l's members were
> allocated.

As Thomas already said, kfree(NULL) is safe. Don't need to check pointer
before kfree() and code is short.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13 17:13     ` Luck, Tony
@ 2016-07-14  6:53       ` Thomas Gleixner
  2016-07-14 17:16         ` Luck, Tony
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-14  6:53 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Wed, 13 Jul 2016, Luck, Tony wrote:
> On Wed, Jul 13, 2016 at 02:47:30PM +0200, Thomas Gleixner wrote:
> > On Tue, 12 Jul 2016, Fenghua Yu wrote:
> > > +3. Hierarchy in rscctrl
> > > +=======================
> > 
> > What means rscctrl?
> > 
> > You were not able to find a more cryptic acronym?
> 
> rscctrl == resource control
> 
> Intel marketing would (probably) like us to use:
> 
>    /sys/fs/Intel(R) Resource Director Technology(TM)/
> 
> Happy to take suggestions for something in between those
> extremes :-)

I'd suggest "resctrl" and the abbreviation dictionaries tell me that the most
common ones for resource are: R, RESORC, RES

> > > +Any tasks scheduled on the cpus will use the schemas. User can set
> > > +both "cpus" and "tasks" to share the same schema in one directory. But when
> > > +a CPU is bound to a schema, a task running on the CPU uses this schema and
> > > +kernel will ignore scheam set up for the task in "tasks".
> > 
> > This does not make any sense. 
> > 
> > When a task is bound to a schema then this should have preference over the
> > schema which is associated to the CPU. The CPU association is meant for tasks
> > which are not bound to a particular partition/schema.
> > 
> > So the initial setup should be:
> > 
> >    - All CPUs are associated to the root resource partition
> > 
> >    - No thread is associated to a particular resource partition
> > 
> > When a thread is added to a 'tasks' file of a partition then this partition
> > takes preference. If it's removed, i.e. the association to a partition is
> > undone, then the CPU association is used.
> > 
> > I have no idea why you think that all threads should be in a tasks file by
> > default. Associating CPUs in the first place makes a lot more sense as it
> > represents the topology of the system nicely.
> 
> If we did it that way, it would be harder to change the default
> resources.  E.g. now we start with all processes in the root
> rdtgroup.  We can change the schema for the root group and restrict
> them to, say, 60% of L3 cache on one (or all) sockets - giving us
> 40% of cache to give out to one or more groups.

I tend to disagree.

If you start up with all resources assigned to all CPUs and all tasks are set
to use the CPU default, then you still can restrict the root CPU defaults to
60% L3 which gives you 40% of cache to hand out.

What's hard about this?

Now you can start to create new partitions and either assign CPU or tasks to
them.

As a side effect that avoids the whole 'find all tasks' on mount machinery
simply because the CPU defaults do not change at all.
 
> So what we've implemented (and perhaps need to explain better here)
> is that every thread always belongs to one (and only one) rdtgroup.
> It will use the resources described in that group whereever it runs,
> except in the case where we have designated some cpus as special snowflakes.

I don't think that case as special snowflakes. Due to the very limited number
of cosids the CPU association is going to be a very useful tool.

> When a cpu is assigned to an rdtgroup the schema for the cpu has
> precedence (i.e. we write the MSR with a CLOSID once, and then it
> never changes).
> 
> Some of this is confusing because people will very likely also use
> cpu affinity to control where their processes run. But affinity is
> orthogonal to rdtgroup membership.

Right. It's confusing and what's even more confusing is that you have no way
to figure out what a particular task is actually using. With the 'use CPU
defaults, if not assigned to a partition' scheme you can very easy figure out
what a task is using because its either in a partition task list or not.

> I think what we have allows you to so all the things we talked about.
> But if we are missing a case, or if things can be simplified while
> still retaining the same functionality then lets discuss that.

It covers almost everything except the case I outlined before:

   Isolated CPU	 	    Important Task runs on isolated CPU
   5% exclusive cache	    10% exclusive cache

That's impossible with your scheme, but it's something which matters. You want
to make sure that the system services on that isolated CPU stay cache hot
without hurting the cache locality of your isolated task.

> Otherwise we can revise the documentation to explain all this better.

That needs to be done in any case. The existing one does not really qualify as
proper documentation. It's closer to a fairy tale :)

I really have to ask why you did not take the time and include all the
information you gave now into that documentation file in the first place.

> > > +Initial value is all zeros which means there is no CPU bound to the schemas
> > > +in the root directory and tasks use the schemas.
> > 
> > As I said above this is backwards.
> 
> > > +If one resource is disabled, its line is not shown in schemas file.
> > 
> > That means:	  
> > 
> >      Resources which are not described in a schemata file are disabled for
> >      that particular partition.
> > 
> > Right?
> > 
> > Now that raises the question how this is supposed to work. Let's assume that
> > we have a partition 'foo' and thread X is in the tasks file of that
> > partition. The schema of that partition contains only an L2 entry. What's the
> > L3 association for thread X? Nothing at all?
> 
> Resources are either enabled or disabled globally. Each schema file
> must provide details for every enabled resource. So if we are on a
> processor that supports both L2 and L3, we will normally have schema
> files that specify both.
> We could boot with the "disable_cat_l2"
> kernel command line option and then every schema file would just
> specify L3 (and the MSRs for L2 would all be set to all-ones so that
> everyone had full access to the L2 on each core).

So the above should read:

   Each schema file must provide configuration for all resource controls which
   are enabled in the system.

Right?
 
> > > +User can create a sub-directory under the root directory by "mkdir" command.
> > > +User can remove the sub-directory by "rmdir" command.
> > 
> > User? Any user?
> 
> Well if someone did:
>  # chmod 777 /sys/fs/rscctrl
> then any user could make directories.  That would be inadvisable.
> You could use 775 and let a trusted group have control so that you
> didn't require root access to modify things.
> 
> Should we say "system administrator" rather than "user"?

Yes. Because the default should be 755 which is the obvious choice for all
root/admin controlled things. If root decides to change it to 777 then it's
not the kernels problem. But documentation should clearly say: It's a root
controlled resource.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT
  2016-07-13  1:02 ` [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT Fenghua Yu
@ 2016-07-14 12:30   ` Thomas Gleixner
  0 siblings, 0 replies; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-14 12:30 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Tue, 12 Jul 2016, Fenghua Yu wrote:
>  
> +extern struct cache_domain cache_domains[MAX_CACHE_LEAVES];
> +
> +

Why only two extra new lines?

>  extern struct rdt_opts rdt_opts;
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -0,0 +1,881 @@
> +/*
> + * Resource Director Technology(RDT)
> + * - User interface for Resource Alloction in RDT.
> + *
> + * Copyright (C) 2016 Intel Corporation
> + *
> + * 2016 Written by
> + *    Fenghua Yu <fenghua.yu@intel.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * More information about RDT be found in the Intel (R) x86 Architecture
> + * Software Developer Manual.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/cred.h>
> +#include <linux/ctype.h>
> +#include <linux/errno.h>
> +#include <linux/init_task.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/magic.h>
> +#include <linux/mm.h>
> +#include <linux/mutex.h>
> +#include <linux/mount.h>
> +#include <linux/pagemap.h>
> +#include <linux/proc_fs.h>
> +#include <linux/rcupdate.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/percpu-rwsem.h>
> +#include <linux/string.h>
> +#include <linux/sort.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/idr.h>
> +#include <linux/vmalloc.h> /* TODO: replace with more sophisticated array */

You want to replace vmalloc with a more sophisticated array? Good luck with
that. Get rid of those nonsensical comments in random places and please do not
use tail comments at all.

> +#include <linux/kthread.h>
> +#include <linux/delay.h>
> +#include <linux/atomic.h>
> +#include <linux/cpumask.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cacheinfo.h>
> +#include <net/sock.h>
> +#include <asm/intel_rdt_rdtgroup.h>
> +#include <asm/intel_rdt.h>

Are you sure that you need all of these includes? 

> +/**
> + * kernfs_root - find out the kernfs_root a kernfs_node belongs to
> + * @kn: kernfs_node of interest
> + *
> + * Return the kernfs_root @kn belongs to.
> + */
> +static inline struct kernfs_root *get_kernfs_root(struct kernfs_node *kn)
> +{
> +	/* if parent exists, it's always a dir; otherwise, @sd is a dir */

Why? That's non obvious for the casual reader ....

> +	if (kn->parent)
> +		kn = kn->parent;
> +	return kn->dir.root;
> +}
> +
> +/*
> + * Protects rdtgroup_idr so that IDs can be released without grabbing
> + * rdtgroup_mutex.
> + */
> +static DEFINE_SPINLOCK(rdtgroup_idr_lock);
> +
> +struct percpu_rw_semaphore rdtgroup_threadgroup_rwsem;
> +
> +#define MAX_CPUMASK_CHAR_IN_HEX	(NR_CPUS/4)

Oh well. This is used to size a buffer. At the usage site you do

   char buf[MAX_CPUMASK_CHAR_IN_HEX + 1];

What's wrong with giving this thing a understandable name and do the '+ 1'
right at the macro definition?

#define CPUMASK_BUF_LEN			((NR_CPUS / 4) + 1)

> +
> +static struct rftype rdtgroup_root_base_files[];
> +
> +#define RDTGROUP_FILE_NAME_MAX		(MAX_RDTGROUP_TYPE_NAMELEN +	\
> +					 MAX_RFTYPE_NAME + 2)

So that's the maximum number of rdt group file names or what?

s/MAX/LEN/ perhaps?

> +static char *rdtgroup_file_name(const struct rftype *rft, char *buf)
> +{
> +	strncpy(buf, rft->name, RDTGROUP_FILE_NAME_MAX);

Do we really need a wrapper around strncpy() with a completely useless
function name ?

> +	return buf;
> +}
> +
> +/**
> + * rdtgroup_file_mode - deduce file mode of a control file
> + * @cft: the control file in question
> + *
> + * S_IRUGO for read, S_IWUSR for write.
> + */
> +static umode_t rdtgroup_file_mode(const struct rftype *rft)
> +{
> +	umode_t mode = 0;
> +
> +	if (rft->read_u64 || rft->read_s64 || rft->seq_show)
> +		mode |= S_IRUGO;
> +
> +	if (rft->write_u64 || rft->write_s64 || rft->write) {
> +		if (rft->flags & RFTYPE_WORLD_WRITABLE)

Why would any of this be world writeable?

Can you please document the permission rules of this file system meticously
and provide sensible arguments for your choices?

> +			mode |= S_IWUGO;
> +		else
> +			mode |= S_IWUSR;
> +	}
> +
> +	return mode;
> +}
> +
> +struct rdtgroup *root_rdtgrp;

Can you please place global variables at the top of the file and not glue them
without a newline to a static function?

> +static int rdtgroup_add_file(struct kernfs_node *parent_kn, struct rftype *rft)
> +{
> +	char name[RDTGROUP_FILE_NAME_MAX];
> +	struct kernfs_node *kn;
> +	struct lock_class_key *key = NULL;
> +	int ret;
> +
> +	kn = __kernfs_create_file(parent_kn, rdtgroup_file_name(rft, name),
> +				  rdtgroup_file_mode(rft), 0, rft->kf_ops, rft,
> +				  NULL, key);
> +	if (IS_ERR(kn))
> +		return PTR_ERR(kn);
> +
> +	ret = rdtgroup_kn_set_ugid(kn);
> +	if (ret) {
> +		kernfs_remove(kn);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void rdtgroup_rm_file(struct kernfs_node *kn, const struct rftype *rft)
> +{
> +	char name[RDTGROUP_FILE_NAME_MAX];
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	kernfs_remove_by_name(kn, rdtgroup_file_name(rft, name));
> +}
> +
> +static int rdtgroup_addrm_files(struct kernfs_node *kn, struct rftype rfts[],
> +			      bool is_add)
> +{
> +	struct rftype *rft, *rft_end = NULL;
> +	int ret;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +restart:
> +	for (rft = rfts; rft != rft_end && rft->name[0] != '\0'; rft++) {

That rtf->name[0] condition is ugly.

> +		if (is_add) {
> +			ret = rdtgroup_add_file(kn, rft);
> +			if (ret) {
> +				pr_warn("%s: failed to add %s, err=%d\n",
> +					__func__, rft->name, ret);
> +				rft_end = rft;
> +				is_add = false;
> +				goto restart;
> +			}
> +		} else {
> +			rdtgroup_rm_file(kn, rft);
> +		}
> +	}
> +	return 0;
> +}

This is crack. Really. I had to read this 5 times to understand the convoluted
logic here. What are you saving by this is_add magic?

You trade an extra for (....) loop for readability.

Whats' wrong with:

rdtgroup_rm_files(struct kernfs_node *kn, const struct rftype *rft,
		  const struct rftype *end)
{
	for (; rft != end, rft++)
		rdtgroup_rm_file(kn, rft);
}

rdtgroup_add_files(struct kernfs_node *kn, const struct rftype *rft,
		   const struct rftype *end)
{
	const struct rtfype *p = rft;

	for (p = rft; p != end, p++) {
		int ret = rdtgroup_add_file(kn, p);

		if (ret) {
		   	 rdtgroup_rm_files(kn, rtf, p);
			 return ret;
		}
	}
	return 0;
}

That's too easy to read and understand, right?

And the callsite simply does:

   rdtgroup_add_files(kn, rtf, rft + ARRAY_SIZE(rtf));

> +static enum resource_type get_kn_res_type(struct kernfs_node *kn)
> +{
> +	return RESOURCE_L3;

Errm. Can we please have a way to store the resource type right away. We know
already that we need it.

> +}
> +
> +static int rdt_max_closid_show(struct seq_file *seq, void *v)
> +{
> +	struct kernfs_open_file *of = seq->private;
> +	enum resource_type res_type;
> +
> +	res_type = get_kn_res_type(of->kn);
> +
> +	switch (res_type) {

  	switch (get_kn_res_type(of->kn)) {

Perhaps?

> +	case RESOURCE_L3:
> +		seq_printf(seq, "%d\n",
> +			boot_cpu_data.x86_l3_max_closid);
> +		break;

> +static int get_shared_domain(int domain, int level)

Nothing is using this in this patch. So why gets this introduced here? Darn,
this add/modify random code without context makes review simply impossible.

> +{
> +	int sd;
> +
> +	for_each_cache_domain(sd, 0, shared_domain_num) {
> +		if (cat_l3_enabled && level == CACHE_LEVEL3) {
> +			if (shared_domain[sd].l3_domain == domain)
> +				return sd;
> +		}
> +	}
> +
> +	return -1;
> +}


> +static int rdtgroup_procs_write_permission(struct task_struct *task,
> +					   struct kernfs_open_file *of)
> +{
> +	const struct cred *cred = current_cred();
> +	const struct cred *tcred = get_task_cred(task);
> +	int ret = 0;
> +
> +	/*
> +	 * even if we're attaching all tasks in the thread group, we only
> +	 * need to check permissions on one of them.
> +	 */
> +	if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
> +	    !uid_eq(cred->euid, tcred->uid) &&
> +	    !uid_eq(cred->euid, tcred->suid))
> +		ret = -EACCES;

Why EACCES? EPERM is the canonical error code here

> +
> +	put_cred(tcred);
> +	return ret;
> +}
> +
> +bool use_rdtgroup_tasks;

Another random global variable at some random place

> +
> +static void init_rdtgroup_housekeeping(struct rdtgroup *rdtgrp)
> +{
> +	init_waitqueue_head(&rdtgrp->offline_waitq);
> +	rdtgrp->pset.self = rdtgrp;
> +	INIT_LIST_HEAD(&rdtgrp->pset.task_iters);
> +}
> +
> +static LIST_HEAD(rdtgroup_lists);
> +static void init_rdtgroup_root(struct rdtgroup_root *root)

Sigh. This coding style sucks

> +{
> +	struct rdtgroup *rdtgrp = &root->rdtgrp;
> +
> +	INIT_LIST_HEAD(&root->root_list);
> +	INIT_LIST_HEAD(&rdtgrp->rdtgroup_list);
> +	list_add_tail(&rdtgrp->rdtgroup_list, &rdtgroup_lists);
> +	atomic_set(&root->nr_rdtgrps, 1);
> +	rdtgrp->root = root;
> +	init_rdtgroup_housekeeping(rdtgrp);
> +	idr_init(&root->rdtgroup_idr);
> +}
> +
> +static DEFINE_IDR(rdtgroup_hierarchy_idr);
> +static int rdtgroup_init_root_id(struct rdtgroup_root *root)
> +{
> +	int id;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	id = idr_alloc_cyclic(&rdtgroup_hierarchy_idr, root, 0, 0, GFP_KERNEL);
> +	if (id < 0)
> +		return id;
> +
> +	root->hierarchy_id = id;
> +	return 0;
> +}
> +
> +static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops;
> +/* IDR wrappers which synchronize using rdtgroup_idr_lock */
> +static int rdtgroup_idr_alloc(struct idr *idr, void *ptr, int start, int end,
> +			    gfp_t gfp_mask)
> +{
> +	int ret;
> +
> +	idr_preload(gfp_mask);
> +	spin_lock_bh(&rdtgroup_idr_lock);

Why is this lock_bh? Against which softirq is this protecting?

> +	ret = idr_alloc(idr, ptr, start, end, gfp_mask & ~__GFP_DIRECT_RECLAIM);
> +	spin_unlock_bh(&rdtgroup_idr_lock);
> +	idr_preload_end();
> +	return ret;
> +}
> +
> +/* hierarchy ID allocation and mapping, protected by rdtgroup_mutex */
> +static void rdtgroup_exit_root_id(struct rdtgroup_root *root)
> +{
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	if (root->hierarchy_id) {
> +		idr_remove(&rdtgroup_hierarchy_idr, root->hierarchy_id);
> +		root->hierarchy_id = 0;
> +	}
> +}
> +
> +static struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
> +{
> +	struct rdtgroup *rdtgrp;
> +
> +	if (kernfs_type(kn) == KERNFS_DIR)
> +		rdtgrp = kn->priv;
> +	else
> +		rdtgrp = kn->parent->priv;
> +
> +	kernfs_break_active_protection(kn);

So you drop the kernfs protection here. What makes sure that rdtgrp is still
valid? Documentation of locking and protection rules is not optional, really.

> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	return rdtgrp;
> +}

> +static int rdtgroup_setup_root(struct rdtgroup_root *root,
> +			       unsigned long ss_mask)
> +{
> +	int ret;
> +
> +	root_rdtgrp = &root->rdtgrp;
> +
> +	lockdep_assert_held(&rdtgroup_mutex);
> +
> +	ret = rdtgroup_idr_alloc(&root->rdtgroup_idr, root_rdtgrp,
> +				 1, 2, GFP_KERNEL);
> +	if (ret < 0)
> +		goto out;
> +
> +	root_rdtgrp->id = ret;
> +	root_rdtgrp->ancestor_ids[0] = ret;
> +
> +	ret = rdtgroup_init_root_id(root);
> +	if (ret)
> +		goto cancel_ref;

cancel_ref is empty. What's the point? And what is undoing idr_alloc ???

> +
> +	root->kf_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
> +					   KERNFS_ROOT_CREATE_DEACTIVATED,
> +					   root_rdtgrp);
> +	if (IS_ERR(root->kf_root)) {
> +		ret = PTR_ERR(root->kf_root);
> +		goto exit_root_id;
> +	}
> +	root_rdtgrp->kn = root->kf_root->kn;
> +
> +	ret = rdtgroup_populate_dir(root->kf_root->kn);
> +	if (ret)
> +		goto destroy_root;
> +
> +	/*
> +	 * Link the root rdtgroup in this hierarchy into all the css_set
> +	 * objects.
> +	 */
> +	WARN_ON(atomic_read(&root->nr_rdtgrps) != 1);
> +
> +	kernfs_activate(root_rdtgrp->kn);
> +	ret = 0;
> +	goto out;
> +
> +destroy_root:
> +	kernfs_destroy_root(root->kf_root);
> +	root->kf_root = NULL;
> +exit_root_id:
> +	rdtgroup_exit_root_id(root);
> +cancel_ref:
> +out:
> +	return ret;
> +}
> +
> +#define cache_leaves(cpu)       (get_cpu_cacheinfo(cpu)->num_leaves)

Do we really need this? What's wrong with having it open coded at the only
usage site? That would make the code too obvious, right?

> +struct cache_domain cache_domains[MAX_CACHE_LEAVES];
> +
> +static int get_shared_cache_id(int cpu, int level)
> +{
> +	struct cpuinfo_x86 *c;
> +	int index_msb;
> +	struct cpu_cacheinfo *this_cpu_ci;
> +	struct cacheinfo *this_leaf;
> +
> +	this_cpu_ci = get_cpu_cacheinfo(cpu);
> +
> +	this_leaf = this_cpu_ci->info_list + level_to_leaf(level);
> +	return this_leaf->id;
> +	return c->apicid >> index_msb;
> +}
> +
> +static __init void init_cache_domains(void)
> +{
> +	int cpu, domain;
> +	struct cpu_cacheinfo *this_cpu_ci;
> +	struct cacheinfo *this_leaf;
> +	int leaves;
> +	char buf[MAX_CPUMASK_CHAR_IN_HEX + 1];

That's 2k stack for CONFIG_MAXSMP=y and if the SGI folks keep going it will be
more..... Make this a static __initdata buf ...

> +	unsigned int level;
> +
> +	for (leaves = 0; leaves < cache_leaves(0); leaves++) {

Why leaves? The iterator deals with a single leaf, right?

> +		for_each_online_cpu(cpu) {

So this relies on the fact that user space is not yet up and running so the
cpu map cannot change, right?

> +			struct cpumask *mask;
> +
> +			this_cpu_ci = get_cpu_cacheinfo(cpu);
> +			this_leaf = this_cpu_ci->info_list + leaves;
> +			cache_domains[leaves].level = this_leaf->level;
> +			mask = &this_leaf->shared_cpu_map;
> +			cpumap_print_to_pagebuf(false, buf, mask);
> +			for (domain = 0; domain < MAX_CACHE_DOMAINS; domain++) {
> +				if (cpumask_test_cpu(cpu,
> +				&cache_domains[leaves].shared_cpu_map[domain]))
> +					break;
> +			}

If you end up with complete silly linebreaks it's time to think about
splitting the functions up.

	  for (leaf = 0; leaf < cache_leaves(0); leaf++) {
		for_each_online_cpu(cpu)
			init_cache_domain(cpu, leaf);
	  }

That gets rid of two indentation levels and makes the above and the below
readable. But it's more time consuming to come up with unreadable line breaks
and therefor it must be better, right? 

> +			if (domain == MAX_CACHE_DOMAINS) {
> +				domain =
> +				  cache_domains[leaves].max_cache_domains_num++;
> +
> +				cache_domains[leaves].shared_cpu_map[domain] =
> +					*mask;
> +
> +				level = cache_domains[leaves].level;
> +				cache_domains[leaves].shared_cache_id[domain] =
> +					get_shared_cache_id(cpu, level);
> +			}
> +		}
> +	}
> +}

I'm giving up for now.

    tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-14  6:53       ` Thomas Gleixner
@ 2016-07-14 17:16         ` Luck, Tony
  2016-07-19 12:32           ` Thomas Gleixner
  0 siblings, 1 reply; 81+ messages in thread
From: Luck, Tony @ 2016-07-14 17:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Thu, Jul 14, 2016 at 08:53:17AM +0200, Thomas Gleixner wrote:
> > Happy to take suggestions for something in between those
> > extremes :-)
> 
> I'd suggest "resctrl" and the abbreviation dictionaries tell me that the most
> common ones for resource are: R, RESORC, RES

OK. "resctrl" it is.

> As a side effect that avoids the whole 'find all tasks' on mount machinery
> simply because the CPU defaults do not change at all.

That's a very good side effect.

It just means that the "tasks" file in the root of the hierachy will
need different read/write functions from those in sub-directories.

read: scan all tasks, print pid for ones with task->rdtgroup == NULL

write: remove task from the rdtgroup list that it was on; set task->rdtgroup = NULL;

> It covers almost everything except the case I outlined before:
> 
>    Isolated CPU	 	    Important Task runs on isolated CPU
>    5% exclusive cache	    10% exclusive cache
> 
> That's impossible with your scheme, but it's something which matters. You want
> to make sure that the system services on that isolated CPU stay cache hot
> without hurting the cache locality of your isolated task.

So the core part of __intel_rdt_sched_in() will look like:

	/*
	 * Precedence rules:
	 * Processes assigned to an rdtgroup use that group
	 * wherever they run. If they don't have an rdtgroup
	 * we see if the current cpu has one and use it.
	 * If no specific rdtgroup was provided, we use the
	 * root_rdtgroup
	 */
	rdtgrp = current->rdtgroup;
	if (!rdtgrp) {
		rdtgrp = per_cpu(cpu_rdtgroup, cpu);
		if (!rdtgrp)
			rdtgrp = root_rdtgroup;
	}

> > Otherwise we can revise the documentation to explain all this better.
> 
> That needs to be done in any case. The existing one does not really qualify as
> proper documentation. It's closer to a fairy tale :)

Yes. We will re-write.

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation
  2016-07-14  0:40   ` David Carrillo-Cisneros
@ 2016-07-14 22:58     ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-07-14 22:58 UTC (permalink / raw)
  To: David Carrillo-Cisneros
  Cc: Thomas Gleixner, Ingo Molnar, Anvin, H Peter, Luck, Tony,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Shankar, Ravi V, Vikas Shivappa, Prakhya,
	Sai Praneeth, linux-kernel, x86

> 
> > +static inline void intel_rdt_cpu_start(int cpu) {
> > +       struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
> > +
> > +       state->closid = 0;
> > +       mutex_lock(&rdt_group_mutex);
> > +       if (rdt_cpumask_update(cpu))
> > +               smp_call_function_single(cpu, cbm_update_msrs, NULL, 1);
> > +       mutex_unlock(&rdt_group_mutex);
> 
> what happens if cpu's with a cache_id not available at boot comes online?

For L3, that case happens when a new socket is hot plugged into the platform.
We don't handle that right now because that needs platform support and I don't
have that kind of platform to test.

But maybe I can add that support in code and do a test in a simulated mode.
Basically that will create a new domain for the new cache_id.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-14 17:16         ` Luck, Tony
@ 2016-07-19 12:32           ` Thomas Gleixner
  2016-08-04 23:38             ` Yu, Fenghua
  0 siblings, 1 reply; 81+ messages in thread
From: Thomas Gleixner @ 2016-07-19 12:32 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Thu, 14 Jul 2016, Luck, Tony wrote:
> So the core part of __intel_rdt_sched_in() will look like:
> 
> 	/*
> 	 * Precedence rules:
> 	 * Processes assigned to an rdtgroup use that group
> 	 * wherever they run. If they don't have an rdtgroup
> 	 * we see if the current cpu has one and use it.
> 	 * If no specific rdtgroup was provided, we use the
> 	 * root_rdtgroup
> 	 */
> 	rdtgrp = current->rdtgroup;
> 	if (!rdtgrp) {
> 		rdtgrp = per_cpu(cpu_rdtgroup, cpu);
> 		if (!rdtgrp)
> 			rdtgrp = root_rdtgroup;
> 	}

That can be done simpler. The default cpu_rdtgroup should be root_rdtgroup. So
you spare one conditional.

Thanks

	tglx

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation
  2016-07-13  9:19   ` Thomas Gleixner
@ 2016-07-21 19:46     ` Shivappa Vikas
  0 siblings, 0 replies; 81+ messages in thread
From: Shivappa Vikas @ 2016-07-21 19:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, H. Peter Anvin, Tony Luck, Tejun Heo,
	Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86



On Wed, 13 Jul 2016, Thomas Gleixner wrote:

> On Tue, 12 Jul 2016, Fenghua Yu wrote:
>>  static int __init intel_rdt_late_init(void)
>>  {
>>  	struct cpuinfo_x86 *c = &boot_cpu_data;
>> @@ -261,9 +331,15 @@ static int __init intel_rdt_late_init(void)
>>  		goto out_err;
>>  	}
>>
>> +	cpu_notifier_register_begin();
>> +
>>  	for_each_online_cpu(i)
>>  		rdt_cpumask_update(i);
>>
>> +	__hotcpu_notifier(intel_rdt_cpu_notifier, 0);
>
> CPU hotplug notifiers are phased out. Please use the new state machine
> interfaces.

Ok, I just see the patch for cqm with the new state machine. Also would need to 
remove the usage of the static tmp cpumask like we did in cqm.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-13  1:02 ` [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
@ 2016-07-22  7:12   ` Marcelo Tosatti
  2016-07-22 21:43     ` Luck, Tony
  0 siblings, 1 reply; 81+ messages in thread
From: Marcelo Tosatti @ 2016-07-22  7:12 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	David Carrillo-Cisneros, Ravi V Shankar, Vikas Shivappa,
	Sai Prakhya, linux-kernel, x86

On Tue, Jul 12, 2016 at 06:02:37PM -0700, Fenghua Yu wrote:
> From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> 
> This patch adds different APIs to manage the L3 cache capacity bitmask.
> The capacity bit mask(CBM) needs to have only contiguous bits set. The
> current implementation has a global CBM for each class of service id.
> There are APIs added to update the CBM via MSR write to IA32_L3_MASK_n
> on all packages. Other APIs are to read and write entries to the
> clos_cbm_table.
> 
> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/intel_rdt.h |   4 ++
>  arch/x86/kernel/cpu/intel_rdt.c  | 133 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 136 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 88b7643..4f45dc8 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -3,6 +3,10 @@
>  
>  #ifdef CONFIG_INTEL_RDT
>  
> +#define MAX_CBM_LENGTH			32
> +#define IA32_L3_CBM_BASE		0xc90
> +#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
> +
>  struct clos_cbm_table {
>  	unsigned long l3_cbm;
>  	unsigned int clos_refcnt;
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index d79213a..6ad5b48 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -34,8 +34,22 @@ static struct clos_cbm_table *cctable;
>   * closid availability bit map.
>   */
>  unsigned long *closmap;
> +/*
> + * Mask of CPUs for writing CBM values. We only need one CPU per-socket.
> + */
> +static cpumask_t rdt_cpumask;
> +/*
> + * Temporary cpumask used during hot cpu notificaiton handling. The usage
> + * is serialized by hot cpu locks.
> + */
> +static cpumask_t tmp_cpumask;
>  static DEFINE_MUTEX(rdt_group_mutex);
>  
> +struct rdt_remote_data {
> +	int msr;
> +	u64 val;
> +};
> +
>  static inline void closid_get(u32 closid)
>  {
>  	struct clos_cbm_table *cct = &cctable[closid];
> @@ -82,11 +96,126 @@ static void closid_put(u32 closid)
>  		closid_free(closid);
>  }
>  
> +static bool cbm_validate(unsigned long var)
> +{
> +	u32 max_cbm_len = boot_cpu_data.x86_cache_max_cbm_len;
> +	unsigned long first_bit, zero_bit;
> +	u64 max_cbm;
> +
> +	if (bitmap_weight(&var, max_cbm_len) < 1)
> +		return false;
> +
> +	max_cbm = (1ULL << max_cbm_len) - 1;
> +	if (var & ~max_cbm)
> +		return false;
> +
> +	first_bit = find_first_bit(&var, max_cbm_len);
> +	zero_bit = find_next_zero_bit(&var, max_cbm_len, first_bit);
> +
> +	if (find_next_bit(&var, max_cbm_len, zero_bit) < max_cbm_len)
> +		return false;
> +
> +	return true;
> +}
> +
> +static int clos_cbm_table_read(u32 closid, unsigned long *l3_cbm)
> +{
> +	u32 maxid = boot_cpu_data.x86_cache_max_closid;
> +
> +	lockdep_assert_held(&rdt_group_mutex);
> +
> +	if (closid >= maxid)
> +		return -EINVAL;
> +
> +	*l3_cbm = cctable[closid].l3_cbm;
> +
> +	return 0;
> +}
> +
> +/*
> + * clos_cbm_table_update() - Update a clos cbm table entry.
> + * @closid: the closid whose cbm needs to be updated
> + * @cbm: the new cbm value that has to be updated
> + *
> + * This assumes the cbm is validated as per the interface requirements
> + * and the cache allocation requirements(through the cbm_validate).
> + */
> +static int clos_cbm_table_update(u32 closid, unsigned long cbm)
> +{
> +	u32 maxid = boot_cpu_data.x86_cache_max_closid;
> +
> +	lockdep_assert_held(&rdt_group_mutex);
> +
> +	if (closid >= maxid)
> +		return -EINVAL;
> +
> +	cctable[closid].l3_cbm = cbm;
> +
> +	return 0;
> +}
> +
> +static bool cbm_search(unsigned long cbm, u32 *closid)
> +{
> +	u32 maxid = boot_cpu_data.x86_cache_max_closid;
> +	u32 i;
> +
> +	for (i = 0; i < maxid; i++) {
> +		if (cctable[i].clos_refcnt &&
> +		    bitmap_equal(&cbm, &cctable[i].l3_cbm, MAX_CBM_LENGTH)) {
> +			*closid = i;
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +static void closcbm_map_dump(void)
> +{
> +	u32 i;
> +
> +	pr_debug("CBMMAP\n");
> +	for (i = 0; i < boot_cpu_data.x86_cache_max_closid; i++) {
> +		pr_debug("l3_cbm: 0x%x,clos_refcnt: %u\n",
> +		 (unsigned int)cctable[i].l3_cbm, cctable[i].clos_refcnt);
> +	}
> +}
> +
> +static void msr_cpu_update(void *arg)
> +{
> +	struct rdt_remote_data *info = arg;
> +
> +	wrmsrl(info->msr, info->val);
> +}
> +
> +/*
> + * msr_update_all() - Update the msr for all packages.
> + */
> +static inline void msr_update_all(int msr, u64 val)
> +{
> +	struct rdt_remote_data info;
> +
> +	info.msr = msr;
> +	info.val = val;
> +	on_each_cpu_mask(&rdt_cpumask, msr_cpu_update, &info, 1);
> +}

How does this patchset handle the following condition:

6) Create reservations in such a way that the sum is larger than
total amount of cache, and CPU pinning (example from Karen Noel):

VM-1 on socket-1 with 80% of reservation.
VM-2 on socket-2 with 80% of reservation.
VM-1 pinned to socket-1.
VM-2 pinned to socket-2.


> +
> +static inline bool rdt_cpumask_update(int cpu)
> +{
> +	cpumask_and(&tmp_cpumask, &rdt_cpumask, topology_core_cpumask(cpu));
> +	if (cpumask_empty(&tmp_cpumask)) {
> +		cpumask_set_cpu(cpu, &rdt_cpumask);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  static int __init intel_rdt_late_init(void)
>  {
>  	struct cpuinfo_x86 *c = &boot_cpu_data;
>  	u32 maxid, max_cbm_len;
> -	int err = 0, size;
> +	int err = 0, size, i;
>  
>  	if (!cpu_has(c, X86_FEATURE_CAT_L3))
>  		return -ENODEV;
> @@ -109,6 +238,8 @@ static int __init intel_rdt_late_init(void)
>  		goto out_err;
>  	}
>  
> +	for_each_online_cpu(i)
> +		rdt_cpumask_update(i);
>  	pr_info("Intel cache allocation enabled\n");
>  out_err:
>  
> -- 
> 2.5.0

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-22  7:12   ` Marcelo Tosatti
@ 2016-07-22 21:43     ` Luck, Tony
  2016-07-23  4:31       ` Marcelo Tosatti
  0 siblings, 1 reply; 81+ messages in thread
From: Luck, Tony @ 2016-07-22 21:43 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	David Carrillo-Cisneros, Ravi V Shankar, Vikas Shivappa,
	Sai Prakhya, linux-kernel, x86

On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
> How does this patchset handle the following condition:
> 
> 6) Create reservations in such a way that the sum is larger than
> total amount of cache, and CPU pinning (example from Karen Noel):
> 
> VM-1 on socket-1 with 80% of reservation.
> VM-2 on socket-2 with 80% of reservation.
> VM-1 pinned to socket-1.
> VM-2 pinned to socket-2.

That's legal, but perhaps we need a description of
overlapping cache reservations.

Hardware tells you how finely you can divide the cache (and this
information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
you from digging in CPUID leaves).  E.g. on Broadwell the value is
20, so you can control cache allocations in 5% slices.

A bitmask defines which slices you can use (and h/w has the restriction
that you must have contiguous '1' bits in any mask).  So you can pick
your 80% using 0x0ffff, 0x1fffe, 0x3fffc, 0x7fff8 or 0xffff0.

There is no requirement that masks be exclusive of each other. So
you might pick the two extremes: 0x0ffff and 0xffff0 for your two
VM's in this example. Each would be allowed to allocate up to 80%,
but with a big overlap in the middle. Each has 20% exclusive, but
there is a 60% range in the middle that they would compete for.

Is this specific case useful? Possibly not.  I think the more common
overlap cases might be between processes that you know have shared
code/data. Also the case where some rdtgroup has access to allocate
in the entire cache (mask 0xfffff on Broadwell) and some other rdtgroups
have limited cache allocation with less bits in the mask.

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-22 21:43     ` Luck, Tony
@ 2016-07-23  4:31       ` Marcelo Tosatti
  2016-07-26  3:18         ` Luck, Tony
  2016-07-26 17:10         ` Shivappa Vikas
  0 siblings, 2 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2016-07-23  4:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	David Carrillo-Cisneros, Ravi V Shankar, Vikas Shivappa,
	Sai Prakhya, linux-kernel, x86

On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
> > How does this patchset handle the following condition:
> > 
> > 6) Create reservations in such a way that the sum is larger than
> > total amount of cache, and CPU pinning (example from Karen Noel):
> > 
> > VM-1 on socket-1 with 80% of reservation.
> > VM-2 on socket-2 with 80% of reservation.
> > VM-1 pinned to socket-1.
> > VM-2 pinned to socket-2.
> 
> That's legal, but perhaps we need a description of
> overlapping cache reservations.
> 
> Hardware tells you how finely you can divide the cache (and this
> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
> you from digging in CPUID leaves).  E.g. on Broadwell the value is
> 20, so you can control cache allocations in 5% slices.
> 
> A bitmask defines which slices you can use (and h/w has the restriction
> that you must have contiguous '1' bits in any mask).  So you can pick
> your 80% using 0x0ffff, 0x1fffe, 0x3fffc, 0x7fff8 or 0xffff0.
> 
> There is no requirement that masks be exclusive of each other. So
> you might pick the two extremes: 0x0ffff and 0xffff0 for your two
> VM's in this example. Each would be allowed to allocate up to 80%,
> but with a big overlap in the middle. Each has 20% exclusive, but
> there is a 60% range in the middle that they would compete for.

This are different sockets, so there is no competing/sharing of L3 cache
here: the question is about whether the interface allows the
user to specify that 80/80 reservation without complaining:
because the VM's are pinned, they will never actually
share the same L3 cache.

(haven't finished reading the patchset to be certain).

> Is this specific case useful? Possibly not.  I think the more common
> overlap cases might be between processes that you know have shared
> code/data. Also the case where some rdtgroup has access to allocate
> in the entire cache (mask 0xfffff on Broadwell) and some other
> rdtgroups
> have limited cache allocation with less bits in the mask.
>
> -Tony

All you have to do is to build the bitmask for a given processor
from the union of the tasks which have been scheduled on that
processor.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
  2016-07-13  1:02 ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
@ 2016-07-25 16:25   ` Nilay Vaish
  2016-07-25 16:31   ` Nilay Vaish
  1 sibling, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-25 16:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 4f45dc8..afb6da3 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -3,14 +3,42 @@
>
>  #ifdef CONFIG_INTEL_RDT
>
> +#include <linux/jump_label.h>
> +
>  #define MAX_CBM_LENGTH                 32
>  #define IA32_L3_CBM_BASE               0xc90
>  #define CBM_FROM_INDEX(x)              (IA32_L3_CBM_BASE + x)
>
> +extern struct static_key rdt_enable_key;
> +void __intel_rdt_sched_in(void *dummy);
> +
>  struct clos_cbm_table {
>         unsigned long l3_cbm;
>         unsigned int clos_refcnt;
>  };
>
> +/*
> + * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
> + *
> + * Following considerations are made so that this has minimal impact
> + * on scheduler hot path:
> + * - This will stay as no-op unless we are running on an Intel SKU
> + * which supports L3 cache allocation.
> + * - Caches the per cpu CLOSid values and does the MSR write only
> + * when a task with a different CLOSid is scheduled in.
> + */
> +static inline void intel_rdt_sched_in(void)
> +{
> +       /*
> +        * Call the schedule in code only when RDT is enabled.
> +        */
> +       if (static_key_false(&rdt_enable_key))
> +               __intel_rdt_sched_in(NULL);

static_key_false() is deprecated.  I think this should be
static_branch_unlikely().

> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 6ad5b48..8379df8 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -44,12 +46,33 @@ static cpumask_t rdt_cpumask;
>   */
>  static cpumask_t tmp_cpumask;
>  static DEFINE_MUTEX(rdt_group_mutex);
> +struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
>

Similarly, this should be DEFINE_STATIC_KEY_FALSE(rdt_enable_key);


Thanks
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
  2016-07-13  1:02 ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
  2016-07-25 16:25   ` Nilay Vaish
@ 2016-07-25 16:31   ` Nilay Vaish
  2016-07-25 18:05     ` Luck, Tony
  1 sibling, 1 reply; 81+ messages in thread
From: Nilay Vaish @ 2016-07-25 16:31 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
>
> Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For
> Cache Allocation, MSR write would let the task fill in the cache
> 'subset' represented by the task's capacity bit mask.
>
> The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
> CLOSid. During context switch kernel implements this by writing the
> CLOSid of the task belongs to the CPU's IA32_PQR_ASSOC MSR.
>
> This patch also implements a common software cache for IA32_PQR_MSR
> (RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and
> Cache allocation. CMT updates the RMID where as cache_alloc updates the
> CLOSid in the software cache. During scheduling when the new RMID/CLOSid
> value is different from the cached values, IA32_PQR_MSR is updated.
> Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250
>  cycles) this software cache is necessary to avoid reading the MSR to
> compare the current CLOSid value.
>
> The following considerations are done for the PQR MSR write so that it
> minimally impacts scheduler hot path:
>  - This path does not exist on any non-intel platforms.
>  - On Intel platforms, this would not exist by default unless INTEL_RDT
>  is enabled.
>  - remains a no-op when INTEL_RDT is enabled and intel SKU does not
>  support the feature.
>  - When feature is available and enabled, never does MSR write till the
>  user manually starts using one of the capacity bit masks.
>  - MSR write is only done when there is a task with different Closid is
>  scheduled on the CPU. Typically if the task groups are bound to be
>  scheduled on a set of CPUs, the number of MSR writes is greatly
>  reduced.
>  - A per CPU cache of CLOSids is maintained to do the check so that we
>  don't have to do a rdmsr which actually costs a lot of cycles.
>

I was thinking more about this software caching of CLOSids.  How
likely do you think these CLOSids would be found cached?  I think the
software cache would be very infrequently accessed, so it seems you
are likely to miss these in all levels of cache hierarchy and more
likely to have to fetch these from the main memory, which itself might
cost ~250 cycles.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
  2016-07-25 16:31   ` Nilay Vaish
@ 2016-07-25 18:05     ` Luck, Tony
  2016-07-25 22:47       ` David Carrillo-Cisneros
  0 siblings, 1 reply; 81+ messages in thread
From: Luck, Tony @ 2016-07-25 18:05 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Mon, Jul 25, 2016 at 11:31:24AM -0500, Nilay Vaish wrote:
> I was thinking more about this software caching of CLOSids.  How
> likely do you think these CLOSids would be found cached?  I think the
> software cache would be very infrequently accessed, so it seems you
> are likely to miss these in all levels of cache hierarchy and more
> likely to have to fetch these from the main memory, which itself might
> cost ~250 cycles.

We need to avoid reading the PQR_ASSOC MSR (which would cost far
more than 250 cycles).  Life is complicated here because this
MSR contains the CLOSID in the upper half, and the RMID (owned
by the perf code to measure cache occupancy and memory bandwidth)
in the lower half.

I've wondered whether it would be possible to include both the
CLOSID and the RMID in either the task or thread struct (in some
spot that was likely to be pulled into cache by access to adjacent
members during context switch).  This would smooth the path in
context switch (since we can look at the values from the previous
process and compare them to the ones for the new process to see
whether we need to do the wrmsr()).  But it likely needs a bunch
of other changes to update the values when a process migrates
between cpus to get the closid precedence that tglx requires.

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
  2016-07-25 18:05     ` Luck, Tony
@ 2016-07-25 22:47       ` David Carrillo-Cisneros
  0 siblings, 0 replies; 81+ messages in thread
From: David Carrillo-Cisneros @ 2016-07-25 22:47 UTC (permalink / raw)
  To: Luck, Tony, Nilay Vaish
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, Ravi V Shankar, Vikas Shivappa, Sai Prakhya,
	linux-kernel, x86

On Mon, Jul 25, 2016 at 11:05 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> On Mon, Jul 25, 2016 at 11:31:24AM -0500, Nilay Vaish wrote:
> > I was thinking more about this software caching of CLOSids.  How
> > likely do you think these CLOSids would be found cached?  I think the
> > software cache would be very infrequently accessed, so it seems you
> > are likely to miss these in all levels of cache hierarchy and more
> > likely to have to fetch these from the main memory, which itself might
> > cost ~250 cycles.
>
> We need to avoid reading the PQR_ASSOC MSR (which would cost far
> more than 250 cycles).  Life is complicated here because this
> MSR contains the CLOSID in the upper half, and the RMID (owned
> by the perf code to measure cache occupancy and memory bandwidth)
> in the lower half.


On my Haswell machine, writing PQR_ASSOC_MSR takes about 380 cycles.
As Tony said, CQM/CMT writes to the same register, and it does it
twice (once to delete the old event, once to add the new one).

So, if a CQM/CMT or MBM is used with CAT, there will be 3 writes to
PQR_ASSOC_MSR per context switch and it's quite likely that the
software cache's cache line will be there for the 2 last writes.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-23  4:31       ` Marcelo Tosatti
@ 2016-07-26  3:18         ` Luck, Tony
  2016-07-26 17:10         ` Shivappa Vikas
  1 sibling, 0 replies; 81+ messages in thread
From: Luck, Tony @ 2016-07-26  3:18 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Yu, Fenghua, Thomas Gleixner, Ingo Molnar, Anvin, H Peter,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	David Carrillo-Cisneros, Shankar, Ravi V, Vikas Shivappa,
	Prakhya, Sai Praneeth, linux-kernel, x86

You must specify a mask for each L3 cache. So you can achieve your 80/80 split either with one rdtgroup that has an 80% mask on each of the sockets and using affinity to make one VM run only on CPUs on one socket and the second VM on the other. 

Or separate rdtgroups for each VM that give them the 80% when they are on their own socket and the spare 20% if the wander off to the other socket.

Sent from my iPhone

> On Jul 25, 2016, at 19:13, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
>> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>> 
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>> 
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>> 
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>> 
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>> 
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0ffff, 0x1fffe, 0x3fffc, 0x7fff8 or 0xffff0.
>> 
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0ffff and 0xffff0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
> 
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
> 
> (haven't finished reading the patchset to be certain).
> 
>> Is this specific case useful? Possibly not.  I think the more common
>> overlap cases might be between processes that you know have shared
>> code/data. Also the case where some rdtgroup has access to allocate
>> in the entire cache (mask 0xfffff on Broadwell) and some other
>> rdtgroups
>> have limited cache allocation with less bits in the mask.
>> 
>> -Tony
> 
> All you have to do is to build the bitmask for a given processor
> from the union of the tasks which have been scheduled on that
> processor.
> 
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management
  2016-07-23  4:31       ` Marcelo Tosatti
  2016-07-26  3:18         ` Luck, Tony
@ 2016-07-26 17:10         ` Shivappa Vikas
  1 sibling, 0 replies; 81+ messages in thread
From: Shivappa Vikas @ 2016-07-26 17:10 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Luck, Tony, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Tejun Heo, Borislav Petkov, Stephane Eranian,
	Peter Zijlstra, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86



On Sat, 23 Jul 2016, Marcelo Tosatti wrote:

> On Fri, Jul 22, 2016 at 02:43:23PM -0700, Luck, Tony wrote:
>> On Fri, Jul 22, 2016 at 04:12:04AM -0300, Marcelo Tosatti wrote:
>>> How does this patchset handle the following condition:
>>>
>>> 6) Create reservations in such a way that the sum is larger than
>>> total amount of cache, and CPU pinning (example from Karen Noel):
>>>
>>> VM-1 on socket-1 with 80% of reservation.
>>> VM-2 on socket-2 with 80% of reservation.
>>> VM-1 pinned to socket-1.
>>> VM-2 pinned to socket-2.
>>
>> That's legal, but perhaps we need a description of
>> overlapping cache reservations.
>>
>> Hardware tells you how finely you can divide the cache (and this
>> information is shown in /sys/fs/resctrl/info/l3/max_cbm_len to save
>> you from digging in CPUID leaves).  E.g. on Broadwell the value is
>> 20, so you can control cache allocations in 5% slices.
>>
>> A bitmask defines which slices you can use (and h/w has the restriction
>> that you must have contiguous '1' bits in any mask).  So you can pick
>> your 80% using 0x0ffff, 0x1fffe, 0x3fffc, 0x7fff8 or 0xffff0.
>>
>> There is no requirement that masks be exclusive of each other. So
>> you might pick the two extremes: 0x0ffff and 0xffff0 for your two
>> VM's in this example. Each would be allowed to allocate up to 80%,
>> but with a big overlap in the middle. Each has 20% exclusive, but
>> there is a 60% range in the middle that they would compete for.
>
> This are different sockets, so there is no competing/sharing of L3 cache
> here: the question is about whether the interface allows the
> user to specify that 80/80 reservation without complaining:
> because the VM's are pinned, they will never actually
> share the same L3 cache.
>
> (haven't finished reading the patchset to be certain).

This series adds the per-socket support (See 23/32) - which will be folded into 
these which will should make it easier to read like Thomas suggested. 
The first 12 patches are the same as the old ones with the cgroup interface..
with the cgroup we were stuck with the 
atomicity issue to support per-socket because there was no way to define an 
interface to guarentee atomicity to the user when he wants an allocation of 
masks across different sockets

(basically he could end up in a situation where he could get a mask on one 
socket but not able to get it on an other as he ran out of closids, now 
the interface does it all at once - requiring the user to specify the masks 
for all sockets if you see the interface in later patches..) You could still do 
the above case you say with cgroup by co-mounting cpusets
and global masks but end up wasting closids.

Thanks,
Vikas

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection
  2016-07-13  1:02 ` [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
@ 2016-07-26 19:00   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-26 19:00 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 0fe6953..42c90cb 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -711,6 +711,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>                 }
>         }
>
> +       /* Additional Intel-defined flags: level 0x00000010 */
> +       if (c->cpuid_level >= 0x00000010) {
> +               u32 eax, ebx, ecx, edx;
> +
> +               cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
> +               c->x86_capability[14] = ebx;

Should we have a name for this capability / leaf, instead of that
constant 14?  grep informs me that everywhere else we use enum of type
cpuid_leafs.


--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization
  2016-07-13  1:02 ` [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization Fenghua Yu
@ 2016-07-26 19:23   ` Nilay Vaish
  2016-07-26 20:32     ` Shivappa Vikas
  0 siblings, 1 reply; 81+ messages in thread
From: Nilay Vaish @ 2016-07-26 19:23 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
>
> On Intel SKUs that support Code Data Prioritization(CDP), intel_rdt
> operates in 2 modes - legacy cache allocation mode/default or CDP mode.
>
> When CDP is enabled, the number of available CLOSids is halved. Hence the
> enabling is done when less than half the number of CLOSids available are
> used. When CDP is enabled each CLOSid maps to a
> data cache mask and an instruction cache mask. The enabling itself is done
> by writing to the IA32_PQOS_CFG MSR and can dynamically be enabled or
> disabled.
>
> CDP is disabled when for each (dcache_cbm,icache_cbm) pair, the
> dcache_cbm = icache_cbm.
>
> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/intel_rdt.h |  7 +++++
>  arch/x86/kernel/cpu/intel_rdt.c  | 66 ++++++++++++++++++++++++++--------------
>  2 files changed, 51 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 6e20314..f2cb91d 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -8,6 +8,7 @@
>  #define MAX_CBM_LENGTH                 32
>  #define IA32_L3_CBM_BASE               0xc90
>  #define CBM_FROM_INDEX(x)              (IA32_L3_CBM_BASE + x)
> +#define MSR_IA32_PQOS_CFG              0xc81
>
>  extern struct static_key rdt_enable_key;
>  void __intel_rdt_sched_in(void *dummy);
> @@ -17,6 +18,12 @@ struct clos_cbm_table {
>         unsigned int clos_refcnt;
>  };
>
> +struct clos_config {
> +       unsigned long *closmap;
> +       u32 max_closid;
> +       u32 closids_used;
> +};
> +

I think most of this patch is not about CDP, but about moving from an
independently defined closmap to one defined as part of struct
clos_config.  I suggest we combine part of this patch with patch 03/32
and work with struct clos_config right from the beginning.


Thanks
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization
  2016-07-26 19:23   ` Nilay Vaish
@ 2016-07-26 20:32     ` Shivappa Vikas
  0 siblings, 0 replies; 81+ messages in thread
From: Shivappa Vikas @ 2016-07-26 20:32 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tony Luck, Tejun Heo, Borislav Petkov, Stephane Eranian,
	Peter Zijlstra, Marcelo Tosatti, David Carrillo-Cisneros,
	Ravi V Shankar, Vikas Shivappa, Sai Prakhya, linux-kernel, x86



On Tue, 26 Jul 2016, Nilay Vaish wrote:

> On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
>> From: Vikas Shivappa <vikas.shivappa@linux.intel.com>
>>
>> On Intel SKUs that support Code Data Prioritization(CDP), intel_rdt
>> operates in 2 modes - legacy cache allocation mode/default or CDP mode.
>>
>> When CDP is enabled, the number of available CLOSids is halved. Hence the
>> enabling is done when less than half the number of CLOSids available are
>> used. When CDP is enabled each CLOSid maps to a
>> data cache mask and an instruction cache mask. The enabling itself is done
>> by writing to the IA32_PQOS_CFG MSR and can dynamically be enabled or
>> disabled.
>>
>> CDP is disabled when for each (dcache_cbm,icache_cbm) pair, the
>> dcache_cbm = icache_cbm.
>>
>> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> ---
>>  arch/x86/include/asm/intel_rdt.h |  7 +++++
>>  arch/x86/kernel/cpu/intel_rdt.c  | 66 ++++++++++++++++++++++++++--------------
>>  2 files changed, 51 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
>> index 6e20314..f2cb91d 100644
>> --- a/arch/x86/include/asm/intel_rdt.h
>> +++ b/arch/x86/include/asm/intel_rdt.h
>> @@ -8,6 +8,7 @@
>>  #define MAX_CBM_LENGTH                 32
>>  #define IA32_L3_CBM_BASE               0xc90
>>  #define CBM_FROM_INDEX(x)              (IA32_L3_CBM_BASE + x)
>> +#define MSR_IA32_PQOS_CFG              0xc81
>>
>>  extern struct static_key rdt_enable_key;
>>  void __intel_rdt_sched_in(void *dummy);
>> @@ -17,6 +18,12 @@ struct clos_cbm_table {
>>         unsigned int clos_refcnt;
>>  };
>>
>> +struct clos_config {
>> +       unsigned long *closmap;
>> +       u32 max_closid;
>> +       u32 closids_used;
>> +};
>> +
>
> I think most of this patch is not about CDP, but about moving from an
> independently defined closmap to one defined as part of struct
> clos_config.  I suggest we combine part of this patch with patch 03/32
> and work with struct clos_config right from the beginning.

Its moved to struct to track the number of closids used as the cdp mode can be 
switched only when half the number of closids. it has other updates like 
using the cctable as its not indexed by raw closid but depends on whether closid 
is used for data or i cache

Thanks,
Vikas

>
>
> Thanks
> Nilay
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
  2016-07-13 12:47   ` Thomas Gleixner
@ 2016-07-27 16:20   ` Nilay Vaish
  2016-07-27 16:57     ` Luck, Tony
  2016-08-03 22:15   ` Marcelo Tosatti
  2 siblings, 1 reply; 81+ messages in thread
From: Nilay Vaish @ 2016-07-27 16:20 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> +1. Terms
> +========
> +
> +We use the following terms and concepts in this documentation.
> +
> +RDT: Intel Resoure Director Technology
> +
> +CAT: Cache Allocation Technology
> +
> +CDP: Code and Data Prioritization
> +
> +CBM: Cache Bit Mask
> +
> +Cache ID: A cache identification. It is unique in one cache index on the
> +platform. User can find cache ID in cache sysfs interface:
> +/sys/devices/system/cpu/cpu*/cache/index*/id
> +
> +Share resource domain: A few different resources can share same QoS mask
> +MSRs array. For example, one L2 cache can share QoS MSRs with its next level
> +L3 cache. A domain number represents the L2 cache, the L3 cache, the L2
> +cache's shared cpumask, and the L3 cache's shared cpumask.
> +

I think CLOS ID should be defined here.  Is it same as Cache ID?


> +As one example, CAT L3's schema format is:
> +
> +L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> +
> +On a two socket machine, L3's schema line could be:
> +
> +L3:0=ff;1=c0
> +
> +which means this line in "schemas" file is for CAT L3, L3 cache id 0's CBM
> +is 0xff, and L3 cache id 1's CBM is 0xc0.
> +
> +If one resource is disabled, its line is not shown in schemas file.
> +
> +The schema line can be expended for situations. L3 cbms format can be
> +expended to CDP enabled L3 cbms format:
> +
> +L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
> +
> +Initial value is all ones which means all tasks use all resources initially.
> +

Your example here makes me feel that Cache ID and CLOS ID mean the same thing.


> +7. Some usage examples
> +======================
> +
> +7.1 Example 1 for sharing CLOSID on socket 0 between two partitions
> +
> +Only L3 cbm is enabled. Assume the machine is 2-socket and dual-core without
> +hyperthreading.
> +
> +#mount -t rscctrl rscctrl /sys/fs/rscctrl
> +#cd /sys/fs/rscctrl
> +#mkdir p0 p1
> +#echo "L3:0=3;1=c" > /sys/fs/rscctrl/p0/schemas
> +#echo "L3:0=3;1=3" > /sys/fs/rscctrl/p1/schemas
> +
> +In partition p0, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
> +CLOSID 0 for cbm=0xc on socket 1.
> +
> +In partition p1, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
> +CLOSID 1 for cbm=0x3 on socket 1.


And over here you have switched to using CLOS ID and you do not
mention Cache ID at all.
As I said above, I think Cache ID and CLOS ID are the same thing.  If
that is the case, I think Cache ID should be completely replaced with
CLOS ID.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code
  2016-07-13  1:02 ` [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code Fenghua Yu
@ 2016-07-27 16:49   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-27 16:49 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> Define two new cpuid leaves for CAT and CDP. The leaves are used in
> x86_capability to avoid hard coded index.
>
> Clean comments for RDT, CAT_L3, and CDP_L3 cpufeatures.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/cpufeature.h  |  2 ++
>  arch/x86/include/asm/cpufeatures.h |  6 +++---
>  arch/x86/include/asm/processor.h   |  6 +++---
>  arch/x86/kernel/cpu/common.c       | 11 +++++++----
>  4 files changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 483fb54..cd3b0bd 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -28,6 +28,8 @@ enum cpuid_leafs
>         CPUID_8000_000A_EDX,
>         CPUID_7_ECX,
>         CPUID_8000_0007_EBX,
> +       CPUID_10_0_EBX,
> +       CPUID_10_1_ECX,
>  };
>
>  #ifdef CONFIG_X86_FEATURE_NAMES
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 16489b3..588932a 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -220,7 +220,7 @@
>  #define X86_FEATURE_RTM                ( 9*32+11) /* Restricted Transactional Memory */
>  #define X86_FEATURE_CQM                ( 9*32+12) /* Cache QoS Monitoring */
>  #define X86_FEATURE_MPX                ( 9*32+14) /* Memory Protection Extension */
> -#define X86_FEATURE_RDT                ( 9*32+15) /* Resource Allocation */
> +#define X86_FEATURE_RDT                ( 9*32+15) /* Resource Director Technology */
>  #define X86_FEATURE_AVX512F    ( 9*32+16) /* AVX-512 Foundation */
>  #define X86_FEATURE_AVX512DQ   ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */
>  #define X86_FEATURE_RDSEED     ( 9*32+18) /* The RDSEED instruction */
> @@ -289,10 +289,10 @@
>  #define X86_FEATURE_SMCA       (17*32+ 3) /* Scalable MCA */
>
>  /* Intel-defined CPU features, CPUID level 0x00000010:0 (ebx), word 18 */
> -#define X86_FEATURE_CAT_L3      (18*32+ 1) /* Cache Allocation L3 */
> +#define X86_FEATURE_CAT_L3     (18*32+ 1) /* Cache Allocation L3 */
>
>  /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x00000010:1 (ecx), word 19 */
> -#define X86_FEATURE_CDP_L3     (19*32+ 2) /* Code data prioritization L3 */
> +#define X86_FEATURE_CDP_L3     (19*32+ 2) /* Code Data Prioritization L3 */
>
>  /*
>   * BUG word(s)
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 598c9bc..308aa03 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -119,9 +119,9 @@ struct cpuinfo_x86 {
>         int                     x86_cache_occ_scale;    /* scale to bytes */
>         int                     x86_power;
>         unsigned long           loops_per_jiffy;
> -       /* Cache Allocation values: */
> -       u16                     x86_cache_max_cbm_len;
> -       u16                     x86_cache_max_closid;
> +       /* Cache Allocation l3 values: */
> +       u16                     x86_l3_max_cbm_len;
> +       u16                     x86_l3_max_closid;
>         /* cpuid returned max cores value: */
>         u16                      x86_max_cores;
>         u16                     apicid;
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index a695e58..e945e70 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -716,14 +716,17 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>                 u32 eax, ebx, ecx, edx;
>
>                 cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
> -               c->x86_capability[17] = ebx;
> +               c->x86_capability[CPUID_10_0_EBX] = ebx;
>
>                 if (cpu_has(c, X86_FEATURE_CAT_L3)) {
>
>                         cpuid_count(0x00000010, 1, &eax, &ebx, &ecx, &edx);
> -                       c->x86_cache_max_closid = edx + 1;
> -                       c->x86_cache_max_cbm_len = eax + 1;
> -                       c->x86_capability[18] = ecx;
> +                       c->x86_l3_max_closid = edx + 1;
> +                       c->x86_l3_max_cbm_len = eax + 1;
> +                       c->x86_capability[CPUID_10_1_ECX] = ecx;
> +               } else {
> +                       c->x86_l3_max_closid = -1;
> +                       c->x86_l3_max_cbm_len = -1;
>                 }
>         }
>

I think this patch should be earlier in this patch series where the
constants 17 and 18 were first used.  Similarly the renaming of
variables, I think, should be merged with the patch that originally
introduced them.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-27 16:20   ` Nilay Vaish
@ 2016-07-27 16:57     ` Luck, Tony
  0 siblings, 0 replies; 81+ messages in thread
From: Luck, Tony @ 2016-07-27 16:57 UTC (permalink / raw)
  To: Nilay Vaish
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On Wed, Jul 27, 2016 at 11:20:31AM -0500, Nilay Vaish wrote:
> And over here you have switched to using CLOS ID and you do not
> mention Cache ID at all.
> As I said above, I think Cache ID and CLOS ID are the same thing.  If
> that is the case, I think Cache ID should be completely replaced with
> CLOS ID.

Thanks for the input.  We need to clarify things here.

cache id = unique number identifying a cache in the system. At current state
of the patch we only support L3 (a.k.a LLC) CAT, so the cache id is pretty
much the socket id.

CLOS ID = number we program into the PQR_ASSOC MSR to define the resources
available to the currently running process. This number indexes into the
arrays of bitmasks used to constrain the process. Currenlty just the L3_CBM
MSRs ... but when more resources are added, we use the same CLOS ID to index
all of them.

So if we have a schema file that says:

L3:0=00fff,1=ff000

it means a bunch of things:

1) This is a two socket system (since we have two L3 caches to control)

2) Processes assigned to this rdtgroup will be allowed to use different
amounts of cache when they run on cpus in each of the two sockets (the
"low" 60% on cache id 0 (socket 0) and the "high" 40% on cache id 1.

We can't tell from this which CLOS IDs the kernel decided to allocate
to implement this policy. If we just had the default rdtgroup and this
group as the only groups available, then it is likely that the kernel
will pick CLOS ID 1 on both sockets, and then assign the L3_CBM[1]
MSR on socket 0 with 0xfff, and the L3_CBM[1] MSR on socket 1 with the
value 0xff000

If the default rdtgroup still has all cache available
L3:0=fffff,1=fffff

and we add another rdtgroup and give it:

L3:0=fffff,1=ff000

the kernel will notice that we want this new rdtgroup to use the
same masks as exiting groups ... so it can use CLOS ID 0 on socket
0 (same as default) and CLOS ID 1 on socket 1 (same as the first
example I gave above).

We do this sharing because there are a limited number of CLOS ID
values (limited by the size of the array of L3_CBM MSRs.

Hope that is clearer (and we will make a new version of the Docs
for the next version)

-Tony

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 15/32] cacheinfo: Introduce cache id
  2016-07-13  1:02 ` [PATCH 15/32] cacheinfo: Introduce cache id Fenghua Yu
@ 2016-07-27 17:04   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-27 17:04 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> Each cache is described by cacheinfo and is unique in the same index
> across the platform. But there is no id for a cache. We introduce cache
> ID to identify a cache.
>
> Intel Cache Allocation Technology (CAT) allows some control on the
> allocation policy within each cache that it controls. We need a unique
> cache ID for each cache level to allow the user to specify which
> controls are applied to which cache. Cache id is a concise way to specify
> a cache.
>
> Cache id is first enabled on x86. It can be enabled on other platforms
> as well. The cache id is not necessary contiguous.
>
> Add an "id" entry to /sys/devices/system/cpu/cpu*/cache/index*/


Can you explain what index and platform mean?  I think those terms are
generic in nature.   May be an example on how cache ids would be
assigned to different caches would help.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86
  2016-07-13  1:02 ` [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86 Fenghua Yu
@ 2016-07-28  5:41   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-28  5:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> Enable cache id in x86. Cache id comes from APIC ID and CPUID4.
>

I think one of these patches on cache ids should refer to some
documentation from Intel on this subject, either in the commit message
or in the comments in some file.  I found one:
https://software.intel.com/sites/default/files/63/1a/Kuo_CpuTopology_rc1.rh1.final.pdf.
You would know better than me which document we should be looking at.

Thanks
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct
  2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
  2016-07-13 12:56   ` Thomas Gleixner
@ 2016-07-28  5:53   ` Nilay Vaish
  1 sibling, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-28  5:53 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> rg_list is linked list to connect to other tasks in a rdtgroup.
>
> The point of rdtgroup allows the task to access its own rdtgroup directly.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/sched.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 253538f..55adf17 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1761,6 +1761,10 @@ struct task_struct {
>         /* cg_list protected by css_set_lock and tsk->alloc_lock */
>         struct list_head cg_list;
>  #endif
> +#ifdef CONFIG_INTEL_RDT
> +       struct list_head rg_list;
> +       struct rdtgroup *rdtgroup;
> +#endif
>  #ifdef CONFIG_FUTEX
>         struct robust_list_head __user *robust_list;
>  #ifdef CONFIG_COMPAT
> --
> 2.5.0
>

I think this patch should be merged with patch 22/32 since struct
rdtgroup 's definition appears in the patch 22/32.  I do not recall
seeing any forward declaration either.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 20/32] magic number for rscctrl file system
  2016-07-13  1:02 ` [PATCH 20/32] magic number for rscctrl file system Fenghua Yu
@ 2016-07-28  5:57   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-28  5:57 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/uapi/linux/magic.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
> index 546b388..655036a 100644
> --- a/include/uapi/linux/magic.h
> +++ b/include/uapi/linux/magic.h
> @@ -57,6 +57,8 @@
>  #define CGROUP_SUPER_MAGIC     0x27e0eb
>  #define CGROUP2_SUPER_MAGIC    0x63677270
>
> +#define RDTGROUP_SUPER_MAGIC   0x7655821
> +
>
>  #define STACK_END_MAGIC                0x57AC6E9D


Just for my information, can you tell me how did you choose the magic number?

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c
  2016-07-13  1:02 ` [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c Fenghua Yu
@ 2016-07-28 14:07   ` Nilay Vaish
  0 siblings, 0 replies; 81+ messages in thread
From: Nilay Vaish @ 2016-07-28 14:07 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	Marcelo Tosatti, David Carrillo-Cisneros, Ravi V Shankar,
	Vikas Shivappa, Sai Prakhya, linux-kernel, x86

On 12 July 2016 at 20:02, Fenghua Yu <fenghua.yu@intel.com> wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
>
> The header mainly provides functions to call from the user interface
> file intel_rdt_rdtgroup.c.
>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/intel_rdt.h | 87 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 81 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index f2cb91d..4c5e0ac 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -3,27 +3,99 @@
>
>  #ifdef CONFIG_INTEL_RDT
>
> +#include <linux/seq_file.h>
>  #include <linux/jump_label.h>
>
> -#define MAX_CBM_LENGTH                 32
>  #define IA32_L3_CBM_BASE               0xc90
> -#define CBM_FROM_INDEX(x)              (IA32_L3_CBM_BASE + x)
> -#define MSR_IA32_PQOS_CFG              0xc81
> +#define L3_CBM_FROM_INDEX(x)           (IA32_L3_CBM_BASE + x)
> +
> +#define MSR_IA32_L3_QOS_CFG            0xc81
> +
> +enum resource_type {
> +       RESOURCE_L3  = 0,
> +       RESOURCE_NUM = 1,
> +};
> +
> +#define MAX_CACHE_LEAVES        4
> +#define MAX_CACHE_DOMAINS       64
> +
> +DECLARE_PER_CPU_READ_MOSTLY(int, cpu_l3_domain);
> +DECLARE_PER_CPU_READ_MOSTLY(struct rdtgroup *, cpu_rdtgroup);
>
>  extern struct static_key rdt_enable_key;
>  void __intel_rdt_sched_in(void *dummy);
> +extern bool use_rdtgroup_tasks;
> +
> +extern bool cdp_enabled;
> +
> +struct rdt_opts {
> +       bool cdp_enabled;
> +       bool verbose;
> +       bool simulate_cat_l3;
> +};
> +
> +struct cache_domain {
> +       cpumask_t shared_cpu_map[MAX_CACHE_DOMAINS];
> +       unsigned int max_cache_domains_num;
> +       unsigned int level;
> +       unsigned int shared_cache_id[MAX_CACHE_DOMAINS];
> +};
> +
> +extern struct rdt_opts rdt_opts;
>
>  struct clos_cbm_table {
> -       unsigned long l3_cbm;
> +       unsigned long cbm;
>         unsigned int clos_refcnt;
>  };
>
>  struct clos_config {
> -       unsigned long *closmap;
> +       unsigned long **closmap;
>         u32 max_closid;
> -       u32 closids_used;
>  };
>
> +struct shared_domain {
> +       struct cpumask cpumask;
> +       int l3_domain;
> +};
> +
> +#define for_each_cache_domain(domain, start_domain, max_domain)        \
> +       for (domain = start_domain; domain < max_domain; domain++)
> +
> +extern struct clos_config cconfig;
> +extern struct shared_domain *shared_domain;
> +extern int shared_domain_num;
> +
> +extern struct rdtgroup *root_rdtgrp;
> +extern void rdtgroup_fork(struct task_struct *child);
> +extern void rdtgroup_post_fork(struct task_struct *child);
> +
> +extern struct clos_cbm_table **l3_cctable;
> +
> +extern unsigned int min_bitmask_len;
> +extern void msr_cpu_update(void *arg);
> +extern inline void closid_get(u32 closid, int domain);
> +extern void closid_put(u32 closid, int domain);
> +extern void closid_free(u32 closid, int domain, int level);
> +extern int closid_alloc(u32 *closid, int domain);
> +extern bool cat_l3_enabled;
> +extern unsigned int get_domain_num(int level);
> +extern struct shared_domain *shared_domain;
> +extern int shared_domain_num;
> +extern inline int get_dcbm_table_index(int x);
> +extern inline int get_icbm_table_index(int x);
> +
> +extern int get_cache_leaf(int level, int cpu);
> +
> +extern void cbm_update_l3_msr(void *pindex);
> +extern int level_to_leaf(int level);
> +
> +extern void init_msrs(bool cdpenabled);
> +extern bool cat_enabled(int level);
> +extern u64 max_cbm(int level);
> +extern u32 max_cbm_len(int level);
> +
> +extern void rdtgroup_exit(struct task_struct *tsk);
> +
>  /*
>   * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
>   *
> @@ -54,6 +126,9 @@ static inline void intel_rdt_sched_in(void)
>  #else
>
>  static inline void intel_rdt_sched_in(void) {}
> +static inline void rdtgroup_fork(struct task_struct *child) {}
> +static inline void rdtgroup_post_fork(struct task_struct *child) {}
> +static inline void rdtgroup_exit(struct task_struct *tsk) {}
>
>  #endif
>  #endif
> --
> 2.5.0
>


Hi Fenghua


There are few things about this patch that I think can be improved upon.

* some of the variables that were introduced in the first few patches
have been renamed in the later patches.  Since these patches are still
not part of the kernel, I think, we should use the variable names we
want to use right from the beginning, rather than changing them midway
through the patch series.

*  I think struct rdtgroup is being used before it has been declared anywhere.

* I somehow do not like this mass declaration of prototypes of all the
functions.and variables that would be used later.  I think the
prototype and the implementation should be part of the same patch.  If
possible, combine this patch with the ones that have the
implementations.

--
Nilay

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
  2016-07-13 12:47   ` Thomas Gleixner
  2016-07-27 16:20   ` Nilay Vaish
@ 2016-08-03 22:15   ` Marcelo Tosatti
  2 siblings, 0 replies; 81+ messages in thread
From: Marcelo Tosatti @ 2016-08-03 22:15 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Tony Luck,
	Tejun Heo, Borislav Petkov, Stephane Eranian, Peter Zijlstra,
	David Carrillo-Cisneros, Ravi V Shankar, Vikas Shivappa,
	Sai Prakhya, linux-kernel, x86

On Tue, Jul 12, 2016 at 06:02:46PM -0700, Fenghua Yu wrote:
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> The documentation describes user interface of how to allocate resource
> in Intel RDT.
> 
> Please note that the documentation covers generic user interface. Current
> patch set code only implemente CAT L3. CAT L2 code will be sent later.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  Documentation/x86/intel_rdt_ui.txt | 268 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 268 insertions(+)
>  create mode 100644 Documentation/x86/intel_rdt_ui.txt
> 
> diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
> new file mode 100644
> index 0000000..c52baf5
> --- /dev/null
> +++ b/Documentation/x86/intel_rdt_ui.txt
> @@ -0,0 +1,268 @@
> +User Interface for Resource Allocation in Intel Resource Director Technology
> +
> +Copyright (C) 2016 Intel Corporation
> +
> +Fenghua Yu <fenghua.yu@intel.com>
> +
> +We create a new file system rscctrl in /sys/fs as user interface for Cache
> +Allocation Technology (CAT) and future resource allocations in Intel
> +Resource Director Technology (RDT). User can allocate cache or other
> +resources to tasks or cpus through this interface.
> +
> +CONTENTS
> +========
> +
> +	1. Terms
> +	2. Mount rscctrl file system
> +	3. Hierarchy in rscctrl
> +	4. Create and remove sub-directory
> +	5. Add/remove a task in a partition
> +	6. Add/remove a CPU in a partition
> +	7. Some usage examples
> +
> +
> +1. Terms
> +========
> +
> +We use the following terms and concepts in this documentation.
> +
> +RDT: Intel Resoure Director Technology
> +
> +CAT: Cache Allocation Technology
> +
> +CDP: Code and Data Prioritization
> +
> +CBM: Cache Bit Mask
> +
> +Cache ID: A cache identification. It is unique in one cache index on the
> +platform. User can find cache ID in cache sysfs interface:
> +/sys/devices/system/cpu/cpu*/cache/index*/id
> +
> +Share resource domain: A few different resources can share same QoS mask
> +MSRs array. For example, one L2 cache can share QoS MSRs with its next level
> +L3 cache. A domain number represents the L2 cache, the L3 cache, the L2
> +cache's shared cpumask, and the L3 cache's shared cpumask.
> +
> +2. Mount rscctrl file system
> +============================
> +
> +Like other file systems, the rscctrl file system needs to be mounted before
> +it can be used.
> +
> +mount -t rscctrl rscctrl <-o cdp,verbose> /sys/fs/rscctrl
> +
> +This command mounts the rscctrl file system under /sys/fs/rscctrl.
> +
> +Options are optional:
> +
> +cdp: Enable Code and Data Prioritization (CDP). Without the option, CDP
> +is disabled.
> +
> +verbose: Output more info in the "info" file under info directory and in
> +dmesg. This is mainly for debug.
> +
> +
> +3. Hierarchy in rscctrl
> +=======================
> +
> +The initial hierarchy of the rscctrl file system is as follows after mount:
> +
> +/sys/fs/rscctrl/info/info
> +		    /<resource0>/<resource0 specific info files>
> +		    /<resource1>/<resource1 specific info files>
> +			....
> +	       /tasks
> +	       /cpus
> +	       /schemas
> +
> +There are a few files and sub-directories in the hierarchy.
> +
> +3.1. info
> +---------
> +
> +The read-only sub-directory "info" in root directory has RDT related
> +system info.
> +
> +The "info" file under the info sub-directory shows general info of the system.
> +It shows shared domain and the resources within this domain.
> +
> +Each resource has its own info sub-directory. User can read the information
> +for allocation. For example, l3 directory has max_closid, max_cbm_len,
> +domain_to_cache_id.
> +
> +3.2. tasks
> +----------
> +
> +The file "tasks" has all task ids in the root directory initially. The
> +thread ids in the file will be added or removed among sub-directories or
> +partitions. A task id only stays in one directory at the same time.
> +
> +3.3. cpus
> +
> +The file "cpus" has a cpu mask that specifies the CPUs that are bound to the
> +schemas. Any tasks scheduled on the cpus will use the schemas. User can set
> +both "cpus" and "tasks" to share the same schema in one directory. But when
> +a CPU is bound to a schema, a task running on the CPU uses this schema and
> +kernel will ignore scheam set up for the task in "tasks".
                      schema
> +
> +Initial value is all zeros which means there is no CPU bound to the schemas
> +in the root directory and tasks use the schemas.
> +
> +3.4. schemas
> +------------
> +
> +The file "schemas" has default allocation masks/values for all resources on
> +each socket/cpu. Format of the file "schemas" is in multiple lines and each
> +line represents masks or values for one resource.
> +
> +Format of one resource schema line is as follows:
> +
> +<resource name>:<resource id0>=<schema>;<resource id1>=<schema>;...
> +
> +As one example, CAT L3's schema format is:
> +
> +L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> +
> +On a two socket machine, L3's schema line could be:
> +
> +L3:0=ff;1=c0
> +
> +which means this line in "schemas" file is for CAT L3, L3 cache id 0's CBM
> +is 0xff, and L3 cache id 1's CBM is 0xc0.
> +
> +If one resource is disabled, its line is not shown in schemas file.
> +
> +The schema line can be expended for situations. L3 cbms format can be
> +expended to CDP enabled L3 cbms format:
> +
> +L3:<cache_id0>=<d_cbm>,<i_cbm>;<cache_id1>=<d_cbm>,<i_cbm>;...
> +
> +Initial value is all ones which means all tasks use all resources initially.
> +
> +4. Create and remove sub-directory
> +===================================
> +
> +User can create a sub-directory under the root directory by "mkdir" command.
> +User can remove the sub-directory by "rmdir" command.
> +
> +Each sub-directory represents a resource allocation policy that user can
> +allocate resources for tasks or cpus.
> +
> +Each directory has three files "tasks", "cpus", and "schemas". The meaning
> +of each file is same as the files in the root directory.
> +
> +When a directory is created, initial contents of the files are:
> +
> +tasks: Empty. This means no task currently uses this allocation schemas.
> +cpus: All zeros. This means no CPU uses this allocation schemas.
> +schemas: All ones. This means all resources can be used in this allocation.
> +
> +5. Add/remove a task in a partition
> +===================================
> +
> +User can add/remove a task by writing its PID in "tasks" in a partition.
> +User can read PIDs stored in one "tasks" file.
> +
> +One task PID only exists in one partition/directory at the same time. If PID
> +is written in a new directory, it's removed automatically from its last
> +directory.
> +
> +6. Add/remove a CPU in a partition
> +==================================
> +
> +User can add/remove a CPU by writing its bit in "cpus" in a partition.
> +User can read CPUs stored in one "cpus" file.
> +
> +One CPU only exists in one partition/directory if user wants it to be bound
> +to any "schemas". Kernel guarantees uniqueness of the CPU in the whole
> +directory to make sure it only uses one schemas. If a CPU is written in one
> +new directory, it's automatically removed from its original directory if it
> +exists in the original directory.
> +
> +Or it doesn't exist in the whole directory if user doesn't bind it to any
> +"schemas".
> +
> +7. Some usage examples
> +======================
> +
> +7.1 Example 1 for sharing CLOSID on socket 0 between two partitions
> +
> +Only L3 cbm is enabled. Assume the machine is 2-socket and dual-core without
> +hyperthreading.
> +
> +#mount -t rscctrl rscctrl /sys/fs/rscctrl
> +#cd /sys/fs/rscctrl
> +#mkdir p0 p1
> +#echo "L3:0=3;1=c" > /sys/fs/rscctrl/p0/schemas
> +#echo "L3:0=3;1=3" > /sys/fs/rscctrl/p1/schemas
> +
> +In partition p0, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
> +CLOSID 0 for cbm=0xc on socket 1.
> +
> +In partition p1, kernel allocates CLOSID 0 for L3 cbm=0x3 on socket 0 and
> +CLOSID 1 for cbm=0x3 on socket 1.
> +
> +When p1/schemas is updated for socket 0, kernel searches existing
> +IA32_L3_QOS_MASK_n MSR registers and finds that 0x3 is in IA32_L3_QOS_MASK_0
> +register already. Therefore CLOSID 0 is shared between partition 0 and
> +partition 1 on socket 0.
> +
> +When p1/schemas is udpated for socket 1, kernel searches existing
> +IA32_L3_QOS_MASK_n registers and doesn't find a matching cbm. Therefore
> +CLOSID 1 is created and IA32_L3_QOS_MASK_1=0xc.
> +
> +7.2 Example 2 for allocating L3 cache for real-time apps
> +
> +Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
> +processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
> +neighbors, each of the two real-time tasks exclusively occupies one quarter
> +of L3 cache on socket 0. Assume L3 cbm max width is 20 bits.
> +
> +#mount -t rscctrl rscctrl /sys/fs/rscctrl
> +#cd /sys/fs/rscctrl
> +#mkdir p0 p1
> +#taskset 0x1 1234
> +#taskset 0x2 5678
> +#cd /sys/fs/rscctrl/
> +#edit schemas to have following allocation:
> +L3:0=3ff;1=fffff
> +
> +which means that all tasks use whole L3 cache 1 and half of L3 cache 0.
> +
> +#cd ..
> +#mkdir p1 p2
> +#cd p1
> +#echo 1234 >tasks
> +#edit schemas to have following two lines:
> +L3:0=f8000;1=fffff
> +
> +which means task 1234 uses L3 cbm=0xf8000, i.e. one quarter of L3 cache 0
> +and whole L3 cache 1.
> +
> +Since 1234 is tied to processor 0, it actually uses the quarter of L3
> +on socket 0 only.
> +
> +#cd ../p2
> +#echo 5678 >tasks
> +#edit schemas to have following two lines:
> +L3:0=7c00;1=fffff
> +
> +Which means that task 5678 uses L3 cbm=0x7c00, another quarter of L3 cache 0
> +and whole L3 cache 1.
> +
> +Since 5678 is tied to processor 1, it actually only uses the quarter of L3
> +on socket 0.
> +
> +Internally three CLOSIDs are allocated on L3 cache 0:
> +IA32_L3_QOS_MASK_0 = 0x3ff
> +IA32_L3_QOS_MASK_1 = 0xf8000
> +IA32_L3_QOS_MASK_2 = 0x7c00.
> +
> +Each CLOSID's reference count=1 on L3 cache 0. There is no shared cbms on
> +cache 0.
> +
> +Only one CLOSID is allocated on L3 cache 1:
> +
> +IA32_L3_QOS_MASK_0=0xfffff. It's shared by root, p1 and p2.
> +
> +Therefore CLOSID 0's reference count=3 on L3 cache 1.
> -- 
> 2.5.0

This interface addresses the previously listed needs for 
multiple VMs with realtime tasks sharing L3 cache.

Thanks.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* RE: [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface
  2016-07-19 12:32           ` Thomas Gleixner
@ 2016-08-04 23:38             ` Yu, Fenghua
  0 siblings, 0 replies; 81+ messages in thread
From: Yu, Fenghua @ 2016-08-04 23:38 UTC (permalink / raw)
  To: Thomas Gleixner, Luck, Tony
  Cc: Ingo Molnar, Anvin, H Peter, Tejun Heo, Borislav Petkov,
	Stephane Eranian, Peter Zijlstra, Marcelo Tosatti,
	David Carrillo-Cisneros, Shankar, Ravi V, Vikas Shivappa,
	Prakhya, Sai Praneeth, linux-kernel, x86, Jan Kiszka

> From: Thomas Gleixner [mailto:tglx@linutronix.de]
> Sent: Tuesday, July 19, 2016 5:32 AM
> On Thu, 14 Jul 2016, Luck, Tony wrote:
> > So the core part of __intel_rdt_sched_in() will look like:
> >
> > 			rdtgrp = root_rdtgroup;
> That can be done simpler. The default cpu_rdtgroup should be
> root_rdtgroup. So you spare one conditional.
> 
> Thanks
> 
> 	tglx

Hi, Thomas et al,

Do we need to consider using generic schemas format instead of the current architecture specific
schemas format?

Currently we use CBM (Cache Bit Mask) in the "schemas". This is architecture specific format.
A feedback I got is this may not be expanded to hypothetical future other architecture(s),
user/sysadmin may not have knowledge of CBM, and difficult VM migration b/w different
machines which have different length of CBM.

Our current answer to the feedback is user tool/knowledge is needed to set up schemas.
Kernel only does CBM or architecture level schemas. User management tools can be designed
to hide all kernel CBM details and allow users to allocate cache with high level knowledge
(e.g. % of cache or size of cache etc). I believe user level tool can handle this and kernel only
needs to handle minimal CBM level. This is how we have been designing this user interface.

But on the other hand, we can handle high level schemas info in kernel user interface as well.
We can introduce allocation policies to user interface. User specifies an allocation policy during
resctrl file system mount time. Each policy has its own driver in kernel. Default policy is to use
current CBM schemas that is implemented in this patch set. We can implement other drivers
and schemas format may be different in each driver. For example, % policy driver which
specifies % of L3 allocation in the schemas. Another example, size policy driver which specifies
size of L3 in the schemas. And people can write other creative policies in the future. Each driver
has its own CONFIG.

If doing this, this updated user interface can address the above concerns in kernel level plus
it can handle cases (example??) that user tool cannot handle in user space. Should we update
the current user interface to do this? Code and document changes should be about 50 lines
more on top of current patch set.

Just raise the question for open discussion.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2016-08-04 23:38 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
2016-07-13  1:02 ` [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
2016-07-13  1:02 ` [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
2016-07-26 19:00   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 03/32] x86/intel_rdt: Add Class of service management Fenghua Yu
2016-07-13  1:02 ` [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
2016-07-22  7:12   ` Marcelo Tosatti
2016-07-22 21:43     ` Luck, Tony
2016-07-23  4:31       ` Marcelo Tosatti
2016-07-26  3:18         ` Luck, Tony
2016-07-26 17:10         ` Shivappa Vikas
2016-07-13  1:02 ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Fenghua Yu
2016-07-25 16:25   ` Nilay Vaish
2016-07-25 16:31   ` Nilay Vaish
2016-07-25 18:05     ` Luck, Tony
2016-07-25 22:47       ` David Carrillo-Cisneros
2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
2016-07-13  9:19   ` Thomas Gleixner
2016-07-21 19:46     ` Shivappa Vikas
2016-07-14  0:40   ` David Carrillo-Cisneros
2016-07-14 22:58     ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 07/32] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
2016-07-13  1:02 ` [PATCH 08/32] Define CONFIG_INTEL_RDT Fenghua Yu
2016-07-13 10:25   ` Thomas Gleixner
2016-07-13 18:05     ` Yu, Fenghua
2016-07-13 21:09       ` Thomas Gleixner
2016-07-13 21:18         ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 09/32] x86/intel_rdt: Intel Code Data Prioritization detection Fenghua Yu
2016-07-13  1:02 ` [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization Fenghua Yu
2016-07-26 19:23   ` Nilay Vaish
2016-07-26 20:32     ` Shivappa Vikas
2016-07-13  1:02 ` [PATCH 11/32] x86/intel_rdt: Class of service and capacity bitmask management for CDP Fenghua Yu
2016-07-13  1:02 ` [PATCH 12/32] x86/intel_rdt: Hot cpu update for code data prioritization Fenghua Yu
2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
2016-07-13 12:47   ` Thomas Gleixner
2016-07-13 17:13     ` Luck, Tony
2016-07-14  6:53       ` Thomas Gleixner
2016-07-14 17:16         ` Luck, Tony
2016-07-19 12:32           ` Thomas Gleixner
2016-08-04 23:38             ` Yu, Fenghua
2016-07-27 16:20   ` Nilay Vaish
2016-07-27 16:57     ` Luck, Tony
2016-08-03 22:15   ` Marcelo Tosatti
2016-07-13  1:02 ` [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code Fenghua Yu
2016-07-27 16:49   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 15/32] cacheinfo: Introduce cache id Fenghua Yu
2016-07-27 17:04   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 16/32] Documentation, ABI: Add a document entry for " Fenghua Yu
2016-07-13  1:02 ` [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86 Fenghua Yu
2016-07-28  5:41   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 18/32] drivers/base/cacheinfo.c: Export some cacheinfo functions for others to use Fenghua Yu
2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
2016-07-13 12:56   ` Thomas Gleixner
2016-07-13 17:50     ` Yu, Fenghua
2016-07-28  5:53   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 20/32] magic number for rscctrl file system Fenghua Yu
2016-07-28  5:57   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c Fenghua Yu
2016-07-28 14:07   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 22/32] x86/intel_rdt_rdtgroup.h: Header for user interface Fenghua Yu
2016-07-13  1:02 ` [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources Fenghua Yu
2016-07-13 13:07   ` Thomas Gleixner
2016-07-13 17:40     ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 24/32] Task fork and exit for rdtgroup Fenghua Yu
2016-07-13 13:14   ` Thomas Gleixner
2016-07-13 17:32     ` Yu, Fenghua
2016-07-13 21:02       ` Thomas Gleixner
2016-07-13 21:22         ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT Fenghua Yu
2016-07-14 12:30   ` Thomas Gleixner
2016-07-13  1:02 ` [PATCH 26/32] x86/intel_rdt_rdtgroup.c: Create info directory Fenghua Yu
2016-07-13  1:03 ` [PATCH 27/32] x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands Fenghua Yu
2016-07-13  1:03 ` [PATCH 28/32] x86/intel_rdt_rdtgroup.c: Read and write cpus Fenghua Yu
2016-07-13  1:03 ` [PATCH 29/32] x86/intel_rdt_rdtgroup.c: Tasks iterator and write Fenghua Yu
2016-07-13  1:03 ` [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface Fenghua Yu
2016-07-14  0:41   ` David Carrillo-Cisneros
2016-07-14  6:11     ` Thomas Gleixner
2016-07-14  6:16       ` Yu, Fenghua
2016-07-14  6:32     ` Yu, Fenghua
2016-07-13  1:03 ` [PATCH 31/32] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
2016-07-13  1:03 ` [PATCH 32/32] x86/Makefile: Build intel_rdt_rdtgroup.c Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).