All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology
@ 2015-05-02  1:36 Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection Vikas Shivappa
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

This patch adds a cgroup subsystem to support the new Cache Allocation 
Technology (CAT) feature found in future Intel Xeon Intel processors. CAT is 
part of Resource Director Technology(RDT) or
Platform Shared resource control which provides support to control
sharing of platform resources like L3 cache.

Cache Allocation Technology(CAT) provides a way for the Software
(OS/VMM) to restrict cache allocation to a defined 'subset' of cache
which may be overlapping with other 'subsets'.  This feature is used
when allocating a line in cache ie when pulling new data into the cache.
The programming of the h/w is done via programming  MSRs. 
The patch series  support to perform L3 cache allocation.

In todays new processors the number of cores is continuously increasing
which in turn increase the number of threads or workloads that can 
simultaneously be run. When multi-threaded
 applications run concurrently, they compete for shared
resources including L3 cache.  At times, this L3 cache resource contention may 
result in inefficient space utilization. For example a higher priority thread 
may end up with lesser L3 cache resource or a cache sensitive app may not get
optimal cache occupancy thereby degrading the performance.
CAT kernel patch helps provides a framework for sharing L3 cache so that users 
can allocate the resource according to set requirements.

More information about the feature can be found in the Intel SDM, Volume 3 
section 17.15.  SDM does not yet use the 'RDT' term yet and it is planned to be 
changed at a later time.

*All the patches will apply on 4.1-rc0*.

Changes in V6:
- rebased to 4.1-rc1 which has the CMT(cache monitoring) support included.
- (Thanks to Marcelo's feedback).Fixed support for hot cpu handling for 
IA32_L3_QOS MSRs. Although during deep C states the MSR need not be restored 
this is needed when physically a new package is added.
-coding convention changes including renaming to cache_mask using a refcnt to 
track the number of cgroups using a closid in clos_cbm map.
-1b cbm support for non-hsw SKUs. HSW is an exception which needs the cache bit 
 masks to be at least 2 bits.

Changes in v5:
- Added support to propagate the cache bit mask update for each 
package.
- Removed the cache bit mask reference in the intel_rdt structure as
  there was no need for that and we already maintain a separate
  closid<->cbm mapping.
- Made a few coding convention changes which include adding the 
assertion while freeing the CLOSID.

Changes in V4:
- Integrated with the latest V5 CMT patches.
- Changed naming of cgroup to rdt(resource director technology) from
  cat(cache allocation technology). This was done as the RDT is the
  umbrella term for platform shared resources allocation. Hence in
  future it would be easier to add resource allocation to the same 
  cgroup
- Naming changes also applied to a lot of other data structures/APIs.
- Added documentation on cgroup usage for cache allocation to address
  a lot of questions from various academic and industry regarding 
  cache allocation usage.

Changes in V3:
- Implements a common software cache for IA32_PQR_MSR
- Implements support for hsw CAT enumeration. This does not use the brand 
strings like earlier version but does a probe test. The probe test is done only 
on hsw family of processors
- Made a few coding convention, name changes
- Check for lock being held when ClosID manipulation happens

Changes in V2:
- Removed HSW specific enumeration changes. Plan to include it later as a
  separate patch.  
- Fixed the code in prep_arch_switch to be specific for x86 and removed
  x86 defines.
- Fixed cbm_write to not write all 1s when a cgroup is freed.
- Fixed one possible memory leak in init.  
- Changed some of manual bitmap
  manipulation to use the predefined bitmap APIs to make code more readable
- Changed name in sources from cqe to cat
- Global cat enable flag changed to static_key and disabled cgroup early_init
      
[PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection
[PATCH 2/7] x86/intel_rdt: Adds support for Class of service
[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
[PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
[PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR
[PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration
[PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02 18:35   ` Peter Zijlstra
  2015-05-02  1:36 ` [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management Vikas Shivappa
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

This patch adds support for the new Cache Allocation Technology (CAT)
feature found in future Intel Xeon processors. CAT is part of Intel
Resource Director Technology(RDT) which enables sharing of processor
resources. This patch includes CPUID enumeration routines for CAT and
new values to track CAT resources to the cpuinfo_x86 structure.

Cache Allocation Technology(CAT) provides a way for the Software
(OS/VMM) to restrict cache allocation to a defined 'subset' of cache
which may be overlapping with other 'subsets'.  This feature is used
when allocating a line in cache ie when pulling new data into the cache.
The programming of the h/w is done via programming  MSRs.

More information about CAT be found in the Intel (R) x86 Architecture
Software Developer Manual,Volume 3, section 17.15.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/cpufeature.h |  6 +++++-
 arch/x86/include/asm/processor.h  |  3 +++
 arch/x86/kernel/cpu/Makefile      |  1 +
 arch/x86/kernel/cpu/common.c      | 15 +++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c   | 44 +++++++++++++++++++++++++++++++++++++++
 init/Kconfig                      | 11 ++++++++++
 6 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt.c

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 3d6606f..30cb56c 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -12,7 +12,7 @@
 #include <asm/disabled-features.h>
 #endif
 
-#define NCAPINTS	13	/* N 32-bit words worth of info */
+#define NCAPINTS	14	/* N 32-bit words worth of info */
 #define NBUGINTS	1	/* N 32-bit bug flags */
 
 /*
@@ -229,6 +229,7 @@
 #define X86_FEATURE_RTM		( 9*32+11) /* Restricted Transactional Memory */
 #define X86_FEATURE_CQM		( 9*32+12) /* Cache QoS Monitoring */
 #define X86_FEATURE_MPX		( 9*32+14) /* Memory Protection Extension */
+#define X86_FEATURE_RDT		( 9*32+15) /* Resource Allocation */
 #define X86_FEATURE_AVX512F	( 9*32+16) /* AVX-512 Foundation */
 #define X86_FEATURE_RDSEED	( 9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX		( 9*32+19) /* The ADCX and ADOX instructions */
@@ -252,6 +253,9 @@
 /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */
 #define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */
 
+/* Intel-defined CPU features, CPUID level 0x00000010:0 (ebx), word 13 */
+#define X86_FEATURE_CAT_L3	(13*32 + 1) /* Cache QOS Enforcement L3 */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 23ba676..7d9aee2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -114,6 +114,9 @@ struct cpuinfo_x86 {
 	int			x86_cache_occ_scale;	/* scale to bytes */
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
+	/* Cache Allocation Technology values */
+	u16			x86_cat_cbmlength;
+	u16			x86_cat_closs;
 	/* cpuid returned max cores value: */
 	u16			 x86_max_cores;
 	u16			apicid;
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9bff687..4ff7a1f 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -48,6 +48,7 @@ obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= perf_event_intel_uncore.o \
 					   perf_event_intel_uncore_nhmex.o
 endif
 
+obj-$(CONFIG_CGROUP_RDT) 		+= intel_rdt.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a62cf04..f39d948 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -670,6 +670,21 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		}
 	}
 
+	/* Additional Intel-defined flags: level 0x00000010 */
+	if (c->cpuid_level >= 0x00000010) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(0x00000010, 0, &eax, &ebx, &ecx, &edx);
+		c->x86_capability[13] = ebx;
+
+		if (cpu_has(c, X86_FEATURE_CAT_L3)) {
+
+			cpuid_count(0x00000010, 1, &eax, &ebx, &ecx, &edx);
+			c->x86_cat_closs = edx + 1;
+			c->x86_cat_cbmlength = eax + 1;
+		}
+	}
+
 	/* AMD-defined flags: level 0x80000001 */
 	xlvl = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = xlvl;
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
new file mode 100644
index 0000000..901b6fa
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -0,0 +1,44 @@
+/*
+ * Resource Director Technology(RDT)/Cache quality of service code.
+ *
+ * Copyright (C) 2014 Intel Corporation
+ *
+ * 2014-09-10 Written by
+ *    Vikas Shivappa <vikas.shivappa@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * More information about RDT be found in the Intel (R) x86 Architecture
+ * Software Developer Manual, volume 3, section 17.15.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/spinlock.h>
+
+static int __init rdt_late_init(void)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+	int maxid, cbm_len;
+
+	if (!cpu_has(c, X86_FEATURE_CAT_L3))
+		return -ENODEV;
+
+	maxid = c->x86_cat_closs;
+	cbm_len = c->x86_cat_cbmlength;
+
+	pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid);
+
+	return 0;
+}
+
+late_initcall(rdt_late_init);
diff --git a/init/Kconfig b/init/Kconfig
index dc24dec..d97ff5e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -983,6 +983,17 @@ config CPUSETS
 
 	  Say N if unsure.
 
+config CGROUP_RDT
+	bool "Resource Director Technology cgroup subsystem"
+	depends on X86_64 && CPU_SUP_INTEL
+	help
+	  This option provides a cgroup to allocate Platform shared
+	  resources. Among the shared resources, current implementation
+	  focuses on L3 Cache. Using the interface user can specify the
+	  amount of L3 cache space into which an application can fill.
+
+	  Say N if unsure.
+
 config PROC_PID_CPUSET
 	bool "Include legacy /proc/<pid>/cpuset file"
 	depends on CPUSETS
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02 18:38   ` Peter Zijlstra
  2015-05-02  1:36 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

This patch adds a cgroup subsystem to support Intel Resource Director
Technology(RDT) or Platform Shared resources Control. The resources that
are currently supported for sharing is L3 cache
(Cache Allocation Technology or CAT).
When a RDT cgroup is created it has a CLOSid and CBM associated with it
which are inherited from its parent. A Class of service(CLOS) in Cache
Allocation is represented by a CLOSid. CLOSid is internal to the kernel
and not exposed to user. Cache bitmask(CBM) represents one global cache
'subset'. Tasks belonging to a cgroup would get to fill the L3 cache
represented by the CBM. Root cgroup would have all available bits set
for its CBM and would be assigned the CLOSid 0.

CLOSid allocation is tracked using a separate bitmap. The maximum number
of CLOSids is specified by the h/w during CPUID enumeration and the
kernel simply throws an -ENOSPC when it runs out of CLOSids.

Each CBM has an associated CLOSid. If multiple cgroups have the same CBM
they would also have the same CLOSid. The reference count parameter in
CLOSid-CBM map keeps track of how many cgroups are using each
CLOSid<->CBM mapping.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h |  38 +++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c  | 100 ++++++++++++++++++++++++++++++++++++++-
 include/linux/cgroup_subsys.h    |   4 ++
 3 files changed, 140 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_rdt.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
new file mode 100644
index 0000000..87af1a5
--- /dev/null
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -0,0 +1,38 @@
+#ifndef _RDT_H_
+#define _RDT_H_
+
+#ifdef CONFIG_CGROUP_RDT
+
+#include <linux/cgroup.h>
+
+struct rdt_subsys_info {
+	/* Clos Bitmap to keep track of available CLOSids.*/
+	unsigned long *closmap;
+};
+
+struct intel_rdt {
+	struct cgroup_subsys_state css;
+	/* Class of service for the cgroup.*/
+	unsigned int clos;
+};
+
+struct clos_cbm_map {
+	unsigned long cbm;
+	unsigned int cgrp_count;
+};
+
+/*
+ * Return rdt group corresponding to this container.
+ */
+static inline struct intel_rdt *css_rdt(struct cgroup_subsys_state *css)
+{
+	return css ? container_of(css, struct intel_rdt, css) : NULL;
+}
+
+static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir)
+{
+	return css_rdt(ir->css.parent);
+}
+
+#endif
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 901b6fa..eec57fe 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,17 +24,97 @@
 #include <linux/slab.h>
 #include <linux/err.h>
 #include <linux/spinlock.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * ccmap maintains 1:1 mapping between CLOSid and cbm.
+ */
+static struct clos_cbm_map *ccmap;
+static struct rdt_subsys_info rdtss_info;
+static DEFINE_MUTEX(rdt_group_mutex);
+struct intel_rdt rdt_root_group;
+
+static inline bool cat_supported(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_CAT_L3))
+		return true;
+
+	return false;
+}
+
+/*
+* Called with the rdt_group_mutex held.
+*/
+static int rdt_free_closid(struct intel_rdt *ir)
+{
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	WARN_ON(!ccmap[ir->clos].cgrp_count);
+	ccmap[ir->clos].cgrp_count--;
+	if (!ccmap[ir->clos].cgrp_count)
+		clear_bit(ir->clos, rdtss_info.closmap);
+
+	return 0;
+}
+
+static struct cgroup_subsys_state *
+rdt_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+	struct intel_rdt *parent = css_rdt(parent_css);
+	struct intel_rdt *ir;
+
+	/*
+	 * Cannot return failure on systems with no Cache Allocation
+	 * as the cgroup_init does not handle failures gracefully.
+	 */
+	if (!parent)
+		return &rdt_root_group.css;
+
+	ir = kzalloc(sizeof(struct intel_rdt), GFP_KERNEL);
+	if (!ir)
+		return ERR_PTR(-ENOMEM);
+
+	mutex_lock(&rdt_group_mutex);
+	ir->clos = parent->clos;
+	ccmap[parent->clos].cgrp_count++;
+	mutex_unlock(&rdt_group_mutex);
+
+	return &ir->css;
+}
 
 static int __init rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+	static struct clos_cbm_map *ccm;
+	size_t sizeb;
 	int maxid, cbm_len;
 
-	if (!cpu_has(c, X86_FEATURE_CAT_L3))
+	if (!cat_supported(c)) {
+		rdt_root_group.css.ss->disabled = 1;
 		return -ENODEV;
-
+	}
 	maxid = c->x86_cat_closs;
 	cbm_len = c->x86_cat_cbmlength;
+	sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
+
+	rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
+	if (!rdtss_info.closmap)
+		return -ENOMEM;
+
+	sizeb = maxid * sizeof(struct clos_cbm_map);
+	ccmap = kzalloc(sizeb, GFP_KERNEL);
+	if (!ccmap) {
+		kfree(rdtss_info.closmap);
+		return -ENOMEM;
+	}
+
+	set_bit(0, rdtss_info.closmap);
+	rdt_root_group.clos = 0;
+
+	ccm = &ccmap[0];
+	ccm->cbm = (u32)((u64)(1 << cbm_len) - 1);
+	ccm->cgrp_count++;
 
 	pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid);
 
@@ -42,3 +122,19 @@ static int __init rdt_late_init(void)
 }
 
 late_initcall(rdt_late_init);
+
+static void rdt_css_free(struct cgroup_subsys_state *css)
+{
+	struct intel_rdt *ir = css_rdt(css);
+
+	mutex_lock(&rdt_group_mutex);
+	rdt_free_closid(ir);
+	kfree(ir);
+	mutex_unlock(&rdt_group_mutex);
+}
+
+struct cgroup_subsys rdt_cgrp_subsys = {
+	.css_alloc			= rdt_css_alloc,
+	.css_free			= rdt_css_free,
+	.early_init			= 0,
+};
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index e4a96fb..81c803d 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -47,6 +47,10 @@ SUBSYS(net_prio)
 SUBSYS(hugetlb)
 #endif
 
+#if IS_ENABLED(CONFIG_CGROUP_RDT)
+SUBSYS(rdt)
+#endif
+
 /*
  * The following subsystems are not supported on the default hierarchy.
  */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02 18:46   ` Peter Zijlstra
  2015-05-02  1:36 ` [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT Vikas Shivappa
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

Add support for cache bit mask manipulation. The change adds a file
cache_mask to the RDT cgroup which represents the CBM(cache bit mask)
  for the cgroup.

Update to the CBM is done by writing to the IA32_L3_MASK_n.
The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the cache_mask from its parent.  When a user changes the
default CBM for a cgroup, a new CLOSid may be allocated if the
cache_mask was not used before. If the new CBM is the one that is
already used, the count for that CLOSid<->CBM is incremented. The
changing of 'cbm' may fail with -ENOSPC once the kernel runs out of
maximum CLOSids it can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the L3 cache represented
by the cgroup's cache_mask file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

During cpu bringup due to a hotplug event, IA32_L3_MASK_n MSR is
synchronized from the clos cbm map if it is used by any cgroup for the
package.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h |   7 +-
 arch/x86/kernel/cpu/intel_rdt.c  | 364 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 346 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..9e9dbbe 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include <linux/cgroup.h>
+#define MAX_CBM_LENGTH			32
+#define IA32_L3_CBM_BASE		0xc90
+#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
 	/* Clos Bitmap to keep track of available CLOSids.*/
@@ -17,8 +20,8 @@ struct intel_rdt {
 };
 
 struct clos_cbm_map {
-	unsigned long cbm;
-	unsigned int cgrp_count;
+	unsigned long cache_mask;
+	unsigned int clos_refcnt;
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index eec57fe..58b39d6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,16 +24,25 @@
 #include <linux/slab.h>
 #include <linux/err.h>
 #include <linux/spinlock.h>
+#include <linux/cpu.h>
 #include <asm/intel_rdt.h>
 
 /*
- * ccmap maintains 1:1 mapping between CLOSid and cbm.
+ * ccmap maintains 1:1 mapping between CLOSid and cache_mask.
  */
 static struct clos_cbm_map *ccmap;
 static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+/*
+ * Mask of CPUs for writing CBM values. We only need one per-socket.
+ */
+static cpumask_t rdt_cpumask;
+
+#define rdt_for_each_child(pos_css, parent_ir)		\
+	css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -42,22 +51,66 @@ static inline bool cat_supported(struct cpuinfo_x86 *c)
 	return false;
 }
 
+static void __clos_init(unsigned int closid)
+{
+	struct clos_cbm_map *ccm = &ccmap[closid];
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	ccm->clos_refcnt = 1;
+}
+
 /*
-* Called with the rdt_group_mutex held.
-*/
-static int rdt_free_closid(struct intel_rdt *ir)
+ * Allocates a new closid from unused closids.
+ */
+static int rdt_alloc_closid(struct intel_rdt *ir)
 {
+	unsigned int id;
+	unsigned int maxid;
 
 	lockdep_assert_held(&rdt_group_mutex);
 
-	WARN_ON(!ccmap[ir->clos].cgrp_count);
-	ccmap[ir->clos].cgrp_count--;
-	if (!ccmap[ir->clos].cgrp_count)
-		clear_bit(ir->clos, rdtss_info.closmap);
+	maxid = boot_cpu_data.x86_cat_closs;
+	id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+	if (id == maxid)
+		return -ENOSPC;
+
+	set_bit(id, rdtss_info.closmap);
+	__clos_init(id);
+	ir->clos = id;
 
 	return 0;
 }
 
+static void rdt_free_closid(unsigned int clos)
+{
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	clear_bit(clos, rdtss_info.closmap);
+}
+
+static void __clos_get(unsigned int closid)
+{
+	struct clos_cbm_map *ccm = &ccmap[closid];
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	ccm->clos_refcnt += 1;
+}
+
+static void __clos_put(unsigned int closid)
+{
+	struct clos_cbm_map *ccm = &ccmap[closid];
+
+	lockdep_assert_held(&rdt_group_mutex);
+	WARN_ON(!ccm->clos_refcnt);
+
+	ccm->clos_refcnt -= 1;
+	if (!ccm->clos_refcnt)
+		rdt_free_closid(closid);
+}
+
 static struct cgroup_subsys_state *
 rdt_css_alloc(struct cgroup_subsys_state *parent_css)
 {
@@ -77,27 +130,285 @@ rdt_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	mutex_lock(&rdt_group_mutex);
 	ir->clos = parent->clos;
-	ccmap[parent->clos].cgrp_count++;
+	__clos_get(ir->clos);
 	mutex_unlock(&rdt_group_mutex);
 
 	return &ir->css;
 }
 
+static void rdt_css_free(struct cgroup_subsys_state *css)
+{
+	struct intel_rdt *ir = css_rdt(css);
+
+	mutex_lock(&rdt_group_mutex);
+	__clos_put(ir->clos);
+	kfree(ir);
+	mutex_unlock(&rdt_group_mutex);
+}
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+	unsigned long first_bit, zero_bit;
+	unsigned long maxcbm = MAX_CBM_LENGTH;
+
+	if (!var)
+		return false;
+
+	first_bit = find_next_bit(&var, maxcbm, 0);
+	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+		return false;
+
+	return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+	struct intel_rdt *ir = css_rdt(seq_css(m));
+
+	seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask);
+	return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+	struct intel_rdt *par, *c;
+	struct cgroup_subsys_state *css;
+	unsigned long *cbm_tmp;
+
+	if (!cbm_is_contiguous(cbmvalue)) {
+		pr_err("bitmask should have >= 1 bits and be contiguous\n");
+		return -EINVAL;
+	}
+
+	par = parent_rdt(ir);
+	cbm_tmp = &ccmap[par->clos].cache_mask;
+	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
+		return -EINVAL;
+
+	rcu_read_lock();
+	rdt_for_each_child(css, ir) {
+		c = css_rdt(css);
+		cbm_tmp = &ccmap[c->clos].cache_mask;
+		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
+			rcu_read_unlock();
+			pr_err("Children's mask not a subset\n");
+			return -EINVAL;
+		}
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+static bool cbm_search(unsigned long cbm, int *closid)
+{
+	int maxid = boot_cpu_data.x86_cat_closs;
+	unsigned int i;
+
+	for (i = 0; i < maxid; i++) {
+		if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) {
+			*closid = i;
+			return true;
+		}
+	}
+	return false;
+}
+
+static void cbmmap_dump(void)
+{
+	int i;
+
+	pr_debug("CBMMAP\n");
+	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
+		pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n",
+		 (unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt);
+}
+
+static void __cpu_cbm_update(void *info)
+{
+	unsigned int closid = *((unsigned int *)info);
+
+	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask);
+}
+
+/*
+ * cbm_update_all() - Update the cache bit mask for all packages.
+ */
+static inline void cbm_update_all(unsigned int closid)
+{
+	on_each_cpu_mask(&rdt_cpumask, __cpu_cbm_update, &closid, 1);
+}
+
+/*
+ * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
+ * which are one per CLOSid, on the current package.
+ * @cpu : the cpu on which the mask is updated.
+ */
+static inline void cbm_update_msrs(int cpu)
+{
+	int maxid = boot_cpu_data.x86_cat_closs;
+	unsigned int i;
+
+	if (WARN_ON(cpu != smp_processor_id()))
+		return;
+
+	for (i = 1; i < maxid; i++) {
+		if (ccmap[i].clos_refcnt)
+			__cpu_cbm_update(&i);
+	}
+}
+
+/*
+ * rdt_cbm_write() - Validates and writes the cache bit mask(cbm)
+ * to the IA32_L3_MASK_n and also store the same in the ccmap.
+ *
+ * CLOSids are reused for cgroups which have same bitmask.
+ * - This helps to use the scant CLOSids optimally.
+ * - This also implies that at context switch write
+ * to PQR-MSR is done only when a task with a
+ * different bitmask is scheduled in.
+ */
+static int cat_cbm_write(struct cgroup_subsys_state *css,
+				 struct cftype *cft, u64 cbmvalue)
+{
+	struct intel_rdt *ir = css_rdt(css);
+	ssize_t err = 0;
+	unsigned long cache_mask, max_mask;
+	unsigned long *cbm_tmp;
+	unsigned int closid;
+	u32 max_cbm = boot_cpu_data.x86_cat_cbmlength;
+
+	if (ir == &rdt_root_group)
+		return -EPERM;
+	bitmap_set(&max_mask, 0, max_cbm);
+
+	/*
+	 * Need global mutex as cbm write may allocate a closid.
+	 */
+	mutex_lock(&rdt_group_mutex);
+	bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
+	cbm_tmp = &ccmap[ir->clos].cache_mask;
+
+	if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
+		goto out;
+
+	err = validate_cbm(ir, cache_mask);
+	if (err)
+		goto out;
+
+	/*
+	 * At this point we are sure to change the cache_mask.Hence release the
+	 * reference to the current CLOSid and try to get a reference for
+	 * a different CLOSid.
+	 */
+	__clos_put(ir->clos);
+
+	if (cbm_search(cache_mask, &closid)) {
+		ir->clos = closid;
+		__clos_get(closid);
+	} else {
+		err = rdt_alloc_closid(ir);
+		if (err)
+			goto out;
+
+		ccmap[ir->clos].cache_mask = cache_mask;
+		cbm_update_all(ir->clos);
+	}
+
+	cbmmap_dump();
+out:
+
+	mutex_unlock(&rdt_group_mutex);
+	return err;
+}
+
+static inline bool rdt_update_cpumask(int cpu)
+{
+	int phys_id = topology_physical_package_id(cpu);
+	struct cpumask *mask = &rdt_cpumask;
+	int i;
+
+	for_each_cpu(i, mask) {
+		if (phys_id == topology_physical_package_id(i))
+			return false;
+	}
+
+	cpumask_set_cpu(cpu, mask);
+	return true;
+}
+
+/*
+ * rdt_cpu_start() - If a new package has come up, update all
+ * the Cache bitmasks on the package.
+ */
+static inline void rdt_cpu_start(int cpu)
+{
+	mutex_lock(&rdt_group_mutex);
+	if (rdt_update_cpumask(cpu))
+		cbm_update_msrs(cpu);
+	mutex_unlock(&rdt_group_mutex);
+}
+
+static void rdt_cpu_exit(unsigned int cpu)
+{
+	int phys_id = topology_physical_package_id(cpu);
+	int i;
+
+	mutex_lock(&rdt_group_mutex);
+	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
+		mutex_unlock(&rdt_group_mutex);
+		return;
+	}
+
+	for_each_online_cpu(i) {
+		if (i == cpu)
+			continue;
+
+		if (phys_id == topology_physical_package_id(i)) {
+			cpumask_set_cpu(i, &rdt_cpumask);
+			break;
+		}
+	}
+	mutex_unlock(&rdt_group_mutex);
+}
+
+static int rdt_cpu_notifier(struct notifier_block *nb,
+				  unsigned long action, void *hcpu)
+{
+	unsigned int cpu  = (unsigned long)hcpu;
+
+	switch (action) {
+	case CPU_STARTING:
+		rdt_cpu_start(cpu);
+		break;
+	case CPU_DOWN_PREPARE:
+		rdt_cpu_exit(cpu);
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
 static int __init rdt_late_init(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	static struct clos_cbm_map *ccm;
 	size_t sizeb;
-	int maxid, cbm_len;
+	int maxid, cbm_len, i;
 
 	if (!cat_supported(c)) {
 		rdt_root_group.css.ss->disabled = 1;
 		return -ENODEV;
 	}
+
 	maxid = c->x86_cat_closs;
 	cbm_len = c->x86_cat_cbmlength;
-	sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
 
+	sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
 	rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
 	if (!rdtss_info.closmap)
 		return -ENOMEM;
@@ -111,11 +422,17 @@ static int __init rdt_late_init(void)
 
 	set_bit(0, rdtss_info.closmap);
 	rdt_root_group.clos = 0;
-
 	ccm = &ccmap[0];
-	ccm->cbm = (u32)((u64)(1 << cbm_len) - 1);
-	ccm->cgrp_count++;
+	bitmap_set(&ccm->cache_mask, 0, cbm_len);
+	ccm->clos_refcnt = 1;
+
+	cpu_notifier_register_begin();
+	for_each_online_cpu(i)
+		rdt_update_cpumask(i);
+
+	__hotcpu_notifier(rdt_cpu_notifier, 0);
 
+	cpu_notifier_register_done();
 	pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid);
 
 	return 0;
@@ -123,18 +440,19 @@ static int __init rdt_late_init(void)
 
 late_initcall(rdt_late_init);
 
-static void rdt_css_free(struct cgroup_subsys_state *css)
-{
-	struct intel_rdt *ir = css_rdt(css);
-
-	mutex_lock(&rdt_group_mutex);
-	rdt_free_closid(ir);
-	kfree(ir);
-	mutex_unlock(&rdt_group_mutex);
-}
+static struct cftype rdt_files[] = {
+	{
+		.name = "cache_mask",
+		.seq_show = cat_cbm_read,
+		.write_u64 = cat_cbm_write,
+		.mode = 0666,
+	},
+	{ }	/* terminate */
+};
 
 struct cgroup_subsys rdt_cgrp_subsys = {
 	.css_alloc			= rdt_css_alloc,
 	.css_free			= rdt_css_free,
+	.legacy_cftypes			= rdt_files,
 	.early_init			= 0,
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
                   ` (2 preceding siblings ...)
  2015-05-02  1:36 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02 18:51   ` Peter Zijlstra
  2015-05-02  1:36 ` [PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR Vikas Shivappa
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

Adds support for IA32_PQR_ASSOC MSR writes during task scheduling.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the cgroup to which the task belongs to the CPU's
IA32_PQR_ASSOC MSR.

For Cache Allocation, this would let the task fill in the cache 'subset'
represented by the cgroup's Cache bit mask(CBM).

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h | 54 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/switch_to.h |  3 +++
 arch/x86/kernel/cpu/intel_rdt.c  |  3 +++
 kernel/sched/core.c              |  1 +
 kernel/sched/sched.h             |  3 +++
 5 files changed, 64 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 9e9dbbe..2fc496f 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,9 +4,13 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include <linux/cgroup.h>
+
+#define MSR_IA32_PQR_ASSOC		0xc8f
 #define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
 #define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
+DECLARE_PER_CPU(unsigned int, x86_cpu_clos);
+extern struct static_key rdt_enable_key;
 
 struct rdt_subsys_info {
 	/* Clos Bitmap to keep track of available CLOSids.*/
@@ -24,6 +28,11 @@ struct clos_cbm_map {
 	unsigned int clos_refcnt;
 };
 
+static inline bool rdt_enabled(void)
+{
+	return static_key_false(&rdt_enable_key);
+}
+
 /*
  * Return rdt group corresponding to this container.
  */
@@ -37,5 +46,50 @@ static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir)
 	return css_rdt(ir->css.parent);
 }
 
+/*
+ * Return rdt group to which this task belongs.
+ */
+static inline struct intel_rdt *task_rdt(struct task_struct *task)
+{
+	return css_rdt(task_css(task, rdt_cgrp_id));
+}
+
+/*
+ * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ * if the current Closid is different than the new one.
+ */
+static inline void rdt_sched_in(struct task_struct *task)
+{
+	struct intel_rdt *ir;
+	unsigned int clos;
+
+	if (!rdt_enabled())
+		return;
+
+	/*
+	 * This needs to be fixed
+	 * to cache the whole PQR instead of just CLOSid.
+	 * PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
+	 * Should not write a 0 to the low 10 bits of PQR
+	 * and corrupt RMID.
+	 */
+	clos = this_cpu_read(x86_cpu_clos);
+
+	rcu_read_lock();
+	ir = task_rdt(task);
+	if (ir->clos == clos) {
+		rcu_read_unlock();
+		return;
+	}
+
+	wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
+	this_cpu_write(x86_cpu_clos, ir->clos);
+	rcu_read_unlock();
+}
+
+#else
+
+static inline void rdt_sched_in(struct task_struct *task) {}
+
 #endif
 #endif
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 751bf4b..82ef4b3 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -8,6 +8,9 @@ struct tss_struct;
 void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
 		      struct tss_struct *tss);
 
+#include <asm/intel_rdt.h>
+#define post_arch_switch(current)	rdt_sched_in(current)
+
 #ifdef CONFIG_X86_32
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 58b39d6..74b1e28 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -34,6 +34,8 @@ static struct clos_cbm_map *ccmap;
 static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
+struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
+DEFINE_PER_CPU(unsigned int, x86_cpu_clos);
 
 /*
  * Mask of CPUs for writing CBM values. We only need one per-socket.
@@ -433,6 +435,7 @@ static int __init rdt_late_init(void)
 	__hotcpu_notifier(rdt_cpu_notifier, 0);
 
 	cpu_notifier_register_done();
+	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Max bitmask length:%u,Max ClosIds: %u\n", cbm_len, maxid);
 
 	return 0;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a8..cacb490 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	finish_arch_switch(prev);
+	post_arch_switch(current);
 	perf_event_task_sched_in(prev, current);
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e0e1299..9153747 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1045,6 +1045,9 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 #ifndef finish_arch_switch
 # define finish_arch_switch(prev)	do { } while (0)
 #endif
+#ifndef post_arch_switch
+# define post_arch_switch(current)	do { } while (0)
+#endif
 #ifndef finish_arch_post_lock_switch
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
                   ` (3 preceding siblings ...)
  2015-05-02  1:36 ` [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide Vikas Shivappa
  6 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

This patch implements a common software cache for IA32_PQR_MSR(RMID 0:9,
    CLOSId 32:63) to be used by both CMT and CAT. CMT updates the RMID
where as CAT updates the CLOSid in the software cache. When the new
RMID/CLOSid value is different from the cached values, IA32_PQR_MSR is
updated. Since the measured rdmsr latency for IA32_PQR_MSR is very
high(~250 cycles) this software cache is necessary to avoid reading the
MSR to compare the current CLOSid value.
During CPU hotplug the pqr cache is updated to zero.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Conflicts:
	arch/x86/kernel/cpu/perf_event_intel_cqm.c
---
 arch/x86/include/asm/intel_rdt.h           | 31 +++++++++++++++---------------
 arch/x86/include/asm/rdt_common.h          | 13 +++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c            |  3 +++
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 20 +++++++------------
 4 files changed, 39 insertions(+), 28 deletions(-)
 create mode 100644 arch/x86/include/asm/rdt_common.h

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 2fc496f..6aae109 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,12 +4,13 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include <linux/cgroup.h>
+#include <asm/rdt_common.h>
 
-#define MSR_IA32_PQR_ASSOC		0xc8f
 #define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
 #define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
-DECLARE_PER_CPU(unsigned int, x86_cpu_clos);
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
 extern struct static_key rdt_enable_key;
 
 struct rdt_subsys_info {
@@ -61,30 +62,30 @@ static inline struct intel_rdt *task_rdt(struct task_struct *task)
 static inline void rdt_sched_in(struct task_struct *task)
 {
 	struct intel_rdt *ir;
-	unsigned int clos;
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	unsigned long flags;
 
 	if (!rdt_enabled())
 		return;
 
-	/*
-	 * This needs to be fixed
-	 * to cache the whole PQR instead of just CLOSid.
-	 * PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
-	 * Should not write a 0 to the low 10 bits of PQR
-	 * and corrupt RMID.
-	 */
-	clos = this_cpu_read(x86_cpu_clos);
-
+	raw_spin_lock_irqsave(&state->lock, flags);
 	rcu_read_lock();
 	ir = task_rdt(task);
-	if (ir->clos == clos) {
+	if (ir->clos == state->clos) {
 		rcu_read_unlock();
+		raw_spin_unlock_irqrestore(&state->lock, flags);
 		return;
 	}
 
-	wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
-	this_cpu_write(x86_cpu_clos, ir->clos);
+	/*
+	 * PQR has closid in high 32 bits and CQM-RMID
+	 * in low 10 bits. Rewrite the exsting rmid from
+	 * software cache.
+	 */
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, ir->clos);
+	state->clos = ir->clos;
 	rcu_read_unlock();
+	raw_spin_unlock_irqrestore(&state->lock, flags);
 }
 
 #else
diff --git a/arch/x86/include/asm/rdt_common.h b/arch/x86/include/asm/rdt_common.h
new file mode 100644
index 0000000..33fd8ea
--- /dev/null
+++ b/arch/x86/include/asm/rdt_common.h
@@ -0,0 +1,13 @@
+#ifndef _X86_RDT_H_
+#define _X86_RDT_H_
+
+#define MSR_IA32_PQR_ASSOC	0x0c8f
+
+struct intel_pqr_state {
+	raw_spinlock_t	lock;
+	int		rmid;
+	int		clos;
+	int		cnt;
+};
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 74b1e28..9da61b2 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -347,6 +347,9 @@ static inline bool rdt_update_cpumask(int cpu)
  */
 static inline void rdt_cpu_start(int cpu)
 {
+	struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
+
+	state->clos = 0;
 	mutex_lock(&rdt_group_mutex);
 	if (rdt_update_cpumask(cpu))
 		cbm_update_msrs(cpu);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index e4d1b8b..fd039899 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -7,22 +7,16 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
+#include <asm/rdt_common.h>
 #include "perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC	0x0c8f
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
 static unsigned int cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 
-struct intel_cqm_state {
-	raw_spinlock_t		lock;
-	int			rmid;
-	int			cnt;
-};
-
-static DEFINE_PER_CPU(struct intel_cqm_state, cqm_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 
 /*
  * Protects cache_cgroups and cqm_rmid_free_lru and cqm_rmid_limbo_lru.
@@ -961,7 +955,7 @@ out:
 
 static void intel_cqm_event_start(struct perf_event *event, int mode)
 {
-	struct intel_cqm_state *state = this_cpu_ptr(&cqm_state);
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
 	unsigned int rmid = event->hw.cqm_rmid;
 	unsigned long flags;
 
@@ -978,14 +972,14 @@ static void intel_cqm_event_start(struct perf_event *event, int mode)
 		WARN_ON_ONCE(state->rmid);
 
 	state->rmid = rmid;
-	wrmsrl(MSR_IA32_PQR_ASSOC, state->rmid);
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, state->clos);
 
 	raw_spin_unlock_irqrestore(&state->lock, flags);
 }
 
 static void intel_cqm_event_stop(struct perf_event *event, int mode)
 {
-	struct intel_cqm_state *state = this_cpu_ptr(&cqm_state);
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
 	unsigned long flags;
 
 	if (event->hw.cqm_state & PERF_HES_STOPPED)
@@ -998,7 +992,7 @@ static void intel_cqm_event_stop(struct perf_event *event, int mode)
 
 	if (!--state->cnt) {
 		state->rmid = 0;
-		wrmsrl(MSR_IA32_PQR_ASSOC, 0);
+		wrmsr(MSR_IA32_PQR_ASSOC, 0, state->clos);
 	} else {
 		WARN_ON_ONCE(!state->rmid);
 	}
@@ -1243,7 +1237,7 @@ static inline void cqm_pick_event_reader(int cpu)
 
 static void intel_cqm_cpu_prepare(unsigned int cpu)
 {
-	struct intel_cqm_state *state = &per_cpu(cqm_state, cpu);
+	struct intel_pqr_state *state = &per_cpu(pqr_state, cpu);
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 	raw_spin_lock_init(&state->lock);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
                   ` (4 preceding siblings ...)
  2015-05-02  1:36 ` [PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  2015-05-02  1:36 ` [PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide Vikas Shivappa
  6 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

CAT(Cache Allocation Technology) on hsw needs to be enumerated
separately. CAT is only supported on certain HSW SKUs.  This patch does
a probe test for hsw CPUs by writing a CLOSid into high 32 bits of
IA32_PQR_MSR and see if the bits stick. The probe test is only done
after confirming that the CPU is HSW.
HSW also requires the L3 cache bitmask to be at least two bits.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 56 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 9da61b2..4c12e5b 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -38,6 +38,11 @@ struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
 DEFINE_PER_CPU(unsigned int, x86_cpu_clos);
 
 /*
+ * Minimum bits required in Cache bitmasks.
+ */
+static unsigned int min_bitmask_len = 1;
+
+/*
  * Mask of CPUs for writing CBM values. We only need one per-socket.
  */
 static cpumask_t rdt_cpumask;
@@ -45,11 +50,54 @@ static cpumask_t rdt_cpumask;
 #define rdt_for_each_child(pos_css, parent_ir)		\
 	css_for_each_child((pos_css), &(parent_ir)->css)
 
+/*
+ * hsw_probetest() - Have to do probe
+ * test for Intel haswell CPUs as it does not have
+ * CPUID enumeration support for CAT.
+ *
+ * Probes by writing to the high 32 bits(CLOSid)
+ * of the IA32_PQR_MSR and testing if the bits stick.
+ * Then hardcode the max CLOS and max bitmask length on hsw.
+ * The minimum cache bitmask length allowed for HSW is 2 bits.
+ */
+static inline bool hsw_probetest(void)
+{
+	u32 l, h_old, h_new, h_tmp;
+
+	if (rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_old))
+		return false;
+
+	/*
+	 * Default value is always 0 if feature is present.
+	 */
+	h_tmp = h_old ^ 0x1U;
+	if (wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_tmp) ||
+	    rdmsr_safe(MSR_IA32_PQR_ASSOC, &l, &h_new))
+		return false;
+
+	if (h_tmp != h_new)
+		return false;
+
+	wrmsr_safe(MSR_IA32_PQR_ASSOC, l, h_old);
+
+	boot_cpu_data.x86_cat_closs = 4;
+	boot_cpu_data.x86_cat_cbmlength = 20;
+	min_bitmask_len = 2;
+
+	return true;
+}
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CAT_L3))
 		return true;
 
+	/*
+	 * Probe test for Haswell CPUs.
+	 */
+	if (c->x86 == 0x6 && c->x86_model == 0x3f)
+		return hsw_probetest();
+
 	return false;
 }
 
@@ -153,7 +201,7 @@ static inline bool cbm_is_contiguous(unsigned long var)
 	unsigned long first_bit, zero_bit;
 	unsigned long maxcbm = MAX_CBM_LENGTH;
 
-	if (!var)
+	if (bitmap_weight(&var, maxcbm) < min_bitmask_len)
 		return false;
 
 	first_bit = find_next_bit(&var, maxcbm, 0);
@@ -180,7 +228,8 @@ static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
 	unsigned long *cbm_tmp;
 
 	if (!cbm_is_contiguous(cbmvalue)) {
-		pr_err("bitmask should have >= 1 bits and be contiguous\n");
+		pr_err("bitmask should have >=%d bits and be contiguous\n",
+			 min_bitmask_len);
 		return -EINVAL;
 	}
 
@@ -236,7 +285,8 @@ static void __cpu_cbm_update(void *info)
 }
 
 /*
- * cbm_update_all() - Update the cache bit mask for all packages.
+ * cbm_update_all() - Update the cache bit mask for
+ * all packages.
  */
 static inline void cbm_update_all(unsigned int closid)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide
  2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
                   ` (5 preceding siblings ...)
  2015-05-02  1:36 ` [PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration Vikas Shivappa
@ 2015-05-02  1:36 ` Vikas Shivappa
  6 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-02  1:36 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, peter.zijlstra, h.peter.anvin, kanaka.d.juvva,
	vikas.shivappa

Adds a description of Cache allocation technology, overview
of kernel implementation and usage of CAT cgroup interface.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 Documentation/cgroups/rdt.txt | 180 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)
 create mode 100644 Documentation/cgroups/rdt.txt

diff --git a/Documentation/cgroups/rdt.txt b/Documentation/cgroups/rdt.txt
new file mode 100644
index 0000000..6051b73
--- /dev/null
+++ b/Documentation/cgroups/rdt.txt
@@ -0,0 +1,180 @@
+        RDT
+        ---
+
+Copyright (C) 2014 Intel Corporation
+Written by vikas.shivappa@linux.intel.com
+(based on contents and format from cpusets.txt)
+
+CONTENTS:
+=========
+
+1. Cache Allocation Technology
+  1.1 What is RDT and CAT ?
+  1.2 Why is CAT needed ?
+  1.3 CAT implementation overview
+  1.4 Assignment of CBM and CLOS
+  1.5 Scheduling and Context Switch
+2. Usage Examples and Syntax
+
+1. Cache Allocation Technology(CAT)
+===================================
+
+1.1 What is RDT and CAT
+-----------------------
+
+CAT is a part of Resource Director Technology(RDT) or Platform Shared
+resource control which provides support to control Platform shared
+resources like cache. Currently Cache is the only resource that is
+supported in RDT.
+More information can be found in the Intel SDM section 17.15.
+
+Cache Allocation Technology provides a way for the Software (OS/VMM)
+to restrict cache allocation to a defined 'subset' of cache which may
+be overlapping with other 'subsets'.  This feature is used when
+allocating a line in cache ie when pulling new data into the cache.
+The programming of the h/w is done via programming  MSRs.
+
+The different cache subsets are identified by CLOS identifier (class
+of service) and each CLOS has a CBM (cache bit mask).  The CBM is a
+contiguous set of bits which defines the amount of cache resource that
+is available for each 'subset'.
+
+1.2 Why is CAT needed
+---------------------
+
+The CAT  enables more cache resources to be made available for higher
+priority applications based on guidance from the execution
+environment.
+
+The architecture also allows dynamically changing these subsets during
+runtime to further optimize the performance of the higher priority
+application with minimal degradation to the low priority app.
+Additionally, resources can be rebalanced for system throughput
+benefit.  (Refer to Section 17.15 in the Intel SDM)
+
+This technique may be useful in managing large computer systems which
+large LLC. Examples may be large servers running  instances of
+webservers or database servers. In such complex systems, these subsets
+can be used for more careful placing of the available cache
+resources.
+
+1.3 CAT implementation Overview
+-------------------------------
+
+Kernel implements a cgroup subsystem to support cache allocation.
+
+Each cgroup has a CLOSid <-> CBM(cache bit mask) mapping.
+A CLOS(Class of service) is represented by a CLOSid.CLOSid is internal
+to the kernel and not exposed to user.  Each cgroup would have one CBM
+and would just represent one cache 'subset'.
+
+The cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
+cgroup never fails.  When a child cgroup is created it inherits the
+CLOSid and the CBM from its parent.  When a user changes the default
+CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
+used before.  The changing of 'cache_mask' may fail with -ERRNOSPC once the
+kernel runs out of maximum CLOSids it can support.
+User can create as many cgroups as he wants but having different CBMs
+at the same time is restricted by the maximum number of CLOSids
+(multiple cgroups can have the same CBM).
+Kernel maintains a CLOSid<->cbm mapping which keeps reference counter
+for each cgroup using a CLOSid.
+
+The tasks in the cgroup would get to fill the LLC cache represented by
+the cgroup's 'cache_mask' file.
+
+Root directory would have all available  bits set in 'cache_mask' file by
+default.
+
+1.4 Assignment of CBM,CLOS
+--------------------------
+
+The 'cache_mask' needs to be a  subset of the parent node's 'cache_mask'.
+Any contiguous subset of these bits(with a minimum of 2 bits) maybe
+set to indicate the cache mapping desired.  The 'cache_mask' between 2
+directories can overlap. The 'cache_mask' would represent the cache 'subset'
+of the CAT cgroup.  For ex: on a system with 16 bits of max cbm bits,
+if the directory has the least significant 4 bits set in its 'cache_mask'
+file(meaning the 'cache_mask' is just 0xf), it would be allocated the right
+quarter of the Last level cache which means the tasks belonging to
+this CAT cgroup can use the right quarter of the cache to fill. If it
+has the most significant 8 bits set ,it would be allocated the left
+half of the cache(8 bits  out of 16 represents 50%).
+
+The cache portion defined in the CBM file is available to all tasks
+within the cgroup to fill and these task are not allowed to allocate
+space in other parts of the cache.
+
+1.5 Scheduling and Context Switch
+---------------------------------
+
+During context switch kernel implements this by writing the
+CLOSid (internally maintained by kernel) of the cgroup to which the
+task belongs to the CPU's IA32_PQR_ASSOC MSR. The MSR is only written
+when there is a change in the CLOSid for the CPU in order to minimize
+the latency incurred during context switch.
+
+2. Usage examples and syntax
+============================
+
+To check if CAT was enabled on your system
+
+dmesg | grep -i intel_rdt
+should output : intel_rdt: Max bitmask length: xx,Max ClosIds: xx
+the length of cache_mask and CLOS should depend on the system you use.
+
+
+Following would mount the cache allocation cgroup subsystem and create
+2 directories. Please refer to Documentation/cgroups/cgroups.txt on
+details about how to use cgroups.
+
+  cd /sys/fs/cgroup
+  mkdir rdt
+  mount -t cgroup -ordt rdt /sys/fs/cgroup/rdt
+  cd rdt
+
+Create 2 rdt cgroups
+
+  mkdir group1
+  mkdir group2
+
+Following are some of the Files in the directory
+
+  ls
+  rdt.cache_mask
+  tasks
+
+Say if the cache is 2MB and cbm supports 16 bits, then setting the
+below allocates the 'right 1/4th(512KB)' of the cache to group2
+
+Edit the CBM for group2 to set the least significant 4 bits.  This
+allocates 'right quarter' of the cache.
+
+  cd group2
+  /bin/echo 0xf > cat.cache_mask
+
+
+Edit the CBM for group2 to set the least significant 8 bits.This
+allocates the right half of the cache to 'group2'.
+
+  cd group2
+  /bin/echo 0xff > rdt.cache_mask
+
+Assign tasks to the group2
+
+  /bin/echo PID1 > tasks
+  /bin/echo PID2 > tasks
+
+  Meaning now threads
+  PID1 and PID2 get to fill the 'right half' of
+  the cache as the belong to cgroup group2.
+
+Create a group under group2
+
+  cd group2
+  mkdir group21
+  cat rdt.cache_mask
+   0xff - inherits parents mask.
+
+  /bin/echo 0xfff > rdt.cache_mask - throws error as mask has to parent's mask's subset
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection
  2015-05-02  1:36 ` [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection Vikas Shivappa
@ 2015-05-02 18:35   ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-02 18:35 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Fri, May 01, 2015 at 06:36:35PM -0700, Vikas Shivappa wrote:
> +#define X86_FEATURE_RDT		( 9*32+15) /* Resource Allocation */
> +#define X86_FEATURE_CAT_L3	(13*32 + 1) /* Cache QOS Enforcement L3 */
> +	/* Cache Allocation Technology values */
> +	u16			x86_cat_cbmlength;
> +	u16			x86_cat_closs;

At the very least be consistent with the silly TLA nonsense.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management
  2015-05-02  1:36 ` [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management Vikas Shivappa
@ 2015-05-02 18:38   ` Peter Zijlstra
  2015-05-04 17:31     ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-02 18:38 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Fri, May 01, 2015 at 06:36:36PM -0700, Vikas Shivappa wrote:
> +static inline bool cat_supported(struct cpuinfo_x86 *c)

Is search and replace really that hard?

> +/*
> +* Called with the rdt_group_mutex held.
> +*/

Whitespace damaged and pointless comment.

> +static int rdt_free_closid(struct intel_rdt *ir)
> +{
> +

superfluous whitespace

> +	lockdep_assert_held(&rdt_group_mutex);
> +
> +	WARN_ON(!ccmap[ir->clos].cgrp_count);
> +	ccmap[ir->clos].cgrp_count--;
> +	if (!ccmap[ir->clos].cgrp_count)
> +		clear_bit(ir->clos, rdtss_info.closmap);
> +
> +	return 0;
> +}


These patches really are very sloppy.. 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-02  1:36 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
@ 2015-05-02 18:46   ` Peter Zijlstra
  2015-05-04 17:30     ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-02 18:46 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva


There's CAT in your subject, make up your minds already on what you want
to call this stuff.

On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote:
> +static void rdt_free_closid(unsigned int clos)
> +{
> +

superfluous whitespace

> +	lockdep_assert_held(&rdt_group_mutex);
> +
> +	clear_bit(clos, rdtss_info.closmap);
> +}

> +static inline bool cbm_is_contiguous(unsigned long var)
> +{
> +	unsigned long first_bit, zero_bit;
> +	unsigned long maxcbm = MAX_CBM_LENGTH;

flip these two lines

> +
> +	if (!var)
> +		return false;
> +
> +	first_bit = find_next_bit(&var, maxcbm, 0);

What was wrong with find_first_bit() ?

> +	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
> +
> +	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
> +		return false;
> +
> +	return true;
> +}
> +
> +static int cat_cbm_read(struct seq_file *m, void *v)
> +{
> +	struct intel_rdt *ir = css_rdt(seq_css(m));
> +
> +	seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask);

inconsistent spacing, you mostly have a whilespace before the return
statement, but here you have not.

> +	return 0;
> +}
> +
> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
> +{
> +	struct intel_rdt *par, *c;
> +	struct cgroup_subsys_state *css;
> +	unsigned long *cbm_tmp;

No reason no to order these on line length is there?

> +
> +	if (!cbm_is_contiguous(cbmvalue)) {
> +		pr_err("bitmask should have >= 1 bits and be contiguous\n");
> +		return -EINVAL;
> +	}
> +
> +	par = parent_rdt(ir);
> +	cbm_tmp = &ccmap[par->clos].cache_mask;
> +	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	rdt_for_each_child(css, ir) {
> +		c = css_rdt(css);
> +		cbm_tmp = &ccmap[c->clos].cache_mask;
> +		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
> +			rcu_read_unlock();
> +			pr_err("Children's mask not a subset\n");
> +			return -EINVAL;
> +		}
> +	}
> +
> +	rcu_read_unlock();

Daft whitespace again.

> +	return 0;
> +}
> +
> +static bool cbm_search(unsigned long cbm, int *closid)
> +{
> +	int maxid = boot_cpu_data.x86_cat_closs;
> +	unsigned int i;
> +
> +	for (i = 0; i < maxid; i++) {
> +		if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) {
> +			*closid = i;
> +			return true;
> +		}
> +	}

and again

> +	return false;
> +}
> +
> +static void cbmmap_dump(void)
> +{
> +	int i;
> +
> +	pr_debug("CBMMAP\n");
> +	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
> +		pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n",
> +		 (unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt);

This is missing {}

> +}
> +
> +static void __cpu_cbm_update(void *info)
> +{
> +	unsigned int closid = *((unsigned int *)info);
> +
> +	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask);
> +}

> +static int cat_cbm_write(struct cgroup_subsys_state *css,
> +				 struct cftype *cft, u64 cbmvalue)
> +{
> +	struct intel_rdt *ir = css_rdt(css);
> +	ssize_t err = 0;
> +	unsigned long cache_mask, max_mask;
> +	unsigned long *cbm_tmp;
> +	unsigned int closid;
> +	u32 max_cbm = boot_cpu_data.x86_cat_cbmlength;

That's just a right mess isn't it?

> +
> +	if (ir == &rdt_root_group)
> +		return -EPERM;
> +	bitmap_set(&max_mask, 0, max_cbm);
> +
> +	/*
> +	 * Need global mutex as cbm write may allocate a closid.
> +	 */
> +	mutex_lock(&rdt_group_mutex);
> +	bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
> +	cbm_tmp = &ccmap[ir->clos].cache_mask;
> +
> +	if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
> +		goto out;
> +
> +	err = validate_cbm(ir, cache_mask);
> +	if (err)
> +		goto out;
> +
> +	/*
> +	 * At this point we are sure to change the cache_mask.Hence release the
> +	 * reference to the current CLOSid and try to get a reference for
> +	 * a different CLOSid.
> +	 */
> +	__clos_put(ir->clos);
> +
> +	if (cbm_search(cache_mask, &closid)) {
> +		ir->clos = closid;
> +		__clos_get(closid);
> +	} else {
> +		err = rdt_alloc_closid(ir);
> +		if (err)
> +			goto out;
> +
> +		ccmap[ir->clos].cache_mask = cache_mask;
> +		cbm_update_all(ir->clos);
> +	}
> +
> +	cbmmap_dump();
> +out:
> +

Daft whitespace again.. Also inconsistent return paradigm, here you use
an out label, where in validate_cbm() you did rcu_read_unlock() and
return from the middle.

> +	mutex_unlock(&rdt_group_mutex);
> +	return err;
> +}
> +
> +static inline bool rdt_update_cpumask(int cpu)
> +{
> +	int phys_id = topology_physical_package_id(cpu);
> +	struct cpumask *mask = &rdt_cpumask;
> +	int i;
> +
> +	for_each_cpu(i, mask) {
> +		if (phys_id == topology_physical_package_id(i))
> +			return false;
> +	}
> +
> +	cpumask_set_cpu(cpu, mask);

More daft whitespace

> +	return true;
> +}
> +
> +/*
> + * rdt_cpu_start() - If a new package has come up, update all
> + * the Cache bitmasks on the package.
> + */
> +static inline void rdt_cpu_start(int cpu)
> +{
> +	mutex_lock(&rdt_group_mutex);
> +	if (rdt_update_cpumask(cpu))
> +		cbm_update_msrs(cpu);
> +	mutex_unlock(&rdt_group_mutex);
> +}
> +
> +static void rdt_cpu_exit(unsigned int cpu)
> +{
> +	int phys_id = topology_physical_package_id(cpu);
> +	int i;
> +
> +	mutex_lock(&rdt_group_mutex);
> +	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
> +		mutex_unlock(&rdt_group_mutex);
> +		return;

And here we return from the middle again..

> +	}
> +
> +	for_each_online_cpu(i) {
> +		if (i == cpu)
> +			continue;
> +
> +		if (phys_id == topology_physical_package_id(i)) {
> +			cpumask_set_cpu(i, &rdt_cpumask);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&rdt_group_mutex);
> +}

/me tired and gives up..

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-02  1:36 ` [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT Vikas Shivappa
@ 2015-05-02 18:51   ` Peter Zijlstra
  2015-05-04 18:39     ` Vikas Shivappa
  2015-05-06  0:19     ` Vikas Shivappa
  0 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-02 18:51 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote:
> Adds support for IA32_PQR_ASSOC MSR writes during task scheduling.
> 
> The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
> CLOSid. During context switch kernel implements this by writing the
> CLOSid of the cgroup to which the task belongs to the CPU's
> IA32_PQR_ASSOC MSR.
> 
> For Cache Allocation, this would let the task fill in the cache 'subset'
> represented by the cgroup's Cache bit mask(CBM).
> 

Are you guys for real? Have you even looked at the trainwreck this
makes?

> +static inline bool rdt_enabled(void)
> +{
> +	return static_key_false(&rdt_enable_key);
> +}

> +/*
> + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
> + * if the current Closid is different than the new one.
> + */
> +static inline void rdt_sched_in(struct task_struct *task)
> +{
> +	struct intel_rdt *ir;
> +	unsigned int clos;
> +
> +	if (!rdt_enabled())
> +		return;
> +
> +	/*
> +	 * This needs to be fixed
> +	 * to cache the whole PQR instead of just CLOSid.
> +	 * PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
> +	 * Should not write a 0 to the low 10 bits of PQR
> +	 * and corrupt RMID.
> +	 */
> +	clos = this_cpu_read(x86_cpu_clos);
> +
> +	rcu_read_lock();
> +	ir = task_rdt(task);
> +	if (ir->clos == clos) {
> +		rcu_read_unlock();
> +		return;
> +	}
> +
> +	wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
> +	this_cpu_write(x86_cpu_clos, ir->clos);
> +	rcu_read_unlock();
> +}

You inject _ALL_ that into the scheduler hot path. Insane much?

> +
> +#else
> +
> +static inline void rdt_sched_in(struct task_struct *task) {}
> +
>  #endif
>  #endif
> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
> index 751bf4b..82ef4b3 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -8,6 +8,9 @@ struct tss_struct;
>  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>  		      struct tss_struct *tss);
>  
> +#include <asm/intel_rdt.h>
> +#define post_arch_switch(current)	rdt_sched_in(current)
> +
>  #ifdef CONFIG_X86_32
>  
>  #ifdef CONFIG_CC_STACKPROTECTOR

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f9123a8..cacb490 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  	prev_state = prev->state;
>  	vtime_task_switch(prev);
>  	finish_arch_switch(prev);
> +	post_arch_switch(current);
>  	perf_event_task_sched_in(prev, current);
>  	finish_lock_switch(rq, prev);
>  	finish_arch_post_lock_switch();

Not a word in the Changelog on this hook; that's double fail.

Please _THINK_ before writing code.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-02 18:46   ` Peter Zijlstra
@ 2015-05-04 17:30     ` Vikas Shivappa
  2015-05-06  8:09       ` Peter Zijlstra
  2015-05-06  8:11       ` Peter Zijlstra
  0 siblings, 2 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-04 17:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, Matt Fleming, Auld, Will, peter.zijlstra,
	h.peter.anvin, Juvva, Kanaka D



On Sat, 2 May 2015, Peter Zijlstra wrote:

>
> There's CAT in your subject, make up your minds already on what you want
> to call this stuff.

We dont have control over the names.It is clear from the patch 0/7 where 
its explained that RDT is 
the umbrella term and CAT is a part of it and this patch series is only for CAT 
... It also mentions what exact section of the Intel manual this refers to. Is 
there still some lack of clarification here ?
If its just your disliking the term thats already known.

would have received suggestions for some fancy names 
like venilla / or icecream or burger and spent time deciding on the names but 
we cant do that to keep consistent with the rest of naming.

>
> On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote:
>> +static void rdt_free_closid(unsigned int clos)
>> +{
>> +
>
> superfluous whitespace

Will fix the whitespace issues (including before return) or other possible 
coding convention issues.

It could be more of a common sense to have this in checkpatch rather that manually having to 
pointing out. If you want to have fun with that go for it though.
An automated approach would make the time taken much 
smaller in terms of reviewer's time and for people submiting the code 
as well.
They are all run against checkpatch.

>
>> +	lockdep_assert_held(&rdt_group_mutex);
>> +
>> +	clear_bit(clos, rdtss_info.closmap);
>> +}
>
>> +static inline bool cbm_is_contiguous(unsigned long var)
>> +{
>> +	unsigned long first_bit, zero_bit;
>> +	unsigned long maxcbm = MAX_CBM_LENGTH;
>
> flip these two lines
>
>> +
>> +	if (!var)
>> +		return false;
>> +
>> +	first_bit = find_next_bit(&var, maxcbm, 0);
>
> What was wrong with find_first_bit() ?
>
>> +	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
>> +
>> +	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static int cat_cbm_read(struct seq_file *m, void *v)
>> +{
>> +	struct intel_rdt *ir = css_rdt(seq_css(m));
>> +
>> +	seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask);
>
> inconsistent spacing, you mostly have a whilespace before the return
> statement, but here you have not.
>
>> +	return 0;
>> +}
>> +
>> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
>> +{
>> +	struct intel_rdt *par, *c;
>> +	struct cgroup_subsys_state *css;
>> +	unsigned long *cbm_tmp;
>
> No reason no to order these on line length is there?
>
>> +
>> +	if (!cbm_is_contiguous(cbmvalue)) {
>> +		pr_err("bitmask should have >= 1 bits and be contiguous\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	par = parent_rdt(ir);
>> +	cbm_tmp = &ccmap[par->clos].cache_mask;
>> +	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
>> +		return -EINVAL;
>> +
>> +	rcu_read_lock();
>> +	rdt_for_each_child(css, ir) {
>> +		c = css_rdt(css);
>> +		cbm_tmp = &ccmap[c->clos].cache_mask;
>> +		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
>> +			rcu_read_unlock();
>> +			pr_err("Children's mask not a subset\n");
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	rcu_read_unlock();
>
> Daft whitespace again.
>
>> +	return 0;
>> +}
>> +
>> +static bool cbm_search(unsigned long cbm, int *closid)
>> +{
>> +	int maxid = boot_cpu_data.x86_cat_closs;
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < maxid; i++) {
>> +		if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) {
>> +			*closid = i;
>> +			return true;
>> +		}
>> +	}
>
> and again
>
>> +	return false;
>> +}
>> +
>> +static void cbmmap_dump(void)
>> +{
>> +	int i;
>> +
>> +	pr_debug("CBMMAP\n");
>> +	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
>> +		pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n",
>> +		 (unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt);
>
> This is missing {}
>
>> +}
>> +
>> +static void __cpu_cbm_update(void *info)
>> +{
>> +	unsigned int closid = *((unsigned int *)info);
>> +
>> +	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask);
>> +}
>
>> +static int cat_cbm_write(struct cgroup_subsys_state *css,
>> +				 struct cftype *cft, u64 cbmvalue)
>> +{
>> +	struct intel_rdt *ir = css_rdt(css);
>> +	ssize_t err = 0;
>> +	unsigned long cache_mask, max_mask;
>> +	unsigned long *cbm_tmp;
>> +	unsigned int closid;
>> +	u32 max_cbm = boot_cpu_data.x86_cat_cbmlength;
>
> That's just a right mess isn't it?
>
>> +
>> +	if (ir == &rdt_root_group)
>> +		return -EPERM;
>> +	bitmap_set(&max_mask, 0, max_cbm);
>> +
>> +	/*
>> +	 * Need global mutex as cbm write may allocate a closid.
>> +	 */
>> +	mutex_lock(&rdt_group_mutex);
>> +	bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
>> +	cbm_tmp = &ccmap[ir->clos].cache_mask;
>> +
>> +	if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
>> +		goto out;
>> +
>> +	err = validate_cbm(ir, cache_mask);
>> +	if (err)
>> +		goto out;
>> +
>> +	/*
>> +	 * At this point we are sure to change the cache_mask.Hence release the
>> +	 * reference to the current CLOSid and try to get a reference for
>> +	 * a different CLOSid.
>> +	 */
>> +	__clos_put(ir->clos);
>> +
>> +	if (cbm_search(cache_mask, &closid)) {
>> +		ir->clos = closid;
>> +		__clos_get(closid);
>> +	} else {
>> +		err = rdt_alloc_closid(ir);
>> +		if (err)
>> +			goto out;
>> +
>> +		ccmap[ir->clos].cache_mask = cache_mask;
>> +		cbm_update_all(ir->clos);
>> +	}
>> +
>> +	cbmmap_dump();
>> +out:
>> +
>
> Daft whitespace again.. Also inconsistent return paradigm, here you use
> an out label, where in validate_cbm() you did rcu_read_unlock() and
> return from the middle.
>
>> +	mutex_unlock(&rdt_group_mutex);
>> +	return err;
>> +}
>> +
>> +static inline bool rdt_update_cpumask(int cpu)
>> +{
>> +	int phys_id = topology_physical_package_id(cpu);
>> +	struct cpumask *mask = &rdt_cpumask;
>> +	int i;
>> +
>> +	for_each_cpu(i, mask) {
>> +		if (phys_id == topology_physical_package_id(i))
>> +			return false;
>> +	}
>> +
>> +	cpumask_set_cpu(cpu, mask);
>
> More daft whitespace
>
>> +	return true;
>> +}
>> +
>> +/*
>> + * rdt_cpu_start() - If a new package has come up, update all
>> + * the Cache bitmasks on the package.
>> + */
>> +static inline void rdt_cpu_start(int cpu)
>> +{
>> +	mutex_lock(&rdt_group_mutex);
>> +	if (rdt_update_cpumask(cpu))
>> +		cbm_update_msrs(cpu);
>> +	mutex_unlock(&rdt_group_mutex);
>> +}
>> +
>> +static void rdt_cpu_exit(unsigned int cpu)
>> +{
>> +	int phys_id = topology_physical_package_id(cpu);
>> +	int i;
>> +
>> +	mutex_lock(&rdt_group_mutex);
>> +	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
>> +		mutex_unlock(&rdt_group_mutex);
>> +		return;
>
> And here we return from the middle again..
>
>> +	}
>> +
>> +	for_each_online_cpu(i) {
>> +		if (i == cpu)
>> +			continue;
>> +
>> +		if (phys_id == topology_physical_package_id(i)) {
>> +			cpumask_set_cpu(i, &rdt_cpumask);
>> +			break;
>> +		}
>> +	}
>> +	mutex_unlock(&rdt_group_mutex);
>> +}
>
> /me tired and gives up..
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management
  2015-05-02 18:38   ` Peter Zijlstra
@ 2015-05-04 17:31     ` Vikas Shivappa
  0 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-04 17:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, matt.fleming, will.auld, peter.zijlstra,
	h.peter.anvin, kanaka.d.juvva



On Sat, 2 May 2015, Peter Zijlstra wrote:

> On Fri, May 01, 2015 at 06:36:36PM -0700, Vikas Shivappa wrote:
>> +static inline bool cat_supported(struct cpuinfo_x86 *c)
>
> Is search and replace really that hard?
>
>> +/*
>> +* Called with the rdt_group_mutex held.
>> +*/
>
> Whitespace damaged and pointless comment.
>
>> +static int rdt_free_closid(struct intel_rdt *ir)
>> +{
>> +
>
> superfluous whitespace
>
>> +	lockdep_assert_held(&rdt_group_mutex);
>> +
>> +	WARN_ON(!ccmap[ir->clos].cgrp_count);
>> +	ccmap[ir->clos].cgrp_count--;
>> +	if (!ccmap[ir->clos].cgrp_count)
>> +		clear_bit(ir->clos, rdtss_info.closmap);
>> +
>> +	return 0;
>> +}
>
>
> These patches really are very sloppy..

Will resend the updated patch - most of the changes here are removed in next 
patch.

Thanks,
Vikas

>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-02 18:51   ` Peter Zijlstra
@ 2015-05-04 18:39     ` Vikas Shivappa
  2015-05-06  7:48       ` Peter Zijlstra
  2015-05-06  0:19     ` Vikas Shivappa
  1 sibling, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-04 18:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, matt.fleming, will.auld, peter.zijlstra,
	h.peter.anvin, kanaka.d.juvva



On Sat, 2 May 2015, Peter Zijlstra wrote:

> On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote:
>> Adds support for IA32_PQR_ASSOC MSR writes during task scheduling.
>>
>> The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
>> CLOSid. During context switch kernel implements this by writing the
>> CLOSid of the cgroup to which the task belongs to the CPU's
>> IA32_PQR_ASSOC MSR.
>>
>> For Cache Allocation, this would let the task fill in the cache 'subset'
>> represented by the cgroup's Cache bit mask(CBM).
>>
>
> Are you guys for real? Have you even looked at the trainwreck this
> makes?
>
>> +static inline bool rdt_enabled(void)
>> +{
>> +	return static_key_false(&rdt_enable_key);
>> +}
>
>> +/*
>> + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
>> + * if the current Closid is different than the new one.
>> + */
>> +static inline void rdt_sched_in(struct task_struct *task)
>> +{
>> +	struct intel_rdt *ir;
>> +	unsigned int clos;
>> +
>> +	if (!rdt_enabled())
>> +		return;
>> +
>> +	/*
>> +	 * This needs to be fixed
>> +	 * to cache the whole PQR instead of just CLOSid.
>> +	 * PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
>> +	 * Should not write a 0 to the low 10 bits of PQR
>> +	 * and corrupt RMID.
>> +	 */
>> +	clos = this_cpu_read(x86_cpu_clos);
>> +
>> +	rcu_read_lock();
>> +	ir = task_rdt(task);
>> +	if (ir->clos == clos) {
>> +		rcu_read_unlock();
>> +		return;
>> +	}
>> +
>> +	wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
>> +	this_cpu_write(x86_cpu_clos, ir->clos);
>> +	rcu_read_unlock();
>> +}
>
> You inject _ALL_ that into the scheduler hot path. Insane much?

At some point I had a #ifdef for the rdt_sched_in , will fix this.

>
>> +
>> +#else
>> +
>> +static inline void rdt_sched_in(struct task_struct *task) {}
>> +
>>  #endif
>>  #endif
>> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
>> index 751bf4b..82ef4b3 100644
>> --- a/arch/x86/include/asm/switch_to.h
>> +++ b/arch/x86/include/asm/switch_to.h
>> @@ -8,6 +8,9 @@ struct tss_struct;
>>  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>>  		      struct tss_struct *tss);
>>
>> +#include <asm/intel_rdt.h>
>> +#define post_arch_switch(current)	rdt_sched_in(current)
>> +
>>  #ifdef CONFIG_X86_32
>>
>>  #ifdef CONFIG_CC_STACKPROTECTOR
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9123a8..cacb490 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>>  	prev_state = prev->state;
>>  	vtime_task_switch(prev);
>>  	finish_arch_switch(prev);
>> +	post_arch_switch(current);
>>  	perf_event_task_sched_in(prev, current);
>>  	finish_lock_switch(rq, prev);
>>  	finish_arch_post_lock_switch();
>
> Not a word in the Changelog on this hook; that's double fail.

will add the changelog. we want the current task which no other existing hook 
provides.

Thanks,
Vikas

>
> Please _THINK_ before writing code.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-02 18:51   ` Peter Zijlstra
  2015-05-04 18:39     ` Vikas Shivappa
@ 2015-05-06  0:19     ` Vikas Shivappa
  2015-05-06  7:50       ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-06  0:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, vikas.shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, matt.fleming, will.auld, peter.zijlstra,
	h.peter.anvin, kanaka.d.juvva



On Sat, 2 May 2015, Peter Zijlstra wrote:

> On Fri, May 01, 2015 at 06:36:38PM -0700, Vikas Shivappa wrote:
>> Adds support for IA32_PQR_ASSOC MSR writes during task scheduling.
>>
>> The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
>> CLOSid. During context switch kernel implements this by writing the
>> CLOSid of the cgroup to which the task belongs to the CPU's
>> IA32_PQR_ASSOC MSR.
>>
>> For Cache Allocation, this would let the task fill in the cache 'subset'
>> represented by the cgroup's Cache bit mask(CBM).
>>
>
> Are you guys for real? Have you even looked at the trainwreck this
> makes?

The h/w tags the cache lines with the closid and hence we associate the id with 
the tasks scheduled.

The following considerations are done for the PQR MSR write to have minimal 
effect on scheduling hot path:
- This would not exist on any non-intel platforms.
- On Intel platforms, this would not exist by default unless CGROUP_RDT
is enabled.
- remains a no-op when CGROUP_RDT is enabled and intel hardware does not
support the feature.
- When feature is available and RDT is enabled, does not do msr write till 
the user manually creates a cgroup *and* assigns a new cache mask.
Since the
child node inherits the parents cache mask , by cgroup creation there is
no scheduling hot path impact from the new cgroup.
- per cpu PQR values are cached and the MSR write is only done when
there is a task with different PQR is scheduled on the CPU. Typically if
the task groups are bound to be scheduled on a set of CPUs , the number
of MSR writes is greatly reduced.

However Matt pointed out I could improve this to
if (static_key_false)
 	{ rdt_sched_in(); }
instead of a static inline which i will update. Will update the commit 
message to include these details.

And can improve to not enable the feature till a user creates a new cgroup 
and creates a new the bitmask.

Thanks,
Vikas

>
>> +static inline bool rdt_enabled(void)
>> +{
>> +	return static_key_false(&rdt_enable_key);
>> +}
>
>> +/*
>> + * rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
>> + * if the current Closid is different than the new one.
>> + */
>> +static inline void rdt_sched_in(struct task_struct *task)
>> +{
>> +	struct intel_rdt *ir;
>> +	unsigned int clos;
>> +
>> +	if (!rdt_enabled())
>> +		return;
>> +
>> +	/*
>> +	 * This needs to be fixed
>> +	 * to cache the whole PQR instead of just CLOSid.
>> +	 * PQR has closid in high 32 bits and CQM-RMID in low 10 bits.
>> +	 * Should not write a 0 to the low 10 bits of PQR
>> +	 * and corrupt RMID.
>> +	 */
>> +	clos = this_cpu_read(x86_cpu_clos);
>> +
>> +	rcu_read_lock();
>> +	ir = task_rdt(task);
>> +	if (ir->clos == clos) {
>> +		rcu_read_unlock();
>> +		return;
>> +	}
>> +
>> +	wrmsr(MSR_IA32_PQR_ASSOC, 0, ir->clos);
>> +	this_cpu_write(x86_cpu_clos, ir->clos);
>> +	rcu_read_unlock();
>> +}
>
> You inject _ALL_ that into the scheduler hot path. Insane much?
>
>> +
>> +#else
>> +
>> +static inline void rdt_sched_in(struct task_struct *task) {}
>> +
>>  #endif
>>  #endif
>> diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
>> index 751bf4b..82ef4b3 100644
>> --- a/arch/x86/include/asm/switch_to.h
>> +++ b/arch/x86/include/asm/switch_to.h
>> @@ -8,6 +8,9 @@ struct tss_struct;
>>  void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>>  		      struct tss_struct *tss);
>>
>> +#include <asm/intel_rdt.h>
>> +#define post_arch_switch(current)	rdt_sched_in(current)
>> +
>>  #ifdef CONFIG_X86_32
>>
>>  #ifdef CONFIG_CC_STACKPROTECTOR
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f9123a8..cacb490 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>>  	prev_state = prev->state;
>>  	vtime_task_switch(prev);
>>  	finish_arch_switch(prev);
>> +	post_arch_switch(current);
>>  	perf_event_task_sched_in(prev, current);
>>  	finish_lock_switch(rq, prev);
>>  	finish_arch_post_lock_switch();
>
> Not a word in the Changelog on this hook; that's double fail.
>
> Please _THINK_ before writing code.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-04 18:39     ` Vikas Shivappa
@ 2015-05-06  7:48       ` Peter Zijlstra
  2015-05-07 23:15         ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-06  7:48 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Mon, May 04, 2015 at 11:39:21AM -0700, Vikas Shivappa wrote:

> >>--- a/arch/x86/include/asm/switch_to.h
> >>+++ b/arch/x86/include/asm/switch_to.h
> >>@@ -8,6 +8,9 @@ struct tss_struct;
> >> void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> >> 		      struct tss_struct *tss);
> >>
> >>+#include <asm/intel_rdt.h>
> >>+#define post_arch_switch(current)	rdt_sched_in(current)
> >>+
> >> #ifdef CONFIG_X86_32
> >>
> >> #ifdef CONFIG_CC_STACKPROTECTOR
> >
> >>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>index f9123a8..cacb490 100644
> >>--- a/kernel/sched/core.c
> >>+++ b/kernel/sched/core.c
> >>@@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> >> 	prev_state = prev->state;
> >> 	vtime_task_switch(prev);
> >> 	finish_arch_switch(prev);
> >>+	post_arch_switch(current);
> >> 	perf_event_task_sched_in(prev, current);
> >> 	finish_lock_switch(rq, prev);
> >> 	finish_arch_post_lock_switch();
> >
> >Not a word in the Changelog on this hook; that's double fail.
> 
> will add the changelog. we want the current task which no other existing
> hook provides.

No.

 1) two arch hooks right after one another is FAIL
 1a) just 'fix' the existing hook
 2) current is cheap and easily obtainable without passing it as
    an argument
 3) why do you need the hook in the first place?
 3a) why can't you put this in __switch_to()? This is very much x86 only
     code.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-06  0:19     ` Vikas Shivappa
@ 2015-05-06  7:50       ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-06  7:50 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Tue, May 05, 2015 at 05:19:40PM -0700, Vikas Shivappa wrote:
> However Matt pointed out I could improve this to
> if (static_key_false)
> 	{ rdt_sched_in(); }
> instead of a static inline which i will update. Will update the commit
> message to include these details.

Indeed, that causes minimal I$ bloat / impact for people not using this
feature.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-04 17:30     ` Vikas Shivappa
@ 2015-05-06  8:09       ` Peter Zijlstra
  2015-05-06  8:30         ` Matt Fleming
  2015-05-06 16:48         ` Vikas Shivappa
  2015-05-06  8:11       ` Peter Zijlstra
  1 sibling, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-06  8:09 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	Matt Fleming, Auld, Will, peter.zijlstra, h.peter.anvin, Juvva,
	Kanaka D

On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
> On Sat, 2 May 2015, Peter Zijlstra wrote:
> 
> >
> >There's CAT in your subject, make up your minds already on what you want
> >to call this stuff.
> 
> We dont have control over the names.It is clear from the patch 0/7 where its

If I read 0/n its _after_ I've read all the other patches. The thing is,
0/n should not contain anything persistent. Patches should stand on
their own.

> explained that RDT is the umbrella term and CAT is a part of it and this
> patch series is only for CAT ... It also mentions what exact section of the
> Intel manual this refers to. Is there still some lack of clarification here
> ?

But we're not implementing an umbrella right? We're implementing Cache
QoS Enforcement (CQE aka. CAT).

Why confuse things with calling it random other names?

>From what I understand the whole RDT thing is the umbrella term for
Cache QoS Monitoring and Enforcement together. CQM is implemented
elsewhere, this part is only implementing CQE.

So just call it that, calling it RDT is actively misleading, because it
explicitly does _NOT_ do the monitoring half of it.

> If its just your disliking the term thats already known.

I think its crazy to go CQE no CAT no RDT, but I could get over that in
time. But now it turns out you need _both_, and that's even more silly.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-04 17:30     ` Vikas Shivappa
  2015-05-06  8:09       ` Peter Zijlstra
@ 2015-05-06  8:11       ` Peter Zijlstra
  2015-05-06 18:09         ` Vikas Shivappa
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-06  8:11 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	Matt Fleming, Auld, Will, peter.zijlstra, h.peter.anvin, Juvva,
	Kanaka D

On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
> Will fix the whitespace issues (including before return) or other possible
> coding convention issues.
> 
> It could be more of a common sense to have this in checkpatch rather that
> manually having to pointing out. If you want to have fun with that go for it
> though.

My main objection was that your coding style is entirely inconsistent
with itself.

Sometimes you have a whitespace before return, sometimes you do not.

Sometimes you have exit labels with locks, sometimes you do not.

etc..

Pick one stick to it; although we'd all much prefer if you pick the one
that's common to the kernel.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-06  8:09       ` Peter Zijlstra
@ 2015-05-06  8:30         ` Matt Fleming
  2015-05-06 16:48         ` Vikas Shivappa
  1 sibling, 0 replies; 35+ messages in thread
From: Matt Fleming @ 2015-05-06  8:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, Auld, Will, peter.zijlstra, h.peter.anvin, Juvva,
	Kanaka D

On Wed, 2015-05-06 at 10:09 +0200, Peter Zijlstra wrote:
> 
> But we're not implementing an umbrella right? We're implementing Cache
> QoS Enforcement (CQE aka. CAT).
> 
> Why confuse things with calling it random other names?
> 
> From what I understand the whole RDT thing is the umbrella term for
> Cache QoS Monitoring and Enforcement together. CQM is implemented
> elsewhere, this part is only implementing CQE.
> 
> So just call it that, calling it RDT is actively misleading, because it
> explicitly does _NOT_ do the monitoring half of it.

Right, and we're already running into this problem where some of the
function names contain "rdt" and some contain "cat".

How about we go with "intel cache alloc"? We avoid the dreaded TLA-fest,
it clearly matches up with what's in the Software Developer's Manual
(Cache Allocation Technology) and it's pretty simple for people who
haven't read the SDM to understand.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-06  8:09       ` Peter Zijlstra
  2015-05-06  8:30         ` Matt Fleming
@ 2015-05-06 16:48         ` Vikas Shivappa
  1 sibling, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-06 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, Matt Fleming, Auld, Will, peter.zijlstra,
	h.peter.anvin, Juvva, Kanaka D



On Wed, 6 May 2015, Peter Zijlstra wrote:

> On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
>> On Sat, 2 May 2015, Peter Zijlstra wrote:
>>
>>>
>>> There's CAT in your subject, make up your minds already on what you want
>>> to call this stuff.
>>
>> We dont have control over the names.It is clear from the patch 0/7 where its
>
> If I read 0/n its _after_ I've read all the other patches. The thing is,
> 0/n should not contain anything persistent. Patches should stand on
> their own.
>
>> explained that RDT is the umbrella term and CAT is a part of it and this
>> patch series is only for CAT ... It also mentions what exact section of the
>> Intel manual this refers to. Is there still some lack of clarification here
>> ?
>
> But we're not implementing an umbrella right? We're implementing Cache
> QoS Enforcement (CQE aka. CAT).

In some sense we are - The idea was that the same rdt cgroup would include other 
features in the rdt and 
cache allocation is one of them. Hence the cgroup name is RDT. Like Matt just 
commented we just found some naming issues with respect to the APIs whether to 
use cat or rdt. I can plan to remove the 'cat' altogether and use cache alloc 
as I just learnt it may be not liked because its cat(mean an animal.. in english 
:) )

Only reason to do it now is that we cant change the cgroup name later.

>
> Why confuse things with calling it random other names?
>
> From what I understand the whole RDT thing is the umbrella term for
> Cache QoS Monitoring and Enforcement together. CQM is implemented
> elsewhere, this part is only implementing CQE.
>
> So just call it that, calling it RDT is actively misleading, because it
> explicitly does _NOT_ do the monitoring half of it.
>
>> If its just your disliking the term thats already known.
>
> I think its crazy to go CQE no CAT no RDT, but I could get over that in
> time. But now it turns out you need _both_, and that's even more silly.

I agree the changing of names have led to enough confusion and we can try to see 
that if we can make this better in coming features.
I will send fixes where I will try to be more clear on names. 
Basically have rdt for things which would be common to cache alloc and all other 
features and keep cache alloc which is specific to cache alloc (like the 
cache bit mask is only for cache alloc)..

>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-05-06  8:11       ` Peter Zijlstra
@ 2015-05-06 18:09         ` Vikas Shivappa
  0 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-06 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, Matt Fleming, Auld, Will, peter.zijlstra,
	h.peter.anvin, Juvva, Kanaka D



On Wed, 6 May 2015, Peter Zijlstra wrote:

> On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
>> Will fix the whitespace issues (including before return) or other possible
>> coding convention issues.
>>
>> It could be more of a common sense to have this in checkpatch rather that
>> manually having to pointing out. If you want to have fun with that go for it
>> though.
>
> My main objection was that your coding style is entirely inconsistent
> with itself.
>
> Sometimes you have a whitespace before return, sometimes you do not.
>
> Sometimes you have exit labels with locks, sometimes you do not.
>
> etc..
>
> Pick one stick to it; although we'd all much prefer if you pick the one
> that's common to the kernel.

Will fix the convention issues.

Thanks,
Vikas

>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-06  7:48       ` Peter Zijlstra
@ 2015-05-07 23:15         ` Vikas Shivappa
  2015-05-08  8:59           ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-07 23:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, matt.fleming, will.auld, peter.zijlstra,
	h.peter.anvin, kanaka.d.juvva



On Wed, 6 May 2015, Peter Zijlstra wrote:

> On Mon, May 04, 2015 at 11:39:21AM -0700, Vikas Shivappa wrote:
>
>>>> --- a/arch/x86/include/asm/switch_to.h
>>>> +++ b/arch/x86/include/asm/switch_to.h
>>>> @@ -8,6 +8,9 @@ struct tss_struct;
>>>> void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>>>> 		      struct tss_struct *tss);
>>>>
>>>> +#include <asm/intel_rdt.h>
>>>> +#define post_arch_switch(current)	rdt_sched_in(current)
>>>> +
>>>> #ifdef CONFIG_X86_32
>>>>
>>>> #ifdef CONFIG_CC_STACKPROTECTOR
>>>
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index f9123a8..cacb490 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -2241,6 +2241,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>>>> 	prev_state = prev->state;
>>>> 	vtime_task_switch(prev);
>>>> 	finish_arch_switch(prev);
>>>> +	post_arch_switch(current);
>>>> 	perf_event_task_sched_in(prev, current);
>>>> 	finish_lock_switch(rq, prev);
>>>> 	finish_arch_post_lock_switch();
>>>
>>> Not a word in the Changelog on this hook; that's double fail.
>>
>> will add the changelog. we want the current task which no other existing
>> hook provides.
>
> No.
>
> 1) two arch hooks right after one another is FAIL
> 1a) just 'fix' the existing hook
> 2) current is cheap and easily obtainable without passing it as
>    an argument

will fix to just use an existing hook in finish_task_switch and 
current(get_current) since the stack would already be changed ..

Thanks,
Vikas

> 3) why do you need the hook in the first place?
> 3a) why can't you put this in __switch_to()? This is very much x86 only
>     code.
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-07 23:15         ` Vikas Shivappa
@ 2015-05-08  8:59           ` Peter Zijlstra
  2015-05-08 20:55             ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2015-05-08  8:59 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, x86, hpa, tglx, mingo, tj,
	matt.fleming, will.auld, peter.zijlstra, h.peter.anvin,
	kanaka.d.juvva

On Thu, May 07, 2015 at 04:15:41PM -0700, Vikas Shivappa wrote:
> >No.
> >
> >1) two arch hooks right after one another is FAIL
> >1a) just 'fix' the existing hook
> >2) current is cheap and easily obtainable without passing it as
> >   an argument
> 
> will fix to just use an existing hook in finish_task_switch and
> current(get_current) since the stack would already be changed ..
> 
> Thanks,
> Vikas
> 
> >3) why do you need the hook in the first place?
> >3a) why can't you put this in __switch_to()? This is very much x86 only
> >    code.

^ please also answer 3, why can't this go in __swtich_to()?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT
  2015-05-08  8:59           ` Peter Zijlstra
@ 2015-05-08 20:55             ` Vikas Shivappa
  0 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-05-08 20:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, x86, hpa, tglx,
	mingo, tj, Matt Fleming, Auld, Will, peter.zijlstra,
	h.peter.anvin, Juvva, Kanaka D



On Fri, 8 May 2015, Peter Zijlstra wrote:

> On Thu, May 07, 2015 at 04:15:41PM -0700, Vikas Shivappa wrote:
>>> No.
>>>
>>> 1) two arch hooks right after one another is FAIL
>>> 1a) just 'fix' the existing hook
>>> 2) current is cheap and easily obtainable without passing it as
>>>   an argument
>>
>> will fix to just use an existing hook in finish_task_switch and
>> current(get_current) since the stack would already be changed ..
>>
>> Thanks,
>> Vikas
>>
>>> 3) why do you need the hook in the first place?
>>> 3a) why can't you put this in __switch_to()? This is very much x86 only
>>>    code.
>
> ^ please also answer 3, why can't this go in __swtich_to()?

perf uses similar(#1a) hook to update its MSRs (including for cache 
monitoring ).
Also since switch_to is for registers state and stack , may be a 
safer option to use it in the finish_arch_switch? Thats kind of why we had it 
there at first but had a seperate hook.

#define finish_arch_switch(prev)					\
do {									\
 	intel_rdt_sched_in();						\
} while (0)


Hpa , any comments ?

>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-04-09 20:56   ` Marcelo Tosatti
@ 2015-04-13  2:36     ` Vikas Shivappa
  0 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-04-13  2:36 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Vikas Shivappa, vikas.shivappa, x86, linux-kernel, hpa, tglx,
	mingo, tj, peterz, matt.fleming, will.auld, glenn.p.williamson,
	kanaka.d.juvva



On Thu, 9 Apr 2015, Marcelo Tosatti wrote:

> On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote:
>> Add support for cache bit mask manipulation. The change adds a file to
>> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
>>
>> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
>> cgroup never fails.  When a child cgroup is created it inherits the
>> CLOSid and the CBM from its parent.  When a user changes the default
>> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
>> used before. If the new CBM is the one that is already used, the
>> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
>> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
>> can support.
>> User can create as many cgroups as he wants but having different CBMs
>> at the same time is restricted by the maximum number of CLOSids
>> (multiple cgroups can have the same CBM).
>> Kernel maintains a CLOSid<->cbm mapping which keeps count
>> of cgroups using a CLOSid.
>>
>> The tasks in the CAT cgroup would get to fill the LLC cache represented
>> by the cgroup's 'cbm' file.
>>
>> Reuse of CLOSids for cgroups with same bitmask also has following
>> advantages:
>> - This helps to use the scant CLOSids optimally.
>> - This also implies that during context switch, write to PQR-MSR is done
>> only when a task with a different bitmask is scheduled in.
>>
>> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
>> ---
>>  arch/x86/include/asm/intel_rdt.h |   3 +
>>  arch/x86/kernel/cpu/intel_rdt.c  | 205 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 208 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
>> index 87af1a5..0ed28d9 100644
>> --- a/arch/x86/include/asm/intel_rdt.h
>> +++ b/arch/x86/include/asm/intel_rdt.h
>> @@ -4,6 +4,9 @@
>>  #ifdef CONFIG_CGROUP_RDT
>>
>>  #include <linux/cgroup.h>
>> +#define MAX_CBM_LENGTH			32
>> +#define IA32_L3_CBM_BASE		0xc90
>> +#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
>>
>>  struct rdt_subsys_info {
>>  	/* Clos Bitmap to keep track of available CLOSids.*/
>> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
>> index 3726f41..495497a 100644
>> --- a/arch/x86/kernel/cpu/intel_rdt.c
>> +++ b/arch/x86/kernel/cpu/intel_rdt.c
>> @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
>>  static DEFINE_MUTEX(rdt_group_mutex);
>>  struct intel_rdt rdt_root_group;
>>
>> +#define rdt_for_each_child(pos_css, parent_ir)		\
>> +	css_for_each_child((pos_css), &(parent_ir)->css)
>> +
>>  static inline bool cat_supported(struct cpuinfo_x86 *c)
>>  {
>>  	if (cpu_has(c, X86_FEATURE_CAT_L3))
>> @@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
>>  late_initcall(rdt_late_init);
>>
>>  /*
>> + * Allocates a new closid from unused closids.
>> + * Called with the rdt_group_mutex held.
>> + */
>> +
>> +static int rdt_alloc_closid(struct intel_rdt *ir)
>> +{
>> +	unsigned int id;
>> +	unsigned int maxid;
>> +
>> +	lockdep_assert_held(&rdt_group_mutex);
>> +
>> +	maxid = boot_cpu_data.x86_cat_closs;
>> +	id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
>> +	if (id == maxid)
>> +		return -ENOSPC;
>> +
>> +	set_bit(id, rdtss_info.closmap);
>> +	WARN_ON(ccmap[id].cgrp_count);
>> +	ccmap[id].cgrp_count++;
>> +	ir->clos = id;
>> +
>> +	return 0;
>> +}
>> +
>> +/*
>>  * Called with the rdt_group_mutex held.
>>  */
>>  static int rdt_free_closid(struct intel_rdt *ir)
>> @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
>>  	mutex_unlock(&rdt_group_mutex);
>>  }
>>
>> +/*
>> + * Tests if atleast two contiguous bits are set.
>> + */
>> +
>> +static inline bool cbm_is_contiguous(unsigned long var)
>> +{
>> +	unsigned long first_bit, zero_bit;
>> +	unsigned long maxcbm = MAX_CBM_LENGTH;
>> +
>> +	if (bitmap_weight(&var, maxcbm) < 2)
>> +		return false;
>> +
>> +	first_bit = find_next_bit(&var, maxcbm, 0);
>> +	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
>> +
>> +	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static int cat_cbm_read(struct seq_file *m, void *v)
>> +{
>> +	struct intel_rdt *ir = css_rdt(seq_css(m));
>> +
>> +	seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
>> +	return 0;
>> +}
>> +
>> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
>> +{
>> +	struct intel_rdt *par, *c;
>> +	struct cgroup_subsys_state *css;
>> +	unsigned long *cbm_tmp;
>> +
>> +	if (!cbm_is_contiguous(cbmvalue)) {
>> +		pr_info("cbm should have >= 2 bits and be contiguous\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	par = parent_rdt(ir);
>> +	cbm_tmp = &ccmap[par->clos].cbm;
>> +	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
>> +		return -EINVAL;
>
> Can you have different errors for the different cases?

Could use -EPER

>
>> +	rcu_read_lock();
>> +	rdt_for_each_child(css, ir) {
>> +		c = css_rdt(css);
>> +		cbm_tmp = &ccmap[c->clos].cbm;
>> +		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
>> +			pr_info("Children's mask not a subset\n");
>> +			rcu_read_unlock();
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	rcu_read_unlock();
>> +	return 0;
>> +}
>> +
>> +static bool cbm_search(unsigned long cbm, int *closid)
>> +{
>> +	int maxid = boot_cpu_data.x86_cat_closs;
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < maxid; i++)
>> +		if (bitmap_equal(&cbm, &ccmap[i].cbm, MAX_CBM_LENGTH)) {
>> +			*closid = i;
>> +			return true;
>> +		}
>> +
>> +	return false;
>> +}
>> +
>> +static void cbmmap_dump(void)
>> +{
>> +	int i;
>> +
>> +	pr_debug("CBMMAP\n");
>> +	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
>> +		pr_debug("cbm: 0x%x,cgrp_count: %u\n",
>> +		 (unsigned int)ccmap[i].cbm, ccmap[i].cgrp_count);
>> +}
>> +
>> +static void cpu_cbm_update(void *info)
>> +{
>> +	unsigned int closid = *((unsigned int *)info);
>> +
>> +	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cbm);
>> +}
>> +
>> +static inline void cbm_update(unsigned int closid)
>> +{
>> +	int pkg_id = -1;
>> +	int cpu;
>> +
>> +	for_each_online_cpu(cpu) {
>> +		if (pkg_id == topology_physical_package_id(cpu))
>> +			continue;
>> +		smp_call_function_single(cpu, cpu_cbm_update, &closid, 1);
>> +		pkg_id = topology_physical_package_id(cpu);
>> +
>> +
>
> Can use smp_call_function_many, once, more efficient.
>
> Can this race with CPU hotplug? BTW, on CPU hotplug, where are
> the IA32_L3_MASK_n initialized for the new CPU ?
>

Thanks for pointing out , Will fix this . Think i was terrible when i changed 
the design to not use 
the cpuset did not change the hot cpu update , I remembered an other similar
update needed.The s3 resume needs a fix to the software cache as we used the msr before.

Thanks,
Vikas

>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-03-12 23:16 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
@ 2015-04-09 20:56   ` Marcelo Tosatti
  2015-04-13  2:36     ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2015-04-09 20:56 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, x86, linux-kernel, hpa, tglx, mingo, tj, peterz,
	matt.fleming, will.auld, glenn.p.williamson, kanaka.d.juvva

On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote:
> Add support for cache bit mask manipulation. The change adds a file to
> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
> 
> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
> cgroup never fails.  When a child cgroup is created it inherits the
> CLOSid and the CBM from its parent.  When a user changes the default
> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
> used before. If the new CBM is the one that is already used, the
> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
> can support.
> User can create as many cgroups as he wants but having different CBMs
> at the same time is restricted by the maximum number of CLOSids
> (multiple cgroups can have the same CBM).
> Kernel maintains a CLOSid<->cbm mapping which keeps count
> of cgroups using a CLOSid.
> 
> The tasks in the CAT cgroup would get to fill the LLC cache represented
> by the cgroup's 'cbm' file.
> 
> Reuse of CLOSids for cgroups with same bitmask also has following
> advantages:
> - This helps to use the scant CLOSids optimally.
> - This also implies that during context switch, write to PQR-MSR is done
> only when a task with a different bitmask is scheduled in.
> 
> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
> ---
>  arch/x86/include/asm/intel_rdt.h |   3 +
>  arch/x86/kernel/cpu/intel_rdt.c  | 205 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 208 insertions(+)
> 
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index 87af1a5..0ed28d9 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -4,6 +4,9 @@
>  #ifdef CONFIG_CGROUP_RDT
>  
>  #include <linux/cgroup.h>
> +#define MAX_CBM_LENGTH			32
> +#define IA32_L3_CBM_BASE		0xc90
> +#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
>  
>  struct rdt_subsys_info {
>  	/* Clos Bitmap to keep track of available CLOSids.*/
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 3726f41..495497a 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
>  static DEFINE_MUTEX(rdt_group_mutex);
>  struct intel_rdt rdt_root_group;
>  
> +#define rdt_for_each_child(pos_css, parent_ir)		\
> +	css_for_each_child((pos_css), &(parent_ir)->css)
> +
>  static inline bool cat_supported(struct cpuinfo_x86 *c)
>  {
>  	if (cpu_has(c, X86_FEATURE_CAT_L3))
> @@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
>  late_initcall(rdt_late_init);
>  
>  /*
> + * Allocates a new closid from unused closids.
> + * Called with the rdt_group_mutex held.
> + */
> +
> +static int rdt_alloc_closid(struct intel_rdt *ir)
> +{
> +	unsigned int id;
> +	unsigned int maxid;
> +
> +	lockdep_assert_held(&rdt_group_mutex);
> +
> +	maxid = boot_cpu_data.x86_cat_closs;
> +	id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
> +	if (id == maxid)
> +		return -ENOSPC;
> +
> +	set_bit(id, rdtss_info.closmap);
> +	WARN_ON(ccmap[id].cgrp_count);
> +	ccmap[id].cgrp_count++;
> +	ir->clos = id;
> +
> +	return 0;
> +}
> +
> +/*
>  * Called with the rdt_group_mutex held.
>  */
>  static int rdt_free_closid(struct intel_rdt *ir)
> @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
>  	mutex_unlock(&rdt_group_mutex);
>  }
>  
> +/*
> + * Tests if atleast two contiguous bits are set.
> + */
> +
> +static inline bool cbm_is_contiguous(unsigned long var)
> +{
> +	unsigned long first_bit, zero_bit;
> +	unsigned long maxcbm = MAX_CBM_LENGTH;
> +
> +	if (bitmap_weight(&var, maxcbm) < 2)
> +		return false;
> +
> +	first_bit = find_next_bit(&var, maxcbm, 0);
> +	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
> +
> +	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
> +		return false;
> +
> +	return true;
> +}
> +
> +static int cat_cbm_read(struct seq_file *m, void *v)
> +{
> +	struct intel_rdt *ir = css_rdt(seq_css(m));
> +
> +	seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
> +	return 0;
> +}
> +
> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
> +{
> +	struct intel_rdt *par, *c;
> +	struct cgroup_subsys_state *css;
> +	unsigned long *cbm_tmp;
> +
> +	if (!cbm_is_contiguous(cbmvalue)) {
> +		pr_info("cbm should have >= 2 bits and be contiguous\n");
> +		return -EINVAL;
> +	}
> +
> +	par = parent_rdt(ir);
> +	cbm_tmp = &ccmap[par->clos].cbm;
> +	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
> +		return -EINVAL;

Can you have different errors for the different cases?

> +	rcu_read_lock();
> +	rdt_for_each_child(css, ir) {
> +		c = css_rdt(css);
> +		cbm_tmp = &ccmap[c->clos].cbm;
> +		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
> +			pr_info("Children's mask not a subset\n");
> +			rcu_read_unlock();
> +			return -EINVAL;
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +	return 0;
> +}
> +
> +static bool cbm_search(unsigned long cbm, int *closid)
> +{
> +	int maxid = boot_cpu_data.x86_cat_closs;
> +	unsigned int i;
> +
> +	for (i = 0; i < maxid; i++)
> +		if (bitmap_equal(&cbm, &ccmap[i].cbm, MAX_CBM_LENGTH)) {
> +			*closid = i;
> +			return true;
> +		}
> +
> +	return false;
> +}
> +
> +static void cbmmap_dump(void)
> +{
> +	int i;
> +
> +	pr_debug("CBMMAP\n");
> +	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
> +		pr_debug("cbm: 0x%x,cgrp_count: %u\n",
> +		 (unsigned int)ccmap[i].cbm, ccmap[i].cgrp_count);
> +}
> +
> +static void cpu_cbm_update(void *info)
> +{
> +	unsigned int closid = *((unsigned int *)info);
> +
> +	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cbm);
> +}
> +
> +static inline void cbm_update(unsigned int closid)
> +{
> +	int pkg_id = -1;
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		if (pkg_id == topology_physical_package_id(cpu))
> +			continue;
> +		smp_call_function_single(cpu, cpu_cbm_update, &closid, 1);
> +		pkg_id = topology_physical_package_id(cpu);
> +
> +

Can use smp_call_function_many, once, more efficient.

Can this race with CPU hotplug? BTW, on CPU hotplug, where are
the IA32_L3_MASK_n initialized for the new CPU ? 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-03-12 23:16 [PATCH V5 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
@ 2015-03-12 23:16 ` Vikas Shivappa
  2015-04-09 20:56   ` Marcelo Tosatti
  0 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-03-12 23:16 UTC (permalink / raw)
  To: vikas.shivappa
  Cc: x86, linux-kernel, hpa, tglx, mingo, tj, peterz, matt.fleming,
	will.auld, glenn.p.williamson, kanaka.d.juvva, vikas.shivappa

Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c  | 205 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 208 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..0ed28d9 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include <linux/cgroup.h>
+#define MAX_CBM_LENGTH			32
+#define IA32_L3_CBM_BASE		0xc90
+#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
 	/* Clos Bitmap to keep track of available CLOSids.*/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 3726f41..495497a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+#define rdt_for_each_child(pos_css, parent_ir)		\
+	css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
 late_initcall(rdt_late_init);
 
 /*
+ * Allocates a new closid from unused closids.
+ * Called with the rdt_group_mutex held.
+ */
+
+static int rdt_alloc_closid(struct intel_rdt *ir)
+{
+	unsigned int id;
+	unsigned int maxid;
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	maxid = boot_cpu_data.x86_cat_closs;
+	id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+	if (id == maxid)
+		return -ENOSPC;
+
+	set_bit(id, rdtss_info.closmap);
+	WARN_ON(ccmap[id].cgrp_count);
+	ccmap[id].cgrp_count++;
+	ir->clos = id;
+
+	return 0;
+}
+
+/*
 * Called with the rdt_group_mutex held.
 */
 static int rdt_free_closid(struct intel_rdt *ir)
@@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
 	mutex_unlock(&rdt_group_mutex);
 }
 
+/*
+ * Tests if atleast two contiguous bits are set.
+ */
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+	unsigned long first_bit, zero_bit;
+	unsigned long maxcbm = MAX_CBM_LENGTH;
+
+	if (bitmap_weight(&var, maxcbm) < 2)
+		return false;
+
+	first_bit = find_next_bit(&var, maxcbm, 0);
+	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+		return false;
+
+	return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+	struct intel_rdt *ir = css_rdt(seq_css(m));
+
+	seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
+	return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+	struct intel_rdt *par, *c;
+	struct cgroup_subsys_state *css;
+	unsigned long *cbm_tmp;
+
+	if (!cbm_is_contiguous(cbmvalue)) {
+		pr_info("cbm should have >= 2 bits and be contiguous\n");
+		return -EINVAL;
+	}
+
+	par = parent_rdt(ir);
+	cbm_tmp = &ccmap[par->clos].cbm;
+	if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
+		return -EINVAL;
+
+	rcu_read_lock();
+	rdt_for_each_child(css, ir) {
+		c = css_rdt(css);
+		cbm_tmp = &ccmap[c->clos].cbm;
+		if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
+			pr_info("Children's mask not a subset\n");
+			rcu_read_unlock();
+			return -EINVAL;
+		}
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+static bool cbm_search(unsigned long cbm, int *closid)
+{
+	int maxid = boot_cpu_data.x86_cat_closs;
+	unsigned int i;
+
+	for (i = 0; i < maxid; i++)
+		if (bitmap_equal(&cbm, &ccmap[i].cbm, MAX_CBM_LENGTH)) {
+			*closid = i;
+			return true;
+		}
+
+	return false;
+}
+
+static void cbmmap_dump(void)
+{
+	int i;
+
+	pr_debug("CBMMAP\n");
+	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
+		pr_debug("cbm: 0x%x,cgrp_count: %u\n",
+		 (unsigned int)ccmap[i].cbm, ccmap[i].cgrp_count);
+}
+
+static void cpu_cbm_update(void *info)
+{
+	unsigned int closid = *((unsigned int *)info);
+
+	wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cbm);
+}
+
+static inline void cbm_update(unsigned int closid)
+{
+	int pkg_id = -1;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (pkg_id == topology_physical_package_id(cpu))
+			continue;
+		smp_call_function_single(cpu, cpu_cbm_update, &closid, 1);
+		pkg_id = topology_physical_package_id(cpu);
+
+	}
+}
+
+/*
+ * rdt_cbm_write() - Validates and writes the cache bit mask(cbm)
+ * to the IA32_L3_MASK_n and also store the same in the ccmap.
+ *
+ * CLOSids are reused for cgroups which have same bitmask.
+ * - This helps to use the scant CLOSids optimally.
+ * - This also implies that at context switch write
+ * to PQR-MSR is done only when a task with a
+ * different bitmask is scheduled in.
+ */
+
+static int cat_cbm_write(struct cgroup_subsys_state *css,
+				 struct cftype *cft, u64 cbmvalue)
+{
+	struct intel_rdt *ir = css_rdt(css);
+	ssize_t err = 0;
+	unsigned long cbm;
+	unsigned long *cbm_tmp;
+	unsigned int closid;
+	u32 cbm_mask =
+		(u32)((u64)(1 << boot_cpu_data.x86_cat_cbmlength) - 1);
+
+	if (ir == &rdt_root_group)
+		return -EPERM;
+
+	/*
+	* Need global mutex as cbm write may allocate a closid.
+	*/
+	mutex_lock(&rdt_group_mutex);
+	cbm = cbmvalue & cbm_mask;
+	cbm_tmp = &ccmap[ir->clos].cbm;
+
+	if (bitmap_equal(&cbm, cbm_tmp, MAX_CBM_LENGTH))
+		goto out;
+
+	err = validate_cbm(ir, cbm);
+	if (err)
+		goto out;
+
+	rdt_free_closid(ir);
+
+	if (cbm_search(cbm, &closid)) {
+		ir->clos = closid;
+		ccmap[ir->clos].cgrp_count++;
+	} else {
+		err = rdt_alloc_closid(ir);
+		if (err)
+			goto out;
+
+		ccmap[ir->clos].cbm = cbm;
+		cbm_update(ir->clos);
+	}
+
+	cbmmap_dump();
+
+out:
+
+	mutex_unlock(&rdt_group_mutex);
+	return err;
+}
+
+static struct cftype rdt_files[] = {
+	{
+		.name = "cbm",
+		.seq_show = cat_cbm_read,
+		.write_u64 = cat_cbm_write,
+		.mode = 0666,
+	},
+	{ }	/* terminate */
+};
+
 struct cgroup_subsys rdt_cgrp_subsys = {
 	.css_alloc			= rdt_css_alloc,
 	.css_free			= rdt_css_free,
+	.legacy_cftypes			= rdt_files,
 	.early_init			= 0,
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-27 19:42       ` Tejun Heo
@ 2015-02-27 21:38         ` Vikas Shivappa
  0 siblings, 0 replies; 35+ messages in thread
From: Vikas Shivappa @ 2015-02-27 21:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vikas Shivappa, Vikas Shivappa, linux-kernel, matt.fleming, hpa,
	tglx, mingo, peterz, will.auld, dave.hansen, andi.kleen,
	tony.luck, kanaka.d.juvva



On Fri, 27 Feb 2015, Tejun Heo wrote:

> Hello, Vikas.
>
> On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote:
>> This cgroup subsystem would basically let the user partition one of the
>> Platform shared resource , the LLC cache. This could be extended in future
>
> I suppose LLC means last level cache?  It'd be great if you can spell
> out the full term when the abbreviation is first referenced in the
> comments or documentation.
>

Yes that's last level cache. Will update documentation/comments if any.

>> to partition more shared resources when there is hardware support that way
>> we may eventually have more files in the cgroup. RDT is a generic term for
>> platform resource sharing.
>
>> For more information you can refer to section 17.15 of Intel SDM.
>> We did go through quite a bit of discussion on lkml regarding adding the
>> cgroup interface for CAT and the patches were posted only after that.
>> This cgroup would not interact with other cgroups in the sense would not
>> modify or add any elements to existing cgroups - there was such a proposal
>> but was removed as we did not get agreement on lkml.
>>
>> the original lkml thread is here from 10/2014 for your reference -
>> https://lkml.org/lkml/2014/10/16/568
>
> Yeap, I followed that thread and this being a separate controller
> definitely makes a lot more sense.
>
>>   I
>>> take it that the feature implemented is too coarse to allow for weight
>>> based distribution?
>>>
>> Could you please clarify more on this ? However there is a limitation from
>> hardware that there have to be a minimum of 2 bits in the cbm if thats what
>> you referred to. Otherwise the bits in the cbm directly map to the number of
>> cache ways and hence the cache capacity ..
>
> Right, so the granularity is fairly coarse and specifying things like
> "distribute cache in 4:2:1 (or even in absolute bytes) to these three
> cgroups" wouldn't work at all.

Specifying in any amount of cache bytes would be not possible because the minimum 
granularity has to be atleast one cache way because the entire memory can be 
indexed into one cache way.
Providing the bit mask granularity helps users to not worry about how much bytes 
cache way is and can specify in terms of the bitmask. If we want to 
provide such an interface in the cgroups where users can specify the size in 
bytes then we need to show the user the 
minimum granularity in bytes as well. Also note that this 
bit masks are overlapping and hence the users have a way to specify overlapped 
regions in cache which may be very useful in lot of scenarios where multiple 
cgroups want to share the capacity.

The minimum granularity is 2 bits in the pre-production SKUs  and it does
put limitation to scenarios you say. We will issue a patch update once it 
hopefully gets updated in later SKUs. But note that the SDM also recommends 
using 
2 bits from performance aspect because an application using only cache-way would 
have a lot more conflicts.
Say if max cbm is 20bits then the granularity is 10% of total cache..

>
> Thanks.
>
> -- 
> tejun
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-27 19:34     ` Vikas Shivappa
@ 2015-02-27 19:42       ` Tejun Heo
  2015-02-27 21:38         ` Vikas Shivappa
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2015-02-27 19:42 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: Vikas Shivappa, linux-kernel, matt.fleming, hpa, tglx, mingo,
	peterz, will.auld, dave.hansen, andi.kleen, tony.luck,
	kanaka.d.juvva

Hello, Vikas.

On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote:
> This cgroup subsystem would basically let the user partition one of the
> Platform shared resource , the LLC cache. This could be extended in future

I suppose LLC means last level cache?  It'd be great if you can spell
out the full term when the abbreviation is first referenced in the
comments or documentation.

> to partition more shared resources when there is hardware support that way
> we may eventually have more files in the cgroup. RDT is a generic term for
> platform resource sharing.

> For more information you can refer to section 17.15 of Intel SDM.
> We did go through quite a bit of discussion on lkml regarding adding the
> cgroup interface for CAT and the patches were posted only after that.
> This cgroup would not interact with other cgroups in the sense would not
> modify or add any elements to existing cgroups - there was such a proposal
> but was removed as we did not get agreement on lkml.
>
> the original lkml thread is here from 10/2014 for your reference -
> https://lkml.org/lkml/2014/10/16/568

Yeap, I followed that thread and this being a separate controller
definitely makes a lot more sense.

>   I
> >take it that the feature implemented is too coarse to allow for weight
> >based distribution?
> >
> Could you please clarify more on this ? However there is a limitation from
> hardware that there have to be a minimum of 2 bits in the cbm if thats what
> you referred to. Otherwise the bits in the cbm directly map to the number of
> cache ways and hence the cache capacity ..

Right, so the granularity is fairly coarse and specifying things like
"distribute cache in 4:2:1 (or even in absolute bytes) to these three
cgroups" wouldn't work at all.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-27 12:12   ` Tejun Heo
  2015-02-27 12:18     ` Tejun Heo
@ 2015-02-27 19:34     ` Vikas Shivappa
  2015-02-27 19:42       ` Tejun Heo
  1 sibling, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-02-27 19:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Vikas Shivappa, linux-kernel, vikas.shivappa, matt.fleming, hpa,
	tglx, mingo, peterz, will.auld, dave.hansen, andi.kleen,
	tony.luck, kanaka.d.juvva


Hello Tejun,

On Fri, 27 Feb 2015, Tejun Heo wrote:

> Hello,
>
> On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote:
>> Add support for cache bit mask manipulation. The change adds a file to
>> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
>>
>> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
>> cgroup never fails.  When a child cgroup is created it inherits the
>> CLOSid and the CBM from its parent.  When a user changes the default
>> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
>> used before. If the new CBM is the one that is already used, the
>> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
>> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
>> can support.
>> User can create as many cgroups as he wants but having different CBMs
>> at the same time is restricted by the maximum number of CLOSids
>> (multiple cgroups can have the same CBM).
>> Kernel maintains a CLOSid<->cbm mapping which keeps count
>> of cgroups using a CLOSid.
>>
>> The tasks in the CAT cgroup would get to fill the LLC cache represented
>> by the cgroup's 'cbm' file.
>>
>> Reuse of CLOSids for cgroups with same bitmask also has following
>> advantages:
>> - This helps to use the scant CLOSids optimally.
>> - This also implies that during context switch, write to PQR-MSR is done
>> only when a task with a different bitmask is scheduled in.
>
> I feel a bit underwhelmed about this new controller and its interface.
> It is evidently at a lot lower level and way more niche than what
> other controllers are doing, even cpuset.  At the same time, as long
> as it's well isolated, it piggybacking on cgroup should be okay.

This cgroup subsystem would basically let the user partition one of the Platform 
shared resource , the LLC cache. This could be extended in future to partition 
more shared resources when there is hardware support that way we may eventually have more 
files in the cgroup. RDT is a generic term for platform resource sharing.
For more information you can refer to section 17.15 of Intel SDM.
We did go through quite a bit of discussion on lkml regarding adding 
the cgroup interface for CAT and the patches were posted only after that.
This cgroup would not interact with other cgroups in the sense would not modify 
or add any elements to existing cgroups - there was such a proposal but was 
removed as we did not get agreement on lkml.

the original lkml thread is here from 10/2014 for your reference -
https://lkml.org/lkml/2014/10/16/568

   I
> take it that the feature implemented is too coarse to allow for weight
> based distribution?
>
Could you please clarify more on this ? However there is a limitation from 
hardware that there have to be a minimum of 2 bits in the cbm if thats what you 
referred to. Otherwise the bits in the cbm directly map to the number of cache 
ways and hence the cache capacity ..

Thanks,
Vikas

> Thanks.
>
> -- 
> tejun
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-27 12:12   ` Tejun Heo
@ 2015-02-27 12:18     ` Tejun Heo
  2015-02-27 19:34     ` Vikas Shivappa
  1 sibling, 0 replies; 35+ messages in thread
From: Tejun Heo @ 2015-02-27 12:18 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: linux-kernel, vikas.shivappa, matt.fleming, hpa, tglx, mingo,
	peterz, will.auld, dave.hansen, andi.kleen, tony.luck,
	kanaka.d.juvva

On Fri, Feb 27, 2015 at 07:12:22AM -0500, Tejun Heo wrote:
> I feel a bit underwhelmed about this new controller and its interface.
> It is evidently at a lot lower level and way more niche than what
> other controllers are doing, even cpuset.  At the same time, as long
> as it's well isolated, it piggybacking on cgroup should be okay.  I
> take it that the feature implemented is too coarse to allow for weight
> based distribution?

And, Ingo, Peter, are you guys in general agreeing with this addition?
As Tony said, we don't wanna be left way behind but that doesn't mean
we wanna jump on everything giving off the faintest sign of movement,
which sadly has happened often enough in the storage area at least.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-24 23:16 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
@ 2015-02-27 12:12   ` Tejun Heo
  2015-02-27 12:18     ` Tejun Heo
  2015-02-27 19:34     ` Vikas Shivappa
  0 siblings, 2 replies; 35+ messages in thread
From: Tejun Heo @ 2015-02-27 12:12 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: linux-kernel, vikas.shivappa, matt.fleming, hpa, tglx, mingo,
	peterz, will.auld, dave.hansen, andi.kleen, tony.luck,
	kanaka.d.juvva

Hello,

On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote:
> Add support for cache bit mask manipulation. The change adds a file to
> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
> 
> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
> cgroup never fails.  When a child cgroup is created it inherits the
> CLOSid and the CBM from its parent.  When a user changes the default
> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
> used before. If the new CBM is the one that is already used, the
> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
> can support.
> User can create as many cgroups as he wants but having different CBMs
> at the same time is restricted by the maximum number of CLOSids
> (multiple cgroups can have the same CBM).
> Kernel maintains a CLOSid<->cbm mapping which keeps count
> of cgroups using a CLOSid.
> 
> The tasks in the CAT cgroup would get to fill the LLC cache represented
> by the cgroup's 'cbm' file.
> 
> Reuse of CLOSids for cgroups with same bitmask also has following
> advantages:
> - This helps to use the scant CLOSids optimally.
> - This also implies that during context switch, write to PQR-MSR is done
> only when a task with a different bitmask is scheduled in.

I feel a bit underwhelmed about this new controller and its interface.
It is evidently at a lot lower level and way more niche than what
other controllers are doing, even cpuset.  At the same time, as long
as it's well isolated, it piggybacking on cgroup should be okay.  I
take it that the feature implemented is too coarse to allow for weight
based distribution?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
  2015-02-24 23:16 [PATCH V4 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
@ 2015-02-24 23:16 ` Vikas Shivappa
  2015-02-27 12:12   ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Vikas Shivappa @ 2015-02-24 23:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: vikas.shivappa, vikas.shivappa, matt.fleming, hpa, tglx, mingo,
	tj, peterz, will.auld, dave.hansen, andi.kleen, tony.luck,
	kanaka.d.juvva

Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/include/asm/intel_rdt.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c  | 179 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 182 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index ecd9664..a414771 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include <linux/cgroup.h>
+#define MAX_CBM_LENGTH			32
+#define IA32_L3_CBM_BASE		0xc90
+#define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
 	/* Clos Bitmap to keep track of available CLOSids.*/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6cf1a16..dd090a7 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+#define rdt_for_each_child(pos_css, parent_ir)		\
+	css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -84,6 +87,30 @@ static int __init rdt_late_init(void)
 late_initcall(rdt_late_init);
 
 /*
+ * Allocates a new closid from unused closids.
+ * Called with the rdt_group_mutex held.
+ */
+
+static int rdt_alloc_closid(struct intel_rdt *ir)
+{
+	unsigned int id;
+	unsigned int maxid;
+
+	lockdep_assert_held(&rdt_group_mutex);
+
+	maxid = boot_cpu_data.x86_cat_closs;
+	id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+	if (id == maxid)
+		return -ENOSPC;
+
+	set_bit(id, rdtss_info.closmap);
+	ccmap[id].cgrp_count++;
+	ir->clos = id;
+
+	return 0;
+}
+
+/*
 * Called with the rdt_group_mutex held.
 */
 static int rdt_free_closid(struct intel_rdt *ir)
@@ -135,8 +162,160 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
 	mutex_unlock(&rdt_group_mutex);
 }
 
+/*
+ * Tests if atleast two contiguous bits are set.
+ */
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+	unsigned long first_bit, zero_bit;
+	unsigned long maxcbm = MAX_CBM_LENGTH;
+
+	if (bitmap_weight(&var, maxcbm) < 2)
+		return false;
+
+	first_bit = find_next_bit(&var, maxcbm, 0);
+	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+		return false;
+
+	return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+	struct intel_rdt *ir = css_rdt(seq_css(m));
+
+	seq_bitmap(m, ir->cbm, MAX_CBM_LENGTH);
+	seq_putc(m, '\n');
+	return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+	struct intel_rdt *par, *c;
+	struct cgroup_subsys_state *css;
+
+	if (!cbm_is_contiguous(cbmvalue)) {
+		pr_info("cbm should have >= 2 bits and be contiguous\n");
+		return -EINVAL;
+	}
+
+	par = parent_rdt(ir);
+	if (!bitmap_subset(&cbmvalue, par->cbm, MAX_CBM_LENGTH))
+		return -EINVAL;
+
+	rcu_read_lock();
+	rdt_for_each_child(css, ir) {
+		c = css_rdt(css);
+		if (!bitmap_subset(c->cbm, &cbmvalue, MAX_CBM_LENGTH)) {
+			pr_info("Children's mask not a subset\n");
+			rcu_read_unlock();
+			return -EINVAL;
+		}
+	}
+
+	rcu_read_unlock();
+	return 0;
+}
+
+static bool cbm_search(unsigned long cbm, int *closid)
+{
+	int maxid = boot_cpu_data.x86_cat_closs;
+	unsigned int i;
+
+	for (i = 0; i < maxid; i++)
+		if (bitmap_equal(&cbm, &ccmap[i].cbm, MAX_CBM_LENGTH)) {
+			*closid = i;
+			return true;
+		}
+
+	return false;
+}
+
+static void cbmmap_dump(void)
+{
+	int i;
+
+	pr_debug("CBMMAP\n");
+	for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
+		pr_debug("cbm: 0x%x,cgrp_count: %u\n",
+		 (unsigned int)ccmap[i].cbm, ccmap[i].cgrp_count);
+}
+
+/*
+ * rdt_cbm_write() - Validates and writes the cache bit mask(cbm)
+ * to the IA32_L3_MASK_n and also store the same in the ccmap.
+ *
+ * CLOSids are reused for cgroups which have same bitmask.
+ * - This helps to use the scant CLOSids optimally.
+ * - This also implies that at context switch write
+ * to PQR-MSR is done only when a task with a
+ * different bitmask is scheduled in.
+ */
+
+static int cat_cbm_write(struct cgroup_subsys_state *css,
+				 struct cftype *cft, u64 cbmvalue)
+{
+	struct intel_rdt *ir = css_rdt(css);
+	ssize_t err = 0;
+	unsigned long cbm;
+	unsigned int closid;
+	u32 cbm_mask =
+		(u32)((u64)(1 << boot_cpu_data.x86_cat_cbmlength) - 1);
+
+	if (ir == &rdt_root_group)
+		return -EPERM;
+
+	/*
+	* Need global mutex as cbm write may allocate a closid.
+	*/
+	mutex_lock(&rdt_group_mutex);
+	cbm = cbmvalue & cbm_mask;
+
+	if (bitmap_equal(&cbm, ir->cbm, MAX_CBM_LENGTH))
+		goto out;
+
+	err = validate_cbm(ir, cbm);
+	if (err)
+		goto out;
+
+	rdt_free_closid(ir);
+	if (cbm_search(cbm, &closid)) {
+		ir->clos = closid;
+		ccmap[ir->clos].cgrp_count++;
+	} else {
+		err = rdt_alloc_closid(ir);
+		if (err)
+			goto out;
+
+		wrmsrl(CBM_FROM_INDEX(ir->clos), cbm);
+	}
+
+	ccmap[ir->clos].cbm = cbm;
+	ir->cbm = &ccmap[ir->clos].cbm;
+	cbmmap_dump();
+
+out:
+
+	mutex_unlock(&rdt_group_mutex);
+	return err;
+}
+
+static struct cftype rdt_files[] = {
+	{
+		.name = "cbm",
+		.seq_show = cat_cbm_read,
+		.write_u64 = cat_cbm_write,
+		.mode = 0666,
+	},
+	{ }	/* terminate */
+};
+
 struct cgroup_subsys rdt_cgrp_subsys = {
 	.css_alloc			= rdt_css_alloc,
 	.css_free			= rdt_css_free,
+	.legacy_cftypes			= rdt_files,
 	.early_init			= 0,
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-05-08 20:57 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-02  1:36 [PATCH V6 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
2015-05-02  1:36 ` [PATCH 1/7] x86/intel_rdt: Intel Cache Allocation Technology detection Vikas Shivappa
2015-05-02 18:35   ` Peter Zijlstra
2015-05-02  1:36 ` [PATCH 2/7] x86/intel_rdt: Adds support for Class of service management Vikas Shivappa
2015-05-02 18:38   ` Peter Zijlstra
2015-05-04 17:31     ` Vikas Shivappa
2015-05-02  1:36 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
2015-05-02 18:46   ` Peter Zijlstra
2015-05-04 17:30     ` Vikas Shivappa
2015-05-06  8:09       ` Peter Zijlstra
2015-05-06  8:30         ` Matt Fleming
2015-05-06 16:48         ` Vikas Shivappa
2015-05-06  8:11       ` Peter Zijlstra
2015-05-06 18:09         ` Vikas Shivappa
2015-05-02  1:36 ` [PATCH 4/7] x86/intel_rdt: Implement scheduling support for Intel RDT Vikas Shivappa
2015-05-02 18:51   ` Peter Zijlstra
2015-05-04 18:39     ` Vikas Shivappa
2015-05-06  7:48       ` Peter Zijlstra
2015-05-07 23:15         ` Vikas Shivappa
2015-05-08  8:59           ` Peter Zijlstra
2015-05-08 20:55             ` Vikas Shivappa
2015-05-06  0:19     ` Vikas Shivappa
2015-05-06  7:50       ` Peter Zijlstra
2015-05-02  1:36 ` [PATCH 5/7] x86/intel_rdt: Software Cache for IA32_PQR_MSR Vikas Shivappa
2015-05-02  1:36 ` [PATCH 6/7] x86/intel_rdt: Intel haswell CAT enumeration Vikas Shivappa
2015-05-02  1:36 ` [PATCH 7/7] x86/intel_rdt: Add CAT documentation and usage guide Vikas Shivappa
  -- strict thread matches above, loose matches on Subject: below --
2015-03-12 23:16 [PATCH V5 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
2015-03-12 23:16 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
2015-04-09 20:56   ` Marcelo Tosatti
2015-04-13  2:36     ` Vikas Shivappa
2015-02-24 23:16 [PATCH V4 0/7] x86/intel_rdt: Intel Cache Allocation Technology Vikas Shivappa
2015-02-24 23:16 ` [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT Vikas Shivappa
2015-02-27 12:12   ` Tejun Heo
2015-02-27 12:18     ` Tejun Heo
2015-02-27 19:34     ` Vikas Shivappa
2015-02-27 19:42       ` Tejun Heo
2015-02-27 21:38         ` Vikas Shivappa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.