All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
@ 2017-11-13 16:39 Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 01/20] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
                   ` (20 more replies)
  0 siblings, 21 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Dear Maintainers,

Cache Allocation Technology (CAT), part of Intel(R) Resource Director
Technology (Intel(R) RDT), enables a user to specify the amount of cache
space into which an application can fill. Cache pseudo-locking builds on
the fact that a CPU can still read and write data pre-allocated outside
its current allocated area on cache hit. With cache pseudo-locking data
can be preloaded into a reserved portion of cache that no application can
fill, and from that point on will only serve cache hits. The cache
pseudo-locked memory is made accessible to user space where an application
can map it into its virtual address space and thus have a region of
memory with reduced average read latency.

The cache pseudo-locking approach relies on generation-specific behavior
of processors. It may provide benefits on certain processor generations,
but is not guaranteed to be supported in the future. It is not a guarantee
that data will remain in the cache. It is not a guarantee that data will
remain in certain levels or certain regions of the cache. Rather, cache
pseudo-locking increases the probability that data will remain in a certain
level of the cache via carefully configuring the CAT feature and carefully
controlling application behavior.

Known limitations:
Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict pseudo-locked
memory from the cache. Power management C-states may still shrink or power
off cache causing eviction of cache pseudo-locked memory. We utilize
PM QoS to prevent entering deeper C-states on cores associated with cache
pseudo-locked regions at the time they (the pseudo-locked regions) are
created.

Known software limitation:
Cache pseudo-locked regions are currently limited to 4MB, even on
platforms that support larger cache sizes. Work is in progress to
support larger regions.

Graphs visualizing the benefits of cache pseudo-locking on an Intel(R)
NUC NUC6CAYS (it has an Intel(R) Celeron(R) Processor J3455) with the
default 2GB DDR3L-1600 memory are available. In these tests the patches
from this series were applied on the x86/cache branch of tip.git at the
time the HEAD was:

commit 87943db7dfb0c5ee5aa74a9ac06346fadd9695c8 (tip/x86/cache)
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Fri Oct 20 02:16:59 2017 -0700
    x86/intel_rdt: Fix potential deadlock during resctrl mount

DISCLAIMER: Tests document performance of components on a particular test,
in specific systems. Differences in hardware, software, or configuration
will affect actual performance. Performance varies depending on system
configuration.

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/perfcount.png
Above shows the few L2 cache misses possible with cache pseudo-locking
on the Intel(R) NUC with default configuration. Each test, which is
repeated 100 times, pseudo-locks schemata shown and then measure from
the kernel via precision counters the number of cache misses when
accessing the memory afterwards. This test is run on an idle system as
well as a system with significant noise (using stress-ng) from a
neighboring core associated with the same cache. This plot shows us that:
(1) the number of cache misses remain consistent irrespective of the size
of region being pseudo-locked, and (2) the number of cache misses for a
pseudo-locked region remains low when traversing memory regions ranging
in size from 256KB (4096 cache lines) to 896KB (14336 cache lines).

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_malloc_with_load.png
Above shows the read latency experienced by an application running with
default CAT CLOS after it allocated 256KB memory with malloc() (and using
mlockall()). In this example the application reads randomly (to not trigger
hardware prefetcher) from its entire allocated region at 2 second intervals
while there is a noisy neighbor present. Each individual access is 32 bytes
in size and the latency of each access is measured using the rdtsc
instruction. In this visualization we can observe two groupings of data,
the group with lower latency indicating cache hits, and the group with
higher latency indicating cache misses. We can see a significant portion
of memory reads experience larger latencies.

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_psl_with_load.png
Above plots a similar test as the previous, but instead of the application
reading from a 256KB malloc() region it reads from a 256KB pseudo-locked
region that was mmap()'ed into its address space. When comparing these
latencies to that of regular malloc() latencies we do see a significant
improvement in latencies experienced.

https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_malloc_and_cat_with_load_clos0_fixed.png
Applications that are sensitive to latencies may use existing CAT
technology to isolate the sensitive application. In this plot we show an
application running with a dedicated CAT CLOS double the size (512KB) of
the memory being tested (256KB). A dedicated CLOS with CBM 0x0f is created and
the default CLOS changed to CBM 0xf0. We see in this plot that even though
the application runs within a dedicated portion of cache it still
experiences significant latency accessing its memory (when compared to
pseudo-locking).

Your feedback about this proposal for enabling of Cache Pseudo-Locking
will be greatly appreciated.

Regards,

Reinette

Reinette Chatre (20):
  x86/intel_rdt: Documentation for Cache Pseudo-Locking
  x86/intel_rdt: Make useful functions available internally
  x86/intel_rdt: Introduce hooks to create pseudo-locking files
  x86/intel_rdt: Introduce test to determine if closid is in use
  x86/intel_rdt: Print more accurate pseudo-locking availability
  x86/intel_rdt: Create pseudo-locked regions
  x86/intel_rdt: Connect pseudo-locking directory to operations
  x86/intel_rdt: Introduce pseudo-locking resctrl files
  x86/intel_rdt: Discover supported platforms via prefetch disable bits
  x86/intel_rdt: Disable pseudo-locking if CDP enabled
  x86/intel_rdt: Associate pseudo-locked regions with its domain
  x86/intel_rdt: Support CBM checking from value and character buffer
  x86/intel_rdt: Support schemata write - pseudo-locking core
  x86/intel_rdt: Enable testing for pseudo-locked region
  x86/intel_rdt: Prevent new allocations from pseudo-locked regions
  x86/intel_rdt: Create debugfs files for pseudo-locking testing
  x86/intel_rdt: Create character device exposing pseudo-locked region
  x86/intel_rdt: More precise L2 hit/miss measurements
  x86/intel_rdt: Support L3 cache performance event of Broadwell
  x86/intel_rdt: Limit C-states dynamically when pseudo-locking active

 Documentation/x86/intel_rdt_ui.txt                |  229 ++-
 arch/x86/Kconfig                                  |   11 +
 arch/x86/kernel/cpu/Makefile                      |    4 +-
 arch/x86/kernel/cpu/intel_rdt.h                   |   24 +
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c       |   44 +-
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 1846 +++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |   54 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c          |   46 +-
 8 files changed, 2238 insertions(+), 20 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h

-- 
2.13.5

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 01/20] x86/intel_rdt: Documentation for Cache Pseudo-Locking
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 02/20] x86/intel_rdt: Make useful functions available internally Reinette Chatre
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Add description of Cache Pseudo-Locking feature, its interface,
as well as an example of its usage.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 229 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 228 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index 6851854cf69d..9924f7146c63 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -18,7 +18,10 @@ mount options are:
 "cdp": Enable code/data prioritization in L3 cache allocations.
 
 RDT features are orthogonal. A particular system may support only
-monitoring, only control, or both monitoring and control.
+monitoring, only control, or both monitoring and control. Cache
+pseudo-locking is a unique way of using cache control to "pin" or
+"lock" data in the cache. Details can be found in
+"Cache Pseudo-Locking".
 
 The mount succeeds if either of allocation or monitoring is present, but
 only those files and directories supported by the system will be created.
@@ -320,6 +323,149 @@ L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Cache Pseudo-Locking
+--------------------
+CAT enables a user to specify the amount of cache space into which an
+application can fill. Cache pseudo-locking builds on the fact that a
+CPU can still read and write data pre-allocated outside its current
+allocated area on a cache hit. With cache pseudo-locking, data can be
+preloaded into a reserved portion of cache that no application can
+fill, and from that point on will only serve cache hits. The cache
+pseudo-locked memory is made accessible to user space where an
+application can map it into its virtual address space and thus have
+a region of memory with reduced average read latency.
+
+Cache pseudo-locking increases the probability that data will remain
+in the cache via carefully configuring the CAT feature and controlling
+application behavior. There is no guarantee that data is placed in
+cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
+“locked” data from cache. Power management C-states may shrink or
+power off cache. It is thus recommended to limit the processor maximum
+C-state, for example, by setting the processor.max_cstate kernel parameter.
+
+It is required that an application using a pseudo-locked region runs
+with affinity to the cores (or a subset of the cores) associated
+with the cache on which the pseudo-locked region resides. This is
+enforced by the implementation.
+
+Pseudo-locking is accomplished in two stages:
+1) During the first stage the system administrator allocates a portion
+   of cache that should be dedicated to pseudo-locking. At this time an
+   equivalent portion of memory is allocated, loaded into allocated
+   cache portion, and exposed as a character device.
+2) During the second stage a user-space application maps (mmap()) the
+   pseudo-locked memory into its address space.
+
+Cache Pseudo-Locking Interface
+------------------------------
+Platforms supporting cache pseudo-locking will expose a new
+"/sys/fs/restrl/pseudo_lock" directory after successful mount of the
+resctrl filesystem. Initially this directory will contain a single file,
+"avail" that contains the schemata, one line per resource, of cache region
+available for pseudo-locking.
+
+A pseudo-locked region is created by creating a new directory within
+/sys/fs/resctrl/pseudo_lock. On success two new files will appear in
+the directory:
+
+"schemata":
+	Shows the schemata representing the pseudo-locked cache region.
+	User writes schemata of requested locked area to file.
+	Only one id of single resource accepted - can only lock from
+	single cache instance. Writing of schemata to this file will
+	return success on successful pseudo-locked region setup.
+"size":
+	After successful pseudo-locked region setup this read-only file
+	will contain the size in bytes of pseudo-locked region.
+
+Cache Pseudo-Locking Debugging Interface
+---------------------------------------
+The pseudo-locking debugging interface is enabled with
+CONFIG_INTEL_RDT_DEBUGFS and can be found in
+/sys/kernel/debug/resctrl/pseudo_lock.
+
+There is no explicit way for the kernel to test if a provided memory
+location is present in the cache. The pseudo-locking debugging interface uses
+the tracing infrastructure to provide two ways to measure cache residency of
+the pseudo-locked region:
+1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
+   from these measurements are best visualized using a hist trigger (see
+   example below). In this test the pseudo-locked region is traversed at
+   a stride of 32 bytes while hardware prefetchers, preemption, and interrupts
+   are disabled. This also provides a substitute visualization of cache
+   hits and misses.
+2) Cache hit and miss measurements using model specific precision counters if
+   available. Depending on the levels of cache on the system the following
+   tracepoints are available: pseudo_lock_l2_hits, pseudo_lock_l2_miss,
+   pseudo_lock_l3_miss, and pseudo_lock_l3_hits. WARNING: triggering this
+   measurement uses from two (for just L2 measurements) to four (for L2 and L3
+   measurements) precision counters on the system, if any other
+   measurements are in progress the counters and their corresponding event
+   registers will be clobbered.
+
+When a pseudo-locked region is created a new debugfs directory is created for
+it in debugfs as /sys/kernel/debug/resctrl/pseudo_lock/<newdir>. A single
+write-only file, measure_trigger, is present in this directory. The
+measurement on the pseudo-locked region depends on the number, 1 or 2,
+written to this debugfs file. Since the measurements are recorded with the
+tracing infrastructure the relevant tracepoints need to be enabled before the
+measurement is triggered.
+
+Example of latency debugging interface:
+In this example a pseudo-locked region named "newlock" was created. Here is
+how we can measure the latency in cycles of reading from this region:
+# :> /sys/kernel/debug/tracing/trace
+# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_mem_latency/trigger
+# echo 1 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_mem_latency/enable
+# echo 1 > /sys/kernel/debug/resctrl/pseudo_lock/newlock/measure_trigger
+# echo 0 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_mem_latency/enable
+# cat /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_mem_latency/hist
+
+# event histogram
+#
+# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
+#
+
+{ latency:        456 } hitcount:          1
+{ latency:         50 } hitcount:         83
+{ latency:         36 } hitcount:         96
+{ latency:         44 } hitcount:        174
+{ latency:         48 } hitcount:        195
+{ latency:         46 } hitcount:        262
+{ latency:         42 } hitcount:        693
+{ latency:         40 } hitcount:       3204
+{ latency:         38 } hitcount:       3484
+
+Totals:
+    Hits: 8192
+    Entries: 9
+    Dropped: 0
+
+Example of cache hits/misses debugging:
+In this example a pseudo-locked region named "newlock" was created on the L2
+cache of a platform. Here is how we can obtain details of the cache hits
+and misses using the platform's precision counters.
+
+# :> /sys/kernel/debug/tracing/trace
+# echo 1 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_l2_hits/enable
+# echo 1 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_l2_miss/enable
+# echo 2 > /sys/kernel/debug/resctrl/pseudo_lock/newlock/measure_trigger
+# echo 0 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_l2_hits/enable
+# echo 0 > /sys/kernel/debug/tracing/events/pseudo_lock/pseudo_lock_l2_miss/enable
+# cat /sys/kernel/debug/tracing/trace
+
+# tracer: nop
+#
+#                              _-----=> irqs-off
+#                             / _----=> need-resched
+#                            | / _---=> hardirq/softirq
+#                            || / _--=> preempt-depth
+#                            ||| /     delay
+#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
+#              | |       |   ||||       |         |
+ pseudo_lock_mea-1039  [002] ....  1598.825180: pseudo_lock_l2_hits: L2 hits=4097
+ pseudo_lock_mea-1039  [002] ....  1598.825184: pseudo_lock_l2_miss: L2 miss=2
+
 Examples for RDT allocation usage:
 
 Example 1
@@ -434,6 +580,87 @@ siblings and only the real time threads are scheduled on the cores 4-7.
 
 # echo F0 > p0/cpus
 
+Example of Cache Pseudo-Locking
+-------------------------------
+Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
+region is exposed at /dev/pseudo_lock/newlock that can be provided to
+application for argument to mmap().
+
+# cd /sys/fs/resctrl/pseudo_lock
+# cat avail
+L2:0=ff;1=ff
+# mkdir newlock
+# cd newlock
+# cat schemata
+# L2:uninitialized
+# echo ‘L2:1=3’ > schemata
+# ls -l /dev/pseudo_lock/newlock
+crw------- 1 root root 244, 0 Mar 30 03:00 /dev/pseudo_lock/newlock
+
+/*
+ * Example code to access one page of pseudo-locked cache region
+ * from user space.
+ */
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+/*
+ * It is required that the application runs with affinity to only
+ * cores associated with the pseudo-locked region. Here the cpu
+ * is hardcoded for convenience of example.
+ */
+static int cpuid = 2;
+
+int main(int argc, char *argv[])
+{
+	cpu_set_t cpuset;
+	long page_size;
+	void *mapping;
+	int dev_fd;
+	int ret;
+
+	page_size = sysconf(_SC_PAGESIZE);
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpuid, &cpuset);
+	ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
+	if (ret < 0) {
+		perror("sched_setaffinity");
+		exit(EXIT_FAILURE);
+	}
+
+	dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
+	if (dev_fd < 0) {
+		perror("open");
+		exit(EXIT_FAILURE);
+	}
+
+	mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+		       dev_fd, 0);
+	if (mapping == MAP_FAILED) {
+		perror("mmap");
+		close(dev_fd);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Application interacts with pseudo-locked memory @mapping */
+
+	ret = munmap(mapping, page_size);
+	if (ret < 0) {
+		perror("munmap");
+		close(dev_fd);
+		exit(EXIT_FAILURE);
+	}
+
+	close(dev_fd);
+	exit(EXIT_SUCCESS);
+}
+
 4) Locking between applications
 
 Certain operations on the resctrl filesystem, composed of read/writes
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 02/20] x86/intel_rdt: Make useful functions available internally
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 01/20] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 03/20] x86/intel_rdt: Introduce hooks to create pseudo-locking files Reinette Chatre
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

In preparation for support of pseudo-locking we move some static
functions to be available for sharing amongst all RDT components.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             | 5 +++++
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 2 +-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 8 ++++----
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 3397244984f5..8cb42523894e 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -425,7 +425,12 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
+int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags);
+int rdtgroup_kn_set_ugid(struct kernfs_node *kn);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+int closid_alloc(void);
+void closid_free(int closid);
+int update_domains(struct rdt_resource *r, int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 30aeb267cbd2..d2bb1a30c6a1 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -174,7 +174,7 @@ static int parse_line(char *line, struct rdt_resource *r)
 	return -EINVAL;
 }
 
-static int update_domains(struct rdt_resource *r, int closid)
+int update_domains(struct rdt_resource *r, int closid)
 {
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 64c5ff97ee0d..f8dc41e15d13 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -109,7 +109,7 @@ static void closid_init(void)
 	closid_free_map &= ~1;
 }
 
-static int closid_alloc(void)
+int closid_alloc(void)
 {
 	u32 closid = ffs(closid_free_map);
 
@@ -121,13 +121,13 @@ static int closid_alloc(void)
 	return closid;
 }
 
-static void closid_free(int closid)
+void closid_free(int closid)
 {
 	closid_free_map |= 1 << closid;
 }
 
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
-static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
+int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
 	struct iattr iattr = { .ia_valid = ATTR_UID | ATTR_GID,
 				.ia_uid = current_fsuid(),
@@ -855,7 +855,7 @@ static struct rftype res_common_files[] = {
 	},
 };
 
-static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
+int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
 {
 	struct rftype *rfts, *rft;
 	int ret, len;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 03/20] x86/intel_rdt: Introduce hooks to create pseudo-locking files
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 01/20] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 02/20] x86/intel_rdt: Make useful functions available internally Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 04/20] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

We create a new file to host pseudo-locking specific code. The first of
this code are the functions that create the initial pseudo_lock
directory with its first file, "avail", starting by reporting zero. This
will be expanded in future commits.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/Makefile                |   3 +-
 arch/x86/kernel/cpu/intel_rdt.h             |   2 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 105 ++++++++++++++++++++++++++++
 3 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 236999c54edc..ce704c5fe1da 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -34,7 +34,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
-obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o intel_rdt_ctrlmondata.o
+obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
+obj-$(CONFIG_INTEL_RDT)	+= intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 8cb42523894e..4d3b03685fda 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -449,5 +449,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
+int rdt_pseudo_lock_fs_init(struct kernfs_node *root);
+void rdt_pseudo_lock_fs_remove(void);
 
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
new file mode 100644
index 000000000000..ad8b97747024
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -0,0 +1,105 @@
+/*
+ * Resource Director Technology(RDT)
+ *
+ * Pseudo-locking support built on top of Cache Allocation Technology (CAT)
+ *
+ * Copyright (C) 2017 Intel Corporation
+ *
+ * Author: Reinette Chatre <reinette.chatre@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/kernfs.h>
+#include <linux/seq_file.h>
+#include <linux/stat.h>
+#include "intel_rdt.h"
+
+static struct kernfs_node *pseudo_lock_kn;
+
+static int pseudo_lock_avail_show(struct seq_file *sf, void *v)
+{
+	seq_puts(sf, "0\n");
+	return 0;
+}
+
+static struct kernfs_ops pseudo_lock_avail_ops = {
+	.seq_show		= pseudo_lock_avail_show,
+};
+
+/**
+ * rdt_pseudo_lock_fs_init - Create and initialize pseudo-locking files
+ * @root: location in kernfs where directory and files should be created
+ *
+ * The pseudo_lock directory and the pseudo-locking related files and
+ * directories will live within the structure created here.
+ *
+ * LOCKING:
+ * rdtgroup_mutex is expected to be held when called
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure
+ */
+int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
+{
+	struct kernfs_node *kn;
+	int ret;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	pseudo_lock_kn = kernfs_create_dir(root, "pseudo_lock",
+					   root->mode, NULL);
+	if (IS_ERR(pseudo_lock_kn))
+		return PTR_ERR(pseudo_lock_kn);
+
+	kn = __kernfs_create_file(pseudo_lock_kn, "avail", 0444,
+				  0, &pseudo_lock_avail_ops,
+				  NULL, NULL, NULL);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto error;
+	}
+
+	ret = rdtgroup_kn_set_ugid(pseudo_lock_kn);
+	if (ret)
+		goto error;
+
+	kernfs_activate(pseudo_lock_kn);
+
+	ret = 0;
+	goto out;
+
+error:
+	kernfs_remove(pseudo_lock_kn);
+	pseudo_lock_kn = NULL;
+out:
+	return ret;
+}
+
+/**
+ * rdt_pseudo_lock_fs_remove - Remove all pseudo-locking files
+ *
+ * All pseudo-locking related files and directories are removed.
+ *
+ * LOCKING:
+ * rdtgroup_mutex is expected to be held when called
+ *
+ * RETURNS:
+ * none
+ */
+void rdt_pseudo_lock_fs_remove(void)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	kernfs_remove(pseudo_lock_kn);
+	pseudo_lock_kn = NULL;
+}
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 04/20] x86/intel_rdt: Introduce test to determine if closid is in use
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (2 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 03/20] x86/intel_rdt: Introduce hooks to create pseudo-locking files Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 05/20] x86/intel_rdt: Print more accurate pseudo-locking availability Reinette Chatre
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

During CAT feature discovery the capacity bitmasks (CBMs) associated
with all the classes of service are initialized to all ones, even if the
class of service is not in use. Introduce a test that can be used to
determine if a class of service is in use. This test enables code
interested in parsing the CBMs to know if its values are meaningful or
can be ignored.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          | 1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 4d3b03685fda..48efcc2e27dd 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -430,6 +430,7 @@ int rdtgroup_kn_set_ugid(struct kernfs_node *kn);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int closid_alloc(void);
 void closid_free(int closid);
+bool closid_allocated(unsigned int closid);
 int update_domains(struct rdt_resource *r, int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index f8dc41e15d13..c711df201e57 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -126,6 +126,12 @@ void closid_free(int closid)
 	closid_free_map |= 1 << closid;
 }
 
+/* closid_allocated - test if provided closid is in use */
+bool closid_allocated(unsigned int closid)
+{
+	return (closid_free_map & (1 << closid)) == 0;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 05/20] x86/intel_rdt: Print more accurate pseudo-locking availability
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (3 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 04/20] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 06/20] x86/intel_rdt: Create pseudo-locked regions Reinette Chatre
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

A region of cache is considered available for pseudo-locking when:
 * Cache area is in use by default COS.
 * Cache area is NOT in use by any other (other than default) COS.
 * Cache area is not shared with any other entity. Specifically, the
   cache area does not appear in "Bitmask of Shareable Resource with Other
   executing entities" found in EBX during CAT enumeration.
 * Cache area is not currently pseudo-locked.

At this time the first three tests are possible and we update the "avail"
file associated with pseudo-locking to print a more accurate reflection
of pseudo-locking availability.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 62 ++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index ad8b97747024..a787a103c432 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -26,9 +26,69 @@
 
 static struct kernfs_node *pseudo_lock_kn;
 
+/**
+ * pseudo_lock_avail_get - return bitmask of cache available for locking
+ * @r: resource to which this cache instance belongs
+ * @d: domain representing the cache instance
+ *
+ * Availability for pseudo-locking is determined as follows:
+ * * Cache area is in use by default COS.
+ * * Cache area is NOT in use by any other (other than default) COS.
+ * * Cache area is not shared with any other entity. Specifically, the
+ *   cache area does not appear in "Bitmask of Shareable Resource with Other
+ *   executing entities" found in EBX during CAT enumeration.
+ *
+ * Below is also required to determine availability and will be
+ * added in later:
+ * * Cache area is not currently pseudo-locked.
+ *
+ * LOCKING:
+ * rdtgroup_mutex is expected to be held when called
+ *
+ * RETURNS:
+ * Bitmask representing region of cache that can be locked, zero if nothing
+ * available.
+ */
+static u32 pseudo_lock_avail_get(struct rdt_resource *r, struct rdt_domain *d)
+{
+	u32 avail;
+	int i;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	avail = d->ctrl_val[0];
+	for (i = 1; i < r->num_closid; i++) {
+		if (closid_allocated(i))
+			avail &= ~d->ctrl_val[i];
+	}
+	avail &= ~r->cache.shareable_bits;
+
+	return avail;
+}
+
 static int pseudo_lock_avail_show(struct seq_file *sf, void *v)
 {
-	seq_puts(sf, "0\n");
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+	bool sep;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		sep = false;
+		seq_printf(sf, "%s:", r->name);
+		list_for_each_entry(d, &r->domains, list) {
+			if (sep)
+				seq_puts(sf, ";");
+			seq_printf(sf, "%d=%x", d->id,
+				   pseudo_lock_avail_get(r, d));
+			sep = true;
+		}
+		seq_puts(sf, "\n");
+	}
+
+	mutex_unlock(&rdtgroup_mutex);
+
 	return 0;
 }
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 06/20] x86/intel_rdt: Create pseudo-locked regions
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (4 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 05/20] x86/intel_rdt: Print more accurate pseudo-locking availability Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 07/20] x86/intel_rdt: Connect pseudo-locking directory to operations Reinette Chatre
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

System administrator creates/removes pseudo-locked regions by
creating/removing directories in the pseudo-lock subdirectory of the
resctrl filesystem. Here we add directory creation and removal support.

A "pseudo-lock region" is introduced, which represents an
instance of a pseudo-locked cache region. During mkdir a new region is
created but since we do not know which cache it belongs to at that time
we maintain a global pointer to it from where it will be moved to the cache
(rdt_domain) it belongs to after initialization. This implies that
we only support one uninitialized pseudo-locked region at a time.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |   3 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 220 +++++++++++++++++++++++++++-
 2 files changed, 222 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 48efcc2e27dd..260f09f872b0 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -349,6 +349,7 @@ extern struct mutex rdtgroup_mutex;
 extern struct rdt_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
+extern struct kernfs_node *pseudo_lock_kn;
 
 int __init rdtgroup_init(void);
 
@@ -452,5 +453,7 @@ bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
 int rdt_pseudo_lock_fs_init(struct kernfs_node *root);
 void rdt_pseudo_lock_fs_remove(void);
+int rdt_pseudo_lock_mkdir(const char *name, umode_t mode);
+int rdt_pseudo_lock_rmdir(struct kernfs_node *kn);
 
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index a787a103c432..7a22e367b82f 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -20,11 +20,142 @@
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
 #include <linux/kernfs.h>
+#include <linux/kref.h>
 #include <linux/seq_file.h>
 #include <linux/stat.h>
+#include <linux/slab.h>
 #include "intel_rdt.h"
 
-static struct kernfs_node *pseudo_lock_kn;
+struct kernfs_node *pseudo_lock_kn;
+
+/*
+ * Protect the pseudo_lock_region access. Since we will link to
+ * pseudo_lock_region from rdt domains rdtgroup_mutex should be obtained
+ * first if needed.
+ */
+static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
+
+/**
+ * struct pseudo_lock_region - pseudo-lock region information
+ * @kn:			kernfs node representing this region in the resctrl
+ *			filesystem
+ * @cbm:		bitmask of the pseudo-locked region
+ * @cpu:		core associated with the cache on which the setup code
+ *			will be run
+ * @minor:		minor number of character device associated with this
+ *			region
+ * @locked:		state indicating if this region has been locked or not
+ * @refcount:		how many are waiting to access this pseudo-lock
+ *			region via kernfs
+ * @deleted:		user requested removal of region via rmdir on kernfs
+ */
+struct pseudo_lock_region {
+	struct kernfs_node	*kn;
+	u32			cbm;
+	int			cpu;
+	unsigned int		minor;
+	bool			locked;
+	struct kref		refcount;
+	bool			deleted;
+};
+
+/*
+ * Only one uninitialized pseudo-locked region can exist at a time. An
+ * uninitialized pseudo-locked region is created when the user creates a
+ * new directory within the pseudo_lock subdirectory of the resctrl
+ * filsystem. The user will initialize the pseudo-locked region by writing
+ * to its schemata file at which point this structure will be moved to the
+ * cache domain it belongs to.
+ */
+static struct pseudo_lock_region *new_plr;
+
+static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
+{
+	bool is_new_plr = (plr == new_plr);
+
+	WARN_ON(!plr->deleted);
+	if (!plr->deleted)
+		return;
+
+	kfree(plr);
+	if (is_new_plr)
+		new_plr = NULL;
+}
+
+static void pseudo_lock_region_release(struct kref *ref)
+{
+	struct pseudo_lock_region *plr = container_of(ref,
+						      struct pseudo_lock_region,
+						      refcount);
+
+	mutex_lock(&rdt_pseudo_lock_mutex);
+	__pseudo_lock_region_release(plr);
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+}
+
+/**
+ * pseudo_lock_region_kn_lock - Obtain lock to pseudo-lock region kernfs node
+ *
+ * This is called from the kernfs related functions which are called with
+ * an active reference to the kernfs_node that contains a valid pointer to
+ * the pseudo-lock region it represents. We can thus safely take an active
+ * reference to the pseudo-lock region before dropping the reference to the
+ * kernfs_node.
+ *
+ * We need to handle the scenarios where the kernfs directory representing
+ * this pseudo-lock region can be removed while an application still has an
+ * open handle to one of the directory's files and operations on this
+ * handle are attempted.
+ * To support this we allow a file operation to drop its reference to the
+ * kernfs_node so that the removal can proceed, while using the mutex to
+ * ensure these operations on the pseudo-lock region are serialized. At the
+ * time an operation does obtain access to the region it may thus have been
+ * deleted.
+ */
+static struct pseudo_lock_region *pseudo_lock_region_kn_lock(
+						struct kernfs_node *kn)
+{
+	struct pseudo_lock_region *plr = (kernfs_type(kn) == KERNFS_DIR) ?
+						kn->priv : kn->parent->priv;
+
+	WARN_ON(!plr);
+	if (!plr)
+		return NULL;
+
+	kref_get(&plr->refcount);
+	kernfs_break_active_protection(kn);
+
+	mutex_lock(&rdtgroup_mutex);
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	if (plr->deleted)
+		return NULL;
+
+	return plr;
+}
+
+/**
+ * pseudo_lock_region_kn_unlock - Release lock to pseudo-lock region kernfs node
+ *
+ * The pseudo-lock region's kernfs_node did not have protection against
+ * removal while the lock was held. Here we do actual cleanup if the region
+ * was removed while the lock was held.
+ */
+static void pseudo_lock_region_kn_unlock(struct kernfs_node *kn)
+{
+	struct pseudo_lock_region *plr = (kernfs_type(kn) == KERNFS_DIR) ?
+						kn->priv : kn->parent->priv;
+
+	WARN_ON(!plr);
+	if (!plr)
+		return;
+
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+	mutex_unlock(&rdtgroup_mutex);
+
+	kernfs_unbreak_active_protection(kn);
+	kref_put(&plr->refcount, pseudo_lock_region_release);
+}
 
 /**
  * pseudo_lock_avail_get - return bitmask of cache available for locking
@@ -96,6 +227,87 @@ static struct kernfs_ops pseudo_lock_avail_ops = {
 	.seq_show		= pseudo_lock_avail_show,
 };
 
+int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
+{
+	struct pseudo_lock_region *plr;
+	struct kernfs_node *kn;
+	int ret = 0;
+
+	mutex_lock(&rdtgroup_mutex);
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	if (new_plr) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	plr = kzalloc(sizeof(*plr), GFP_KERNEL);
+	if (!plr) {
+		ret = -ENOSPC;
+		goto out;
+	}
+
+	kn = kernfs_create_dir(pseudo_lock_kn, name, mode, plr);
+	if (IS_ERR(kn)) {
+		ret = PTR_ERR(kn);
+		goto out_free;
+	}
+
+	plr->kn = kn;
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_remove;
+
+	kref_init(&plr->refcount);
+	kernfs_activate(kn);
+	new_plr = plr;
+	ret = 0;
+	goto out;
+
+out_remove:
+	kernfs_remove(kn);
+out_free:
+	kfree(plr);
+out:
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+	mutex_unlock(&rdtgroup_mutex);
+	return ret;
+}
+
+/*
+ * rdt_pseudo_lock_rmdir - Remove pseudo-lock region
+ *
+ * LOCKING:
+ * Since the pseudo-locked region can be associated with a RDT domain at
+ * removal we take both rdtgroup_mutex and rdt_pseudo_lock_mutex to protect
+ * the rdt_domain access as well as the pseudo_lock_region access.
+ */
+int rdt_pseudo_lock_rmdir(struct kernfs_node *kn)
+{
+	struct kernfs_node *parent_kn = kn->parent;
+	struct pseudo_lock_region *plr;
+	int ret = 0;
+
+	plr = pseudo_lock_region_kn_lock(kn);
+	if (!plr) {
+		ret = -EPERM;
+		goto out;
+	}
+
+	if (parent_kn != pseudo_lock_kn) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	kernfs_remove(kn);
+	plr->deleted = true;
+	kref_put(&plr->refcount, pseudo_lock_region_release);
+
+out:
+	pseudo_lock_region_kn_unlock(kn);
+	return ret;
+}
+
 /**
  * rdt_pseudo_lock_fs_init - Create and initialize pseudo-locking files
  * @root: location in kernfs where directory and files should be created
@@ -159,7 +371,13 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 void rdt_pseudo_lock_fs_remove(void)
 {
 	lockdep_assert_held(&rdtgroup_mutex);
+	mutex_lock(&rdt_pseudo_lock_mutex);
 
+	if (new_plr) {
+		new_plr->deleted = true;
+		__pseudo_lock_region_release(new_plr);
+	}
 	kernfs_remove(pseudo_lock_kn);
 	pseudo_lock_kn = NULL;
+	mutex_unlock(&rdt_pseudo_lock_mutex);
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 07/20] x86/intel_rdt: Connect pseudo-locking directory to operations
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (5 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 06/20] x86/intel_rdt: Create pseudo-locked regions Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 08/20] x86/intel_rdt: Introduce pseudo-locking resctrl files Reinette Chatre
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

As a dependent of RDT/CAT we hook up the pseudo-locking files
initialization to that of RDT/CAT. The initial operations of mkdir/rmdir
used to create pseudo-locked regions are now hooked up also.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index c711df201e57..afa3795a124e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1179,13 +1179,19 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		goto out_cdp;
 	}
 
+	ret = rdt_pseudo_lock_fs_init(rdtgroup_default.kn);
+	if (ret) {
+		dentry = ERR_PTR(ret);
+		goto out_info;
+	}
+
 	if (rdt_mon_capable) {
 		ret = mongroup_create_dir(rdtgroup_default.kn,
 					  NULL, "mon_groups",
 					  &kn_mongrp);
 		if (ret) {
 			dentry = ERR_PTR(ret);
-			goto out_info;
+			goto out_psl;
 		}
 		kernfs_get(kn_mongrp);
 
@@ -1226,6 +1232,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 out_mongrp:
 	if (rdt_mon_capable)
 		kernfs_remove(kn_mongrp);
+out_psl:
+	rdt_pseudo_lock_fs_remove();
 out_info:
 	kernfs_remove(kn_info);
 out_cdp:
@@ -1374,6 +1382,7 @@ static void rmdir_all_sub(void)
 	/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
 	update_closid_rmid(cpu_online_mask, &rdtgroup_default);
 
+	rdt_pseudo_lock_fs_remove();
 	kernfs_remove(kn_info);
 	kernfs_remove(kn_mongrp);
 	kernfs_remove(kn_mondata);
@@ -1796,6 +1805,9 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
 	if (strchr(name, '\n'))
 		return -EINVAL;
 
+	if (parent_kn == pseudo_lock_kn)
+		return rdt_pseudo_lock_mkdir(name, mode);
+
 	/*
 	 * If the parent directory is the root directory and RDT
 	 * allocation is supported, add a control and monitoring
@@ -1905,6 +1917,9 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 	cpumask_var_t tmpmask;
 	int ret = 0;
 
+	if (parent_kn == pseudo_lock_kn)
+		return rdt_pseudo_lock_rmdir(kn);
+
 	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
 		return -ENOMEM;
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 08/20] x86/intel_rdt: Introduce pseudo-locking resctrl files
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (6 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 07/20] x86/intel_rdt: Connect pseudo-locking directory to operations Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 09/20] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Each sub-directory within the pseudo-lock directory represents a
pseudo-locked region. Each of these sub-directories now receive the
files that will be used by the user to specify requirements for the
particular region and for the kernel to communicate some details about
the region.

Only support reading operations on these files in this commit. Since
writing to these files will trigger the locking of a region we also just
support reading of unlocked region data.

Two files are created:
schemata:
	Print the details of the portion of cache locked. If this has
	not yet been locked all resources will be listed as uninitialized.
size:
	Print the size in bytes of the memory region pseudo-locked to
	the cache. Value is not yet initialized.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  5 +++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 49 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 14 +++++++++
 3 files changed, 68 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 260f09f872b0..3216488f72df 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -131,6 +131,7 @@ struct rdtgroup {
 #define RFTYPE_CTRL			BIT(RF_CTRLSHIFT)
 #define RFTYPE_MON			BIT(RF_MONSHIFT)
 #define RFTYPE_TOP			BIT(RF_TOPSHIFT)
+#define RF_PSEUDO_LOCK			BIT(7)
 #define RFTYPE_RES_CACHE		BIT(8)
 #define RFTYPE_RES_MB			BIT(9)
 #define RF_CTRL_INFO			(RFTYPE_INFO | RFTYPE_CTRL)
@@ -455,5 +456,9 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root);
 void rdt_pseudo_lock_fs_remove(void);
 int rdt_pseudo_lock_mkdir(const char *name, umode_t mode);
 int rdt_pseudo_lock_rmdir(struct kernfs_node *kn);
+int pseudo_lock_schemata_show(struct kernfs_open_file *of,
+			      struct seq_file *seq, void *v);
+int pseudo_lock_size_show(struct kernfs_open_file *of,
+			  struct seq_file *seq, void *v);
 
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 7a22e367b82f..94bd1b4fbfee 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -40,6 +40,7 @@ static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
  * @kn:			kernfs node representing this region in the resctrl
  *			filesystem
  * @cbm:		bitmask of the pseudo-locked region
+ * @size:		size of pseudo-locked region in bytes
  * @cpu:		core associated with the cache on which the setup code
  *			will be run
  * @minor:		minor number of character device associated with this
@@ -52,6 +53,7 @@ static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
 struct pseudo_lock_region {
 	struct kernfs_node	*kn;
 	u32			cbm;
+	unsigned int		size;
 	int			cpu;
 	unsigned int		minor;
 	bool			locked;
@@ -227,6 +229,49 @@ static struct kernfs_ops pseudo_lock_avail_ops = {
 	.seq_show		= pseudo_lock_avail_show,
 };
 
+int pseudo_lock_schemata_show(struct kernfs_open_file *of,
+			      struct seq_file *seq, void *v)
+{
+	struct pseudo_lock_region *plr;
+	struct rdt_resource *r;
+	int ret = 0;
+
+	plr = pseudo_lock_region_kn_lock(of->kn);
+	if (!plr) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	if (!plr->locked) {
+		for_each_alloc_enabled_rdt_resource(r) {
+			seq_printf(seq, "%s:uninitialized\n", r->name);
+		}
+	}
+
+out:
+	pseudo_lock_region_kn_unlock(of->kn);
+	return ret;
+}
+
+int pseudo_lock_size_show(struct kernfs_open_file *of,
+			  struct seq_file *seq, void *v)
+{
+	struct pseudo_lock_region *plr;
+	int ret = 0;
+
+	plr = pseudo_lock_region_kn_lock(of->kn);
+	if (!plr) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	seq_printf(seq, "%u\n", plr->size);
+
+out:
+	pseudo_lock_region_kn_unlock(of->kn);
+	return ret;
+}
+
 int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 {
 	struct pseudo_lock_region *plr;
@@ -258,6 +303,10 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 	if (ret)
 		goto out_remove;
 
+	ret = rdtgroup_add_files(kn, RF_PSEUDO_LOCK);
+	if (ret)
+		goto out_remove;
+
 	kref_init(&plr->refcount);
 	kernfs_activate(kn);
 	new_plr = plr;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index afa3795a124e..947e9d706864 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -859,6 +859,20 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_schemata_show,
 		.fflags		= RF_CTRL_BASE,
 	},
+	{
+		.name		= "schemata",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= pseudo_lock_schemata_show,
+		.fflags		= RF_PSEUDO_LOCK,
+	},
+	{
+		.name		= "size",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= pseudo_lock_size_show,
+		.fflags		= RF_PSEUDO_LOCK,
+	},
 };
 
 int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 09/20] x86/intel_rdt: Discover supported platforms via prefetch disable bits
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (7 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 08/20] x86/intel_rdt: Introduce pseudo-locking resctrl files Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 10/20] x86/intel_rdt: Disable pseudo-locking if CDP enabled Reinette Chatre
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Knowing the model specific prefetch disable bits is required to support
cache pseudo-locking because the hardware prefetchers need to be disabled
when the kernel memory is pseudo-locked to cache. We add these bits only
for platforms known to support cache pseudo-locking.

If we have not validated pseudo-locking on a platform that does support
RDT/CAT this should not be seen as a failure of CAT, the pseudo-locking
interface will just not be set up.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 80 +++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 94bd1b4fbfee..a0c144b5b09b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -24,8 +24,22 @@
 #include <linux/seq_file.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
+#include <asm/intel-family.h>
 #include "intel_rdt.h"
 
+/*
+ * MSR_MISC_FEATURE_CONTROL register enables the modification of hardware
+ * prefetcher state. Details about this register can be found in the MSR
+ * tables for specific platforms found in Intel's SDM.
+ */
+#define MSR_MISC_FEATURE_CONTROL	0x000001a4
+
+/*
+ * The bits needed to disable hardware prefetching varies based on the
+ * platform. During initialization we will discover which bits to use.
+ */
+static u64 prefetch_disable_bits;
+
 struct kernfs_node *pseudo_lock_kn;
 
 /*
@@ -358,6 +372,57 @@ int rdt_pseudo_lock_rmdir(struct kernfs_node *kn)
 }
 
 /**
+ * get_prefetch_disable_bits - prefetch disable bits of supported platforms
+ *
+ * Here we capture the list of platforms that have been validated to support
+ * pseudo-locking. This includes testing to ensure pseudo-locked regions
+ * with low cache miss rates can be created under variety of load conditions
+ * as well as that these pseudo-locked regions can maintain their low cache
+ * miss rates under variety of load conditions for significant lengths of time.
+ *
+ * After a platform has been validated to support pseudo-locking its
+ * hardware prefetch disable bits are included here as they are documented
+ * in the SDM.
+ *
+ * RETURNS
+ * If platform is supported, the bits to disable hardware prefetchers, 0
+ * if platform is not supported.
+ */
+static u64 get_prefetch_disable_bits(void)
+{
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+	    boot_cpu_data.x86 != 6)
+		return 0;
+
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_BROADWELL_X:
+		/*
+		 * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+		 * as:
+		 * 0    L2 Hardware Prefetcher Disable (R/W)
+		 * 1    L2 Adjacent Cache Line Prefetcher Disable (R/W)
+		 * 2    DCU Hardware Prefetcher Disable (R/W)
+		 * 3    DCU IP Prefetcher Disable (R/W)
+		 * 63:4 Reserved
+		 */
+		return 0xF;
+	case INTEL_FAM6_ATOM_GOLDMONT:
+	case INTEL_FAM6_ATOM_GEMINI_LAKE:
+		/*
+		 * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+		 * as:
+		 * 0     L2 Hardware Prefetcher Disable (R/W)
+		 * 1     Reserved
+		 * 2     DCU Hardware Prefetcher Disable (R/W)
+		 * 63:3  Reserved
+		 */
+		return 0x5;
+	}
+
+	return 0;
+}
+
+/**
  * rdt_pseudo_lock_fs_init - Create and initialize pseudo-locking files
  * @root: location in kernfs where directory and files should be created
  *
@@ -377,6 +442,17 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
+	/*
+	 * Not knowing the bits to disable prefetching is not a failure
+	 * that should be propagated since we only return prefetching bits
+	 * for those platforms pseudo-locking has been tested on. If
+	 * pseudo-locking has not been tested to work on this platform the
+	 * other RDT features should continue to be available.
+	 */
+	prefetch_disable_bits = get_prefetch_disable_bits();
+	if (prefetch_disable_bits == 0)
+		return 0;
+
 	pseudo_lock_kn = kernfs_create_dir(root, "pseudo_lock",
 					   root->mode, NULL);
 	if (IS_ERR(pseudo_lock_kn))
@@ -420,6 +496,10 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 void rdt_pseudo_lock_fs_remove(void)
 {
 	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (!pseudo_lock_kn)
+		return;
+
 	mutex_lock(&rdt_pseudo_lock_mutex);
 
 	if (new_plr) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 10/20] x86/intel_rdt: Disable pseudo-locking if CDP enabled
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (8 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 09/20] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 11/20] x86/intel_rdt: Associate pseudo-locked regions with its domain Reinette Chatre
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Pseudo-locking can work when Code and Data Prioritization (CDP) is enabled,
but there are a few additional checks and actions involved. At this time
it is not clear if users would want to use pseudo-locking and CDP at the
same time so the support of this is delayed until we understand the
usage better.

Disable pseudo-locking if CDP is enabled. Add the details of things to
keep in mind for anybody considering enabling this support.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index a0c144b5b09b..d2fe8ef7e74a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -443,6 +443,34 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 	lockdep_assert_held(&rdtgroup_mutex);
 
 	/*
+	 * Pseudo-locking not supported when CDP is enabled.
+	 *
+	 * Some things to consider if you would like to enable this support:
+	 * - When CDP is enabled two separate resources are exposed, L3DATA
+	 *   and L3CODE, but they are actually on the same cache. The
+	 *   implication for pseudo-locking is that if a pseudo-locked
+	 *   region is created on a domain of one resource (eg. L3CODE),
+	 *   then a pseudo-locked region cannot be created on that same
+	 *   domain of the other resource (eg. L3DATA). This is because
+	 *   the creation of a pseudo-locked region involves a call to
+	 *   wbinvd that will affect all cache allocations on particular
+	 *   domain.
+	 * - Considering the previous, it may be possible to only expose
+	 *   one of the CDP resources to pseudo-locking and hide the other.
+	 *   For example, we could consider to only expose L3DATA and since
+	 *   the L3 cache is unified it is still possible to place
+	 *   instructions there are execute it.
+	 * - If only one region is exposed to pseudo-locking we should still
+	 *   keep in mind that availability of a portion of cache for
+	 *   pseudo-locking should take into account both resources. Similarly,
+	 *   if a pseudo-locked region is created in one resource, the portion
+	 *   of cache used by it should be made unavailable to all future
+	 *   allocations from both resources.
+	 */
+	if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled)
+		return 0;
+
+	/*
 	 * Not knowing the bits to disable prefetching is not a failure
 	 * that should be propagated since we only return prefetching bits
 	 * for those platforms pseudo-locking has been tested on. If
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 11/20] x86/intel_rdt: Associate pseudo-locked regions with its domain
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (9 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 10/20] x86/intel_rdt: Disable pseudo-locking if CDP enabled Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 12/20] x86/intel_rdt: Support CBM checking from value and character buffer Reinette Chatre
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

After a pseudo-locked region is locked it needs to be associated with
the RDT domain representing the pseudo-locked cache so that its life
cycle can be managed correctly.

Only a single pseudo-locked region can exist on any cache instance so we
maintain a single pointer to a pseudo-locked region from each RDT
domain.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 3216488f72df..120a25fdf1e8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -184,6 +184,8 @@ struct mbm_state {
 	u64	prev_msr;
 };
 
+struct pseudo_lock_region;
+
 /**
  * struct rdt_domain - group of cpus sharing an RDT resource
  * @list:	all instances of this resource
@@ -202,6 +204,7 @@ struct mbm_state {
  * @ctrl_val:	array of cache or mem ctrl values (indexed by CLOSID)
  * @new_ctrl:	new ctrl value to be loaded
  * @have_new_ctrl: did user provide new_ctrl for this domain
+ * @plr:	pseudo-locked region associated with this domain
  */
 struct rdt_domain {
 	struct list_head	list;
@@ -217,6 +220,7 @@ struct rdt_domain {
 	u32			*ctrl_val;
 	u32			new_ctrl;
 	bool			have_new_ctrl;
+	struct pseudo_lock_region	*plr;
 };
 
 /**
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 12/20] x86/intel_rdt: Support CBM checking from value and character buffer
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (10 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 11/20] x86/intel_rdt: Associate pseudo-locked regions with its domain Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 13/20] x86/intel_rdt: Support schemata write - pseudo-locking core Reinette Chatre
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Validity check of capacity bitmask (CBM) is currently only done on
character buffer when user writes new schemata to resctrl file.

In preparation for support of CBM checking within other areas of the RDT
code the CBM validity check is split up to support checking with CBM
provided as character buffer as well as the value itself.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  1 +
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 34 ++++++++++++++++++++---------
 2 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 120a25fdf1e8..27d3b01b5f07 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -437,6 +437,7 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int closid_alloc(void);
 void closid_free(int closid);
 bool closid_allocated(unsigned int closid);
+bool cbm_validate_val(unsigned long val, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index d2bb1a30c6a1..50d3743ed79e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -86,17 +86,10 @@ int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
  *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
  * Additionally Haswell requires at least two bits set.
  */
-static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+bool cbm_validate_val(unsigned long val, struct rdt_resource *r)
 {
-	unsigned long first_bit, zero_bit, val;
+	unsigned long first_bit, zero_bit;
 	unsigned int cbm_len = r->cache.cbm_len;
-	int ret;
-
-	ret = kstrtoul(buf, 16, &val);
-	if (ret) {
-		rdt_last_cmd_printf("non-hex character in mask %s\n", buf);
-		return false;
-	}
 
 	if (val == 0 || val > r->default_ctrl) {
 		rdt_last_cmd_puts("mask out of range\n");
@@ -117,11 +110,32 @@ static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
 		return false;
 	}
 
-	*data = val;
 	return true;
 }
 
 /*
+ * Validate CBM as provided in character buffer. If CBM is valid
+ * true will be returned as well as number representation pointed to by
+ * @data. If CBM is invalid, return false.
+ */
+static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+{
+	unsigned long val;
+	bool ret;
+
+	if (kstrtoul(buf, 16, &val)) {
+		rdt_last_cmd_printf("non-hex character in mask %s\n", buf);
+		return false;
+	}
+
+	ret = cbm_validate_val(val, r);
+	if (ret)
+		*data = val;
+
+	return ret;
+}
+
+/*
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 13/20] x86/intel_rdt: Support schemata write - pseudo-locking core
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (11 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 12/20] x86/intel_rdt: Support CBM checking from value and character buffer Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 14/20] x86/intel_rdt: Enable testing for pseudo-locked region Reinette Chatre
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

When a user writes the requested pseudo-locking schemata it will trigger
the pseudo-locking of equivalent sized memory. A successful return from
this schemata write means that the pseudo-locking succeeded.

To support the pseudo-locking we first initialize as much as we can
about the region that will be pseudo-locked. This includes, how much
memory does the requested bitmask represent, which CPU the requested
region is associated with, and what is the cache line size of that cache
(so that we know what stride to use for locking). At this point a
contiguous block of memory matching the requested bitmask is allocated.

After initialization the pseudo-locking is performed. A temporary CAT
allocation is made to reflect the requested bitmask and with this new
class of service active and interference minimized, the allocated memory
is loaded into the cache. This completes the pseudo-locking of kernel
memory.

As part of the pseudo-locking the pseudo-locked region is moved to
the RDT domain to which it belongs. We thus also need to ensure that
cleanups happen in this area when there is a directory removal or
unmount request.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |   2 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 573 +++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    |   3 +-
 3 files changed, 571 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 27d3b01b5f07..b4426a997516 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -463,6 +463,8 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode);
 int rdt_pseudo_lock_rmdir(struct kernfs_node *kn);
 int pseudo_lock_schemata_show(struct kernfs_open_file *of,
 			      struct seq_file *seq, void *v);
+ssize_t pseudo_lock_schemata_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off);
 int pseudo_lock_size_show(struct kernfs_open_file *of,
 			  struct seq_file *seq, void *v);
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index d2fe8ef7e74a..c859ff7dea6e 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -19,12 +19,18 @@
 
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <linux/kernfs.h>
 #include <linux/kref.h>
+#include <linux/kthread.h>
 #include <linux/seq_file.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
+#include <asm/cacheflush.h>
 #include <asm/intel-family.h>
+#include <asm/intel_rdt_sched.h>
 #include "intel_rdt.h"
 
 /*
@@ -43,6 +49,20 @@ static u64 prefetch_disable_bits;
 struct kernfs_node *pseudo_lock_kn;
 
 /*
+ * Only one pseudo-locked region can be set up at a time and that is
+ * enforced by taking the rdt_pseudo_lock_mutex when the user writes the
+ * requested schemata to the resctrl file and releasing the mutex on
+ * completion. The thread locking the kernel memory into the cache starts
+ * and completes during this time so we can be sure that only one thread
+ * can run at any time.
+ * The functions starting the pseudo-locking thread needs to wait for its
+ * completion and since there can only be one we have a global workqueue
+ * and variable to support this.
+ */
+static DECLARE_WAIT_QUEUE_HEAD(wq);
+static int thread_done;
+
+/*
  * Protect the pseudo_lock_region access. Since we will link to
  * pseudo_lock_region from rdt domains rdtgroup_mutex should be obtained
  * first if needed.
@@ -53,26 +73,39 @@ static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
  * struct pseudo_lock_region - pseudo-lock region information
  * @kn:			kernfs node representing this region in the resctrl
  *			filesystem
+ * @r:			point back to the rdt_resource to which this
+ *			pseudo-locked region belongs
+ * @d:			point back to the rdt_domain to which this
+ *			pseudo-locked region belongs
  * @cbm:		bitmask of the pseudo-locked region
  * @size:		size of pseudo-locked region in bytes
+ * @line_size:		size of the cache lines
  * @cpu:		core associated with the cache on which the setup code
  *			will be run
+ * @closid:		CAT class of service that will be used temporarily
+ *			to initialize this pseudo-locked region
  * @minor:		minor number of character device associated with this
  *			region
  * @locked:		state indicating if this region has been locked or not
  * @refcount:		how many are waiting to access this pseudo-lock
  *			region via kernfs
  * @deleted:		user requested removal of region via rmdir on kernfs
+ * @kmem:		the kernel memory associated with pseudo-locked region
  */
 struct pseudo_lock_region {
 	struct kernfs_node	*kn;
+	struct rdt_resource	*r;
+	struct rdt_domain	*d;
 	u32			cbm;
 	unsigned int		size;
+	unsigned int		line_size;
 	int			cpu;
+	int			closid;
 	unsigned int		minor;
 	bool			locked;
 	struct kref		refcount;
 	bool			deleted;
+	void			*kmem;
 };
 
 /*
@@ -85,6 +118,55 @@ struct pseudo_lock_region {
  */
 static struct pseudo_lock_region *new_plr;
 
+/*
+ * Helper to write 64bit value to MSR without tracing. Used when
+ * use of the cache should be restricted and use of registers used
+ * for local variables should be avoided.
+ */
+static inline void pseudo_wrmsrl_notrace(unsigned int msr, u64 val)
+{
+	__wrmsr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
+}
+
+/**
+ * pseudo_lock_clos_set - Program requested class of service
+ * @plr:    pseudo-locked region identifying cache that will have its
+ *          class of service modified
+ * @closid: class of service that should be modified
+ * @bm:     new bitmask for @closid
+ */
+static int pseudo_lock_clos_set(struct pseudo_lock_region *plr,
+				int closid, u32 bm)
+{
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+	int ret;
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list)
+			d->have_new_ctrl = false;
+	}
+
+	r = plr->r;
+	d = plr->d;
+	d->new_ctrl = bm;
+	d->have_new_ctrl = true;
+
+	ret = update_domains(r, closid);
+
+	return ret;
+}
+
+static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
+{
+	plr->size = 0;
+	plr->line_size = 0;
+	kfree(plr->kmem);
+	plr->kmem = NULL;
+	plr->r = NULL;
+	plr->d = NULL;
+}
+
 static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 {
 	bool is_new_plr = (plr == new_plr);
@@ -93,6 +175,23 @@ static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 	if (!plr->deleted)
 		return;
 
+	if (plr->locked) {
+		plr->d->plr = NULL;
+		/*
+		 * Resource groups come and go. Simply returning this
+		 * pseudo-locked region's bits to the default CLOS may
+		 * result in default CLOS to become fragmented, causing
+		 * the setting of its bitmask to fail. Ensure it is valid
+		 * first. If this check does fail we cannot return the bits
+		 * to the default CLOS and userspace intervention would be
+		 * required to ensure portions of the cache do not go
+		 * unused.
+		 */
+		if (cbm_validate_val(plr->d->ctrl_val[0] | plr->cbm, plr->r))
+			pseudo_lock_clos_set(plr, 0,
+					     plr->d->ctrl_val[0] | plr->cbm);
+		pseudo_lock_region_clear(plr);
+	}
 	kfree(plr);
 	if (is_new_plr)
 		new_plr = NULL;
@@ -178,17 +277,17 @@ static void pseudo_lock_region_kn_unlock(struct kernfs_node *kn)
  * @r: resource to which this cache instance belongs
  * @d: domain representing the cache instance
  *
- * Availability for pseudo-locking is determined as follows:
+ * Pseudo-locked regions are set up with wbinvd, limiting us to one region
+ * per cache instance.
+ *
+ * If no other pseudo-locked region present on this cache instance
+ * availability for pseudo-locking is determined as follows:
  * * Cache area is in use by default COS.
  * * Cache area is NOT in use by any other (other than default) COS.
  * * Cache area is not shared with any other entity. Specifically, the
  *   cache area does not appear in "Bitmask of Shareable Resource with Other
  *   executing entities" found in EBX during CAT enumeration.
  *
- * Below is also required to determine availability and will be
- * added in later:
- * * Cache area is not currently pseudo-locked.
- *
  * LOCKING:
  * rdtgroup_mutex is expected to be held when called
  *
@@ -203,6 +302,13 @@ static u32 pseudo_lock_avail_get(struct rdt_resource *r, struct rdt_domain *d)
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
+	/*
+	 * Nothing available if a pseudo-locked region already associated
+	 * with this cache instance.
+	 */
+	if (d->plr)
+		return 0;
+
 	avail = d->ctrl_val[0];
 	for (i = 1; i < r->num_closid; i++) {
 		if (closid_allocated(i))
@@ -213,6 +319,34 @@ static u32 pseudo_lock_avail_get(struct rdt_resource *r, struct rdt_domain *d)
 	return avail;
 }
 
+/**
+ * pseudo_lock_space_avail - returns if any space available for pseudo-locking
+ *
+ * Checks all cache instances on system for any regions available for
+ * pseudo-locking.
+ *
+ * LOCKING:
+ * rdtgroup_mutex is expected to be held when called
+ *
+ * RETURNS:
+ * true if any cache instance has space available for pseudo-locking, false
+ * otherwise
+ */
+static bool pseudo_lock_space_avail(void)
+{
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list) {
+			if (pseudo_lock_avail_get(r, d) > 0)
+				return true;
+		}
+	}
+	return false;
+}
+
 static int pseudo_lock_avail_show(struct seq_file *sf, void *v)
 {
 	struct rdt_resource *r;
@@ -260,6 +394,9 @@ int pseudo_lock_schemata_show(struct kernfs_open_file *of,
 		for_each_alloc_enabled_rdt_resource(r) {
 			seq_printf(seq, "%s:uninitialized\n", r->name);
 		}
+	} else {
+		seq_printf(seq, "%s:%d=%x\n", plr->r->name,
+			   plr->d->id, plr->cbm);
 	}
 
 out:
@@ -267,6 +404,418 @@ int pseudo_lock_schemata_show(struct kernfs_open_file *of,
 	return ret;
 }
 
+/**
+ * init_from_cache_details - Initialize pseudo-lock region info from cache data
+ *
+ * When a user requests a cache region to be locked the request is provided
+ * as a bitmask. We need to allocate memory of matching size so here we
+ * translate the requested bitmask into how many bytes it represents. This
+ * is done by dividing the total cache size by the CBM len to first
+ * determine how many bytes each bit in bitmask represents, then
+ * multiply that with how many bits were set in requested bitmask.
+ *
+ * Also set the cache line size to know the stride with which data needs to
+ * be accessed to be pseudo-locked.
+ */
+static int init_from_cache_details(struct pseudo_lock_region *plr,
+				   struct rdt_resource *r)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(plr->cpu);
+	unsigned int cbm_len = r->cache.cbm_len;
+	int num_b;
+	int i;
+
+	num_b = bitmap_weight((unsigned long *)&plr->cbm, cbm_len);
+
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == r->cache_level) {
+			plr->size = ci->info_list[i].size / cbm_len * num_b;
+			plr->line_size = ci->info_list[i].coherency_line_size;
+			return 0;
+		}
+	}
+
+	return -1;
+}
+
+static int pseudo_lock_region_init(struct pseudo_lock_region *plr,
+				   struct rdt_resource *r,
+				   struct rdt_domain *d)
+{
+	unsigned long b_req = plr->cbm;
+	unsigned long b_avail;
+	int ret;
+
+	b_avail = pseudo_lock_avail_get(r, d);
+
+	if (!bitmap_subset(&b_req, &b_avail, r->cache.cbm_len)) {
+		rdt_last_cmd_puts("requested bitmask not available\n");
+		return -ENOSPC;
+	}
+
+	/*
+	 * Use the first cpu we find that is associated with the
+	 * cache selected.
+	 */
+	plr->cpu = cpumask_first(&d->cpu_mask);
+
+	if (!cpu_online(plr->cpu)) {
+		rdt_last_cmd_printf("cpu %u associated with cache not online\n",
+				    plr->cpu);
+		return -ENODEV;
+	}
+
+	ret = init_from_cache_details(plr, r);
+	if (ret < 0) {
+		rdt_last_cmd_puts("unable to lookup cache details\n");
+		return -ENOSPC;
+	}
+
+	/*
+	 * We do not yet support contiguous regions larger than
+	 * KMALLOC_MAX_SIZE
+	 */
+	if (plr->size > KMALLOC_MAX_SIZE) {
+		rdt_last_cmd_puts("requested region exceeds maximum size\n");
+		return -E2BIG;
+	}
+
+	plr->kmem = kzalloc(plr->size, GFP_KERNEL);
+	if (!plr->kmem) {
+		rdt_last_cmd_puts("unable to allocate memory\n");
+		return -ENOMEM;
+	}
+
+	plr->r = r;
+	plr->d = d;
+
+	return 0;
+}
+
+/**
+ * pseudo_lock_fn - Load kernel memory into cache
+ *
+ * This is the core pseudo-locking function.
+ *
+ * First we ensure that the kernel memory cannot be found in the cache.
+ * Then, while taking care that there will be as little interference as
+ * possible, each cache line of the memory to be loaded is touched while
+ * core is running with class of service set to the bitmask of the
+ * pseudo-locked region. After this is complete no future CAT allocations
+ * will be allowed to overlap with this bitmask.
+ *
+ * Local register variables are utilized to ensure that the memory region
+ * to be locked is the only memory access made during the critical locking
+ * loop.
+ */
+static int pseudo_lock_fn(void *_plr)
+{
+	struct pseudo_lock_region *plr = _plr;
+	u32 rmid_p, closid_p;
+	unsigned long flags;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use a regular
+	 * variable to ensure we always use a valid pointer, but the cost
+	 * is that this variable will enter the cache through evicting the
+	 * memory we are trying to lock into the cache. Thus expect lower
+	 * pseudo-locking success rate when KASAN is active.
+	 */
+	unsigned int line_size;
+	unsigned int size;
+	void *mem_r;
+#else
+	register unsigned int line_size asm("esi");
+	register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	/*
+	 * Make sure none of the allocated memory is cached. If it is we
+	 * will get a cache hit in below loop from outside of pseudo-locked
+	 * region.
+	 * wbinvd (as opposed to clflush/clflushopt) is required to
+	 * increase likelihood that allocated cache portion will be filled
+	 * with associated memory
+	 */
+	wbinvd();
+
+	preempt_disable();
+	local_irq_save(flags);
+	/*
+	 * Call wrmsr and rdmsr as directly as possible to avoid tracing
+	 * clobbering local register variables or affecting cache accesses.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	closid_p = this_cpu_read(pqr_state.cur_closid);
+	rmid_p = this_cpu_read(pqr_state.cur_rmid);
+	mem_r = plr->kmem;
+	size = plr->size;
+	line_size = plr->line_size;
+	__wrmsr(IA32_PQR_ASSOC, rmid_p, plr->closid);
+	/*
+	 * Cache was flushed earlier. Now access kernel memory to read it
+	 * into cache region associated with just activated plr->closid.
+	 * Loop over data twice:
+	 * - In first loop the cache region is shared with the page walker
+	 *   as it populates the paging structure caches (including TLB).
+	 * - In the second loop the paging structure caches are used and
+	 *   cache region is populated with the memory being referenced.
+	 */
+	for (i = 0; i < size; i += PAGE_SIZE) {
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			:
+			: "r" (mem_r), "r" (i)
+			: "%eax", "memory");
+	}
+	for (i = 0; i < size; i += line_size) {
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			:
+			: "r" (mem_r), "r" (i)
+			: "%eax", "memory");
+	}
+	__wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_restore(flags);
+	preempt_enable();
+
+	thread_done = 1;
+	wake_up_interruptible(&wq);
+	return 0;
+}
+
+static int pseudo_lock_doit(struct pseudo_lock_region *plr,
+			    struct rdt_resource *r,
+			    struct rdt_domain *d)
+{
+	struct task_struct *thread;
+	int closid;
+	int ret, i;
+
+	/*
+	 * With the usage of wbinvd we can only support one pseudo-locked
+	 * region per domain at this time.
+	 */
+	if (d->plr) {
+		rdt_last_cmd_puts("pseudo-locked region exists on cache\n");
+		return -ENOSPC;
+	}
+
+	ret = pseudo_lock_region_init(plr, r, d);
+	if (ret < 0)
+		return ret;
+
+	closid = closid_alloc();
+	if (closid < 0) {
+		ret = closid;
+		rdt_last_cmd_puts("unable to obtain free closid\n");
+		goto out_region;
+	}
+
+	/*
+	 * Ensure we end with a valid default CLOS. If a pseudo-locked
+	 * region in middle of possible bitmasks is selected it will split
+	 * up default CLOS which would be a fault and for which handling
+	 * is unclear so we fail back to userspace. Validation will also
+	 * ensure that default CLOS is not zero, keeping some cache
+	 * available to rest of system.
+	 */
+	if (!cbm_validate_val(d->ctrl_val[0] & ~plr->cbm, r)) {
+		ret = -EINVAL;
+		rdt_last_cmd_printf("bm 0x%x causes invalid clos 0 bm 0x%x\n",
+				    plr->cbm, d->ctrl_val[0] & ~plr->cbm);
+		goto out_closid;
+	}
+
+	ret = pseudo_lock_clos_set(plr, 0, d->ctrl_val[0] & ~plr->cbm);
+	if (ret < 0) {
+		rdt_last_cmd_printf("unable to set clos 0 bitmask to 0x%x\n",
+				    d->ctrl_val[0] & ~plr->cbm);
+		goto out_closid;
+	}
+
+	ret = pseudo_lock_clos_set(plr, closid, plr->cbm);
+	if (ret < 0) {
+		rdt_last_cmd_printf("unable to set closid %d bitmask to 0x%x\n",
+				    closid, plr->cbm);
+		goto out_clos_def;
+	}
+
+	plr->closid = closid;
+
+	thread_done = 0;
+
+	thread = kthread_create_on_node(pseudo_lock_fn, plr,
+					cpu_to_node(plr->cpu),
+					"pseudo_lock/%u", plr->cpu);
+	if (IS_ERR(thread)) {
+		ret = PTR_ERR(thread);
+		rdt_last_cmd_printf("locking thread returned error %d\n", ret);
+		/*
+		 * We do not return CBM to newly allocated CLOS here on
+		 * error path since that will result in a CBM of all
+		 * zeroes which is an illegal MSR write.
+		 */
+		goto out_clos_def;
+	}
+
+	kthread_bind(thread, plr->cpu);
+	wake_up_process(thread);
+
+	ret = wait_event_interruptible(wq, thread_done == 1);
+	if (ret < 0) {
+		rdt_last_cmd_puts("locking thread interrupted\n");
+		goto out_clos_def;
+	}
+
+	/*
+	 * closid will be released soon but its CBM as well as CBM of not
+	 * yet allocated CLOS as stored in the array will remain. Ensure
+	 * that CBM will be what is currently the default CLOS, which
+	 * excludes pseudo-locked region.
+	 */
+	for (i = 1; i < r->num_closid; i++) {
+		if (i == closid || !closid_allocated(i))
+			pseudo_lock_clos_set(plr, i, d->ctrl_val[0]);
+	}
+
+	plr->locked = true;
+	d->plr = plr;
+	new_plr = NULL;
+
+	/*
+	 * We do not return CBM to CLOS here since that will result in a
+	 * CBM of all zeroes which is an illegal MSR write.
+	 */
+	closid_free(closid);
+	ret = 0;
+	goto out;
+
+out_clos_def:
+	pseudo_lock_clos_set(plr, 0, d->ctrl_val[0] | plr->cbm);
+out_closid:
+	closid_free(closid);
+out_region:
+	pseudo_lock_region_clear(plr);
+out:
+	return ret;
+}
+
+/**
+ * pseudo_lock_schemata_write - process user's pseudo-locking request
+ *
+ * User provides a schemata in format of RESOURCE:ID=BITMASK with the
+ * following meaning:
+ * RESOURCE - Name of the RDT resource (rdt_resource->name) that will be
+ *            pseudo-locked.
+ * ID       - id of the particular instace of RESOURCE that will be
+ *            pseudo-locked. This maps to rdt_domain->id.
+ * BITMASK  - The bitmask specifying the region of cache that should be
+ *            pseudo-locked.
+ *
+ * RETURNS:
+ * On success the user's requested region has been pseudo-locked
+ */
+ssize_t pseudo_lock_schemata_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	struct pseudo_lock_region *plr;
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+	char *resname, *dom;
+	bool found = false;
+	int ret = -EINVAL;
+	int dom_id;
+	u32 b_req;
+
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+
+	cpus_read_lock();
+
+	plr = pseudo_lock_region_kn_lock(of->kn);
+	if (!plr) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	rdt_last_cmd_clear();
+
+	/* Do not lock a region twice. */
+	if (plr->locked) {
+		ret = -EEXIST;
+		rdt_last_cmd_puts("region is already locked\n");
+		goto out;
+	}
+
+	if (plr != new_plr) {
+		rdt_last_cmd_puts("region has already been initialized\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	buf[nbytes - 1] = '\0';
+
+	resname = strsep(&buf, ":");
+	if (!buf) {
+		rdt_last_cmd_puts("schemata missing ':'\n");
+		goto out;
+	}
+
+	dom = strsep(&buf, "=");
+	if (!buf) {
+		rdt_last_cmd_puts("schemata missing '='\n");
+		goto out;
+	}
+
+	ret = kstrtoint(dom, 10, &dom_id);
+	if (ret < 0 || dom_id < 0) {
+		rdt_last_cmd_puts("unable to parse cache id\n");
+		goto out;
+	}
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		if (!strcmp(resname, r->name)) {
+			found = true;
+			ret = kstrtou32(buf, 16, &b_req);
+			if (ret) {
+				rdt_last_cmd_puts("unable to parse bitmask\n");
+				goto out;
+			}
+			if (!cbm_validate_val(b_req, r)) {
+				ret = -EINVAL;
+				goto out;
+			}
+			plr->cbm = b_req;
+			list_for_each_entry(d, &r->domains, list) {
+				if (d->id == dom_id) {
+					ret = pseudo_lock_doit(plr, r, d);
+					goto out;
+				}
+			}
+			rdt_last_cmd_puts("no matching cache instance\n");
+			ret = -EINVAL;
+			break;
+		}
+	}
+
+	if (!found) {
+		rdt_last_cmd_puts("invalid resource name\n");
+		ret = -EINVAL;
+	}
+
+out:
+	pseudo_lock_region_kn_unlock(of->kn);
+	cpus_read_unlock();
+	return ret ?: nbytes;
+}
+
 int pseudo_lock_size_show(struct kernfs_open_file *of,
 			  struct seq_file *seq, void *v)
 {
@@ -295,7 +844,7 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 	mutex_lock(&rdtgroup_mutex);
 	mutex_lock(&rdt_pseudo_lock_mutex);
 
-	if (new_plr) {
+	if (new_plr || !pseudo_lock_space_avail()) {
 		ret = -ENOSPC;
 		goto out;
 	}
@@ -523,6 +1072,9 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
  */
 void rdt_pseudo_lock_fs_remove(void)
 {
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+
 	lockdep_assert_held(&rdtgroup_mutex);
 
 	if (!pseudo_lock_kn)
@@ -534,6 +1086,15 @@ void rdt_pseudo_lock_fs_remove(void)
 		new_plr->deleted = true;
 		__pseudo_lock_region_release(new_plr);
 	}
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list) {
+			if (d->plr) {
+				d->plr->deleted = true;
+				__pseudo_lock_region_release(d->plr);
+			}
+		}
+	}
 	kernfs_remove(pseudo_lock_kn);
 	pseudo_lock_kn = NULL;
 	mutex_unlock(&rdt_pseudo_lock_mutex);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 947e9d706864..a7126949a9aa 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -861,9 +861,10 @@ static struct rftype res_common_files[] = {
 	},
 	{
 		.name		= "schemata",
-		.mode		= 0444,
+		.mode		= 0644,
 		.kf_ops		= &rdtgroup_kf_single_ops,
 		.seq_show	= pseudo_lock_schemata_show,
+		.write		= pseudo_lock_schemata_write,
 		.fflags		= RF_PSEUDO_LOCK,
 	},
 	{
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 14/20] x86/intel_rdt: Enable testing for pseudo-locked region
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (12 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 13/20] x86/intel_rdt: Support schemata write - pseudo-locking core Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 15/20] x86/intel_rdt: Prevent new allocations from pseudo-locked regions Reinette Chatre
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Introduce a new test that can be used to determine if a provided CBM
intersects with an existing pseudo-locked region of cache domain.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  1 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index b4426a997516..529fe535b357 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -457,6 +457,7 @@ void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_resource *r, struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
+bool cbm_pseudo_locked(unsigned long cbm, struct rdt_domain *d);
 int rdt_pseudo_lock_fs_init(struct kernfs_node *root);
 void rdt_pseudo_lock_fs_remove(void);
 int rdt_pseudo_lock_mkdir(const char *name, umode_t mode);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index c859ff7dea6e..9c0f002c08ea 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -273,6 +273,25 @@ static void pseudo_lock_region_kn_unlock(struct kernfs_node *kn)
 }
 
 /**
+ * cbm_pseudo_locked - Test if all or portion of CBM is pseudo-locked
+ * @cbm:	bitmask to be tested
+ * @d:		rdt_domain for which @cbm was provided
+ *
+ * RETURNS:
+ * True if bits from @cbm intersects with what has been pseudo-locked in
+ * rdt_domain @d, false otherwise.
+ */
+bool cbm_pseudo_locked(unsigned long cbm, struct rdt_domain *d)
+{
+	if (d->plr &&
+	    bitmap_intersects(&cbm, (unsigned long *)&d->plr->cbm,
+			      d->plr->r->cache.cbm_len))
+		return true;
+
+	return false;
+}
+
+/**
  * pseudo_lock_avail_get - return bitmask of cache available for locking
  * @r: resource to which this cache instance belongs
  * @d: domain representing the cache instance
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 15/20] x86/intel_rdt: Prevent new allocations from pseudo-locked regions
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (13 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 14/20] x86/intel_rdt: Enable testing for pseudo-locked region Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 16/20] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

When a user requests a new cache allocation we need to enforce that it
does not intersect with an existing pseudo-locked region. An allocation
with a bitmask intersection with a pseudo-locked region will enable
cache allocations to that region and thus evict pseudo-locked data.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 50d3743ed79e..fb3ecc864b2a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -136,8 +136,10 @@ static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
 }
 
 /*
- * Read one cache bit mask (hex). Check that it is valid for the current
- * resource type.
+ * Read one cache bit mask (hex). Check that it is valid and available for
+ * the current resource type. While CAT allows CBM to overlap amongst
+ * classes of service we do not allow a CBM to overlap with a region that has
+ * been pseudo-locked.
  */
 int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
 {
@@ -150,6 +152,8 @@ int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
 
 	if(!cbm_validate(buf, &data, r))
 		return -EINVAL;
+	if (cbm_pseudo_locked(data, d))
+		return -EINVAL;
 	d->new_ctrl = data;
 	d->have_new_ctrl = true;
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 16/20] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (14 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 15/20] x86/intel_rdt: Prevent new allocations from pseudo-locked regions Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 17/20] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

There is no simple yes/no test to determine if pseudo-locking was
successful. In order to test pseudo-locking we expose a debugfs file for
each pseudo-locked region that will record the latency of reading the
pseudo-locked memory at a stride of 32 bytes (hardcoded). These numbers
will give us an idea of locking was successful or not since they will
reflect cache hits and cache misses (hardware prefetching is disabled
during the test).

The new debugfs file "measure_trigger" will, when the
pseudo_lock_mem_latency tracepoint is enabled, record the latency of
accessing each cache line twice.

Kernel tracepoints offer us histograms that is a simple way to visualize
the memory access latency and immediately see any cache misses. For
example, the hist trigger below before trigger of the measurement
will display the memory access latency and instances at each
latency:
echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/pseudo_lock/\
                           pseudo_lock_mem_latency/trigger

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/Kconfig                                  |  11 ++
 arch/x86/kernel/cpu/Makefile                      |   1 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 204 ++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |  22 +++
 4 files changed, 238 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2fdb23313dd5..b66a583e410f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -442,6 +442,17 @@ config INTEL_RDT
 
 	  Say N if unsure.
 
+config INTEL_RDT_DEBUGFS
+	bool "Intel RDT debugfs interface"
+	depends on INTEL_RDT
+	select HIST_TRIGGERS
+	select DEBUG_FS
+	---help---
+	  Enable the creation of Intel RDT debugfs files. In support of
+	  debugging and validation of Intel RDT sub-features that use it.
+
+	  Say N if unsure.
+
 if X86_32
 config X86_EXTENDED_PLATFORM
 	bool "Support for extended (non-PC) x86 platforms"
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index ce704c5fe1da..995269e702bc 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -36,6 +36,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
 obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
 obj-$(CONFIG_INTEL_RDT)	+= intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
+CFLAGS_intel_rdt_pseudo_lock.o = -I$(src)
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 9c0f002c08ea..e9187d5a70f0 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -22,6 +22,7 @@
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/debugfs.h>
 #include <linux/kernfs.h>
 #include <linux/kref.h>
 #include <linux/kthread.h>
@@ -33,6 +34,11 @@
 #include <asm/intel_rdt_sched.h>
 #include "intel_rdt.h"
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+#define CREATE_TRACE_POINTS
+#include "intel_rdt_pseudo_lock_event.h"
+#endif
+
 /*
  * MSR_MISC_FEATURE_CONTROL register enables the modification of hardware
  * prefetcher state. Details about this register can be found in the MSR
@@ -69,6 +75,17 @@ static int thread_done;
  */
 static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+/*
+ * Pointers to debugfs directories. @debugfs_resctrl points to the top-level
+ * directory named resctrl. This can be moved to a central area when other
+ * RDT components start using it.
+ * @debugfs_pseudo points to the pseudo_lock directory under resctrl.
+ */
+static struct dentry *debugfs_resctrl;
+static struct dentry *debugfs_pseudo;
+#endif
+
 /**
  * struct pseudo_lock_region - pseudo-lock region information
  * @kn:			kernfs node representing this region in the resctrl
@@ -91,6 +108,8 @@ static DEFINE_MUTEX(rdt_pseudo_lock_mutex);
  *			region via kernfs
  * @deleted:		user requested removal of region via rmdir on kernfs
  * @kmem:		the kernel memory associated with pseudo-locked region
+ * @debugfs_dir:	pointer to this region's directory in the debugfs
+ *			filesystem
  */
 struct pseudo_lock_region {
 	struct kernfs_node	*kn;
@@ -106,6 +125,9 @@ struct pseudo_lock_region {
 	struct kref		refcount;
 	bool			deleted;
 	void			*kmem;
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	struct dentry		*debugfs_dir;
+#endif
 };
 
 /*
@@ -192,6 +214,9 @@ static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 					     plr->d->ctrl_val[0] | plr->cbm);
 		pseudo_lock_region_clear(plr);
 	}
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	debugfs_remove_recursive(plr->debugfs_dir);
+#endif
 	kfree(plr);
 	if (is_new_plr)
 		new_plr = NULL;
@@ -291,6 +316,136 @@ bool cbm_pseudo_locked(unsigned long cbm, struct rdt_domain *d)
 	return false;
 }
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+static int measure_cycles_fn(void *_plr)
+{
+	struct pseudo_lock_region *plr = _plr;
+	unsigned long flags;
+	u64 start, end;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use a regular
+	 * variable to ensure we always use a valid pointer to access memory.
+	 * The cost is that accessing this pointer, which could be in
+	 * cache, will be included in the measurement of memory read latency.
+	 */
+	void *mem_r;
+#else
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	preempt_disable();
+	local_irq_save(flags);
+	/*
+	 * The wrmsr call may be reordered with the assignment below it.
+	 * Call wrmsr as directly as possible to avoid tracing clobbering
+	 * local register variable used for memory pointer.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	mem_r = plr->kmem;
+	for (i = 0; i < plr->size; i += 32) {
+		start = rdtsc_ordered();
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			     :
+			     : "r" (mem_r), "r" (i)
+			     : "%eax", "memory");
+		end = rdtsc_ordered();
+		trace_pseudo_lock_mem_latency((u32)(end - start));
+	}
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_restore(flags);
+	preempt_enable();
+	thread_done = 1;
+	wake_up_interruptible(&wq);
+	return 0;
+}
+
+static int pseudo_measure_cycles(struct pseudo_lock_region *plr)
+{
+	struct task_struct *thread;
+	unsigned int cpu;
+	int ret;
+
+	cpus_read_lock();
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	if (!plr->locked || plr->deleted) {
+		ret = 0;
+		goto out;
+	}
+
+	thread_done = 0;
+	cpu = cpumask_first(&plr->d->cpu_mask);
+	if (!cpu_online(cpu)) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	thread = kthread_create_on_node(measure_cycles_fn, plr,
+					cpu_to_node(cpu),
+					"pseudo_lock_measure/%u", cpu);
+	if (IS_ERR(thread)) {
+		ret = PTR_ERR(thread);
+		goto out;
+	}
+	kthread_bind(thread, cpu);
+	wake_up_process(thread);
+
+	ret = wait_event_interruptible(wq, thread_done == 1);
+	if (ret < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+	cpus_read_unlock();
+	return ret;
+}
+
+static ssize_t pseudo_measure_trigger(struct file *file,
+				      const char __user *user_buf,
+				      size_t count, loff_t *ppos)
+{
+	struct pseudo_lock_region *plr = file->private_data;
+	size_t buf_size;
+	char buf[32];
+	int srcu_idx;
+	int ret;
+	bool bv;
+
+	buf_size = min(count, (sizeof(buf) - 1));
+	if (copy_from_user(buf, user_buf, buf_size))
+		return -EFAULT;
+
+	buf[buf_size] = '\0';
+	ret = strtobool(buf, &bv);
+	if (ret == 0) {
+		ret = debugfs_use_file_start(file->f_path.dentry, &srcu_idx);
+		if (ret == 0 && bv) {
+			ret = pseudo_measure_cycles(plr);
+			if (ret == 0)
+				ret = count;
+		}
+		debugfs_use_file_finish(srcu_idx);
+	}
+
+	return ret;
+}
+
+static const struct file_operations pseudo_measure_fops = {
+	.write = pseudo_measure_trigger,
+	.open = simple_open,
+	.llseek = default_llseek,
+};
+#endif /* CONFIG_INTEL_RDT_DEBUGFS */
+
 /**
  * pseudo_lock_avail_get - return bitmask of cache available for locking
  * @r: resource to which this cache instance belongs
@@ -858,6 +1013,9 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 {
 	struct pseudo_lock_region *plr;
 	struct kernfs_node *kn;
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	struct dentry *entry;
+#endif
 	int ret = 0;
 
 	mutex_lock(&rdtgroup_mutex);
@@ -889,12 +1047,32 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 	if (ret)
 		goto out_remove;
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	plr->debugfs_dir = debugfs_create_dir(plr->kn->name, debugfs_pseudo);
+	if (IS_ERR(plr->debugfs_dir)) {
+		ret = PTR_ERR(plr->debugfs_dir);
+		plr->debugfs_dir = NULL;
+		goto out_remove;
+	}
+
+	entry = debugfs_create_file("measure_trigger", 0200, plr->debugfs_dir,
+				    plr, &pseudo_measure_fops);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto out_debugfs;
+	}
+#endif
+
 	kref_init(&plr->refcount);
 	kernfs_activate(kn);
 	new_plr = plr;
 	ret = 0;
 	goto out;
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+out_debugfs:
+	debugfs_remove_recursive(plr->debugfs_dir);
+#endif
 out_remove:
 	kernfs_remove(kn);
 out_free:
@@ -990,6 +1168,23 @@ static u64 get_prefetch_disable_bits(void)
 	return 0;
 }
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+static int pseudo_lock_debugfs_create(void)
+{
+	debugfs_resctrl = debugfs_create_dir("resctrl", NULL);
+	if (IS_ERR(debugfs_resctrl))
+		return PTR_ERR(debugfs_resctrl);
+
+	debugfs_pseudo = debugfs_create_dir("pseudo_lock", debugfs_resctrl);
+	if (IS_ERR(debugfs_pseudo)) {
+		debugfs_remove_recursive(debugfs_resctrl);
+		return PTR_ERR(debugfs_pseudo);
+	}
+
+	return 0;
+}
+#endif
+
 /**
  * rdt_pseudo_lock_fs_init - Create and initialize pseudo-locking files
  * @root: location in kernfs where directory and files should be created
@@ -1066,6 +1261,12 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 	if (ret)
 		goto error;
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	ret = pseudo_lock_debugfs_create();
+	if (ret < 0)
+		goto error;
+#endif
+
 	kernfs_activate(pseudo_lock_kn);
 
 	ret = 0;
@@ -1114,6 +1315,9 @@ void rdt_pseudo_lock_fs_remove(void)
 			}
 		}
 	}
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	debugfs_remove_recursive(debugfs_resctrl);
+#endif
 	kernfs_remove(pseudo_lock_kn);
 	pseudo_lock_kn = NULL;
 	mutex_unlock(&rdt_pseudo_lock_mutex);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
new file mode 100644
index 000000000000..cd74d1a0f592
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -0,0 +1,22 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pseudo_lock
+
+#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PSEUDO_LOCK_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(pseudo_lock_mem_latency,
+	    TP_PROTO(u32 latency),
+	    TP_ARGS(latency),
+	    TP_STRUCT__entry(__field(u32, latency)),
+	    TP_fast_assign(__entry->latency = latency),
+	    TP_printk("latency=%u", __entry->latency)
+	   );
+
+#endif /* _TRACE_PSEUDO_LOCK_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE intel_rdt_pseudo_lock_event
+#include <trace/define_trace.h>
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 17/20] x86/intel_rdt: Create character device exposing pseudo-locked region
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (15 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 16/20] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 18/20] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Once a pseudo-locked region has been created it needs to be made
available to user space to provide benefit there.

A character device supporting mmap() is created for each pseudo-locked
region. A user space application can now use mmap() system call to map
pseudo-locked region into its virtual address space.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 267 +++++++++++++++++++++++++++-
 1 file changed, 265 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index e9187d5a70f0..4b562823c0ca 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -26,6 +26,7 @@
 #include <linux/kernfs.h>
 #include <linux/kref.h>
 #include <linux/kthread.h>
+#include <linux/mman.h>
 #include <linux/seq_file.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
@@ -52,6 +53,14 @@
  */
 static u64 prefetch_disable_bits;
 
+/*
+ * Major number assigned to and shared by all devices exposing
+ * pseudo-locked regions.
+ */
+static unsigned int pseudo_lock_major;
+static unsigned long pseudo_lock_minor_avail = GENMASK(MINORBITS, 0);
+static struct class *pseudo_lock_class;
+
 struct kernfs_node *pseudo_lock_kn;
 
 /*
@@ -189,6 +198,15 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
 	plr->d = NULL;
 }
 
+/**
+ * pseudo_lock_minor_release - Return minor number to available
+ * @minor: The minor number being released
+ */
+static void pseudo_lock_minor_release(unsigned int minor)
+{
+	__set_bit(minor, &pseudo_lock_minor_avail);
+}
+
 static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 {
 	bool is_new_plr = (plr == new_plr);
@@ -199,6 +217,9 @@ static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 
 	if (plr->locked) {
 		plr->d->plr = NULL;
+		device_destroy(pseudo_lock_class,
+			       MKDEV(pseudo_lock_major, plr->minor));
+		pseudo_lock_minor_release(plr->minor);
 		/*
 		 * Resource groups come and go. Simply returning this
 		 * pseudo-locked region's bits to the default CLOS may
@@ -764,11 +785,74 @@ static int pseudo_lock_fn(void *_plr)
 	return 0;
 }
 
+/**
+ * pseudo_lock_minor_get - Obtain available minor number
+ * @minor: Pointer to where new minor number will be stored
+ *
+ * A bitmask is used to track available minor numbers. Here the next free
+ * minor number is allocated and returned.
+ *
+ * RETURNS:
+ * Zero on success, error on failure.
+ */
+static int pseudo_lock_minor_get(unsigned int *minor)
+{
+	unsigned long first_bit;
+
+	first_bit = find_first_bit(&pseudo_lock_minor_avail, MINORBITS);
+
+	if (first_bit == MINORBITS)
+		return -ENOSPC;
+
+	__clear_bit(first_bit, &pseudo_lock_minor_avail);
+	*minor = first_bit;
+
+	return 0;
+}
+
+/**
+ * region_find_by_minor - Locate a pseudo-lock region by inode minor number
+ * @minor: The minor number of the device representing pseudo-locked region
+ *
+ * When the character device is accessed we need to determine which
+ * pseudo-locked region it belongs to. This is done by matching the minor
+ * number of the device to the pseudo-locked region it belongs.
+ *
+ * Minor numbers are assigned at the time a pseudo-locked region is associated
+ * with a cache instance.
+ *
+ * LOCKING:
+ * rdt_pseudo_lock_mutex must be held
+ *
+ * RETURNS:
+ * On success returns pointer to pseudo-locked region, NULL on failure.
+ */
+static struct pseudo_lock_region *region_find_by_minor(unsigned int minor)
+{
+	struct pseudo_lock_region *plr_match = NULL;
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+
+	lockdep_assert_held(&rdt_pseudo_lock_mutex);
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list) {
+			if (d->plr && d->plr->minor == minor) {
+				plr_match = d->plr;
+				break;
+			}
+		}
+	}
+	return plr_match;
+}
+
 static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 			    struct rdt_resource *r,
 			    struct rdt_domain *d)
 {
 	struct task_struct *thread;
+	unsigned int new_minor;
+	struct device *dev;
 	int closid;
 	int ret, i;
 
@@ -859,11 +943,45 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 			pseudo_lock_clos_set(plr, i, d->ctrl_val[0]);
 	}
 
+	ret = pseudo_lock_minor_get(&new_minor);
+	if (ret < 0) {
+		rdt_last_cmd_puts("unable to obtain a new minor number\n");
+		goto out_clos_def;
+	}
+
 	plr->locked = true;
 	d->plr = plr;
 	new_plr = NULL;
 
 	/*
+	 * Unlock access but do not release the reference. The
+	 * pseudo-locked region will still be here when we return.
+	 * If anything else attempts to access the region while we do not
+	 * have the mutex the region would be considered locked.
+	 *
+	 * We need to release the mutex temporarily to avoid a potential
+	 * deadlock with the mm->mmap_sem semaphore which is obtained in
+	 * the device_create() callpath below as well as before our mmap()
+	 * callback is called.
+	 */
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+
+	dev = device_create(pseudo_lock_class, NULL,
+			    MKDEV(pseudo_lock_major, new_minor),
+			    plr, "%s", plr->kn->name);
+
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		rdt_last_cmd_printf("failed to created character device: %d\n",
+				    ret);
+		goto out_minor;
+	}
+
+	plr->minor = new_minor;
+
+	/*
 	 * We do not return CBM to CLOS here since that will result in a
 	 * CBM of all zeroes which is an illegal MSR write.
 	 */
@@ -871,6 +989,8 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 	ret = 0;
 	goto out;
 
+out_minor:
+	pseudo_lock_minor_release(new_minor);
 out_clos_def:
 	pseudo_lock_clos_set(plr, 0, d->ctrl_val[0] | plr->cbm);
 out_closid:
@@ -1185,6 +1305,127 @@ static int pseudo_lock_debugfs_create(void)
 }
 #endif
 
+static int pseudo_lock_dev_open(struct inode *inode, struct file *filp)
+{
+	struct pseudo_lock_region *plr;
+
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	plr = region_find_by_minor(iminor(inode));
+	if (!plr) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -ENODEV;
+	}
+
+	filp->private_data = plr;
+	/* Perform a non-seekable open - llseek is not supported */
+	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+
+	return 0;
+}
+
+static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
+{
+	mutex_lock(&rdt_pseudo_lock_mutex);
+	filp->private_data = NULL;
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+	return 0;
+}
+
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
+{
+	/* Not supported */
+	return -EINVAL;
+}
+
+static const struct vm_operations_struct pseudo_mmap_ops = {
+	.mremap = pseudo_lock_dev_mremap,
+};
+
+static int pseudo_lock_dev_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long vsize = vma->vm_end - vma->vm_start;
+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+	struct pseudo_lock_region *plr;
+	unsigned long physical;
+	unsigned long psize;
+
+	mutex_lock(&rdt_pseudo_lock_mutex);
+
+	plr = file->private_data;
+	WARN_ON(!plr);
+	if (!plr) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -ENODEV;
+	}
+
+	/*
+	 * Task is required to run with affinity to the cpus associated
+	 * with the pseudo-locked region. If this is not the case the task
+	 * may be scheduled elsewhere and invalidate entries in the
+	 * pseudo-locked region.
+	 */
+	if (!cpumask_subset(&current->cpus_allowed, &plr->d->cpu_mask)) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -EINVAL;
+	}
+
+	physical = __pa(plr->kmem) >> PAGE_SHIFT;
+	psize = plr->size - off;
+
+	if (off > plr->size) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -ENOSPC;
+	}
+
+	/*
+	 * Ensure changes are carried directly to the memory being mapped,
+	 * do not allow copy-on-write mapping.
+	 */
+	if (!(vma->vm_flags & VM_SHARED)) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -EINVAL;
+	}
+
+	if (vsize > psize) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -ENOSPC;
+	}
+
+	memset(plr->kmem + off, 0, vsize);
+
+	if (remap_pfn_range(vma, vma->vm_start, physical + vma->vm_pgoff,
+			    vsize, vma->vm_page_prot)) {
+		mutex_unlock(&rdt_pseudo_lock_mutex);
+		return -EAGAIN;
+	}
+	vma->vm_ops = &pseudo_mmap_ops;
+	mutex_unlock(&rdt_pseudo_lock_mutex);
+	return 0;
+}
+
+static const struct file_operations pseudo_lock_dev_fops = {
+	.owner =	THIS_MODULE,
+	.llseek =	no_llseek,
+	.read =		NULL,
+	.write =	NULL,
+	.open =		pseudo_lock_dev_open,
+	.release =	pseudo_lock_dev_release,
+	.mmap =		pseudo_lock_dev_mmap,
+};
+
+static char *pseudo_lock_devnode(struct device *dev, umode_t *mode)
+{
+	struct pseudo_lock_region *plr;
+
+	plr = dev_get_drvdata(dev);
+	if (mode)
+		*mode = 0600;
+	return kasprintf(GFP_KERNEL, "pseudo_lock/%s", plr->kn->name);
+}
+
 /**
  * rdt_pseudo_lock_fs_init - Create and initialize pseudo-locking files
  * @root: location in kernfs where directory and files should be created
@@ -1244,10 +1485,26 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 	if (prefetch_disable_bits == 0)
 		return 0;
 
+	ret = register_chrdev(0, "pseudo_lock", &pseudo_lock_dev_fops);
+	if (ret < 0)
+		return ret;
+
+	pseudo_lock_major = ret;
+
+	pseudo_lock_class = class_create(THIS_MODULE, "pseudo_lock");
+	if (IS_ERR(pseudo_lock_class)) {
+		ret = PTR_ERR(pseudo_lock_class);
+		goto out_char;
+	}
+
+	pseudo_lock_class->devnode = pseudo_lock_devnode;
+
 	pseudo_lock_kn = kernfs_create_dir(root, "pseudo_lock",
 					   root->mode, NULL);
-	if (IS_ERR(pseudo_lock_kn))
-		return PTR_ERR(pseudo_lock_kn);
+	if (IS_ERR(pseudo_lock_kn)) {
+		ret = PTR_ERR(pseudo_lock_kn);
+		goto out_class;
+	}
 
 	kn = __kernfs_create_file(pseudo_lock_kn, "avail", 0444,
 				  0, &pseudo_lock_avail_ops,
@@ -1275,6 +1532,10 @@ int rdt_pseudo_lock_fs_init(struct kernfs_node *root)
 error:
 	kernfs_remove(pseudo_lock_kn);
 	pseudo_lock_kn = NULL;
+out_class:
+	class_destroy(pseudo_lock_class);
+out_char:
+	unregister_chrdev(pseudo_lock_major, "pseudo_lock");
 out:
 	return ret;
 }
@@ -1320,5 +1581,7 @@ void rdt_pseudo_lock_fs_remove(void)
 #endif
 	kernfs_remove(pseudo_lock_kn);
 	pseudo_lock_kn = NULL;
+	class_destroy(pseudo_lock_class);
+	unregister_chrdev(pseudo_lock_major, "pseudo_lock");
 	mutex_unlock(&rdt_pseudo_lock_mutex);
 }
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 18/20] x86/intel_rdt: More precise L2 hit/miss measurements
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (16 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 17/20] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 19/20] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Intel Goldmont processors supports non-architectural precise events that
can be used to give us more insight into the success of L2 cache
pseudo-locking on these platforms.

Introduce a new measurement trigger that will enable two precise events,
MEM_LOAD_UOPS_RETIRED.L2_HIT and MEM_LOAD_UOPS_RETIRED.L2_MISS, while
accessing pseudo-locked data. Two new tracepoints, pseudo_lock_l2_hits
and pseudo_lock_l2_miss, are created to make these results visible to
the user.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 141 ++++++++++++++++++++--
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |  16 +++
 2 files changed, 146 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 4b562823c0ca..6c5c310476c3 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -36,6 +36,7 @@
 #include "intel_rdt.h"
 
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
+#include <asm/perf_event.h>
 #define CREATE_TRACE_POINTS
 #include "intel_rdt_pseudo_lock_event.h"
 #endif
@@ -338,7 +339,7 @@ bool cbm_pseudo_locked(unsigned long cbm, struct rdt_domain *d)
 }
 
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
-static int measure_cycles_fn(void *_plr)
+static int measure_cycles_hist_fn(void *_plr)
 {
 	struct pseudo_lock_region *plr = _plr;
 	unsigned long flags;
@@ -387,11 +388,116 @@ static int measure_cycles_fn(void *_plr)
 	return 0;
 }
 
-static int pseudo_measure_cycles(struct pseudo_lock_region *plr)
+static int measure_cycles_perf_fn(void *_plr)
+{
+	struct pseudo_lock_region *plr = _plr;
+	unsigned long long l2_hits, l2_miss;
+	u64 l2_hit_bits, l2_miss_bits;
+	unsigned long flags;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use regular variables
+	 * at the cost of including cache access latency to these variables
+	 * in the measurements.
+	 */
+	unsigned int line_size;
+	unsigned int size;
+	void *mem_r;
+#else
+	register unsigned int line_size asm("esi");
+	register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	/*
+	 * Non-architectural event for the Goldmont Microarchitecture
+	 * from Intel x86 Architecture Software Developer Manual (SDM):
+	 * MEM_LOAD_UOPS_RETIRED D1H (event number)
+	 * Umask values:
+	 *     L1_HIT   01H
+	 *     L2_HIT   02H
+	 *     L1_MISS  08H
+	 *     L2_MISS  10H
+	 */
+
+	/*
+	 * Start by setting flags for IA32_PERFEVTSELx:
+	 *     OS  (Operating system mode)  0x2
+	 *     INT (APIC interrupt enable)  0x10
+	 *     EN  (Enable counter)         0x40
+	 *
+	 * Then add the Umask value and event number to select performance
+	 * event.
+	 */
+
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_ATOM_GOLDMONT:
+	case INTEL_FAM6_ATOM_GEMINI_LAKE:
+		l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1;
+		l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1;
+		break;
+	default:
+		goto out;
+	}
+
+	preempt_disable();
+	local_irq_save(flags);
+	/*
+	 * Call wrmsr direcly to avoid the local register variables from
+	 * being overwritten due to reordering of their assignment with
+	 * the wrmsr calls.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	/* Disable events and reset counters */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
+	/* Set and enable the L2 counters */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits);
+	mem_r = plr->kmem;
+	size = plr->size;
+	line_size = plr->line_size;
+	for (i = 0; i < size; i += line_size) {
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			     :
+			     : "r" (mem_r), "r" (i)
+			     : "%eax", "memory");
+	}
+	/*
+	 * Call wrmsr directly (no tracing) to not influence
+	 * the cache access counters as they are disabled.
+	 */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0,
+			      l2_hit_bits & ~(0x40ULL << 16));
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1,
+			      l2_miss_bits & ~(0x40ULL << 16));
+	l2_hits = native_read_pmc(0);
+	l2_miss = native_read_pmc(1);
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_restore(flags);
+	preempt_enable();
+	trace_pseudo_lock_l2_hits(l2_hits);
+	trace_pseudo_lock_l2_miss(l2_miss);
+
+out:
+	thread_done = 1;
+	wake_up_interruptible(&wq);
+	return 0;
+}
+
+static int pseudo_measure_cycles(struct pseudo_lock_region *plr, int sel)
 {
 	struct task_struct *thread;
 	unsigned int cpu;
-	int ret;
+	int ret = -1;
 
 	cpus_read_lock();
 	mutex_lock(&rdt_pseudo_lock_mutex);
@@ -408,9 +514,19 @@ static int pseudo_measure_cycles(struct pseudo_lock_region *plr)
 		goto out;
 	}
 
-	thread = kthread_create_on_node(measure_cycles_fn, plr,
-					cpu_to_node(cpu),
-					"pseudo_lock_measure/%u", cpu);
+	if (sel == 1)
+		thread = kthread_create_on_node(measure_cycles_hist_fn, plr,
+						cpu_to_node(cpu),
+						"pseudo_lock_measure/%u",
+						cpu);
+	else if (sel == 2)
+		thread = kthread_create_on_node(measure_cycles_perf_fn, plr,
+						cpu_to_node(cpu),
+						"pseudo_lock_measure/%u",
+						cpu);
+	else
+		goto out;
+
 	if (IS_ERR(thread)) {
 		ret = PTR_ERR(thread);
 		goto out;
@@ -439,18 +555,18 @@ static ssize_t pseudo_measure_trigger(struct file *file,
 	char buf[32];
 	int srcu_idx;
 	int ret;
-	bool bv;
+	int sel;
 
 	buf_size = min(count, (sizeof(buf) - 1));
 	if (copy_from_user(buf, user_buf, buf_size))
 		return -EFAULT;
 
 	buf[buf_size] = '\0';
-	ret = strtobool(buf, &bv);
-	if (ret == 0) {
+	ret = kstrtoint(buf, 10, &sel);
+	if (ret == 0 && (sel == 1 || sel == 2)) {
 		ret = debugfs_use_file_start(file->f_path.dentry, &srcu_idx);
-		if (ret == 0 && bv) {
-			ret = pseudo_measure_cycles(plr);
+		if (ret == 0) {
+			ret = pseudo_measure_cycles(plr, sel);
 			if (ret == 0)
 				ret = count;
 		}
@@ -1250,6 +1366,9 @@ int rdt_pseudo_lock_rmdir(struct kernfs_node *kn)
  * hardware prefetch disable bits are included here as they are documented
  * in the SDM.
  *
+ * When adding a platform here also add support for its cache events to
+ * measure_cycles_perf_fn()
+ *
  * RETURNS
  * If platform is supported, the bits to disable hardware prefetchers, 0
  * if platform is not supported.
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
index cd74d1a0f592..d117a0b8451d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -14,6 +14,22 @@ TRACE_EVENT(pseudo_lock_mem_latency,
 	    TP_printk("latency=%u", __entry->latency)
 	   );
 
+TRACE_EVENT(pseudo_lock_l2_hits,
+	    TP_PROTO(u64 l2_hits),
+	    TP_ARGS(l2_hits),
+	    TP_STRUCT__entry(__field(u64, l2_hits)),
+	    TP_fast_assign(__entry->l2_hits = l2_hits),
+	    TP_printk("L2 hits=%llu", __entry->l2_hits)
+	   );
+
+TRACE_EVENT(pseudo_lock_l2_miss,
+	    TP_PROTO(u64 l2_miss),
+	    TP_ARGS(l2_miss),
+	    TP_STRUCT__entry(__field(u64, l2_miss)),
+	    TP_fast_assign(__entry->l2_miss = l2_miss),
+	    TP_printk("L2 miss=%llu", __entry->l2_miss)
+	   );
+
 #endif /* _TRACE_PSEUDO_LOCK_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 19/20] x86/intel_rdt: Support L3 cache performance event of Broadwell
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (17 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 18/20] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-13 16:39 ` [RFC PATCH 20/20] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre
  2017-11-18  0:48 ` [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Thomas Gleixner
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Broadwell microarchitecture supports pseudo-locking. Add support for
the L3 cache related performance events of these systems so that we can
measure the success of pseudo-locking.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 57 +++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h | 16 +++++++
 2 files changed, 73 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 6c5c310476c3..2eb309374ec9 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -390,6 +390,8 @@ static int measure_cycles_hist_fn(void *_plr)
 
 static int measure_cycles_perf_fn(void *_plr)
 {
+	unsigned long long l3_hits = 0, l3_miss = 0;
+	u64 l3_hit_bits = 0, l3_miss_bits = 0;
 	struct pseudo_lock_region *plr = _plr;
 	unsigned long long l2_hits, l2_miss;
 	u64 l2_hit_bits, l2_miss_bits;
@@ -424,6 +426,16 @@ static int measure_cycles_perf_fn(void *_plr)
 	 *     L2_HIT   02H
 	 *     L1_MISS  08H
 	 *     L2_MISS  10H
+	 *
+	 * On Broadwell Microarchitecture the MEM_LOAD_UOPS_RETIRED event
+	 * has two "no fix" errata associated with it: BDM35 and BDM100. On
+	 * this platform we use the following events instead:
+	 *  L2_RQSTS 24H (Documented in https://download.01.org/perfmon/BDW/)
+	 *       REFERENCES FFH
+	 *       MISS       3FH
+	 *  LONGEST_LAT_CACHE 2EH (Documented in SDM)
+	 *       REFERENCE 4FH
+	 *       MISS      41H
 	 */
 
 	/*
@@ -442,6 +454,14 @@ static int measure_cycles_perf_fn(void *_plr)
 		l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1;
 		l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1;
 		break;
+	case INTEL_FAM6_BROADWELL_X:
+		/* On BDW the l2_hit_bits count references, not hits */
+		l2_hit_bits = (0x52ULL << 16) | (0xff << 8) | 0x24;
+		l2_miss_bits = (0x52ULL << 16) | (0x3f << 8) | 0x24;
+		/* On BDW the l3_hit_bits count references, not hits */
+		l3_hit_bits = (0x52ULL << 16) | (0x4f << 8) | 0x2e;
+		l3_miss_bits = (0x52ULL << 16) | (0x41 << 8) | 0x2e;
+		break;
 	default:
 		goto out;
 	}
@@ -459,9 +479,21 @@ static int measure_cycles_perf_fn(void *_plr)
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 2, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 3, 0x0);
+	}
 	/* Set and enable the L2 counters */
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits);
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+				      l3_hit_bits);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+				      l3_miss_bits);
+	}
 	mem_r = plr->kmem;
 	size = plr->size;
 	line_size = plr->line_size;
@@ -479,13 +511,38 @@ static int measure_cycles_perf_fn(void *_plr)
 			      l2_hit_bits & ~(0x40ULL << 16));
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1,
 			      l2_miss_bits & ~(0x40ULL << 16));
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+				      l3_hit_bits & ~(0x40ULL << 16));
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+				      l3_miss_bits & ~(0x40ULL << 16));
+	}
 	l2_hits = native_read_pmc(0);
 	l2_miss = native_read_pmc(1);
+	if (l3_hit_bits > 0) {
+		l3_hits = native_read_pmc(2);
+		l3_miss = native_read_pmc(3);
+	}
 	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
 	local_irq_restore(flags);
 	preempt_enable();
+	/*
+	 * On BDW we count references and misses, need to adjust. Sometimes
+	 * the "hits" counter is a bit more than the references, for
+	 * example, x references but x + 1 hits. To not report invalid
+	 * hit values in this case we treat that as misses eaqual to
+	 * references.
+	 */
+	if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+		l2_hits -= (l2_miss > l2_hits ? l2_hits : l2_miss);
 	trace_pseudo_lock_l2_hits(l2_hits);
 	trace_pseudo_lock_l2_miss(l2_miss);
+	if (l3_hit_bits > 0) {
+		if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+			l3_hits -= (l3_miss > l3_hits ? l3_hits : l3_miss);
+		trace_pseudo_lock_l3_hits(l3_hits);
+		trace_pseudo_lock_l3_miss(l3_miss);
+	}
 
 out:
 	thread_done = 1;
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
index d117a0b8451d..f2ee0fb36b64 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -30,6 +30,22 @@ TRACE_EVENT(pseudo_lock_l2_miss,
 	    TP_printk("L2 miss=%llu", __entry->l2_miss)
 	   );
 
+TRACE_EVENT(pseudo_lock_l3_hits,
+	    TP_PROTO(u64 l3_hits),
+	    TP_ARGS(l3_hits),
+	    TP_STRUCT__entry(__field(u64, l3_hits)),
+	    TP_fast_assign(__entry->l3_hits = l3_hits),
+	    TP_printk("L3 hits=%llu", __entry->l3_hits)
+	   );
+
+TRACE_EVENT(pseudo_lock_l3_miss,
+	    TP_PROTO(u64 l3_miss),
+	    TP_ARGS(l3_miss),
+	    TP_STRUCT__entry(__field(u64, l3_miss)),
+	    TP_fast_assign(__entry->l3_miss = l3_miss),
+	    TP_printk("L3 miss=%llu", __entry->l3_miss)
+	   );
+
 #endif /* _TRACE_PSEUDO_LOCK_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 20/20] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (18 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 19/20] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
@ 2017-11-13 16:39 ` Reinette Chatre
  2017-11-18  0:48 ` [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Thomas Gleixner
  20 siblings, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2017-11-13 16:39 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck
  Cc: vikas.shivappa, dave.hansen, mingo, hpa, x86, linux-kernel,
	Reinette Chatre

Deeper C-states impact cache content through shrinking of the cache or
flushing entire cache to memory before reducing power to the cache.
Deeper C-states will thus negatively impact the pseudo-locked regions.

To avoid impacting pseudo-locked regions we limit C-states on
pseudo-locked region creation so that cores associated with the
pseudo-locked region are prevented from entering deeper C-states.
This is accomplished by requesting a CPU latency target which will
prevent the core from entering C6 across all supported platforms.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---

This patch depends on device resume latency PM QoS API fixes that was
recently submitted:
http://lkml.kernel.org/r/3615978.1OBBRZE5lx@aspire.rjw.lan
http://lkml.kernel.org/r/5398562.D1bMImPBRK@aspire.rjw.lan

 Documentation/x86/intel_rdt_ui.txt          |  4 +-
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 89 ++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index 9924f7146c63..df6ba21c38bb 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -340,8 +340,8 @@ in the cache via carefully configuring the CAT feature and controlling
 application behavior. There is no guarantee that data is placed in
 cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
 “locked” data from cache. Power management C-states may shrink or
-power off cache. It is thus recommended to limit the processor maximum
-C-state, for example, by setting the processor.max_cstate kernel parameter.
+power off cache. Deeper C-states will automatically be restricted on
+pseudo-locked region creation.
 
 It is required that an application using a pseudo-locked region runs
 with affinity to the cores (or a subset of the cores) associated
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 2eb309374ec9..c33637677ce7 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -27,6 +27,7 @@
 #include <linux/kref.h>
 #include <linux/kthread.h>
 #include <linux/mman.h>
+#include <linux/pm_qos.h>
 #include <linux/seq_file.h>
 #include <linux/stat.h>
 #include <linux/slab.h>
@@ -120,6 +121,7 @@ static struct dentry *debugfs_pseudo;
  * @kmem:		the kernel memory associated with pseudo-locked region
  * @debugfs_dir:	pointer to this region's directory in the debugfs
  *			filesystem
+ * @pm_reqs:		Power management QoS requests related to this region
  */
 struct pseudo_lock_region {
 	struct kernfs_node	*kn;
@@ -138,6 +140,17 @@ struct pseudo_lock_region {
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	struct dentry		*debugfs_dir;
 #endif
+	struct list_head	pm_reqs;
+};
+
+/**
+ * pseudo_lock_pm_req - A power management QoS request list entry
+ * @list:	Entry within the @pm_reqs list for a pseudo-locked region
+ * @req:	PM QoS request
+ */
+struct pseudo_lock_pm_req {
+	struct list_head list;
+	struct dev_pm_qos_request req;
 };
 
 /*
@@ -208,6 +221,66 @@ static void pseudo_lock_minor_release(unsigned int minor)
 	__set_bit(minor, &pseudo_lock_minor_avail);
 }
 
+static void pseudo_lock_cstates_relax(struct pseudo_lock_region *plr)
+{
+	struct pseudo_lock_pm_req *pm_req, *next;
+
+	list_for_each_entry_safe(pm_req, next, &plr->pm_reqs, list) {
+		dev_pm_qos_remove_request(&pm_req->req);
+		list_del(&pm_req->list);
+		kfree(pm_req);
+	}
+}
+
+/**
+ * pseudo_lock_cstates_constrain - Restrict cores from entering C6
+ *
+ * To prevent the cache from being affected by power management we have to
+ * avoid entering C6. We accomplish this by requesting a latency
+ * requirement lower than lowest C6 exit latency of all supported
+ * platforms as found in the cpuidle state tables in the intel_idle driver.
+ * At this time it is possible to do so with a single latency requirement
+ * for all supported platforms.
+ *
+ * Since we do support Goldmont, which is affected by X86_BUG_MONITOR, we
+ * need to consider the ACPI latencies while keeping in mind that C2 may be
+ * set to map to deeper sleep states. In this case the latency requirement
+ * needs to prevent entering C2 also.
+ */
+static int pseudo_lock_cstates_constrain(struct pseudo_lock_region *plr)
+{
+	struct pseudo_lock_pm_req *pm_req;
+	int cpu;
+	int ret;
+
+	for_each_cpu(cpu, &plr->d->cpu_mask) {
+		pm_req = kzalloc(sizeof(*pm_req), GFP_KERNEL);
+		if (!pm_req) {
+			rdt_last_cmd_puts("fail allocating mem for PM QoS\n");
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		ret = dev_pm_qos_add_request(get_cpu_device(cpu),
+					     &pm_req->req,
+					     DEV_PM_QOS_RESUME_LATENCY,
+					     30);
+		if (ret < 0) {
+			rdt_last_cmd_printf("fail to add latency req cpu%d\n",
+					    cpu);
+			kfree(pm_req);
+			ret = -1;
+			goto out_err;
+		}
+		list_add(&pm_req->list, &plr->pm_reqs);
+	}
+
+	return 0;
+
+out_err:
+	pseudo_lock_cstates_relax(plr);
+	return ret;
+}
+
 static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 {
 	bool is_new_plr = (plr == new_plr);
@@ -218,6 +291,7 @@ static void __pseudo_lock_region_release(struct pseudo_lock_region *plr)
 
 	if (plr->locked) {
 		plr->d->plr = NULL;
+		pseudo_lock_cstates_relax(plr);
 		device_destroy(pseudo_lock_class,
 			       MKDEV(pseudo_lock_major, plr->minor));
 		pseudo_lock_minor_release(plr->minor);
@@ -1078,6 +1152,12 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 		goto out_clos_def;
 	}
 
+	ret = pseudo_lock_cstates_constrain(plr);
+	if (ret < 0) {
+		ret = -EINVAL;
+		goto out_clos_def;
+	}
+
 	plr->closid = closid;
 
 	thread_done = 0;
@@ -1093,7 +1173,7 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 		 * error path since that will result in a CBM of all
 		 * zeroes which is an illegal MSR write.
 		 */
-		goto out_clos_def;
+		goto out_cstates;
 	}
 
 	kthread_bind(thread, plr->cpu);
@@ -1102,7 +1182,7 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 	ret = wait_event_interruptible(wq, thread_done == 1);
 	if (ret < 0) {
 		rdt_last_cmd_puts("locking thread interrupted\n");
-		goto out_clos_def;
+		goto out_cstates;
 	}
 
 	/*
@@ -1119,7 +1199,7 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 	ret = pseudo_lock_minor_get(&new_minor);
 	if (ret < 0) {
 		rdt_last_cmd_puts("unable to obtain a new minor number\n");
-		goto out_clos_def;
+		goto out_cstates;
 	}
 
 	plr->locked = true;
@@ -1164,6 +1244,8 @@ static int pseudo_lock_doit(struct pseudo_lock_region *plr,
 
 out_minor:
 	pseudo_lock_minor_release(new_minor);
+out_cstates:
+	pseudo_lock_cstates_relax(plr);
 out_clos_def:
 	pseudo_lock_clos_set(plr, 0, d->ctrl_val[0] | plr->cbm);
 out_closid:
@@ -1356,6 +1438,7 @@ int rdt_pseudo_lock_mkdir(const char *name, umode_t mode)
 	}
 #endif
 
+	INIT_LIST_HEAD(&plr->pm_reqs);
 	kref_init(&plr->refcount);
 	kernfs_activate(kn);
 	new_plr = plr;
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (19 preceding siblings ...)
  2017-11-13 16:39 ` [RFC PATCH 20/20] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre
@ 2017-11-18  0:48 ` Thomas Gleixner
  2017-11-18  6:42   ` Reinette Chatre
  20 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2017-11-18  0:48 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, tony.luck, vikas.shivappa, dave.hansen, mingo, hpa,
	x86, linux-kernel

Reinette,

On Mon, 13 Nov 2017, Reinette Chatre wrote:

thanks for that interesting work. Before I start looking into the details
in the next days let me ask a few general questions first.

> Cache Allocation Technology (CAT), part of Intel(R) Resource Director
> Technology (Intel(R) RDT), enables a user to specify the amount of cache
> space into which an application can fill. Cache pseudo-locking builds on
> the fact that a CPU can still read and write data pre-allocated outside
> its current allocated area on cache hit. With cache pseudo-locking data
> can be preloaded into a reserved portion of cache that no application can
> fill, and from that point on will only serve cache hits. The cache
> pseudo-locked memory is made accessible to user space where an application
> can map it into its virtual address space and thus have a region of
> memory with reduced average read latency.

Did you compare that against the good old cache coloring mechanism,
e.g. palloc ?

> The cache pseudo-locking approach relies on generation-specific behavior
> of processors. It may provide benefits on certain processor generations,
> but is not guaranteed to be supported in the future.

Hmm, are you saying that the CAT mechanism might change radically in the
future so that access to cached data in an allocated area which does not
belong to the current executing context wont work anymore?

> It is not a guarantee that data will remain in the cache. It is not a
> guarantee that data will remain in certain levels or certain regions of
> the cache. Rather, cache pseudo-locking increases the probability that
> data will remain in a certain level of the cache via carefully
> configuring the CAT feature and carefully controlling application
> behavior.

Which kind of applications are you targeting with that?

Are there real world use cases which actually can benefit from this and
what are those applications supposed to do once the feature breaks with
future generations of processors?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2017-11-18  0:48 ` [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Thomas Gleixner
@ 2017-11-18  6:42   ` Reinette Chatre
  2018-01-14 22:54     ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Reinette Chatre @ 2017-11-18  6:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: fenghua.yu, tony.luck, vikas.shivappa, dave.hansen, mingo, hpa,
	x86, linux-kernel

Hi Thomas,

On 11/17/2017 4:48 PM, Thomas Gleixner wrote:
> On Mon, 13 Nov 2017, Reinette Chatre wrote:
> 
> thanks for that interesting work. Before I start looking into the details
> in the next days let me ask a few general questions first.

Thank you very much for taking a look. I look forward to your feedback.

> 
>> Cache Allocation Technology (CAT), part of Intel(R) Resource Director
>> Technology (Intel(R) RDT), enables a user to specify the amount of cache
>> space into which an application can fill. Cache pseudo-locking builds on
>> the fact that a CPU can still read and write data pre-allocated outside
>> its current allocated area on cache hit. With cache pseudo-locking data
>> can be preloaded into a reserved portion of cache that no application can
>> fill, and from that point on will only serve cache hits. The cache
>> pseudo-locked memory is made accessible to user space where an application
>> can map it into its virtual address space and thus have a region of
>> memory with reduced average read latency.
> 
> Did you compare that against the good old cache coloring mechanism,
> e.g. palloc ?

I understand where your question originates. I have not compared against PALLOC for two reasons:
1) PALLOC is not upstream and while inquiring about the status of this work (please see https://github.com/heechul/palloc/issues/4 for details) we learned that one reason for this is that recent Intel processors are not well supported.
2) The most recent kernel supported by PALLOC is v4.4 and also mentioned in the above link there is currently no plan to upstream this work for a less divergent comparison of PALLOC and the more recent RDT/CAT enabling on which Cache Pseudo-Locking is built.

>> The cache pseudo-locking approach relies on generation-specific behavior
>> of processors. It may provide benefits on certain processor generations,
>> but is not guaranteed to be supported in the future.
> 
> Hmm, are you saying that the CAT mechanism might change radically in the
> future so that access to cached data in an allocated area which does not
> belong to the current executing context wont work anymore?

Most devices that publicly support CAT in the Linux mainline can take advantage of Cache Pseudo-Locking.  However, Cache Pseudo-Locking is a model-specific feature so there may be some variation in if, or to what extent, current and future devices can support Cache Pseudo-Locking. CAT remains architectural.

>> It is not a guarantee that data will remain in the cache. It is not a
>> guarantee that data will remain in certain levels or certain regions of
>> the cache. Rather, cache pseudo-locking increases the probability that
>> data will remain in a certain level of the cache via carefully
>> configuring the CAT feature and carefully controlling application
>> behavior.
> 
> Which kind of applications are you targeting with that?
>
> Are there real world use cases which actually can benefit from this and

To ensure I answer your question I will consider two views. First, the "carefully controlling application behavior" referred to above refers to applications/OS/VMs running after the pseudo-locked regions have been set up. These applications should take care to not do anything, for example call wbinvd, that would affect the Cache Pseudo-Locked regions. Second, what you are also asking about is the applications using these Cache Pseudo-Locked regions. We do see a clear performance benefit to applications using these pseudo-locked regions. Latency sensitive applications could relocate their code as well as data to pseudo-locked regions for improved performance.

> what are those applications supposed to do once the feature breaks with
> future generations of processors?

This feature is model specific with a few platforms supporting it at this time. Only platforms known to support Cache Pseudo-Locking will expose its resctrl interface.

Reinette

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2017-11-18  6:42   ` Reinette Chatre
@ 2018-01-14 22:54     ` Thomas Gleixner
  2018-01-15 16:23       ` Hindman, Gavin
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-01-14 22:54 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, tony.luck, vikas.shivappa, dave.hansen, mingo, hpa,
	x86, linux-kernel

On Fri, 17 Nov 2017, Reinette Chatre wrote:

Sorry for the delay. You know why :)

> On 11/17/2017 4:48 PM, Thomas Gleixner wrote:
> > On Mon, 13 Nov 2017, Reinette Chatre wrote:
> > Did you compare that against the good old cache coloring mechanism,
> > e.g. palloc ?
> 

> I understand where your question originates. I have not compared against
> PALLOC for two reasons:
>
> 1) PALLOC is not upstream and while inquiring about the status of this
> work (please see https://github.com/heechul/palloc/issues/4 for details)
> we learned that one reason for this is that recent Intel processors are
> not well supported.

So if I understand Heechul correctly then recent CPUs cannot be supported
easily due to changes in the memory controllers and the cache. I assume the
latter is related to CAT.

> 2) The most recent kernel supported by PALLOC is v4.4 and also mentioned
> in the above link there is currently no plan to upstream this work for a
> less divergent comparison of PALLOC and the more recent RDT/CAT enabling
> on which Cache Pseudo-Locking is built.

Well, that's not a really good excuse for not trying. You at Intel should
be able to get to the parameters easy enough :)

> >> The cache pseudo-locking approach relies on generation-specific behavior
> >> of processors. It may provide benefits on certain processor generations,
> >> but is not guaranteed to be supported in the future.
> > 
> > Hmm, are you saying that the CAT mechanism might change radically in the
> > future so that access to cached data in an allocated area which does not
> > belong to the current executing context wont work anymore?
> 
> Most devices that publicly support CAT in the Linux mainline can take
> advantage of Cache Pseudo-Locking.  However, Cache Pseudo-Locking is a
> model-specific feature so there may be some variation in if, or to what
> extent, current and future devices can support Cache Pseudo-Locking. CAT
> remains architectural.

Sure, but that does NOT answer my question at all.

> >> It is not a guarantee that data will remain in the cache. It is not a
> >> guarantee that data will remain in certain levels or certain regions of
> >> the cache. Rather, cache pseudo-locking increases the probability that
> >> data will remain in a certain level of the cache via carefully
> >> configuring the CAT feature and carefully controlling application
> >> behavior.
> > 
> > Which kind of applications are you targeting with that?
> >
> > Are there real world use cases which actually can benefit from this and
> 
> To ensure I answer your question I will consider two views. First, the
>"carefully controlling application behavior" referred to above refers to
> applications/OS/VMs running after the pseudo-locked regions have been set
> up. These applications should take care to not do anything, for example
> call wbinvd, that would affect the Cache Pseudo-Locked regions. Second,
> what you are also asking about is the applications using these Cache
> Pseudo-Locked regions. We do see a clear performance benefit to
> applications using these pseudo-locked regions. Latency sensitive
> applications could relocate their code as well as data to pseudo-locked
> regions for improved performance.

This is again a marketing pitch and not answering my question about real
world use cases.

> > what are those applications supposed to do once the feature breaks with
> > future generations of processors?
> 
> This feature is model specific with a few platforms supporting it at this
> time. Only platforms known to support Cache Pseudo-Locking will expose
> its resctrl interface.

And you deliberately avoided to answer my question again.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2018-01-14 22:54     ` Thomas Gleixner
@ 2018-01-15 16:23       ` Hindman, Gavin
  2018-01-16 11:38         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Hindman, Gavin @ 2018-01-15 16:23 UTC (permalink / raw)
  To: Thomas Gleixner, Chatre, Reinette
  Cc: Yu, Fenghua, Luck, Tony, vikas.shivappa, Hansen, Dave, mingo,
	hpa, x86, linux-kernel

Thanks for the feedback, Thomas.

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Thomas Gleixner
> Sent: Sunday, January 14, 2018 2:54 PM
> To: Chatre, Reinette <reinette.chatre@intel.com>
> Cc: Yu, Fenghua <fenghua.yu@intel.com>; Luck, Tony
> <tony.luck@intel.com>; vikas.shivappa@linux.intel.com; Hansen, Dave
> <dave.hansen@intel.com>; mingo@redhat.com; hpa@zytor.com;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache
> Pseudo-Locking enabling
> 
> On Fri, 17 Nov 2017, Reinette Chatre wrote:
> 
> Sorry for the delay. You know why :)
> 
> > On 11/17/2017 4:48 PM, Thomas Gleixner wrote:
> > > On Mon, 13 Nov 2017, Reinette Chatre wrote:
> > > Did you compare that against the good old cache coloring mechanism,
> > > e.g. palloc ?
> >
> 
> > I understand where your question originates. I have not compared
> > against PALLOC for two reasons:
> >
> > 1) PALLOC is not upstream and while inquiring about the status of this
> > work (please see https://github.com/heechul/palloc/issues/4 for
> > details) we learned that one reason for this is that recent Intel
> > processors are not well supported.
> 
> So if I understand Heechul correctly then recent CPUs cannot be supported
> easily due to changes in the memory controllers and the cache. I assume the
> latter is related to CAT.
> 
> > 2) The most recent kernel supported by PALLOC is v4.4 and also
> > mentioned in the above link there is currently no plan to upstream
> > this work for a less divergent comparison of PALLOC and the more
> > recent RDT/CAT enabling on which Cache Pseudo-Locking is built.
> 
> Well, that's not a really good excuse for not trying. You at Intel should be able
> to get to the parameters easy enough :)
> 
We can run the comparison, but I'm not sure that I understand the intent - my understanding of Palloc is that it's intended to allow allocation of memory to specific physical memory banks.  While that might result in reduced cache-misses since processes are more separated, it's not explicitly intended to reduce cache-misses, and Palloc's benefits would only hold as long as you have few enough processes to be able to dedicate/isolate memory accordingly.  Am I misunderstanding the intent/usage of palloc?

> > >> The cache pseudo-locking approach relies on generation-specific
> > >> behavior of processors. It may provide benefits on certain
> > >> processor generations, but is not guaranteed to be supported in the
> future.
> > >
> > > Hmm, are you saying that the CAT mechanism might change radically in
> > > the future so that access to cached data in an allocated area which
> > > does not belong to the current executing context wont work anymore?
> >

No, I don't see any scenario in which devices that currently support pseudo-locking would stop working, but until support is architectural support in a current generation of a product line doesn't imply support in a future generation.  Certainly we'll make every effort to carry support forward, and would adjust to any changes in CAT support, but we can't account for unforeseen future architectural changes that might block pseudo-locking use-cases on top of CAT.

> > Most devices that publicly support CAT in the Linux mainline can take
> > advantage of Cache Pseudo-Locking.  However, Cache Pseudo-Locking is a
> > model-specific feature so there may be some variation in if, or to
> > what extent, current and future devices can support Cache
> > Pseudo-Locking. CAT remains architectural.
> 
> Sure, but that does NOT answer my question at all.
> 
> > >> It is not a guarantee that data will remain in the cache. It is not
> > >> a guarantee that data will remain in certain levels or certain
> > >> regions of the cache. Rather, cache pseudo-locking increases the
> > >> probability that data will remain in a certain level of the cache
> > >> via carefully configuring the CAT feature and carefully controlling
> > >> application behavior.
> > >
> > > Which kind of applications are you targeting with that?
> > >
> > > Are there real world use cases which actually can benefit from this
> > > and
> >
> > To ensure I answer your question I will consider two views. First, the
> >"carefully controlling application behavior" referred to above refers
> >to  applications/OS/VMs running after the pseudo-locked regions have
> >been set  up. These applications should take care to not do anything,
> >for example  call wbinvd, that would affect the Cache Pseudo-Locked
> >regions. Second,  what you are also asking about is the applications
> >using these Cache  Pseudo-Locked regions. We do see a clear performance
> >benefit to  applications using these pseudo-locked regions. Latency
> >sensitive  applications could relocate their code as well as data to
> >pseudo-locked  regions for improved performance.
> 
> This is again a marketing pitch and not answering my question about real
> world use cases.
> 
There are a number of real-world use-cases that are already making use of hacked-up ad-hoc versions of pseudo-locking - this corner case has been available in hardware for some time - and this patch-set is intended to bring it more into the mainstream and more supportable.  Primary usages right now are industrial PLCs/automation and high-frequency trading/financial enterprise systems, but anything with relatively small repeating data structures should see benefit.

> > > what are those applications supposed to do once the feature breaks
> > > with future generations of processors?
> >
> > This feature is model specific with a few platforms supporting it at
> > this time. Only platforms known to support Cache Pseudo-Locking will
> > expose its resctrl interface.
> 
> And you deliberately avoided to answer my question again.
> 
Reinette's not trying to avoid the questions, we just don't necessarily have definitive answers at this time.  Currently pseudo-locking requires manual setup on the part of the integrator, so there will not be any invisible breakage when trying to port software expecting pseudo-locking to new devices, and we'll certainly do everything we can to minimize user-space/configuration impact on migration if things change going forward, but these are unknowns.  We are in a bit of chicken/egg where people aren't broadly using it because it's not architectural, and it's not architectural because people aren't broadly using it.  We could publicly carry the patches out of mainline, but our intent for pushing the patches to mainline are to a) increase exposure/usage b) reduce divergence across people already using hacked versions, and c) ease the overhead in keep patches in sync with the larger CAT infrastructure as it evolves - we are clear on the potential support burden being incurred by submitting a non-architectural feature, and there's certainly no intent to dump a science-experiment into mainline. 

> Thanks,
> 
> 	tglx

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2018-01-15 16:23       ` Hindman, Gavin
@ 2018-01-16 11:38         ` Thomas Gleixner
  2018-01-17  0:53           ` Reinette Chatre
  2018-02-12 19:07           ` Reinette Chatre
  0 siblings, 2 replies; 29+ messages in thread
From: Thomas Gleixner @ 2018-01-16 11:38 UTC (permalink / raw)
  To: Hindman, Gavin
  Cc: Chatre, Reinette, Yu, Fenghua, Luck, Tony, vikas.shivappa,
	Hansen, Dave, mingo, hpa, x86, linux-kernel

On Mon, 15 Jan 2018, Hindman, Gavin wrote:
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Thomas Gleixner
> > On Fri, 17 Nov 2017, Reinette Chatre wrote:
> > >
> > > 1) PALLOC is not upstream and while inquiring about the status of this
> > > work (please see https://github.com/heechul/palloc/issues/4 for
> > > details) we learned that one reason for this is that recent Intel
> > > processors are not well supported.
> > 
> > So if I understand Heechul correctly then recent CPUs cannot be supported
> > easily due to changes in the memory controllers and the cache. I assume the
> > latter is related to CAT.

Is that assumption correct?

> > > 2) The most recent kernel supported by PALLOC is v4.4 and also
> > > mentioned in the above link there is currently no plan to upstream
> > > this work for a less divergent comparison of PALLOC and the more
> > > recent RDT/CAT enabling on which Cache Pseudo-Locking is built.
> > 
> > Well, that's not a really good excuse for not trying. You at Intel should be able
> > to get to the parameters easy enough :)
> >
> We can run the comparison, but I'm not sure that I understand the intent
> - my understanding of Palloc is that it's intended to allow allocation of
> memory to specific physical memory banks.  While that might result in
> reduced cache-misses since processes are more separated, it's not
> explicitly intended to reduce cache-misses, and Palloc's benefits would
> only hold as long as you have few enough processes to be able to
> dedicate/isolate memory accordingly.  Am I misunderstanding the
> intent/usage of palloc?

Right. It comes with its own set of restrictions as does the pseudo-locking.

> > > >> The cache pseudo-locking approach relies on generation-specific
> > > >> behavior of processors. It may provide benefits on certain
> > > >> processor generations, but is not guaranteed to be supported in the
> > future.
> > > >
> > > > Hmm, are you saying that the CAT mechanism might change radically in
> > > > the future so that access to cached data in an allocated area which
> > > > does not belong to the current executing context wont work anymore?
> > >
> 
> No, I don't see any scenario in which devices that currently support
> pseudo-locking would stop working, but until support is architectural
> support in a current generation of a product line doesn't imply support
> in a future generation.  Certainly we'll make every effort to carry
> support forward, and would adjust to any changes in CAT support, but we
> can't account for unforeseen future architectural changes that might
> block pseudo-locking use-cases on top of CAT.

So and that's the real problem. We add something which gives us some form
of isolation, but we don't know whether next generation CPUs will
work. From a maintainability and usefulnes POV that's not a really great
prospect.

> > This is again a marketing pitch and not answering my question about real
> > world use cases.
> > 
> There are a number of real-world use-cases that are already making use of
> hacked-up ad-hoc versions of pseudo-locking - this corner case has been
> available in hardware for some time - and this patch-set is intended to
> bring it more into the mainstream and more supportable.  Primary usages
> right now are industrial PLCs/automation and high-frequency
> trading/financial enterprise systems, but anything with relatively small
> repeating data structures should see benefit.

Ok,

> > > > what are those applications supposed to do once the feature breaks
> > > > with future generations of processors?
> > >
> > > This feature is model specific with a few platforms supporting it at
> > > this time. Only platforms known to support Cache Pseudo-Locking will
> > > expose its resctrl interface.
> > 
> > And you deliberately avoided to answer my question again.
> >
> Reinette's not trying to avoid the questions, we just don't necessarily
> have definitive answers at this time.  Currently pseudo-locking requires
> manual setup on the part of the integrator, so there will not be any
> invisible breakage when trying to port software expecting pseudo-locking
> to new devices, and we'll certainly do everything we can to minimize
> user-space/configuration impact on migration if things change going
> forward, but these are unknowns.  We are in a bit of chicken/egg where
> people aren't broadly using it because it's not architectural, and it's
> not architectural because people aren't broadly using it.  We could
> publicly carry the patches out of mainline, but our intent for pushing
> the patches to mainline are to a) increase exposure/usage b) reduce
> divergence across people already using hacked versions, and c) ease the
> overhead in keep patches in sync with the larger CAT infrastructure as it
> evolves - we are clear on the potential support burden being incurred by
> submitting a non-architectural feature, and there's certainly no intent
> to dump a science-experiment into mainline.

Ok. So what you are saying is that 'official' support should broaden the
user base which in turn might push it into the architectural realm.

I'll go through the patch set with this in mind.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2018-01-16 11:38         ` Thomas Gleixner
@ 2018-01-17  0:53           ` Reinette Chatre
  2018-02-12 19:07           ` Reinette Chatre
  1 sibling, 0 replies; 29+ messages in thread
From: Reinette Chatre @ 2018-01-17  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Hindman, Gavin
  Cc: Yu, Fenghua, Luck, Tony, vikas.shivappa, Hansen, Dave, mingo,
	hpa, x86, linux-kernel

Hi Thomas,

On 1/16/2018 3:38 AM, Thomas Gleixner wrote:
> On Mon, 15 Jan 2018, Hindman, Gavin wrote:
>>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>>> owner@vger.kernel.org] On Behalf Of Thomas Gleixner
>>> On Fri, 17 Nov 2017, Reinette Chatre wrote:
>>>>
>>>> 1) PALLOC is not upstream and while inquiring about the status of this
>>>> work (please see https://github.com/heechul/palloc/issues/4 for
>>>> details) we learned that one reason for this is that recent Intel
>>>> processors are not well supported.
>>>
>>> So if I understand Heechul correctly then recent CPUs cannot be supported
>>> easily due to changes in the memory controllers and the cache. I assume the
>>> latter is related to CAT.
> 
> Is that assumption correct?

>From what I understand to be able to allocate memory from a specific
DRAM bank or cache set PALLOC requires knowing exactly which DRAM bank
or cache set a physical address maps to. The PALLOC implementation
relies on user space code that times a variety of memory accesses to
guess which bits determine DRAM bank or cache set placement. These bits
are then provided to the kernel implementation as the page coloring input.

The comments at https://github.com/heechul/palloc/issues/4 point out
that it is this user space guessing of physical address to specific DRAM
bank and cache set mapping that is harder in recent Intel processors.
This is not related to CAT. CAT could be used to limit the number of
ways to which the contents of a physical address can be allocated, CAT
does not modify the set to which the physical address maps.

Without possibility of using PALLOC I do not currently know how to
answer your request for a comparison with a cache coloring mechanism. I
will surely ask around and do more research.

Reinette

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2018-01-16 11:38         ` Thomas Gleixner
  2018-01-17  0:53           ` Reinette Chatre
@ 2018-02-12 19:07           ` Reinette Chatre
  2018-02-13 10:27             ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Reinette Chatre @ 2018-02-12 19:07 UTC (permalink / raw)
  To: Thomas Gleixner, Hindman, Gavin
  Cc: Yu, Fenghua, Luck, Tony, vikas.shivappa, Hansen, Dave, mingo,
	hpa, x86, linux-kernel

Hi Thomas,

On 1/16/2018 3:38 AM, Thomas Gleixner wrote:
> On Mon, 15 Jan 2018, Hindman, Gavin wrote:
>>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>>> owner@vger.kernel.org] On Behalf Of Thomas Gleixner
>>> On Fri, 17 Nov 2017, Reinette Chatre wrote:

>>>> 2) The most recent kernel supported by PALLOC is v4.4 and also
>>>> mentioned in the above link there is currently no plan to upstream
>>>> this work for a less divergent comparison of PALLOC and the more
>>>> recent RDT/CAT enabling on which Cache Pseudo-Locking is built.
>>>
>>> Well, that's not a really good excuse for not trying. You at Intel should be able
>>> to get to the parameters easy enough :)
>>>
>> We can run the comparison, but I'm not sure that I understand the intent
>> - my understanding of Palloc is that it's intended to allow allocation of
>> memory to specific physical memory banks.  While that might result in
>> reduced cache-misses since processes are more separated, it's not
>> explicitly intended to reduce cache-misses, and Palloc's benefits would
>> only hold as long as you have few enough processes to be able to
>> dedicate/isolate memory accordingly.  Am I misunderstanding the
>> intent/usage of palloc?
> 
> Right. It comes with its own set of restrictions as does the pseudo-locking.

Reporting results of comparison between PALLOC, CAT, and Cache
Pseudo-Locking. CAT is a hardware supported and Linux enabled cache
partitioning mechanism while PALLOC is an out of tree software cache
partitioning mechanism. Neither CAT nor PALLOC protects against eviction
from a cache partition. Cache Pseudo-Locking builds on CAT by adding
protection against eviction from cache.

Latest PALLOC available is a patch against kernel v4.4. PALLOC data was
collected with latest PALLOC v4.4 patch(*) applied against v4.4.113. CAT
and Cache Pseudo-Locking data was collected with a rebase of this patch
series against x86/cache of tip.git (based on v4.15-rc8) when the HEAD was:

commit 31516de306c0c9235156cdc7acb976ea21f1f646
Author: Fenghua Yu <fenghua.yu@intel.com>
Date:   Wed Dec 20 14:57:24 2017 -0800

    x86/intel_rdt: Add command line parameter to control L2_CDP

All tests involve a user space application that allocates (malloc() with
mlockall()) or in the case of Cache Pseudo-Locking maps using mmap()) a
256KB region of memory. The application then randomly accesses this
region, 32 bytes at a time, measuring the latency in cycles of each
access using the rdtsc instruction. Each time a test is run it is
repeated ten times.

As with the previous tests from this thread, testing was done on an
Intel(R) NUC NUC6CAYS (it has an Intel(R) Celeron(R) Processor J3455).
The system has two 1MB L2 cache (1024 sets and 16 ways).

A few extra tests were done with PALLOC to establish a baseline that I
got it working right before comparing it against CAT and Cache
Pseudo-Locking. Each test was run on an idle system as well as a system
where significant interference was introduced on a core sharing the L2 cache
with the core on which the test was running (referred to as "noisy
neighbor").


TEST1) PALLOC: Enable PALLOC but do not do any cache partitioning.

TEST2) PALLOC: Designate four bits to be used for page coloring, thus
creating four bins. Bits were chosen as the only four bits that overlap
between page and cache set addressing. Run application in a cgroup that
has access to one bin with rest of system accessing the three remaining
bins.

TEST3) PALLOC: With the same four bits used for page coloring as in
TEST2. Let application run in cgroup with dedicated access to two bins,
rest of system the remaining two bins.

TEST4) CAT: Same CAT test as in original cover letter where application
runs with dedicated CLOS with CBM of 0xf. Default CLOS CBM changed to
non-overlapping 0xf0.

TEST5) Cache Pseudo-Locking: Application reads from 256KB Cache Pseudo
Locked region.

Data visualizations plot the cumulative (of ten tests) counts of the
number of instances (y axis) a particular number of cycles (x axis) were
measured. Each plot is accompanied by a boxplot used to visualize the
descriptive statistics (whiskers represent 0 to 99th percentile, inter
quartile range q1 to q3 with black rectangle, median is orange line,
green is average).

Visualization
https://github.com/rchatre/data/blob/master/cache_pseudo_locking/palloc/palloc_baseline.png
presents the PALLOC only results for TEST1 through TEST3. The most
prominent improvement when using PALLOC is when the application obtains
dedicated access to two bins (half of the cache, double the size of
memory being accessed) - in this environment its first quartile is
significantly lower than all the other partitionings. The application
thus experiences more instances where memory access latency is low. We
can see though that the average latency experienced by the application
is not affected significantly by this.

Visualization
https://github.com/rchatre/data/blob/master/cache_pseudo_locking/palloc/palloc_cat_pseudo.png
presents the same PALLOC two bins (TEST3) seen in previous visualization
with the CAT and Cache Pseudo-Locking results. The visualization shows
with all descriptive statistics a significant improved latency when
using CAT compared to PALLOC. The additional comparison with Cache
Pseudo-Locking shows the improved average access latency when compared
to both CAT and PALLOC.

In both the PALLOC and CAT tests there was improvement (CAT most
significant) in latency accessing a 256KB memory region but in both
(PALLOC and CAT) 512KB of cache was set aside for application to obtain
these results. Using Cache Pseudo-Locking to access the 256KB memory
region only 256KB of cache was set aside while also reducing the access
latency when compared to both PALLOC and CAT.

I do hope these results establishes the value of Cache Pseudo-Locking to
you. The rebased patch series used in this testing will be sent out
this week.

Regards,

Reinette

(*) A one line change was made as documented in
https://github.com/heechul/palloc/issues/8

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
  2018-02-12 19:07           ` Reinette Chatre
@ 2018-02-13 10:27             ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2018-02-13 10:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Hindman, Gavin, Yu, Fenghua, Luck, Tony, vikas.shivappa, Hansen,
	Dave, mingo, hpa, x86, linux-kernel

On Mon, 12 Feb 2018, Reinette Chatre wrote:
> On 1/16/2018 3:38 AM, Thomas Gleixner wrote:
> All tests involve a user space application that allocates (malloc() with
> mlockall()) or in the case of Cache Pseudo-Locking maps using mmap()) a
> 256KB region of memory. The application then randomly accesses this
> region, 32 bytes at a time, measuring the latency in cycles of each
> access using the rdtsc instruction. Each time a test is run it is
> repeated ten times.
> In both the PALLOC and CAT tests there was improvement (CAT most
> significant) in latency accessing a 256KB memory region but in both
> (PALLOC and CAT) 512KB of cache was set aside for application to obtain
> these results. Using Cache Pseudo-Locking to access the 256KB memory
> region only 256KB of cache was set aside while also reducing the access
> latency when compared to both PALLOC and CAT.
> 
> I do hope these results establishes the value of Cache Pseudo-Locking to
> you. 

Very nice. Thank you so much for doing this. That kind of data is really
valuable.

My take away from this: All of the mechanisms are only delivering best
effort and the real benefit is the reduction of average latency. The worst
case outliers are in the same ballpark at seems.

> The rebased patch series used in this testing will be sent out
> this week.

I'll make sure to have cycles available for review.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-02-13 10:27 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-13 16:39 [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 01/20] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 02/20] x86/intel_rdt: Make useful functions available internally Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 03/20] x86/intel_rdt: Introduce hooks to create pseudo-locking files Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 04/20] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 05/20] x86/intel_rdt: Print more accurate pseudo-locking availability Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 06/20] x86/intel_rdt: Create pseudo-locked regions Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 07/20] x86/intel_rdt: Connect pseudo-locking directory to operations Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 08/20] x86/intel_rdt: Introduce pseudo-locking resctrl files Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 09/20] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 10/20] x86/intel_rdt: Disable pseudo-locking if CDP enabled Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 11/20] x86/intel_rdt: Associate pseudo-locked regions with its domain Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 12/20] x86/intel_rdt: Support CBM checking from value and character buffer Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 13/20] x86/intel_rdt: Support schemata write - pseudo-locking core Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 14/20] x86/intel_rdt: Enable testing for pseudo-locked region Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 15/20] x86/intel_rdt: Prevent new allocations from pseudo-locked regions Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 16/20] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 17/20] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 18/20] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 19/20] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
2017-11-13 16:39 ` [RFC PATCH 20/20] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre
2017-11-18  0:48 ` [RFC PATCH 00/20] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Thomas Gleixner
2017-11-18  6:42   ` Reinette Chatre
2018-01-14 22:54     ` Thomas Gleixner
2018-01-15 16:23       ` Hindman, Gavin
2018-01-16 11:38         ` Thomas Gleixner
2018-01-17  0:53           ` Reinette Chatre
2018-02-12 19:07           ` Reinette Chatre
2018-02-13 10:27             ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.