linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling
@ 2018-05-22 11:28 Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 01/38] x86/intel_rdt: Document new mode, size, and bit_usage Reinette Chatre
                   ` (37 more replies)
  0 siblings, 38 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Dear Maintainers,

This fourth series of Cache Pseudo-Locking enabling addresses all feedback
received up to and including the review of v3. Your time used to review
this work and the valuable feedback you provide is greatly appreciated.

Changes since v3:
- Rebase series on top of tip::x86/cache with HEAD:
  commit de73f38f768021610bd305cf74ef3702fcf6a1eb (tip/x86/cache)
  Author: Vikas Shivappa <vikas.shivappa@linux.intel.com>
  Date:   Fri Apr 20 15:36:21 2018 -0700

    x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
- The final patch from the v3 submission is not included in this series.
  The large contiguous allocation work it depends on is actively discussed
  and patches now at v2:
  http://lkml.kernel.org/r/20180503232935.22539-1-mike.kravetz@oracle.com
  At this time it seems that the large contiguous allocation API may change
  in future versions so I plan to resubmit the final patch when that is
  finalized. Until then we are limited to Cache Pseudo-Locked regions of 4MB.
- rdtgroup_cbm_overlaps() now returns bool instead of int.
- rdtgroup_mode_test_exclusive() now returns bool instead of int.
- Respect the tabular formatting in rdt_cbm_parse_data struct declaration.
- In rdt_bit_usage_show() use test to exit loop earlier and thus spare an
  indentation level of code that follows.
- Follow recommendations of recent additions to checkpatch.pl:
  -- Prefer 'help' over '---help---' for new Kconfig help texts.
  -- Include SPDX-License-Identifier tag in new files.

No changes below. It is verbatim from previous submission (except for
diffstat at the end that reflects v4).

The last patch of this series depends on the series:
"[RFC PATCH 0/3] Interface for higher order contiguous allocations"
submitted at:
http://lkml.kernel.org/r/20180212222056.9735-1-mike.kravetz@oracle.com
A new version of this was submitted recently and currently being discussed
at:
http://lkml.kernel.org/r/20180417020915.11786-1-mike.kravetz@oracle.com
Without this upstream MM work (and patch 39/39 of this series) it would
just not be possible to create pseudo-locked regions larger than 4MB. To
simplify this work we could temporarily drop the last patch of this
series until the upstream MM work is complete.

Changes since v2:
- Introduce resource group "modes" and a new resctrl file "mode" associated
  with each resource group that exposes the associated resource group's mode.
  A resource group's mode is used by the system administrator to enable or
  disable resource sharing between resource groups. A resource group in
  "shareable" mode allows its allocations to be shared with other resource
  groups. This is the default mode and reflects existing behavior. A resource
  group in "exclusive" mode does not allow any sharing of its allocated
  resources. When a schemata is written to any resource group it is not
  allowed to overlap with allocations of any resource group that is in
  "exclusive" mode. A resource group's allocations are not allowed to overlap
  at the time it is set to be "exclusive".  Cache pseudo-locking builds on
  "exclusive" mode and is supported using two new modes: "pseudo-locksetup"
  lets the user indicate that this resource group will be used by a
  pseudo-locked region. A subsequent write of a schemata to the "schemata"
  file will create the corresponding pseudo-locked region and the mode will
  then automatically change to "pseudo-locked".
- A resource group's mode can only be changed to "pseudo-locksetup" if the
  platform has been verified to support cache pseudo-locking and the
  resource group is unused. Unused means that, no monitoring is in progress,
  and no tasks or cpus are assigned to the resource group. Once a resource
  group enters "pseudo-locksetup" it becomes "locked down" such that no
  new tasks or cpus can be assigned to it. Neither can any new monitoring
  be started.
- Each resource group obtains a new "size" file that mirrors the schemata
  file to display the size in bytes of each allocation. There is a difference
  in the implementation from the review feedback. In the review feedback an
  example of output was:
     L2:0=128K;1=256K;
     L3:0=1M;1=2M;
  Within the kernel I could find many examples of support for user _input_ with
  mem suffixes. This is broadly supported with lib/cmdline.c:memparse().
  I was not able to find as clear support or usage of such flexible
  _output_ of size. My conclusion was that the output of size tends to always
  be using the same unit. I also found that printing the size in one unit, in
  this case bytes, does simplify validation.
- A new "bit_usage" file within the info/<resource> sub-directories contain
  annotated bitmaps of how the resources are used.
- Cache pseudo-locked regions are now associated 1:1 with a resource group.
- Do not make any changes to capacity bitmask (CBM) associated with the
  default class-of-service (CLOS). If a pseudo-locked region is requested its
  cache region has to be unused at the time of request.
- Second mutex removed.
- Tabular fashion respected when making struct changes.
- Lifetime of pseudo-locked region (by extension the resource group it
  belongs to) connected to mmap region.
- Do not call preempt_disable() and local_irq_save(). Only local_irq_disable().
- Improve comments in pseudo-locking loop to explain why prefetcher needs
  disabling.
- Ensure that possibility of pseudo-locked region success takes into
  account all levels of cache in the hierarchy, not just the level at which
  it is requested.
- Preloading of code was suggested in review to improve pseudo-locking
  success. We have since been able to connect a hardware debugger to our
  target platform and with current locking flow we are able to lock 100%
  of kernel memory into the cache of an Intel(R) Celeron(R) Processor J3455.
- Above testing with hardware debugger revealed that speculative execution
  of the loop loads data beyond the end of the buffer. Add a read barrier
  to the locking loops to prevent this speculation.
- The name of the debugfs file used to trigger measurements was changed
  from "measure_trigger" to "pseudo_lock_measure".

Changes since v1:
- Enable allocation of contiguous regions larger than what SLAB allocators
  can support. This removes the 4MB Cache Pseudo-Locking limitation
  documented in v1 submission.
  This depends on "mm: drop hotplug lock from lru_add_drain_all",
  now in v4.16-rc1 as 9852a7212324fd25f896932f4f4607ce47b0a22f.
- Convert to debugfs_file_get() and -put() from the now obsolete
  debugfs_use_file_start() and debugfs_use_file_finish() calls.
- Rebase on top of, and take into account, recent L2 CDP enabling.
- Simplify tracing output to print cache hits and miss counts on same line.


Dear Maintainers,

Cache Allocation Technology (CAT), part of Intel(R) Resource Director
Technology (Intel(R) RDT), enables a user to specify the amount of cache
space into which an application can fill. Cache pseudo-locking builds on
the fact that a CPU can still read and write data pre-allocated outside
its current allocated area on cache hit. With cache pseudo-locking data
can be preloaded into a reserved portion of cache that no application can
fill, and from that point on will only serve cache hits. The cache
pseudo-locked memory is made accessible to user space where an application
can map it into its virtual address space and thus have a region of
memory with reduced average read latency.

The cache pseudo-locking approach relies on generation-specific behavior
of processors. It may provide benefits on certain processor generations,
but is not guaranteed to be supported in the future. It is not a guarantee
that data will remain in the cache. It is not a guarantee that data will
remain in certain levels or certain regions of the cache. Rather, cache
pseudo-locking increases the probability that data will remain in a certain
level of the cache via carefully configuring the CAT feature and carefully
controlling application behavior.

Known limitations:
Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict pseudo-locked
memory from the cache. Power management C-states may still shrink or power
off cache causing eviction of cache pseudo-locked memory. We utilize
PM QoS to prevent entering deeper C-states on cores associated with cache
pseudo-locked regions at the time they (the pseudo-locked regions) are
created.

Known software limitation (FIXED IN V2):
Cache pseudo-locked regions are currently limited to 4MB, even on
platforms that support larger cache sizes. Work is in progress to
support larger regions.

Graphs visualizing the benefits of cache pseudo-locking on an Intel(R)
NUC NUC6CAYS (it has an Intel(R) Celeron(R) Processor J3455) with the
default 2GB DDR3L-1600 memory are available. In these tests the patches
from this series were applied on the x86/cache branch of tip.git at the
time the HEAD was:

commit 87943db7dfb0c5ee5aa74a9ac06346fadd9695c8 (tip/x86/cache)
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Fri Oct 20 02:16:59 2017 -0700
    x86/intel_rdt: Fix potential deadlock during resctrl mount

DISCLAIMER: Tests document performance of components on a particular test,
in specific systems. Differences in hardware, software, or configuration
will affect actual performance. Performance varies depending on system
configuration.

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/perfcount.png
Above shows the few L2 cache misses possible with cache pseudo-locking
on the Intel(R) NUC with default configuration. Each test, which is
repeated 100 times, pseudo-locks schemata shown and then measure from
the kernel via precision counters the number of cache misses when
accessing the memory afterwards. This test is run on an idle system as
well as a system with significant noise (using stress-ng) from a
neighboring core associated with the same cache. This plot shows us that:
(1) the number of cache misses remain consistent irrespective of the size
of region being pseudo-locked, and (2) the number of cache misses for a
pseudo-locked region remains low when traversing memory regions ranging
in size from 256KB (4096 cache lines) to 896KB (14336 cache lines).

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_malloc_with_load.png
Above shows the read latency experienced by an application running with
default CAT CLOS after it allocated 256KB memory with malloc() (and using
mlockall()). In this example the application reads randomly (to not trigger
hardware prefetcher) from its entire allocated region at 2 second intervals
while there is a noisy neighbor present. Each individual access is 32 bytes
in size and the latency of each access is measured using the rdtsc
instruction. In this visualization we can observe two groupings of data,
the group with lower latency indicating cache hits, and the group with
higher latency indicating cache misses. We can see a significant portion
of memory reads experience larger latencies.

- https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_psl_with_load.png
Above plots a similar test as the previous, but instead of the application
reading from a 256KB malloc() region it reads from a 256KB pseudo-locked
region that was mmap()'ed into its address space. When comparing these
latencies to that of regular malloc() latencies we do see a significant
improvement in latencies experienced.

https://github.com/rchatre/data/blob/master/cache_pseudo_locking/rfc_v1/userspace_malloc_and_cat_with_load_clos0_fixed.png
Applications that are sensitive to latencies may use existing CAT
technology to isolate the sensitive application. In this plot we show an
application running with a dedicated CAT CLOS double the size (512KB) of
the memory being tested (256KB). A dedicated CLOS with CBM 0x0f is created and
the default CLOS changed to CBM 0xf0. We see in this plot that even though
the application runs within a dedicated portion of cache it still
experiences significant latency accessing its memory (when compared to
pseudo-locking).

Your feedback about this proposal for enabling of Cache Pseudo-Locking
will be greatly appreciated.

Regards,

Reinette

Reinette Chatre (38):
  x86/intel_rdt: Document new mode, size, and bit_usage
  x86/intel_rdt: Introduce RDT resource group mode
  x86/intel_rdt: Associate mode with each RDT resource group
  x86/intel_rdt: Introduce resource group's mode resctrl file
  x86/intel_rdt: Introduce test to determine if closid is in use
  x86/intel_rdt: Make useful functions available internally
  x86/intel_rdt: Initialize new resource group with sane defaults
  x86/intel_rdt: Introduce new "exclusive" mode
  x86/intel_rdt: Enable setting of exclusive mode
  x86/intel_rdt: Making CBM name and type more explicit
  x86/intel_rdt: Support flexible data to parsing callbacks
  x86/intel_rdt: Ensure requested schemata respects mode
  x86/intel_rdt: Introduce "bit_usage" to display cache allocations
    details
  x86/intel_rdt: Display resource groups' allocations' size in bytes
  x86/intel_rdt: Documentation for Cache Pseudo-Locking
  x86/intel_rdt: Introduce the Cache Pseudo-Locking modes
  x86/intel_rdt: Respect read and write access
  x86/intel_rdt: Add utility to test if tasks assigned to resource group
  x86/intel_rdt: Add utility to restrict/restore access to resctrl files
  x86/intel_rdt: Protect against resource group changes during locking
  x86/intel_rdt: Utilities to restrict/restore access to specific files
  x86/intel_rdt: Add check to determine if monitoring in progress
  x86/intel_rdt: Introduce pseudo-locked region
  x86/intel_rdt: Support enter/exit of locksetup mode
  x86/intel_rdt: Enable entering of pseudo-locksetup mode
  x86/intel_rdt: Split resource group removal in two
  x86/intel_rdt: Add utilities to test pseudo-locked region possibility
  x86/intel_rdt: Discover supported platforms via prefetch disable bits
  x86/intel_rdt: Pseudo-lock region creation/removal core
  x86/intel_rdt: Support creation/removal of pseudo-locked region
  x86/intel_rdt: resctrl files reflect pseudo-locked information
  x86/intel_rdt: Ensure RDT cleanup on exit
  x86/intel_rdt: Create resctrl debug area
  x86/intel_rdt: Create debugfs files for pseudo-locking testing
  x86/intel_rdt: Create character device exposing pseudo-locked region
  x86/intel_rdt: More precise L2 hit/miss measurements
  x86/intel_rdt: Support L3 cache performance event of Broadwell
  x86/intel_rdt: Limit C-states dynamically when pseudo-locking active

 Documentation/x86/intel_rdt_ui.txt                |  375 ++++-
 arch/x86/Kconfig                                  |   11 +
 arch/x86/kernel/cpu/Makefile                      |    4 +-
 arch/x86/kernel/cpu/intel_rdt.c                   |   11 +
 arch/x86/kernel/cpu/intel_rdt.h                   |  146 +-
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c       |  129 +-
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 1507 +++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |   43 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c          |  765 ++++++++++-
 9 files changed, 2918 insertions(+), 73 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h

-- 
2.13.6

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V4 01/38] x86/intel_rdt: Document new mode, size, and bit_usage
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 02/38] x86/intel_rdt: Introduce RDT resource group mode Reinette Chatre
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

By default resource groups allow sharing of their cache allocations.
There is nothing that prevents a resource group from configuring
a cache allocation that overlaps with that of an existing resource
group.

To enable resource groups to specify that their cache allocations cannot
be shared a resource group "mode" is introduced to support two possible
modes: "shareable" and "exclusive". A "shareable" resource group allows
sharing of its cache allocations, an "exclusive" resource group does
not. A new resctrl file "mode" associated with each resource group is
used to communicate its (the associated resource group's) mode setting
and allow the mode to be changed.  The new "mode" file as well as two
other resctrl files, "bit_usage" and "size", are introduced in this
series.

Add documentation for the three new resctrl files as well as one example
demonstrating their use.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 99 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index a16aa2113840..de913e00e922 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -65,6 +65,27 @@ related to allocation:
 			some platforms support devices that have their
 			own settings for cache use which can over-ride
 			these bits.
+"bit_usage":		Annotated capacity bitmasks showing how all
+			instances of the resource are used. The legend is:
+			"0" - Corresponding region is unused. When the system's
+			      resources have been allocated and a "0" is found
+			      in "bit_usage" it is a sign that resources are
+			      wasted.
+			"H" - Corresponding region is used by hardware only
+			      but available for software use. If a resource
+			      has bits set in "shareable_bits" but not all
+			      of these bits appear in the resource groups'
+			      schematas then the bits appearing in
+			      "shareable_bits" but no resource group will
+			      be marked as "H".
+			"X" - Corresponding region is available for sharing and
+			      used by hardware and software. These are the
+			      bits that appear in "shareable_bits" as
+			      well as a resource group's allocation.
+			"S" - Corresponding region is used by software
+			      and available for sharing.
+			"E" - Corresponding region is used exclusively by
+			      one resource group. No sharing allowed.
 
 Memory bandwitdh(MB) subdirectory contains the following files
 with respect to allocation:
@@ -163,6 +184,16 @@ When control is enabled all CTRL_MON groups will also contain:
 	A list of all the resources available to this group.
 	Each resource has its own line and format - see below for details.
 
+"size":
+	Mirrors the display of the "schemata" file to display the size in
+	bytes of each allocation instead of the bits representing the
+	allocation.
+
+"mode":
+	The "mode" of the resource group dictates the sharing of its
+	allocations. A "shareable" resource group allows sharing of its
+	allocations while an "exclusive" resource group does not.
+
 When monitoring is enabled all MON groups will also contain:
 
 "mon_data":
@@ -502,7 +533,71 @@ siblings and only the real time threads are scheduled on the cores 4-7.
 
 # echo F0 > p0/cpus
 
-4) Locking between applications
+Example 4
+---------
+
+The resource groups in previous examples were all in the default "shareable"
+mode allowing sharing of their cache allocations. If one resource group
+configures a cache allocation then nothing prevents another resource group
+to overlap with that allocation.
+
+In this example a new exclusive resource group will be created on a L2 CAT
+system with two L2 cache instances that can be configured with an 8-bit
+capacity bitmask. The new exclusive resource group will be configured to use
+25% of each cache instance.
+
+# mount -t resctrl resctrl /sys/fs/resctrl/
+# cd /sys/fs/resctrl
+
+First, we observe that the default group is configured to allocate to all L2
+cache:
+
+# cat schemata
+L2:0=ff;1=ff
+
+We could attempt to create the new resource group at this point, but it will
+fail because of the overlap with the schemata of the default group:
+# mkdir p0
+# echo 'L2:0=0x3;1=0x3' > p0/schemata
+# cat p0/mode
+shareable
+# echo exclusive > p0/mode
+-sh: echo: write error: Invalid argument
+# cat info/last_cmd_status
+schemata overlaps
+
+To ensure that there is no overlap with another resource group the default
+resource group's schemata has to change, making it possible for the new
+resource group to become exclusive.
+# echo 'L2:0=0xfc;1=0xfc' > schemata
+# echo exclusive > p0/mode
+# grep . p0/*
+p0/cpus:0
+p0/mode:exclusive
+p0/schemata:L2:0=03;1=03
+p0/size:L2:0=262144;1=262144
+
+A new resource group will on creation not overlap with an exclusive resource
+group:
+# mkdir p1
+# grep . p1/*
+p1/cpus:0
+p1/mode:shareable
+p1/schemata:L2:0=fc;1=fc
+p1/size:L2:0=786432;1=786432
+
+The bit_usage will reflect how the cache is used:
+# cat info/L2/bit_usage
+0=SSSSSSEE;1=SSSSSSEE
+
+A resource group cannot be forced to overlap with an exclusive resource group:
+# echo 'L2:0=0x1;1=0x1' > p1/schemata
+-sh: echo: write error: Invalid argument
+# cat info/last_cmd_status
+overlaps with exclusive group
+
+Locking between applications
+----------------------------
 
 Certain operations on the resctrl filesystem, composed of read/writes
 to/from multiple files, must be atomic.
@@ -510,7 +605,7 @@ to/from multiple files, must be atomic.
 As an example, the allocation of an exclusive reservation of L3 cache
 involves:
 
-  1. Read the cbmmasks from each directory
+  1. Read the cbmmasks from each directory or the per-resource "bit_usage"
   2. Find a contiguous set of bits in the global CBM bitmask that is clear
      in any of the directory cbmmasks
   3. Create a new directory
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 02/38] x86/intel_rdt: Introduce RDT resource group mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 01/38] x86/intel_rdt: Document new mode, size, and bit_usage Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 03/38] x86/intel_rdt: Associate mode with each RDT resource group Reinette Chatre
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

At this time there are no constraints on how bitmasks represented by
schemata can be associated with closids represented by resource groups.
A bitmask of one class of service can without any objections overlap
with the bitmask of another class of service.

The concept of "mode" is introduced in preparation for support of
control over whether cache regions can be shared between classes of
service. At this time the only mode reflects the current cache allocations
where all can potentially be shared.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 39752825e376..c08eee73ecd3 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -81,6 +81,22 @@ enum rdt_group_type {
 };
 
 /**
+ * enum rdtgrp_mode - Mode of a RDT resource group
+ * @RDT_MODE_SHAREABLE: This resource group allows sharing of its allocations
+ *
+ * The mode of a resource group enables control over the allowed overlap
+ * between allocations associated with different resource groups (classes
+ * of service). User is able to modify the mode of a resource group by
+ * writing to the "mode" resctrl file associated with the resource group.
+ */
+enum rdtgrp_mode {
+	RDT_MODE_SHAREABLE = 0,
+
+	/* Must be last */
+	RDT_NUM_MODES,
+};
+
+/**
  * struct mongroup - store mon group's data in resctrl fs.
  * @mon_data_kn		kernlfs node for the mon_data directory
  * @parent:			parent rdtgrp
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 03/38] x86/intel_rdt: Associate mode with each RDT resource group
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 01/38] x86/intel_rdt: Document new mode, size, and bit_usage Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 02/38] x86/intel_rdt: Introduce RDT resource group mode Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 04/38] x86/intel_rdt: Introduce resource group's mode resctrl file Reinette Chatre
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Each RDT resource group is associated with a mode that will reflect
the level of sharing of its allocations. The default, shareable, will be
associated with each resource group on creation since it is zero and
resource groups are created with kzalloc. The managing of the mode of a
resource group will follow. The default resource group always remain
though so ensure that it is reset to the default mode when the resctrl
filesystem is unmounted.

Also introduce a utility that can be used to determine the mode of a
resource group when it is searched for based on its class of service.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          |  3 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c08eee73ecd3..7ff1f633bebe 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -122,6 +122,7 @@ struct mongroup {
  * @type:			indicates type of this rdtgroup - either
  *				monitor only or ctrl_mon group
  * @mon:			mongroup related data
+ * @mode:			mode of resource group
  */
 struct rdtgroup {
 	struct kernfs_node	*kn;
@@ -132,6 +133,7 @@ struct rdtgroup {
 	atomic_t		waitcount;
 	enum rdt_group_type	type;
 	struct mongroup		mon;
+	enum rdtgrp_mode	mode;
 };
 
 /* rdtgroup.flags */
@@ -461,6 +463,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
+enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 749856a2e736..70d7846041fe 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -126,6 +126,27 @@ static void closid_free(int closid)
 	closid_free_map |= 1 << closid;
 }
 
+/**
+ * rdtgroup_mode_by_closid - Return mode of resource group with closid
+ * @closid: closid if the resource group
+ *
+ * Each resource group is associated with a @closid. Here the mode
+ * of a resource group can be queried by searching for it using its closid.
+ *
+ * Return: mode as &enum rdtgrp_mode of resource group with closid @closid
+ */
+enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
+{
+	struct rdtgroup *rdtgrp;
+
+	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+		if (rdtgrp->closid == closid)
+			return rdtgrp->mode;
+	}
+
+	return RDT_NUM_MODES;
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -1483,6 +1504,7 @@ static void rdt_kill_sb(struct super_block *sb)
 		reset_all_ctrls(r);
 	cdp_disable_all();
 	rmdir_all_sub();
+	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
 	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
 	static_branch_disable_cpuslocked(&rdt_enable_key);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 04/38] x86/intel_rdt: Introduce resource group's mode resctrl file
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (2 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 03/38] x86/intel_rdt: Associate mode with each RDT resource group Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 05/38] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

A new resctrl file "mode" associated with each resource group
is introduced. This file will display the resource group's current mode
and an administrator can also use it to modify the resource group's
mode.

Only shareable mode is currently supported.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 83 ++++++++++++++++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 70d7846041fe..eac60602c87b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -147,6 +147,24 @@ enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
 	return RDT_NUM_MODES;
 }
 
+static const char * const rdt_mode_str[] = {
+	[RDT_MODE_SHAREABLE]	= "shareable",
+};
+
+/**
+ * rdtgroup_mode_str - Return the string representation of mode
+ * @mode: the resource group mode as &enum rdtgroup_mode
+ *
+ * Return: string representation of valid mode, "unknown" otherwise
+ */
+static const char *rdtgroup_mode_str(enum rdtgrp_mode mode)
+{
+	if (mode < RDT_MODE_SHAREABLE || mode >= RDT_NUM_MODES)
+		return "unknown";
+
+	return rdt_mode_str[mode];
+}
+
 /* set uid and gid of rdtgroup dirs and files to that of the creator */
 static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
 {
@@ -761,6 +779,63 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
 	return nbytes;
 }
 
+/*
+ * rdtgroup_mode_show - Display mode of this resource group
+ */
+static int rdtgroup_mode_show(struct kernfs_open_file *of,
+			      struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+
+	seq_printf(s, "%s\n", rdtgroup_mode_str(rdtgrp->mode));
+
+	rdtgroup_kn_unlock(of->kn);
+	return 0;
+}
+
+static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	struct rdtgroup *rdtgrp;
+	enum rdtgrp_mode mode;
+	int ret = 0;
+
+	/* Valid input requires a trailing newline */
+	if (nbytes == 0 || buf[nbytes - 1] != '\n')
+		return -EINVAL;
+	buf[nbytes - 1] = '\0';
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+
+	rdt_last_cmd_clear();
+
+	mode = rdtgrp->mode;
+
+	if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE))
+		goto out;
+
+	if (!strcmp(buf, "shareable")) {
+		rdtgrp->mode = RDT_MODE_SHAREABLE;
+	} else {
+		rdt_last_cmd_printf("unknown/unsupported mode\n");
+		ret = -EINVAL;
+	}
+
+out:
+	rdtgroup_kn_unlock(of->kn);
+	return ret ?: nbytes;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -874,6 +949,14 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_schemata_show,
 		.fflags		= RF_CTRL_BASE,
 	},
+	{
+		.name		= "mode",
+		.mode		= 0644,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.write		= rdtgroup_mode_write,
+		.seq_show	= rdtgroup_mode_show,
+		.fflags		= RF_CTRL_BASE,
+	},
 };
 
 static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 05/38] x86/intel_rdt: Introduce test to determine if closid is in use
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (3 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 04/38] x86/intel_rdt: Introduce resource group's mode resctrl file Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 06/38] x86/intel_rdt: Make useful functions available internally Reinette Chatre
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

During CAT feature discovery the capacity bitmasks (CBMs) associated
with all the classes of service are initialized to all ones, even if the
class of service is not in use. Introduce a test that can be used to
determine if a class of service is in use. This test enables code
interested in parsing the CBMs to know if its values are meaningful or
can be ignored.

Temporarily mark the function as unused to silence compile warnings
until it is used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index eac60602c87b..fdc55357251b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -127,6 +127,18 @@ static void closid_free(int closid)
 }
 
 /**
+ * closid_allocated - test if provided closid is in use
+ * @closid: closid to be tested
+ *
+ * Return: true if @closid is currently associated with a resource group,
+ * false if @closid is free
+ */
+static bool __attribute__ ((unused)) closid_allocated(unsigned int closid)
+{
+	return (closid_free_map & (1 << closid)) == 0;
+}
+
+/**
  * rdtgroup_mode_by_closid - Return mode of resource group with closid
  * @closid: closid if the resource group
  *
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 06/38] x86/intel_rdt: Make useful functions available internally
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (4 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 05/38] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 07/38] x86/intel_rdt: Initialize new resource group with sane defaults Reinette Chatre
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

In support of the work done to enable resource groups to have different
modes some static functions need to be available for sharing amongst
all RDT components.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             | 2 ++
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 2 +-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 7ff1f633bebe..5f3915c2e599 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -465,6 +465,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
 enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
+int update_domains(struct rdt_resource *r, int closid);
+void closid_free(int closid);
 int alloc_rmid(void);
 void free_rmid(u32 rmid);
 int rdt_get_mon_l3_config(struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 116d57b248d3..2c23bb136ccc 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -175,7 +175,7 @@ static int parse_line(char *line, struct rdt_resource *r)
 	return -EINVAL;
 }
 
-static int update_domains(struct rdt_resource *r, int closid)
+int update_domains(struct rdt_resource *r, int closid)
 {
 	struct msr_param msr_param;
 	cpumask_var_t cpu_mask;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index fdc55357251b..35e538eed977 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -121,7 +121,7 @@ static int closid_alloc(void)
 	return closid;
 }
 
-static void closid_free(int closid)
+void closid_free(int closid)
 {
 	closid_free_map |= 1 << closid;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 07/38] x86/intel_rdt: Initialize new resource group with sane defaults
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (5 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 06/38] x86/intel_rdt: Make useful functions available internally Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 08/38] x86/intel_rdt: Introduce new "exclusive" mode Reinette Chatre
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Currently when a new resource group is created its allocations would be
those that belonged to the resource group to which its closid belonged
previously.

That is, we can encounter a case like:
mkdir newgroup
cat newgroup/schemata
L2:0=ff;1=ff
echo 'L2:0=0xf0;1=0xf0' > newgroup/schemata
cat newgroup/schemata
L2:0=0xf0;1=0xf0
rmdir newgroup
mkdir newnewgroup
cat newnewgroup/schemata
L2:0=0xf0;1=0xf0

When the new group is created it would be reasonable to expect its
allocations to be initialized with all regions that it can possibly use.
At this time these regions would be all that are shareable by other
resource groups as well as regions that are not currently used.

When a new resource group is created the hardware is initialized with
these new default allocations.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 69 ++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 35e538eed977..b2008c697ce0 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -133,7 +133,7 @@ void closid_free(int closid)
  * Return: true if @closid is currently associated with a resource group,
  * false if @closid is free
  */
-static bool __attribute__ ((unused)) closid_allocated(unsigned int closid)
+static bool closid_allocated(unsigned int closid)
 {
 	return (closid_free_map & (1 << closid)) == 0;
 }
@@ -1799,6 +1799,64 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
 	return ret;
 }
 
+/**
+ * rdtgroup_init_alloc - Initialize the new RDT group's allocations
+ *
+ * A new RDT group is being created on an allocation capable (CAT)
+ * supporting system. Set this group up to start off with all usable
+ * allocations. That is, all shareable and unused bits.
+ *
+ * All-zero CBM is invalid. If there are no more shareable bits available
+ * on any domain then the entire allocation will fail.
+ */
+static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
+{
+	u32 used_b = 0, unused_b = 0;
+	u32 closid = rdtgrp->closid;
+	struct rdt_resource *r;
+	enum rdtgrp_mode mode;
+	struct rdt_domain *d;
+	int i, ret;
+	u32 *ctrl;
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list) {
+			d->have_new_ctrl = false;
+			d->new_ctrl = r->cache.shareable_bits;
+			used_b = r->cache.shareable_bits;
+			ctrl = d->ctrl_val;
+			for (i = 0; i < r->num_closid; i++, ctrl++) {
+				if (closid_allocated(i) && i != closid) {
+					mode = rdtgroup_mode_by_closid(i);
+					used_b |= *ctrl;
+					if (mode == RDT_MODE_SHAREABLE)
+						d->new_ctrl |= *ctrl;
+				}
+			}
+			unused_b = used_b ^ (BIT_MASK(r->cache.cbm_len) - 1);
+			unused_b &= BIT_MASK(r->cache.cbm_len) - 1;
+			d->new_ctrl |= unused_b;
+			if (d->new_ctrl == 0) {
+				rdt_last_cmd_printf("no space on %s:%d\n",
+						    r->name, d->id);
+				return -ENOSPC;
+			}
+			d->have_new_ctrl = true;
+		}
+	}
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		ret = update_domains(r, rdtgrp->closid);
+		if (ret < 0) {
+			rdt_last_cmd_puts("failed to initialize allocations\n");
+			return ret;
+		}
+		rdtgrp->mode = RDT_MODE_SHAREABLE;
+	}
+
+	return 0;
+}
+
 static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 			     struct kernfs_node *prgrp_kn,
 			     const char *name, umode_t mode,
@@ -1957,6 +2015,10 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 	ret = 0;
 
 	rdtgrp->closid = closid;
+	ret = rdtgroup_init_alloc(rdtgrp);
+	if (ret < 0)
+		goto out_id_free;
+
 	list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
 
 	if (rdt_mon_capable) {
@@ -1967,15 +2029,16 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
 		ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
 		if (ret) {
 			rdt_last_cmd_puts("kernfs subdir error\n");
-			goto out_id_free;
+			goto out_del_list;
 		}
 	}
 
 	goto out_unlock;
 
+out_del_list:
+	list_del(&rdtgrp->rdtgroup_list);
 out_id_free:
 	closid_free(closid);
-	list_del(&rdtgrp->rdtgroup_list);
 out_common_fail:
 	mkdir_rdt_prepare_clean(rdtgrp);
 out_unlock:
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 08/38] x86/intel_rdt: Introduce new "exclusive" mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (6 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 07/38] x86/intel_rdt: Initialize new resource group with sane defaults Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 09/38] x86/intel_rdt: Enable setting of exclusive mode Reinette Chatre
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

At the moment all allocations are shareable. There is no way for a user
to designate that an allocation associated with a resource group cannot be
shared by another.

Introduce the new mode "exclusive". When a resource group is marked as
such it implies that no overlap is allowed between its allocation and that
of another resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          | 2 ++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 5f3915c2e599..399bb94e865b 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -83,6 +83,7 @@ enum rdt_group_type {
 /**
  * enum rdtgrp_mode - Mode of a RDT resource group
  * @RDT_MODE_SHAREABLE: This resource group allows sharing of its allocations
+ * @RDT_MODE_EXCLUSIVE: No sharing of this resource group's allocations allowed
  *
  * The mode of a resource group enables control over the allowed overlap
  * between allocations associated with different resource groups (classes
@@ -91,6 +92,7 @@ enum rdt_group_type {
  */
 enum rdtgrp_mode {
 	RDT_MODE_SHAREABLE = 0,
+	RDT_MODE_EXCLUSIVE,
 
 	/* Must be last */
 	RDT_NUM_MODES,
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b2008c697ce0..739b1ecc2b80 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -161,6 +161,7 @@ enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
 
 static const char * const rdt_mode_str[] = {
 	[RDT_MODE_SHAREABLE]	= "shareable",
+	[RDT_MODE_EXCLUSIVE]	= "exclusive",
 };
 
 /**
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 09/38] x86/intel_rdt: Enable setting of exclusive mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (7 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 08/38] x86/intel_rdt: Introduce new "exclusive" mode Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 10/38] x86/intel_rdt: Making CBM name and type more explicit Reinette Chatre
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The new "mode" file now accepts "exclusive" that means that the
allocations of this resource group cannot be shared.

Enable users to modify a resource group's mode to "exclusive". To
succeed it is required that there is no overlap between resource group's
current schemata and that of all the other active resource groups as
well as cache regions potentially used by other hardware entities.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 97 +++++++++++++++++++++++++++++++-
 1 file changed, 96 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 739b1ecc2b80..3701b1cb0f73 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -812,6 +812,93 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+/**
+ * rdtgroup_cbm_overlaps - Does CBM for intended closid overlap with other
+ * @r: Resource to which domain instance @d belongs.
+ * @d: The domain instance for which @closid is being tested.
+ * @cbm: Capacity bitmask being tested.
+ * @closid: Intended closid for @cbm.
+ * @exclusive: Only check if overlaps with exclusive resource groups
+ *
+ * Checks if provided @cbm intended to be used for @closid on domain
+ * @d overlaps with any other closids or other hardware usage associated
+ * with this domain. If @exclusive is true then only overlaps with
+ * resource groups in exclusive mode will be considered. If @exclusive
+ * is false then overlaps with any resource group or hardware entities
+ * will be considered.
+ *
+ * Return: false if CBM does not overlap, true if it does.
+ */
+static bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+				  u32 _cbm, int closid, bool exclusive)
+{
+	unsigned long *cbm = (unsigned long *)&_cbm;
+	unsigned long *ctrl_b;
+	enum rdtgrp_mode mode;
+	u32 *ctrl;
+	int i;
+
+	/* Check for any overlap with regions used by hardware directly */
+	if (!exclusive) {
+		if (bitmap_intersects(cbm,
+				      (unsigned long *)&r->cache.shareable_bits,
+				      r->cache.cbm_len))
+			return true;
+	}
+
+	/* Check for overlap with other resource groups */
+	ctrl = d->ctrl_val;
+	for (i = 0; i < r->num_closid; i++, ctrl++) {
+		ctrl_b = (unsigned long *)ctrl;
+		if (closid_allocated(i) && i != closid) {
+			if (bitmap_intersects(cbm, ctrl_b, r->cache.cbm_len)) {
+				mode = rdtgroup_mode_by_closid(i);
+				if (exclusive) {
+					if (mode == RDT_MODE_EXCLUSIVE)
+						return true;
+					continue;
+				}
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
+
+/**
+ * rdtgroup_mode_test_exclusive - Test if this resource group can be exclusive
+ *
+ * An exclusive resource group implies that there should be no sharing of
+ * its allocated resources. At the time this group is considered to be
+ * exclusive this test can determine if its current schemata supports this
+ * setting by testing for overlap with all other resource groups.
+ *
+ * Return: true if resource group can be exclusive, false if there is overlap
+ * with allocations of other resource groups and thus this resource group
+ * cannot be exclusive.
+ */
+static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
+{
+	int closid = rdtgrp->closid;
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d, &r->domains, list) {
+			if (rdtgroup_cbm_overlaps(r, d, d->ctrl_val[closid],
+						  rdtgrp->closid, false))
+				return false;
+		}
+	}
+
+	return true;
+}
+
+/**
+ * rdtgroup_mode_write - Modify the resource group's mode
+ *
+ */
 static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
 				   char *buf, size_t nbytes, loff_t off)
 {
@@ -834,11 +921,19 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
 
 	mode = rdtgrp->mode;
 
-	if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE))
+	if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE) ||
+	    (!strcmp(buf, "exclusive") && mode == RDT_MODE_EXCLUSIVE))
 		goto out;
 
 	if (!strcmp(buf, "shareable")) {
 		rdtgrp->mode = RDT_MODE_SHAREABLE;
+	} else if (!strcmp(buf, "exclusive")) {
+		if (!rdtgroup_mode_test_exclusive(rdtgrp)) {
+			rdt_last_cmd_printf("schemata overlaps\n");
+			ret = -EINVAL;
+			goto out;
+		}
+		rdtgrp->mode = RDT_MODE_EXCLUSIVE;
 	} else {
 		rdt_last_cmd_printf("unknown/unsupported mode\n");
 		ret = -EINVAL;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 10/38] x86/intel_rdt: Making CBM name and type more explicit
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (8 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 09/38] x86/intel_rdt: Enable setting of exclusive mode Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:28 ` [PATCH V4 11/38] x86/intel_rdt: Support flexible data to parsing callbacks Reinette Chatre
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

cbm_validate() receives a pointer to the variable that will be
initialized with a validated capacity bitmask. The pointer points to a
variable of type unsigned long that is immediately assigned to a
variable of type u32 by the caller on return from cbm_validate().

Let cbm_validate() initialize a variable of type u32 directly.

At this time also change tha variable name "data" within parse_cbm() to
a name more reflective of the content: "cbm_val". This frees up the
generic "data" to be used later when it is indeed used for a collection
of input.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 2c23bb136ccc..b3da5b981dd8 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -87,7 +87,7 @@ int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
  *	are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
  * Additionally Haswell requires at least two bits set.
  */
-static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
+static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
 {
 	unsigned long first_bit, zero_bit, val;
 	unsigned int cbm_len = r->cache.cbm_len;
@@ -128,16 +128,17 @@ static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
  */
 int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
 {
-	unsigned long data;
+	u32 cbm_val;
 
 	if (d->have_new_ctrl) {
 		rdt_last_cmd_printf("duplicate domain %d\n", d->id);
 		return -EINVAL;
 	}
 
-	if(!cbm_validate(buf, &data, r))
+	if (!cbm_validate(buf, &cbm_val, r))
 		return -EINVAL;
-	d->new_ctrl = data;
+
+	d->new_ctrl = cbm_val;
 	d->have_new_ctrl = true;
 
 	return 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 11/38] x86/intel_rdt: Support flexible data to parsing callbacks
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (9 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 10/38] x86/intel_rdt: Making CBM name and type more explicit Reinette Chatre
@ 2018-05-22 11:28 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 12/38] x86/intel_rdt: Ensure requested schemata respects mode Reinette Chatre
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:28 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Each resource is associated with a configurable callback that should be
used to parse the information provided for the particular resource from
user space. In addition to the resource and domain pointers this callback
is provided with just the character buffer being parsed.

In support of flexible parsing the callback is modified to support a
void pointer as argument. This enables resources that need more data
than just the user provided data to pass its required data to the
callback without affecting the signatures for the callbacks of all the
other resources.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             | 6 +++---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 6 ++++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 399bb94e865b..c0c0ef817f11 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -371,7 +371,7 @@ struct rdt_resource {
 	struct rdt_cache	cache;
 	struct rdt_membw	membw;
 	const char		*format_str;
-	int (*parse_ctrlval)	(char *buf, struct rdt_resource *r,
+	int (*parse_ctrlval)	(void *data, struct rdt_resource *r,
 				 struct rdt_domain *d);
 	struct list_head	evt_list;
 	int			num_rmid;
@@ -379,8 +379,8 @@ struct rdt_resource {
 	unsigned long		fflags;
 };
 
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d);
-int parse_bw(char *buf, struct rdt_resource *r,  struct rdt_domain *d);
+int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d);
+int parse_bw(void *_buf, struct rdt_resource *r,  struct rdt_domain *d);
 
 extern struct mutex rdtgroup_mutex;
 
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index b3da5b981dd8..ab4bb8731825 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -64,9 +64,10 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
 	return true;
 }
 
-int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_bw(void *_buf, struct rdt_resource *r, struct rdt_domain *d)
 {
 	unsigned long data;
+	char *buf = _buf;
 
 	if (d->have_new_ctrl) {
 		rdt_last_cmd_printf("duplicate domain %d\n", d->id);
@@ -126,8 +127,9 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
-int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
+int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
 {
+	char *buf = _data;
 	u32 cbm_val;
 
 	if (d->have_new_ctrl) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 12/38] x86/intel_rdt: Ensure requested schemata respects mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (10 preceding siblings ...)
  2018-05-22 11:28 ` [PATCH V4 11/38] x86/intel_rdt: Support flexible data to parsing callbacks Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details Reinette Chatre
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

When the administrator requests a change in a resource group's schemata
we have to ensure that the new schemata respects the current resource
group as well as the other active resource groups' schemata.

The new schemata is not allowed to overlap with the schemata of any
exclusive resource groups. Similarly, if the resource group being
changed is exclusive then its new schemata is not allowed to overlap
with any schemata of any other active resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  2 ++
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 49 ++++++++++++++++++++++-------
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    |  4 +--
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c0c0ef817f11..68d398bc2942 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -465,6 +465,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 				char *buf, size_t nbytes, loff_t off);
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
+bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+			   u32 _cbm, int closid, bool exclusive);
 enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index ab4bb8731825..0e6210a043f0 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -123,13 +123,19 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
 	return true;
 }
 
+struct rdt_cbm_parse_data {
+	struct rdtgroup		*rdtgrp;
+	char			*buf;
+};
+
 /*
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
 int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
 {
-	char *buf = _data;
+	struct rdt_cbm_parse_data *data = _data;
+	struct rdtgroup *rdtgrp = data->rdtgrp;
 	u32 cbm_val;
 
 	if (d->have_new_ctrl) {
@@ -137,8 +143,24 @@ int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
 		return -EINVAL;
 	}
 
-	if (!cbm_validate(buf, &cbm_val, r))
+	if (!cbm_validate(data->buf, &cbm_val, r))
+		return -EINVAL;
+
+	/*
+	 * The CBM may not overlap with the CBM of another closid if
+	 * either is exclusive.
+	 */
+	if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
+		rdt_last_cmd_printf("overlaps with exclusive group\n");
 		return -EINVAL;
+	}
+
+	if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
+		if (rdtgrp->mode == RDT_MODE_EXCLUSIVE) {
+			rdt_last_cmd_printf("overlaps with other group\n");
+			return -EINVAL;
+		}
+	}
 
 	d->new_ctrl = cbm_val;
 	d->have_new_ctrl = true;
@@ -152,8 +174,10 @@ int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
  * separated by ";". The "id" is in decimal, and must match one of
  * the "id"s for this resource.
  */
-static int parse_line(char *line, struct rdt_resource *r)
+static int parse_line(char *line, struct rdt_resource *r,
+		      struct rdtgroup *rdtgrp)
 {
+	struct rdt_cbm_parse_data data;
 	char *dom = NULL, *id;
 	struct rdt_domain *d;
 	unsigned long dom_id;
@@ -170,7 +194,9 @@ static int parse_line(char *line, struct rdt_resource *r)
 	dom = strim(dom);
 	list_for_each_entry(d, &r->domains, list) {
 		if (d->id == dom_id) {
-			if (r->parse_ctrlval(dom, r, d))
+			data.buf = dom;
+			data.rdtgrp = rdtgrp;
+			if (r->parse_ctrlval(&data, r, d))
 				return -EINVAL;
 			goto next;
 		}
@@ -223,13 +249,14 @@ int update_domains(struct rdt_resource *r, int closid)
 	return 0;
 }
 
-static int rdtgroup_parse_resource(char *resname, char *tok, int closid)
+static int rdtgroup_parse_resource(char *resname, char *tok,
+				   struct rdtgroup *rdtgrp)
 {
 	struct rdt_resource *r;
 
 	for_each_alloc_enabled_rdt_resource(r) {
-		if (!strcmp(resname, r->name) && closid < r->num_closid)
-			return parse_line(tok, r);
+		if (!strcmp(resname, r->name) && rdtgrp->closid < r->num_closid)
+			return parse_line(tok, r, rdtgrp);
 	}
 	rdt_last_cmd_printf("unknown/unsupported resource name '%s'\n", resname);
 	return -EINVAL;
@@ -242,7 +269,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 	struct rdt_domain *dom;
 	struct rdt_resource *r;
 	char *tok, *resname;
-	int closid, ret = 0;
+	int ret = 0;
 
 	/* Valid input requires a trailing newline */
 	if (nbytes == 0 || buf[nbytes - 1] != '\n')
@@ -256,8 +283,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 	}
 	rdt_last_cmd_clear();
 
-	closid = rdtgrp->closid;
-
 	for_each_alloc_enabled_rdt_resource(r) {
 		list_for_each_entry(dom, &r->domains, list)
 			dom->have_new_ctrl = false;
@@ -275,13 +300,13 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 			ret = -EINVAL;
 			goto out;
 		}
-		ret = rdtgroup_parse_resource(resname, tok, closid);
+		ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
 		if (ret)
 			goto out;
 	}
 
 	for_each_alloc_enabled_rdt_resource(r) {
-		ret = update_domains(r, closid);
+		ret = update_domains(r, rdtgrp->closid);
 		if (ret)
 			goto out;
 	}
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 3701b1cb0f73..5b78b4aa1a50 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -829,8 +829,8 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
  *
  * Return: false if CBM does not overlap, true if it does.
  */
-static bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
-				  u32 _cbm, int closid, bool exclusive)
+bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
+			   u32 _cbm, int closid, bool exclusive)
 {
 	unsigned long *cbm = (unsigned long *)&_cbm;
 	unsigned long *ctrl_b;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (11 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 12/38] x86/intel_rdt: Ensure requested schemata respects mode Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 21:03   ` Randy Dunlap
  2018-05-22 11:29 ` [PATCH V4 14/38] x86/intel_rdt: Display resource groups' allocations' size in bytes Reinette Chatre
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

With cache regions now explicitly marked as "shareable" or "exclusive"
we would like to communicate to the user how portions of the cache
are used.

Introduce "bit_usage" that indicates for each resource
how portions of the cache are configured to be used.

To assist the user to distinguish whether the sharing is from software or
hardware we add the following annotation:

0 - currently unused
X - currently available for sharing and used by software and hardware
H - currently used by hardware only but available for software use
S - currently used and shareable by software only
E - currently used exclusively by one resource group

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 79 ++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 5b78b4aa1a50..d0040f83532d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -714,6 +714,78 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+/**
+ * rdt_bit_usage_show - Display current usage of resources
+ *
+ * A domain is a shared resource that can now be allocated differently. Here
+ * we display the current regions of the domain as an annotated bitmask.
+ * For each domain of this resource its allocation bitmask
+ * is annotated as below to indicate the current usage of the corresponding bit:
+ *   0 - currently unused
+ *   X - currently available for sharing and used by software and hardware
+ *   H - currently used by hardware only but available for software use
+ *   S - currently used and shareable by software only
+ *   E - currently used exclusively by one resource group
+ */
+static int rdt_bit_usage_show(struct kernfs_open_file *of,
+			      struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = of->kn->parent->priv;
+	u32 sw_shareable, hw_shareable, exclusive;
+	struct rdt_domain *dom;
+	int i, hwb, swb, excl;
+	enum rdtgrp_mode mode;
+	bool sep = false;
+	u32 *ctrl;
+
+	mutex_lock(&rdtgroup_mutex);
+	hw_shareable = r->cache.shareable_bits;
+	list_for_each_entry(dom, &r->domains, list) {
+		if (sep)
+			seq_puts(seq, ";");
+		ctrl = dom->ctrl_val;
+		sw_shareable = 0;
+		exclusive = 0;
+		seq_printf(seq, "%d=", dom->id);
+		for (i = 0; i < r->num_closid; i++, ctrl++) {
+			if (!closid_allocated(i))
+				continue;
+			mode = rdtgroup_mode_by_closid(i);
+			switch (mode) {
+			case RDT_MODE_SHAREABLE:
+				sw_shareable |= *ctrl;
+				break;
+			case RDT_MODE_EXCLUSIVE:
+				exclusive |= *ctrl;
+				break;
+			case RDT_NUM_MODES:
+				WARN(1,
+				     "invalid mode for closid %d\n", i);
+				break;
+			}
+		}
+		for (i = r->cache.cbm_len - 1; i >= 0; i--) {
+			hwb = test_bit(i, (unsigned long *)&hw_shareable);
+			swb = test_bit(i, (unsigned long *)&sw_shareable);
+			excl = test_bit(i, (unsigned long *)&exclusive);
+			if (hwb && swb)
+				seq_puts(seq, "X");
+			else if (hwb && !swb)
+				seq_puts(seq, "H");
+			else if (!hwb && swb)
+				seq_puts(seq, "S");
+			else if (excl)
+				seq_puts(seq, "E");
+			else /* Unused bits remain */
+				seq_puts(seq, "0");
+		}
+		sep = true;
+	}
+	seq_puts(seq, "\n");
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
 static int rdt_min_bw_show(struct kernfs_open_file *of,
 			     struct seq_file *seq, void *v)
 {
@@ -996,6 +1068,13 @@ static struct rftype res_common_files[] = {
 		.fflags		= RF_CTRL_INFO | RFTYPE_RES_CACHE,
 	},
 	{
+		.name		= "bit_usage",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_bit_usage_show,
+		.fflags		= RF_CTRL_INFO | RFTYPE_RES_CACHE,
+	},
+	{
 		.name		= "min_bandwidth",
 		.mode		= 0444,
 		.kf_ops		= &rdtgroup_kf_single_ops,
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 14/38] x86/intel_rdt: Display resource groups' allocations' size in bytes
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (12 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 15/38] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The schemata file displays the allocations associated with each domain
of each resource. The syntax of this file reflects the capacity bitmask
(CBM) of the actual allocation. In order to determine the actual size of
an allocation the user needs to dig through three different files to query
the variables needed to compute it (the cache size, the CBM length, and
the schemata).

Introduce a new file "size" associated with each resource group that
will mirror the schemata file syntax and display the size in bytes of
each allocation.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          |  2 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 81 ++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 68d398bc2942..8bbb047bf37c 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -467,6 +467,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
 bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
 			   u32 _cbm, int closid, bool exclusive);
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
+				  u32 cbm);
 enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index d0040f83532d..2af99e03faae 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -20,6 +20,7 @@
 
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
+#include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/fs.h>
 #include <linux/sysfs.h>
@@ -1016,6 +1017,78 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
 	return ret ?: nbytes;
 }
 
+/**
+ * rdtgroup_cbm_to_size - Translate CBM to size in bytes
+ * @r: RDT resource to which @d belongs.
+ * @d: RDT domain instance.
+ * @cbm: bitmask for which the size should be computed.
+ *
+ * The bitmask provided associated with the RDT domain instance @d will be
+ * translated into how many bytes it represents. The size in bytes is
+ * computed by first dividing the total cache size by the CBM length to
+ * determine how many bytes each bit in the bitmask represents. The result
+ * is multiplied with the number of bits set in the bitmask.
+ */
+unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
+				  struct rdt_domain *d, u32 cbm)
+{
+	struct cpu_cacheinfo *ci;
+	unsigned int size = 0;
+	int num_b, i;
+
+	num_b = bitmap_weight((unsigned long *)&cbm, r->cache.cbm_len);
+	ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == r->cache_level) {
+			size = ci->info_list[i].size / r->cache.cbm_len * num_b;
+			break;
+		}
+	}
+
+	return size;
+}
+
+/**
+ * rdtgroup_size_show - Display size in bytes of allocated regions
+ *
+ * The "size" file mirrors the layout of the "schemata" file, printing the
+ * size in bytes of each region instead of the capacity bitmask.
+ *
+ */
+static int rdtgroup_size_show(struct kernfs_open_file *of,
+			      struct seq_file *s, void *v)
+{
+	struct rdtgroup *rdtgrp;
+	struct rdt_resource *r;
+	struct rdt_domain *d;
+	unsigned int size;
+	bool sep = false;
+	u32 cbm;
+
+	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		seq_printf(s, "%*s:", max_name_width, r->name);
+		list_for_each_entry(d, &r->domains, list) {
+			if (sep)
+				seq_puts(s, ";");
+			cbm = d->ctrl_val[rdtgrp->closid];
+			size = rdtgroup_cbm_to_size(r, d, cbm);
+			seq_printf(s, "%d=%u", d->id, size);
+			sep = true;
+		}
+		seq_puts(s, "\n");
+	}
+
+	rdtgroup_kn_unlock(of->kn);
+
+	return 0;
+}
+
 /* rdtgroup information files for one cache resource. */
 static struct rftype res_common_files[] = {
 	{
@@ -1144,6 +1217,14 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdtgroup_mode_show,
 		.fflags		= RF_CTRL_BASE,
 	},
+	{
+		.name		= "size",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdtgroup_size_show,
+		.fflags		= RF_CTRL_BASE,
+	},
+
 };
 
 static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 15/38] x86/intel_rdt: Documentation for Cache Pseudo-Locking
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (13 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 14/38] x86/intel_rdt: Display resource groups' allocations' size in bytes Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 16/38] x86/intel_rdt: Introduce the Cache Pseudo-Locking modes Reinette Chatre
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Add description of Cache Pseudo-Locking feature, its interface,
as well as an example of its usage.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt | 278 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 276 insertions(+), 2 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index de913e00e922..70ec9c44b1ee 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -29,7 +29,11 @@ mount options are:
 L2 and L3 CDP are controlled seperately.
 
 RDT features are orthogonal. A particular system may support only
-monitoring, only control, or both monitoring and control.
+monitoring, only control, or both monitoring and control.  Cache
+pseudo-locking is a unique way of using cache control to "pin" or
+"lock" data in the cache. Details can be found in
+"Cache Pseudo-Locking".
+
 
 The mount succeeds if either of allocation or monitoring is present, but
 only those files and directories supported by the system will be created.
@@ -86,6 +90,8 @@ related to allocation:
 			      and available for sharing.
 			"E" - Corresponding region is used exclusively by
 			      one resource group. No sharing allowed.
+			"P" - Corresponding region is pseudo-locked. No
+			      sharing allowed.
 
 Memory bandwitdh(MB) subdirectory contains the following files
 with respect to allocation:
@@ -192,7 +198,12 @@ When control is enabled all CTRL_MON groups will also contain:
 "mode":
 	The "mode" of the resource group dictates the sharing of its
 	allocations. A "shareable" resource group allows sharing of its
-	allocations while an "exclusive" resource group does not.
+	allocations while an "exclusive" resource group does not. A
+	cache pseudo-locked region is created by first writing
+	"pseudo-locksetup" to the "mode" file before writing the cache
+	pseudo-locked region's schemata to the resource group's "schemata"
+	file. On successful pseudo-locked region creation the mode will
+	automatically change to "pseudo-locked".
 
 When monitoring is enabled all MON groups will also contain:
 
@@ -410,6 +421,168 @@ L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
 L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
 
+Cache Pseudo-Locking
+--------------------
+CAT enables a user to specify the amount of cache space that an
+application can fill. Cache pseudo-locking builds on the fact that a
+CPU can still read and write data pre-allocated outside its current
+allocated area on a cache hit. With cache pseudo-locking, data can be
+preloaded into a reserved portion of cache that no application can
+fill, and from that point on will only serve cache hits. The cache
+pseudo-locked memory is made accessible to user space where an
+application can map it into its virtual address space and thus have
+a region of memory with reduced average read latency.
+
+The creation of a cache pseudo-locked region is triggered by a request
+from the user to do so that is accompanied by a schemata of the region
+to be pseudo-locked. The cache pseudo-locked region is created as follows:
+- Create a CAT allocation CLOSNEW with a CBM matching the schemata
+  from the user of the cache region that will contain the pseudo-locked
+  memory. This region must not overlap with any current CAT allocation/CLOS
+  on the system and no future overlap with this cache region is allowed
+  while the pseudo-locked region exists.
+- Create a contiguous region of memory of the same size as the cache
+  region.
+- Flush the cache, disable hardware prefetchers, disable preemption.
+- Make CLOSNEW the active CLOS and touch the allocated memory to load
+  it into the cache.
+- Set the previous CLOS as active.
+- At this point the closid CLOSNEW can be released - the cache
+  pseudo-locked region is protected as long as its CBM does not appear in
+  any CAT allocation. Even though the cache pseudo-locked region will from
+  this point on not appear in any CBM of any CLOS an application running with
+  any CLOS will be able to access the memory in the pseudo-locked region since
+  the region continues to serve cache hits.
+- The contiguous region of memory loaded into the cache is exposed to
+  user-space as a character device.
+
+Cache pseudo-locking increases the probability that data will remain
+in the cache via carefully configuring the CAT feature and controlling
+application behavior. There is no guarantee that data is placed in
+cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
+“locked” data from cache. Power management C-states may shrink or
+power off cache. It is thus recommended to limit the processor maximum
+C-state, for example, by setting the processor.max_cstate kernel parameter.
+
+It is required that an application using a pseudo-locked region runs
+with affinity to the cores (or a subset of the cores) associated
+with the cache on which the pseudo-locked region resides. A sanity check
+within the code will not allow an application to map pseudo-locked memory
+unless it runs with affinity to cores associated with the cache on which the
+pseudo-locked region resides. The sanity check is only done during the
+initial mmap() handling, there is no enforcement afterwards and the
+application self needs to ensure it remains affine to the correct cores.
+
+Pseudo-locking is accomplished in two stages:
+1) During the first stage the system administrator allocates a portion
+   of cache that should be dedicated to pseudo-locking. At this time an
+   equivalent portion of memory is allocated, loaded into allocated
+   cache portion, and exposed as a character device.
+2) During the second stage a user-space application maps (mmap()) the
+   pseudo-locked memory into its address space.
+
+Cache Pseudo-Locking Interface
+------------------------------
+A pseudo-locked region is created using the resctrl interface as follows:
+
+1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
+2) Change the new resource group's mode to "pseudo-locksetup" by writing
+   "pseudo-locksetup" to the "mode" file.
+3) Write the schemata of the pseudo-locked region to the "schemata" file. All
+   bits within the schemata should be "unused" according to the "bit_usage"
+   file.
+
+On successful pseudo-locked region creation the "mode" file will contain
+"pseudo-locked" and a new character device with the same name as the resource
+group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
+by user space in order to obtain access to the pseudo-locked memory region.
+
+An example of cache pseudo-locked region creation and usage can be found below.
+
+Cache Pseudo-Locking Debugging Interface
+---------------------------------------
+The pseudo-locking debugging interface is enabled with
+CONFIG_INTEL_RDT_DEBUGFS and can be found in /sys/kernel/debug/resctrl.
+
+There is no explicit way for the kernel to test if a provided memory
+location is present in the cache. The pseudo-locking debugging interface uses
+the tracing infrastructure to provide two ways to measure cache residency of
+the pseudo-locked region:
+1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
+   from these measurements are best visualized using a hist trigger (see
+   example below). In this test the pseudo-locked region is traversed at
+   a stride of 32 bytes while hardware prefetchers and preemption
+   are disabled. This also provides a substitute visualization of cache
+   hits and misses.
+2) Cache hit and miss measurements using model specific precision counters if
+   available. Depending on the levels of cache on the system the pseudo_lock_l2
+   and pseudo_lock_l3 tracepoints are available.
+   WARNING: triggering this  measurement uses from two (for just L2
+   measurements) to four (for L2 and L3 measurements) precision counters on
+   the system, if any other measurements are in progress the counters and
+   their corresponding event registers will be clobbered.
+
+When a pseudo-locked region is created a new debugfs directory is created for
+it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
+write-only file, pseudo_lock_measure, is present in this directory. The
+measurement on the pseudo-locked region depends on the number, 1 or 2,
+written to this debugfs file. Since the measurements are recorded with the
+tracing infrastructure the relevant tracepoints need to be enabled before the
+measurement is triggered.
+
+Example of latency debugging interface:
+In this example a pseudo-locked region named "newlock" was created. Here is
+how we can measure the latency in cycles of reading from this region:
+# :> /sys/kernel/debug/tracing/trace
+# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
+# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
+# echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
+# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
+# cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
+
+# event histogram
+#
+# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
+#
+
+{ latency:        456 } hitcount:          1
+{ latency:         50 } hitcount:         83
+{ latency:         36 } hitcount:         96
+{ latency:         44 } hitcount:        174
+{ latency:         48 } hitcount:        195
+{ latency:         46 } hitcount:        262
+{ latency:         42 } hitcount:        693
+{ latency:         40 } hitcount:       3204
+{ latency:         38 } hitcount:       3484
+
+Totals:
+    Hits: 8192
+    Entries: 9
+   Dropped: 0
+
+Example of cache hits/misses debugging:
+In this example a pseudo-locked region named "newlock" was created on the L2
+cache of a platform. Here is how we can obtain details of the cache hits
+and misses using the platform's precision counters.
+
+# :> /sys/kernel/debug/tracing/trace
+# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
+# echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
+# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
+# cat /sys/kernel/debug/tracing/trace
+
+# tracer: nop
+#
+#                              _-----=> irqs-off
+#                             / _----=> need-resched
+#                            | / _---=> hardirq/softirq
+#                            || / _--=> preempt-depth
+#                            ||| /     delay
+#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
+#              | |       |   ||||       |         |
+ pseudo_lock_mea-1672  [002] ....  3132.860500: pseudo_lock_l2: hits=4097 miss=0
+
+
 Examples for RDT allocation usage:
 
 Example 1
@@ -596,6 +769,107 @@ A resource group cannot be forced to overlap with an exclusive resource group:
 # cat info/last_cmd_status
 overlaps with exclusive group
 
+Example of Cache Pseudo-Locking
+-------------------------------
+Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
+region is exposed at /dev/pseudo_lock/newlock that can be provided to
+application for argument to mmap().
+
+# mount -t resctrl resctrl /sys/fs/resctrl/
+# cd /sys/fs/resctrl
+
+Ensure that there are bits available that can be pseudo-locked, since only
+unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
+removed from the default resource group's schemata:
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSSSS
+# echo 'L2:1=0xfc' > schemata
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSS00
+
+Create a new resource group that will be associated with the pseudo-locked
+region, indicate that it will be used for a pseudo-locked region, and
+configure the requested pseudo-locked region capacity bitmask:
+
+# mkdir newlock
+# echo pseudo-locksetup > newlock/mode
+# echo 'L2:1=0x3' > newlock/schemata
+
+On success the resource group's mode will change to pseudo-locked, the
+bit_usage will reflect the pseudo-locked region, and the character device
+exposing the pseudo-locked region will exist:
+
+# cat newlock/mode
+pseudo-locked
+# cat info/L2/bit_usage
+0=SSSSSSSS;1=SSSSSSPP
+# ls -l /dev/pseudo_lock/newlock
+crw------- 1 root root 243, 0 Apr  3 05:01 /dev/pseudo_lock/newlock
+
+/*
+ * Example code to access one page of pseudo-locked cache region
+ * from user space.
+ */
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+/*
+ * It is required that the application runs with affinity to only
+ * cores associated with the pseudo-locked region. Here the cpu
+ * is hardcoded for convenience of example.
+ */
+static int cpuid = 2;
+
+int main(int argc, char *argv[])
+{
+	cpu_set_t cpuset;
+	long page_size;
+	void *mapping;
+	int dev_fd;
+	int ret;
+
+	page_size = sysconf(_SC_PAGESIZE);
+
+	CPU_ZERO(&cpuset);
+	CPU_SET(cpuid, &cpuset);
+	ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
+	if (ret < 0) {
+		perror("sched_setaffinity");
+		exit(EXIT_FAILURE);
+	}
+
+	dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
+	if (dev_fd < 0) {
+		perror("open");
+		exit(EXIT_FAILURE);
+	}
+
+	mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+		       dev_fd, 0);
+	if (mapping == MAP_FAILED) {
+		perror("mmap");
+		close(dev_fd);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Application interacts with pseudo-locked memory @mapping */
+
+	ret = munmap(mapping, page_size);
+	if (ret < 0) {
+		perror("munmap");
+		close(dev_fd);
+		exit(EXIT_FAILURE);
+	}
+
+	close(dev_fd);
+	exit(EXIT_SUCCESS);
+}
+
 Locking between applications
 ----------------------------
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 16/38] x86/intel_rdt: Introduce the Cache Pseudo-Locking modes
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (14 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 15/38] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 17/38] x86/intel_rdt: Respect read and write access Reinette Chatre
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The two modes used to manage Cache Pseudo-Locked regions are introduced.
A resource group is assigned "pseudo-locksetup" mode when the user
indicates that this resource group will be used for a Cache
Pseudo-Locked region. When the Cache Pseudo-Locked region has been set
up successfully after the user wrote the requested schemata to the
"schemata" file, then the mode will automatically changed to
"pseudo-locked".  The user is not able to modify the mode to
"pseudo-locked" by writing "pseudo-locked" to the "mode" file directly.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          | 10 ++++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 13 +++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 8bbb047bf37c..bda87f5ef7bc 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -84,15 +84,25 @@ enum rdt_group_type {
  * enum rdtgrp_mode - Mode of a RDT resource group
  * @RDT_MODE_SHAREABLE: This resource group allows sharing of its allocations
  * @RDT_MODE_EXCLUSIVE: No sharing of this resource group's allocations allowed
+ * @RDT_MODE_PSEUDO_LOCKSETUP: Resource group will be used for Pseudo-Locking
+ * @RDT_MODE_PSEUDO_LOCKED: No sharing of this resource group's allocations
+ *                          allowed AND the allocations are Cache Pseudo-Locked
  *
  * The mode of a resource group enables control over the allowed overlap
  * between allocations associated with different resource groups (classes
  * of service). User is able to modify the mode of a resource group by
  * writing to the "mode" resctrl file associated with the resource group.
+ *
+ * The "shareable", "exclusive", and "pseudo-locksetup" modes are set by
+ * writing the appropriate text to the "mode" file. A resource group enters
+ * "pseudo-locked" mode after the schemata is written while the resource
+ * group is in "pseudo-locksetup" mode.
  */
 enum rdtgrp_mode {
 	RDT_MODE_SHAREABLE = 0,
 	RDT_MODE_EXCLUSIVE,
+	RDT_MODE_PSEUDO_LOCKSETUP,
+	RDT_MODE_PSEUDO_LOCKED,
 
 	/* Must be last */
 	RDT_NUM_MODES,
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 2af99e03faae..e6069822f592 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -161,8 +161,10 @@ enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
 }
 
 static const char * const rdt_mode_str[] = {
-	[RDT_MODE_SHAREABLE]	= "shareable",
-	[RDT_MODE_EXCLUSIVE]	= "exclusive",
+	[RDT_MODE_SHAREABLE]		= "shareable",
+	[RDT_MODE_EXCLUSIVE]		= "exclusive",
+	[RDT_MODE_PSEUDO_LOCKSETUP]	= "pseudo-locksetup",
+	[RDT_MODE_PSEUDO_LOCKED]	= "pseudo-locked",
 };
 
 /**
@@ -759,6 +761,13 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 			case RDT_MODE_EXCLUSIVE:
 				exclusive |= *ctrl;
 				break;
+			/*
+			 * Temporarily handle pseudo-locking enums
+			 * to silence compile warnings until handling
+			 * added in later patches.
+			 */
+			case RDT_MODE_PSEUDO_LOCKSETUP:
+			case RDT_MODE_PSEUDO_LOCKED:
 			case RDT_NUM_MODES:
 				WARN(1,
 				     "invalid mode for closid %d\n", i);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 17/38] x86/intel_rdt: Respect read and write access
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (15 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 16/38] x86/intel_rdt: Introduce the Cache Pseudo-Locking modes Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 18/38] x86/intel_rdt: Add utility to test if tasks assigned to resource group Reinette Chatre
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

By default, if the opener has CAP_DAC_OVERRIDE, a kernfs file can be
opened regardless of RW permissions. Writing to a kernfs file will thus
succeed even if permissions are 0000.

We would like to restrict the actions that can be performed on a
resource group from userspace based on the mode of the resource group.
This restriction will be done through a modification of the file
permissions. That is, for example, if a resource group is locked then
the user cannot add tasks to the resource group.

For this restriction through file permissions to work we have to ensure
that the permissions are always respected. To do so the resctrl
filesystem is created with the KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK flag
that will result in open(2) failing with -EACCESS regardless of
CAP_DAC_OVERRIDE if the permission does not have the respective read or
write access.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e6069822f592..7d623d940e49 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2491,7 +2491,8 @@ static int __init rdtgroup_setup_root(void)
 	int ret;
 
 	rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
-				      KERNFS_ROOT_CREATE_DEACTIVATED,
+				      KERNFS_ROOT_CREATE_DEACTIVATED |
+				      KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK,
 				      &rdtgroup_default);
 	if (IS_ERR(rdt_root))
 		return PTR_ERR(rdt_root);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 18/38] x86/intel_rdt: Add utility to test if tasks assigned to resource group
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (16 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 17/38] x86/intel_rdt: Respect read and write access Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 19/38] x86/intel_rdt: Add utility to restrict/restore access to resctrl files Reinette Chatre
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

In considering changes to a resource group it becomes necessary to know
whether tasks have been assigned to the resource group in question.

Introduce a new utility that can be used to check if any tasks have been
assigned to a particular resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          |  1 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index bda87f5ef7bc..10a1539cbec6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -480,6 +480,7 @@ bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
 unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
 				  u32 cbm);
 enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
+int rdtgroup_tasks_assigned(struct rdtgroup *r);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 7d623d940e49..f431ab493339 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -564,6 +564,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 	return ret;
 }
 
+/**
+ * rdtgroup_tasks_assigned - Test if tasks have been assigned to resource group
+ * @r: Resource group
+ *
+ * Return: 1 if tasks have been assigned to @r, 0 otherwise
+ */
+int rdtgroup_tasks_assigned(struct rdtgroup *r)
+{
+	struct task_struct *p, *t;
+	int ret = 0;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	rcu_read_lock();
+	for_each_process_thread(p, t) {
+		if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) ||
+		    (r->type == RDTMON_GROUP && t->rmid == r->mon.rmid)) {
+			ret = 1;
+			break;
+		}
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static int rdtgroup_task_write_permission(struct task_struct *task,
 					  struct kernfs_open_file *of)
 {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 19/38] x86/intel_rdt: Add utility to restrict/restore access to resctrl files
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (17 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 18/38] x86/intel_rdt: Add utility to test if tasks assigned to resource group Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 20/38] x86/intel_rdt: Protect against resource group changes during locking Reinette Chatre
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

When a resource group is used for Cache Pseudo-Locking then the region
of cache ends up being orphaned with no class of service referring to
it. The resctrl files intended to manage how the classes of services are
utilized thus become irrelevant.

The fact that a resctrl file is not relevant can be communicated to the
user by setting all of its permissions to zero. That is, its read,
write, and execute permissions are unset for all users.

Introduce two utilities, rdtgroup_kn_mode_restrict() and
rdtgroup_kn_mode_restore(), that can be used to restrict and restore
the permissions of a file or directory belonging to a resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h          |  2 +
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 95 ++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 10a1539cbec6..c9b8d3d1d413 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -469,6 +469,8 @@ void rdt_last_cmd_printf(const char *fmt, ...);
 void rdt_ctrl_update(void *arg);
 struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
 void rdtgroup_kn_unlock(struct kernfs_node *kn);
+int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
+int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name);
 struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
 				   struct list_head **pos);
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index f431ab493339..3461fb25bb92 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1290,6 +1290,101 @@ static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
 	return ret;
 }
 
+/**
+ * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
+ * @r: The resource group with which the file is associated.
+ * @name: Name of the file
+ *
+ * The permissions of named resctrl file, directory, or link are modified
+ * to not allow read, write, or execute by any user.
+ *
+ * WARNING: This function is intended to communicate to the user that the
+ * resctrl file has been locked down - that it is not relevant to the
+ * particular state the system finds itself in. It should not be relied
+ * on to protect from user access because after the file's permissions
+ * are restricted the user can still change the permissions using chmod
+ * from the command line.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name)
+{
+	struct iattr iattr = {.ia_valid = ATTR_MODE,};
+	struct kernfs_node *kn;
+	int ret = 0;
+
+	kn = kernfs_find_and_get_ns(r->kn, name, NULL);
+	if (!kn)
+		return -ENOENT;
+
+	switch (kernfs_type(kn)) {
+	case KERNFS_DIR:
+		iattr.ia_mode = S_IFDIR;
+		break;
+	case KERNFS_FILE:
+		iattr.ia_mode = S_IFREG;
+		break;
+	case KERNFS_LINK:
+		iattr.ia_mode = S_IFLNK;
+		break;
+	}
+
+	ret = kernfs_setattr(kn, &iattr);
+	kernfs_put(kn);
+	return ret;
+}
+
+/**
+ * rdtgroup_kn_mode_restore - Restore user access to named resctrl file
+ * @r: The resource group with which the file is associated.
+ * @name: Name of the file
+ *
+ * Restore the permissions of the named file. If @name is a directory the
+ * permissions of its parent will be used.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name)
+{
+	struct iattr iattr = {.ia_valid = ATTR_MODE,};
+	struct kernfs_node *kn, *parent;
+	struct rftype *rfts, *rft;
+	int ret, len;
+
+	rfts = res_common_files;
+	len = ARRAY_SIZE(res_common_files);
+
+	for (rft = rfts; rft < rfts + len; rft++) {
+		if (!strcmp(rft->name, name))
+			iattr.ia_mode = rft->mode;
+	}
+
+	kn = kernfs_find_and_get_ns(r->kn, name, NULL);
+	if (!kn)
+		return -ENOENT;
+
+	switch (kernfs_type(kn)) {
+	case KERNFS_DIR:
+		parent = kernfs_get_parent(kn);
+		if (parent) {
+			iattr.ia_mode |= parent->mode;
+			kernfs_put(parent);
+		}
+		iattr.ia_mode |= S_IFDIR;
+		break;
+	case KERNFS_FILE:
+		iattr.ia_mode |= S_IFREG;
+		break;
+	case KERNFS_LINK:
+		iattr.ia_mode |= S_IFLNK;
+		break;
+	}
+
+	ret = kernfs_setattr(kn, &iattr);
+	kernfs_put(kn);
+	return ret;
+}
+
 static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name,
 				      unsigned long fflags)
 {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 20/38] x86/intel_rdt: Protect against resource group changes during locking
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (18 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 19/38] x86/intel_rdt: Add utility to restrict/restore access to resctrl files Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 21/38] x86/intel_rdt: Utilities to restrict/restore access to specific files Reinette Chatre
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

We intend to modify file permissions to make the "tasks", "cpus", and
"cpus_list" not accessible to the user when cache pseudo-locking in
progress. Even so, it is still possible for the user to force the file
permissions (using chmod) to make them writeable. Similarly, directory
permissions will be modified to prevent future monitor group creation
but the user can override these restrictions also.

Add additional checks to the files we intend to restrict to ensure that
no modifications from user space are attempted while setting up a
pseudo-locking or after a pseudo-locked region is set up.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 10 +++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 32 +++++++++++++++++++++++++----
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 0e6210a043f0..bc79396c5dad 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -283,6 +283,16 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 	}
 	rdt_last_cmd_clear();
 
+	/*
+	 * No changes to pseudo-locked region allowed. It has to be removed
+	 * and re-created instead.
+	 */
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("resource group is pseudo-locked\n");
+		goto out;
+	}
+
 	for_each_alloc_enabled_rdt_resource(r) {
 		list_for_each_entry(dom, &r->domains, list)
 			dom->have_new_ctrl = false;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 3461fb25bb92..2df8bcabe085 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -449,6 +449,13 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
 		goto unlock;
 	}
 
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
+	    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("pseudo-locking in progress\n");
+		goto unlock;
+	}
+
 	if (is_cpu_list(of))
 		ret = cpulist_parse(buf, newmask);
 	else
@@ -651,13 +658,22 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
 	if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
 		return -EINVAL;
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
+	if (!rdtgrp) {
+		rdtgroup_kn_unlock(of->kn);
+		return -ENOENT;
+	}
 	rdt_last_cmd_clear();
 
-	if (rdtgrp)
-		ret = rdtgroup_move_task(pid, rdtgrp, of);
-	else
-		ret = -ENOENT;
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
+	    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("pseudo-locking in progress\n");
+		goto unlock;
+	}
+
+	ret = rdtgroup_move_task(pid, rdtgrp, of);
 
+unlock:
 	rdtgroup_kn_unlock(of->kn);
 
 	return ret ?: nbytes;
@@ -2261,6 +2277,14 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
 		goto out_unlock;
 	}
 
+	if (rtype == RDTMON_GROUP &&
+	    (prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+	     prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
+		ret = -EINVAL;
+		rdt_last_cmd_puts("pseudo-locking in progress\n");
+		goto out_unlock;
+	}
+
 	/* allocate the rdtgroup. */
 	rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
 	if (!rdtgrp) {
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 21/38] x86/intel_rdt: Utilities to restrict/restore access to specific files
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (19 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 20/38] x86/intel_rdt: Protect against resource group changes during locking Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 22/38] x86/intel_rdt: Add check to determine if monitoring in progress Reinette Chatre
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

In support of Cache Pseudo-Locking we need to restrict access to
specific resctrl files to protect the state of a resource group used for
pseudo-locking from being changed in unsupported ways.

Introduce two utilities that can be used to either restrict or restore
the access to all files irrelevant to cache pseudo-locking when
pseudo-locking in progress for the resource group.

At this time introduce a new source file, intel_rdt_pseudo_lock.c, that
will contain most of the code related to cache pseudo-locking.

Temporarily mark these new functions as unused to silence compile
warnings until they are used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/Makefile                |   3 +-
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 113 ++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index a66229f51b12..071f50162727 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -35,7 +35,8 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
-obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o intel_rdt_ctrlmondata.o
+obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
+obj-$(CONFIG_INTEL_RDT)	+= intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
new file mode 100644
index 000000000000..dc79b3090ac5
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -0,0 +1,113 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Resource Director Technology (RDT)
+ *
+ * Pseudo-locking support built on top of Cache Allocation Technology (CAT)
+ *
+ * Copyright (C) 2018 Intel Corporation
+ *
+ * Author: Reinette Chatre <reinette.chatre@intel.com>
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include "intel_rdt.h"
+
+/**
+ * rdtgroup_locksetup_user_restrict - Restrict user access to group
+ * @rdtgrp: resource group needing access restricted
+ *
+ * A resource group used for cache pseudo-locking cannot have cpus or tasks
+ * assigned to it. This is communicated to the user by restricting access
+ * to all the files that can be used to make such changes.
+ *
+ * Permissions restored with rdtgroup_locksetup_user_restore()
+ *
+ * Return: 0 on success, <0 on failure. If a failure occurs during the
+ * restriction of access an attempt will be made to restore permissions but
+ * the state of the mode of these files will be uncertain when a failure
+ * occurs.
+ */
+static int __attribute__ ((unused))
+rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
+{
+	int ret;
+
+	ret = rdtgroup_kn_mode_restrict(rdtgrp, "tasks");
+	if (ret)
+		return ret;
+
+	ret = rdtgroup_kn_mode_restrict(rdtgrp, "cpus");
+	if (ret)
+		goto err_tasks;
+
+	ret = rdtgroup_kn_mode_restrict(rdtgrp, "cpus_list");
+	if (ret)
+		goto err_cpus;
+
+	if (rdt_mon_capable) {
+		ret = rdtgroup_kn_mode_restrict(rdtgrp, "mon_groups");
+		if (ret)
+			goto err_cpus_list;
+	}
+
+	ret = 0;
+	goto out;
+
+err_cpus_list:
+	rdtgroup_kn_mode_restore(rdtgrp, "cpus_list");
+err_cpus:
+	rdtgroup_kn_mode_restore(rdtgrp, "cpus");
+err_tasks:
+	rdtgroup_kn_mode_restore(rdtgrp, "tasks");
+out:
+	return ret;
+}
+
+/**
+ * rdtgroup_locksetup_user_restore - Restore user access to group
+ * @rdtgrp: resource group needing access restored
+ *
+ * Restore all file access previously removed using
+ * rdtgroup_locksetup_user_restrict()
+ *
+ * Return: 0 on success, <0 on failure.  If a failure occurs during the
+ * restoration of access an attempt will be made to restrict permissions
+ * again but the state of the mode of these files will be uncertain when
+ * a failure occurs.
+ */
+static int __attribute__ ((unused))
+rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
+{
+	int ret;
+
+	ret = rdtgroup_kn_mode_restore(rdtgrp, "tasks");
+	if (ret)
+		return ret;
+
+	ret = rdtgroup_kn_mode_restore(rdtgrp, "cpus");
+	if (ret)
+		goto err_tasks;
+
+	ret = rdtgroup_kn_mode_restore(rdtgrp, "cpus_list");
+	if (ret)
+		goto err_cpus;
+
+	if (rdt_mon_capable) {
+		ret = rdtgroup_kn_mode_restore(rdtgrp, "mon_groups");
+		if (ret)
+			goto err_cpus_list;
+	}
+
+	ret = 0;
+	goto out;
+
+err_cpus_list:
+	rdtgroup_kn_mode_restrict(rdtgrp, "cpus_list");
+err_cpus:
+	rdtgroup_kn_mode_restrict(rdtgrp, "cpus");
+err_tasks:
+	rdtgroup_kn_mode_restrict(rdtgrp, "tasks");
+out:
+	return ret;
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 22/38] x86/intel_rdt: Add check to determine if monitoring in progress
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (20 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 21/38] x86/intel_rdt: Utilities to restrict/restore access to specific files Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 23/38] x86/intel_rdt: Introduce pseudo-locked region Reinette Chatre
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

When a resource group is pseudo-locked it is orphaned without a class of
service associated with it. We thus do not want any monitoring in
progress on a resource group that will be used for pseudo-locking.

Introduce a test that can be used to determine if pseudo-locking in
progress on a resource group. Temporarily mark it as unused to avoid
compile warnings until it is used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index dc79b3090ac5..8693dbe602a2 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -14,6 +14,19 @@
 #include "intel_rdt.h"
 
 /**
+ * rdtgroup_monitor_in_progress - Test if monitoring in progress
+ * @r: resource group being queried
+ *
+ * Return: 1 if monitor groups have been created for this resource
+ * group, 0 otherwise.
+ */
+static int __attribute__ ((unused))
+rdtgroup_monitor_in_progress(struct rdtgroup *rdtgrp)
+{
+	return !list_empty(&rdtgrp->mon.crdtgrp_list);
+}
+
+/**
  * rdtgroup_locksetup_user_restrict - Restrict user access to group
  * @rdtgrp: resource group needing access restricted
  *
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 23/38] x86/intel_rdt: Introduce pseudo-locked region
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (21 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 22/38] x86/intel_rdt: Add check to determine if monitoring in progress Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 24/38] x86/intel_rdt: Support enter/exit of locksetup mode Reinette Chatre
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

A pseudo-locked region is introduced representing an instance of a
pseudo-locked cache region. Each cache instance (domain) can support
one pseudo-locked region. Similarly a resource group can be used for
one pseudo-locked region.

Include a pointer to a pseudo-locked region from the domain and resource
group structures.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h | 64 ++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c9b8d3d1d413..02ae088fd745 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -123,6 +123,20 @@ struct mongroup {
 };
 
 /**
+ * struct pseudo_lock_region - pseudo-lock region information
+ * @r:			RDT resource to which this pseudo-locked region
+ *			belongs
+ * @d:			RDT domain to which this pseudo-locked region
+ *			belongs
+ * @cbm:		bitmask of the pseudo-locked region
+ */
+struct pseudo_lock_region {
+	struct rdt_resource	*r;
+	struct rdt_domain	*d;
+	u32			cbm;
+};
+
+/**
  * struct rdtgroup - store rdtgroup's data in resctrl file system.
  * @kn:				kernfs node
  * @rdtgroup_list:		linked list for all rdtgroups
@@ -135,17 +149,19 @@ struct mongroup {
  *				monitor only or ctrl_mon group
  * @mon:			mongroup related data
  * @mode:			mode of resource group
+ * @plr:			pseudo-locked region
  */
 struct rdtgroup {
-	struct kernfs_node	*kn;
-	struct list_head	rdtgroup_list;
-	u32			closid;
-	struct cpumask		cpu_mask;
-	int			flags;
-	atomic_t		waitcount;
-	enum rdt_group_type	type;
-	struct mongroup		mon;
-	enum rdtgrp_mode	mode;
+	struct kernfs_node		*kn;
+	struct list_head		rdtgroup_list;
+	u32				closid;
+	struct cpumask			cpu_mask;
+	int				flags;
+	atomic_t			waitcount;
+	enum rdt_group_type		type;
+	struct mongroup			mon;
+	enum rdtgrp_mode		mode;
+	struct pseudo_lock_region	*plr;
 };
 
 /* rdtgroup.flags */
@@ -246,22 +262,24 @@ struct mbm_state {
  * @mbps_val:	When mba_sc is enabled, this holds the bandwidth in MBps
  * @new_ctrl:	new ctrl value to be loaded
  * @have_new_ctrl: did user provide new_ctrl for this domain
+ * @plr:	pseudo-locked region (if any) associated with domain
  */
 struct rdt_domain {
-	struct list_head	list;
-	int			id;
-	struct cpumask		cpu_mask;
-	unsigned long		*rmid_busy_llc;
-	struct mbm_state	*mbm_total;
-	struct mbm_state	*mbm_local;
-	struct delayed_work	mbm_over;
-	struct delayed_work	cqm_limbo;
-	int			mbm_work_cpu;
-	int			cqm_work_cpu;
-	u32			*ctrl_val;
-	u32			*mbps_val;
-	u32			new_ctrl;
-	bool			have_new_ctrl;
+	struct list_head		list;
+	int				id;
+	struct cpumask			cpu_mask;
+	unsigned long			*rmid_busy_llc;
+	struct mbm_state		*mbm_total;
+	struct mbm_state		*mbm_local;
+	struct delayed_work		mbm_over;
+	struct delayed_work		cqm_limbo;
+	int				mbm_work_cpu;
+	int				cqm_work_cpu;
+	u32				*ctrl_val;
+	u32				*mbps_val;
+	u32				new_ctrl;
+	bool				have_new_ctrl;
+	struct pseudo_lock_region	*plr;
 };
 
 /**
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 24/38] x86/intel_rdt: Support enter/exit of locksetup mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (22 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 23/38] x86/intel_rdt: Introduce pseudo-locked region Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 25/38] x86/intel_rdt: Enable entering of pseudo-locksetup mode Reinette Chatre
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The locksetup mode is the way in which the user communicates that the
resource group will be used for a pseudo-locked region. Locksetup mode
should thus ensure that all restrictions on a resource group are met
before locksetup mode can be entered. The resource group should also be
configured to ensure that it cannot be modified in unsupported ways when
a pseudo-locked region.

Introduce the support where the request for entering locksetup mode can
be validated. This includes: CDP is not active, no cpus or tasks are
assigned to the resource group, monitoring is not in progress on the
resource group. Once the resource group is determined ready for a
pseudo-locked region it is configured to not allow future changes to
these properties.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |   2 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 187 +++++++++++++++++++++++++++-
 2 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 02ae088fd745..12b006178d3a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -501,6 +501,8 @@ unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
 				  u32 cbm);
 enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 int rdtgroup_tasks_assigned(struct rdtgroup *r);
+int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
+int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 8693dbe602a2..ce8243c87877 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -11,17 +11,56 @@
 
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
+#include <linux/slab.h>
 #include "intel_rdt.h"
 
 /**
+ * pseudo_lock_init - Initialize a pseudo-lock region
+ * @rdtgrp: resource group to which new pseudo-locked region will belong
+ *
+ * A pseudo-locked region is associated with a resource group. When this
+ * association is created the pseudo-locked region is initialized. The
+ * details of the pseudo-locked region are not known at this time so only
+ * allocation is done and association established.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int pseudo_lock_init(struct rdtgroup *rdtgrp)
+{
+	struct pseudo_lock_region *plr;
+
+	plr = kzalloc(sizeof(*plr), GFP_KERNEL);
+	if (!plr)
+		return -ENOMEM;
+
+	rdtgrp->plr = plr;
+	return 0;
+}
+
+/**
+ * pseudo_lock_free - Free a pseudo-locked region
+ * @rdtgrp: resource group to which pseudo-locked region belonged
+ *
+ * The pseudo-locked region's resources have already been released, or not
+ * yet created at this point. Now it can be freed and disassociated from the
+ * resource group.
+ *
+ * Return: void
+ */
+static void pseudo_lock_free(struct rdtgroup *rdtgrp)
+{
+	kfree(rdtgrp->plr);
+	rdtgrp->plr = NULL;
+}
+
+/**
  * rdtgroup_monitor_in_progress - Test if monitoring in progress
  * @r: resource group being queried
  *
  * Return: 1 if monitor groups have been created for this resource
  * group, 0 otherwise.
  */
-static int __attribute__ ((unused))
-rdtgroup_monitor_in_progress(struct rdtgroup *rdtgrp)
+static int rdtgroup_monitor_in_progress(struct rdtgroup *rdtgrp)
 {
 	return !list_empty(&rdtgrp->mon.crdtgrp_list);
 }
@@ -41,8 +80,7 @@ rdtgroup_monitor_in_progress(struct rdtgroup *rdtgrp)
  * the state of the mode of these files will be uncertain when a failure
  * occurs.
  */
-static int __attribute__ ((unused))
-rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
+static int rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
 {
 	int ret;
 
@@ -89,8 +127,7 @@ rdtgroup_locksetup_user_restrict(struct rdtgroup *rdtgrp)
  * again but the state of the mode of these files will be uncertain when
  * a failure occurs.
  */
-static int __attribute__ ((unused))
-rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
+static int rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
 {
 	int ret;
 
@@ -124,3 +161,141 @@ rdtgroup_locksetup_user_restore(struct rdtgroup *rdtgrp)
 out:
 	return ret;
 }
+
+/**
+ * rdtgroup_locksetup_enter - Resource group enters locksetup mode
+ * @rdtgrp: resource group requested to enter locksetup mode
+ *
+ * A resource group enters locksetup mode to reflect that it would be used
+ * to represent a pseudo-locked region and is in the process of being set
+ * up to do so. A resource group used for a pseudo-locked region would
+ * lose the closid associated with it so we cannot allow it to have any
+ * tasks or cpus assigned nor permit tasks or cpus to be assigned in the
+ * future. Monitoring of a pseudo-locked region is not allowed either.
+ *
+ * The above and more restrictions on a pseudo-locked region are checked
+ * for and enforced before the resource group enters the locksetup mode.
+ *
+ * Returns: 0 if the resource group successfully entered locksetup mode, <0
+ * on failure. On failure the last_cmd_status buffer is updated with text to
+ * communicate details of failure to the user.
+ */
+int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
+{
+	int ret;
+
+	/*
+	 * The default resource group can neither be removed nor lose the
+	 * default closid associated with it.
+	 */
+	if (rdtgrp == &rdtgroup_default) {
+		rdt_last_cmd_puts("cannot pseudo-lock default group\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Cache Pseudo-locking not supported when CDP is enabled.
+	 *
+	 * Some things to consider if you would like to enable this
+	 * support (using L3 CDP as example):
+	 * - When CDP is enabled two separate resources are exposed,
+	 *   L3DATA and L3CODE, but they are actually on the same cache.
+	 *   The implication for pseudo-locking is that if a
+	 *   pseudo-locked region is created on a domain of one
+	 *   resource (eg. L3CODE), then a pseudo-locked region cannot
+	 *   be created on that same domain of the other resource
+	 *   (eg. L3DATA). This is because the creation of a
+	 *   pseudo-locked region involves a call to wbinvd that will
+	 *   affect all cache allocations on particular domain.
+	 * - Considering the previous, it may be possible to only
+	 *   expose one of the CDP resources to pseudo-locking and
+	 *   hide the other. For example, we could consider to only
+	 *   expose L3DATA and since the L3 cache is unified it is
+	 *   still possible to place instructions there are execute it.
+	 * - If only one region is exposed to pseudo-locking we should
+	 *   still keep in mind that availability of a portion of cache
+	 *   for pseudo-locking should take into account both resources.
+	 *   Similarly, if a pseudo-locked region is created in one
+	 *   resource, the portion of cache used by it should be made
+	 *   unavailable to all future allocations from both resources.
+	 */
+	if (rdt_resources_all[RDT_RESOURCE_L3DATA].alloc_enabled ||
+	    rdt_resources_all[RDT_RESOURCE_L2DATA].alloc_enabled) {
+		rdt_last_cmd_puts("CDP enabled\n");
+		return -EINVAL;
+	}
+
+	if (rdtgroup_monitor_in_progress(rdtgrp)) {
+		rdt_last_cmd_puts("monitoring in progress\n");
+		return -EINVAL;
+	}
+
+	if (rdtgroup_tasks_assigned(rdtgrp)) {
+		rdt_last_cmd_puts("tasks assigned to resource group\n");
+		return -EINVAL;
+	}
+
+	if (!cpumask_empty(&rdtgrp->cpu_mask)) {
+		rdt_last_cmd_puts("CPUs assigned to resource group\n");
+		return -EINVAL;
+	}
+
+	if (rdtgroup_locksetup_user_restrict(rdtgrp)) {
+		rdt_last_cmd_puts("unable to modify resctrl permissions\n");
+		return -EIO;
+	}
+
+	ret = pseudo_lock_init(rdtgrp);
+	if (ret) {
+		rdt_last_cmd_puts("unable to init pseudo-lock region\n");
+		goto out_release;
+	}
+
+	/*
+	 * If this system is capable of monitoring a rmid would have been
+	 * allocated when the control group was created. This is not needed
+	 * anymore when this group would be used for pseudo-locking. This
+	 * is safe to call on platforms not capable of monitoring.
+	 */
+	free_rmid(rdtgrp->mon.rmid);
+
+	ret = 0;
+	goto out;
+
+out_release:
+	rdtgroup_locksetup_user_restore(rdtgrp);
+out:
+	return ret;
+}
+
+/**
+ * rdtgroup_locksetup_exit - resource group exist locksetup mode
+ * @rdtgrp: resource group
+ *
+ * When a resource group exits locksetup mode the earlier restrictions are
+ * lifted.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
+{
+	int ret;
+
+	if (rdt_mon_capable) {
+		ret = alloc_rmid();
+		if (ret < 0) {
+			rdt_last_cmd_puts("out of RMIDs\n");
+			return ret;
+		}
+		rdtgrp->mon.rmid = ret;
+	}
+
+	ret = rdtgroup_locksetup_user_restore(rdtgrp);
+	if (ret) {
+		free_rmid(rdtgrp->mon.rmid);
+		return ret;
+	}
+
+	pseudo_lock_free(rdtgrp);
+	return 0;
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 25/38] x86/intel_rdt: Enable entering of pseudo-locksetup mode
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (23 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 24/38] x86/intel_rdt: Support enter/exit of locksetup mode Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 26/38] x86/intel_rdt: Split resource group removal in two Reinette Chatre
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The user can request entering pseudo-locksetup mode by writing
"pseudo-locksetup" to the mode file. Act on this request as well as
support switching from a pseudo-locksetup mode (before pseudo-locked
mode was entered). It is not supported to modify the mode once
pseudo-locked mode has been entered.

The schemata reflects the new mode by adding "uninitialized" to all
resources. The size resctrl file reports zero for all cache domains in
support of the uninitialized nature. Since there are no users of this
class of service its allocations can be ignored when searching for
appropriate default allocations for new resource groups.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 16 +++++++++----
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 36 ++++++++++++++++++++++++++---
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index bc79396c5dad..1ed273220ffa 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -156,7 +156,8 @@ int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
 	}
 
 	if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
-		if (rdtgrp->mode == RDT_MODE_EXCLUSIVE) {
+		if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
+		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
 			rdt_last_cmd_printf("overlaps with other group\n");
 			return -EINVAL;
 		}
@@ -356,10 +357,15 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
 	if (rdtgrp) {
-		closid = rdtgrp->closid;
-		for_each_alloc_enabled_rdt_resource(r) {
-			if (closid < r->num_closid)
-				show_doms(s, r, closid);
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+			for_each_alloc_enabled_rdt_resource(r)
+				seq_printf(s, "%s:uninitialized\n", r->name);
+		} else {
+			closid = rdtgrp->closid;
+			for_each_alloc_enabled_rdt_resource(r) {
+				if (closid < r->num_closid)
+					show_doms(s, r, closid);
+			}
 		}
 	} else {
 		ret = -ENOENT;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 2df8bcabe085..31dd9a2cc394 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1046,10 +1046,24 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
 	mode = rdtgrp->mode;
 
 	if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE) ||
-	    (!strcmp(buf, "exclusive") && mode == RDT_MODE_EXCLUSIVE))
+	    (!strcmp(buf, "exclusive") && mode == RDT_MODE_EXCLUSIVE) ||
+	    (!strcmp(buf, "pseudo-locksetup") &&
+	     mode == RDT_MODE_PSEUDO_LOCKSETUP) ||
+	    (!strcmp(buf, "pseudo-locked") && mode == RDT_MODE_PSEUDO_LOCKED))
 		goto out;
 
+	if (mode == RDT_MODE_PSEUDO_LOCKED) {
+		rdt_last_cmd_printf("cannot change pseudo-locked group\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
 	if (!strcmp(buf, "shareable")) {
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+			ret = rdtgroup_locksetup_exit(rdtgrp);
+			if (ret)
+				goto out;
+		}
 		rdtgrp->mode = RDT_MODE_SHAREABLE;
 	} else if (!strcmp(buf, "exclusive")) {
 		if (!rdtgroup_mode_test_exclusive(rdtgrp)) {
@@ -1057,7 +1071,17 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
 			ret = -EINVAL;
 			goto out;
 		}
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+			ret = rdtgroup_locksetup_exit(rdtgrp);
+			if (ret)
+				goto out;
+		}
 		rdtgrp->mode = RDT_MODE_EXCLUSIVE;
+	} else if (!strcmp(buf, "pseudo-locksetup")) {
+		ret = rdtgroup_locksetup_enter(rdtgrp);
+		if (ret)
+			goto out;
+		rdtgrp->mode = RDT_MODE_PSEUDO_LOCKSETUP;
 	} else {
 		rdt_last_cmd_printf("unknown/unsupported mode\n");
 		ret = -EINVAL;
@@ -1127,8 +1151,12 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 		list_for_each_entry(d, &r->domains, list) {
 			if (sep)
 				seq_puts(s, ";");
-			cbm = d->ctrl_val[rdtgrp->closid];
-			size = rdtgroup_cbm_to_size(r, d, cbm);
+			if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+				size = 0;
+			} else {
+				cbm = d->ctrl_val[rdtgrp->closid];
+				size = rdtgroup_cbm_to_size(r, d, cbm);
+			}
 			seq_printf(s, "%d=%u", d->id, size);
 			sep = true;
 		}
@@ -2230,6 +2258,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 			for (i = 0; i < r->num_closid; i++, ctrl++) {
 				if (closid_allocated(i) && i != closid) {
 					mode = rdtgroup_mode_by_closid(i);
+					if (mode == RDT_MODE_PSEUDO_LOCKSETUP)
+						break;
 					used_b |= *ctrl;
 					if (mode == RDT_MODE_SHAREABLE)
 						d->new_ctrl |= *ctrl;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 26/38] x86/intel_rdt: Split resource group removal in two
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (24 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 25/38] x86/intel_rdt: Enable entering of pseudo-locksetup mode Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 27/38] x86/intel_rdt: Add utilities to test pseudo-locked region possibility Reinette Chatre
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Resource groups used for pseudo-locking do not require the same work on
removal as the other resource groups.

The resource group removal is split in two in preparation for support of
pseudo-locking resource groups. A single re-ordering occurs - the
setting of the rdtgrp flag is moved to later. This flag is not used by
any of the code between its original and new location.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 31dd9a2cc394..b7509ec336df 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2565,6 +2565,21 @@ static int rdtgroup_rmdir_mon(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
 	return 0;
 }
 
+static int rdtgroup_ctrl_remove(struct kernfs_node *kn,
+				struct rdtgroup *rdtgrp)
+{
+	rdtgrp->flags = RDT_DELETED;
+	list_del(&rdtgrp->rdtgroup_list);
+
+	/*
+	 * one extra hold on this, will drop when we kfree(rdtgrp)
+	 * in rdtgroup_kn_unlock()
+	 */
+	kernfs_get(kn);
+	kernfs_remove(rdtgrp->kn);
+	return 0;
+}
+
 static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
 			       cpumask_var_t tmpmask)
 {
@@ -2590,7 +2605,6 @@ static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
 	cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
 	update_closid_rmid(tmpmask, NULL);
 
-	rdtgrp->flags = RDT_DELETED;
 	closid_free(rdtgrp->closid);
 	free_rmid(rdtgrp->mon.rmid);
 
@@ -2599,14 +2613,7 @@ static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
 	 */
 	free_all_child_rdtgrp(rdtgrp);
 
-	list_del(&rdtgrp->rdtgroup_list);
-
-	/*
-	 * one extra hold on this, will drop when we kfree(rdtgrp)
-	 * in rdtgroup_kn_unlock()
-	 */
-	kernfs_get(kn);
-	kernfs_remove(rdtgrp->kn);
+	rdtgroup_ctrl_remove(kn, rdtgrp);
 
 	return 0;
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 27/38] x86/intel_rdt: Add utilities to test pseudo-locked region possibility
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (25 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 26/38] x86/intel_rdt: Split resource group removal in two Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 28/38] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

A pseudo-locked region does not have a class of service associated with
it and thus not tracked in the array of control values maintained as
part of the domain. Even so, when the user provides a new bitmask for
another resource group it needs to be checked for interference with
existing pseudo-locked regions.

Additionally only one pseudo-locked region can be created in any cache
hierarchy.

Introduce two utilities in support of above scenarios: (1) a utility
that can be used to test if a given capacity bitmask overlaps with any
pseudo-locked regions associated with a particular cache instance, (2) a
utility that can be used to test if a pseudo-locked region exists within
a particular cache hierarchy.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  2 ++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 55 +++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 12b006178d3a..119645c83e27 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -503,6 +503,8 @@ enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
 int rdtgroup_tasks_assigned(struct rdtgroup *r);
 int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index ce8243c87877..0d2177a0bc60 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -299,3 +299,58 @@ int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp)
 	pseudo_lock_free(rdtgrp);
 	return 0;
 }
+
+/**
+ * rdtgroup_cbm_overlaps_pseudo_locked - Test if CBM or portion is pseudo-locked
+ * @d: RDT domain
+ * @_cbm: CBM to test
+ *
+ * @d represents a cache instance and @_cbm a capacity bitmask that is
+ * considered for it. Determine if @_cbm overlaps with any existing
+ * pseudo-locked region on @d.
+ *
+ * Return: true if @_cbm overlaps with pseudo-locked region on @d, false
+ * otherwise.
+ */
+bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm)
+{
+	unsigned long *cbm = (unsigned long *)&_cbm;
+	unsigned long *cbm_b;
+	unsigned int cbm_len;
+
+	if (d->plr) {
+		cbm_len = d->plr->r->cache.cbm_len;
+		cbm_b = (unsigned long *)&d->plr->cbm;
+		if (bitmap_intersects(cbm, cbm_b, cbm_len))
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * rdtgroup_pseudo_locked_in_hierarchy - Pseudo-locked region in cache hierarchy
+ * @d: RDT domain under test
+ *
+ * The setup of a pseudo-locked region affects all cache instances within
+ * the hierarchy of the region. It is thus essential to know if any
+ * pseudo-locked regions exist within a cache hierarchy to prevent any
+ * attempts to create new pseudo-locked regions in the same hierarchy.
+ *
+ * Return: true if a pseudo-locked region exists in the hierarchy of @d,
+ * false otherwise.
+ */
+bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
+{
+	struct rdt_resource *r;
+	struct rdt_domain *d_i;
+
+	for_each_alloc_enabled_rdt_resource(r) {
+		list_for_each_entry(d_i, &r->domains, list) {
+			if (d_i != d &&
+			    cpumask_intersects(&d_i->cpu_mask, &d->cpu_mask))
+				return true;
+		}
+	}
+	return false;
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 28/38] x86/intel_rdt: Discover supported platforms via prefetch disable bits
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (26 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 27/38] x86/intel_rdt: Add utilities to test pseudo-locked region possibility Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core Reinette Chatre
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Knowing the model specific prefetch disable bits is required to support
cache pseudo-locking because the hardware prefetchers need to be disabled
when the kernel memory is pseudo-locked to cache. We add these bits only
for platforms known to support cache pseudo-locking.

When the user requests locksetup mode to be entered it will fail if the
prefetch disabling bits are not known for the platform.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 0d2177a0bc60..310c67b12a63 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -12,8 +12,73 @@
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
 #include <linux/slab.h>
+#include <asm/intel-family.h>
 #include "intel_rdt.h"
 
+/*
+ * MSR_MISC_FEATURE_CONTROL register enables the modification of hardware
+ * prefetcher state. Details about this register can be found in the MSR
+ * tables for specific platforms found in Intel's SDM.
+ */
+#define MSR_MISC_FEATURE_CONTROL	0x000001a4
+
+/*
+ * The bits needed to disable hardware prefetching varies based on the
+ * platform. During initialization we will discover which bits to use.
+ */
+static u64 prefetch_disable_bits;
+
+/**
+ * get_prefetch_disable_bits - prefetch disable bits of supported platforms
+ *
+ * Capture the list of platforms that have been validated to support
+ * pseudo-locking. This includes testing to ensure pseudo-locked regions
+ * with low cache miss rates can be created under variety of load conditions
+ * as well as that these pseudo-locked regions can maintain their low cache
+ * miss rates under variety of load conditions for significant lengths of time.
+ *
+ * After a platform has been validated to support pseudo-locking its
+ * hardware prefetch disable bits are included here as they are documented
+ * in the SDM.
+ *
+ * Return:
+ * If platform is supported, the bits to disable hardware prefetchers, 0
+ * if platform is not supported.
+ */
+static u64 get_prefetch_disable_bits(void)
+{
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+	    boot_cpu_data.x86 != 6)
+		return 0;
+
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_BROADWELL_X:
+		/*
+		 * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+		 * as:
+		 * 0    L2 Hardware Prefetcher Disable (R/W)
+		 * 1    L2 Adjacent Cache Line Prefetcher Disable (R/W)
+		 * 2    DCU Hardware Prefetcher Disable (R/W)
+		 * 3    DCU IP Prefetcher Disable (R/W)
+		 * 63:4 Reserved
+		 */
+		return 0xF;
+	case INTEL_FAM6_ATOM_GOLDMONT:
+	case INTEL_FAM6_ATOM_GEMINI_LAKE:
+		/*
+		 * SDM defines bits of MSR_MISC_FEATURE_CONTROL register
+		 * as:
+		 * 0     L2 Hardware Prefetcher Disable (R/W)
+		 * 1     Reserved
+		 * 2     DCU Hardware Prefetcher Disable (R/W)
+		 * 63:3  Reserved
+		 */
+		return 0x5;
+	}
+
+	return 0;
+}
+
 /**
  * pseudo_lock_init - Initialize a pseudo-lock region
  * @rdtgrp: resource group to which new pseudo-locked region will belong
@@ -225,6 +290,16 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
 		return -EINVAL;
 	}
 
+	/*
+	 * Not knowing the bits to disable prefetching implies that this
+	 * platform does not support Cache Pseudo-Locking.
+	 */
+	prefetch_disable_bits = get_prefetch_disable_bits();
+	if (prefetch_disable_bits == 0) {
+		rdt_last_cmd_puts("pseudo-locking not supported\n");
+		return -EINVAL;
+	}
+
 	if (rdtgroup_monitor_in_progress(rdtgrp)) {
 		rdt_last_cmd_puts("monitoring in progress\n");
 		return -EINVAL;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (27 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 28/38] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 30/38] x86/intel_rdt: Support creation/removal of pseudo-locked region Reinette Chatre
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The user requests a pseudo-locked region by providing a schemata to a
resource group that is in the pseudo-locksetup mode. This is the
functionality that consumes the parsed user data and creates the
pseudo-locked region.

First, required information is deduced from user provided data.
This includes, how much memory does the requested bitmask represent,
which CPU the requested region is associated with, and what is the
cache line size of that cache (to learn the stride needed for locking).
Second, a contiguous block of memory matching the requested bitmask is
allocated.

Finally, pseudo-locking is performed. The resource group already has the
allocation that reflects the requested bitmask. With this class of service
active and interference minimized, the allocated memory is loaded into the
cache.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |  17 ++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 334 ++++++++++++++++++++++++++++
 2 files changed, 351 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 119645c83e27..886cd28b305f 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -129,11 +129,26 @@ struct mongroup {
  * @d:			RDT domain to which this pseudo-locked region
  *			belongs
  * @cbm:		bitmask of the pseudo-locked region
+ * @lock_thread_wq:	waitqueue used to wait on the pseudo-locking thread
+ *			completion
+ * @thread_done:	variable used by waitqueue to test if pseudo-locking
+ *			thread completed
+ * @cpu:		core associated with the cache on which the setup code
+ *			will be run
+ * @line_size:		size of the cache lines
+ * @size:		size of pseudo-locked region in bytes
+ * @kmem:		the kernel memory associated with pseudo-locked region
  */
 struct pseudo_lock_region {
 	struct rdt_resource	*r;
 	struct rdt_domain	*d;
 	u32			cbm;
+	wait_queue_head_t	lock_thread_wq;
+	int			thread_done;
+	int			cpu;
+	unsigned int		line_size;
+	unsigned int		size;
+	void			*kmem;
 };
 
 /**
@@ -505,6 +520,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
 bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
 bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
 int update_domains(struct rdt_resource *r, int closid);
 void closid_free(int closid);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 310c67b12a63..bced04dd90b6 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -11,8 +11,14 @@
 
 #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
 
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/kthread.h>
 #include <linux/slab.h>
+#include <asm/cacheflush.h>
 #include <asm/intel-family.h>
+#include <asm/intel_rdt_sched.h>
 #include "intel_rdt.h"
 
 /*
@@ -80,6 +86,51 @@ static u64 get_prefetch_disable_bits(void)
 }
 
 /**
+ * pseudo_lock_region_init - Initialize pseudo-lock region information
+ * @plr: pseudo-lock region
+ *
+ * Called after user provided a schemata to be pseudo-locked. From the
+ * schemata the &struct pseudo_lock_region is on entry already initialized
+ * with the resource, domain, and capacity bitmask. Here the information
+ * required for pseudo-locking is deduced from this data and &struct
+ * pseudo_lock_region initialized further. This information includes:
+ * - size in bytes of the region to be pseudo-locked
+ * - cache line size to know the stride with which data needs to be accessed
+ *   to be pseudo-locked
+ * - a cpu associated with the cache instance on which the pseudo-locking
+ *   flow can be executed
+ *
+ * Return: 0 on success, <0 on failure. Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_init(struct pseudo_lock_region *plr)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(plr->cpu);
+	int i;
+
+	/* Pick the first cpu we find that is associated with the cache. */
+	plr->cpu = cpumask_first(&plr->d->cpu_mask);
+
+	if (!cpu_online(plr->cpu)) {
+		rdt_last_cmd_printf("cpu %u associated with cache not online\n",
+				    plr->cpu);
+		return -ENODEV;
+	}
+
+	plr->size = rdtgroup_cbm_to_size(plr->r, plr->d, plr->cbm);
+
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == plr->r->cache_level) {
+			plr->line_size = ci->info_list[i].coherency_line_size;
+			return 0;
+		}
+	}
+
+	rdt_last_cmd_puts("unable to determine cache line size\n");
+	return -1;
+}
+
+/**
  * pseudo_lock_init - Initialize a pseudo-lock region
  * @rdtgrp: resource group to which new pseudo-locked region will belong
  *
@@ -98,11 +149,70 @@ static int pseudo_lock_init(struct rdtgroup *rdtgrp)
 	if (!plr)
 		return -ENOMEM;
 
+	init_waitqueue_head(&plr->lock_thread_wq);
 	rdtgrp->plr = plr;
 	return 0;
 }
 
 /**
+ * pseudo_lock_region_clear - Reset pseudo-lock region data
+ * @plr: pseudo-lock region
+ *
+ * All content of the pseudo-locked region is reset - any memory allocated
+ * freed.
+ *
+ * Return: void
+ */
+static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
+{
+	plr->size = 0;
+	plr->line_size = 0;
+	kfree(plr->kmem);
+	plr->kmem = NULL;
+	plr->r = NULL;
+	if (plr->d)
+		plr->d->plr = NULL;
+	plr->d = NULL;
+	plr->cbm = 0;
+}
+
+/**
+ * pseudo_lock_region_alloc - Allocate kernel memory that will be pseudo-locked
+ * @plr: pseudo-lock region
+ *
+ * Initialize the details required to set up the pseudo-locked region and
+ * allocate the contiguous memory that will be pseudo-locked to the cache.
+ *
+ * Return: 0 on success, <0 on failure.  Descriptive error will be written
+ * to last_cmd_status buffer.
+ */
+static int pseudo_lock_region_alloc(struct pseudo_lock_region *plr)
+{
+	int ret;
+
+	ret = pseudo_lock_region_init(plr);
+	if (ret < 0)
+		return ret;
+
+	/*
+	 * We do not yet support contiguous regions larger than
+	 * KMALLOC_MAX_SIZE.
+	 */
+	if (plr->size > KMALLOC_MAX_SIZE) {
+		rdt_last_cmd_puts("requested region exceeds maximum size\n");
+		return -E2BIG;
+	}
+
+	plr->kmem = kzalloc(plr->size, GFP_KERNEL);
+	if (!plr->kmem) {
+		rdt_last_cmd_puts("unable to allocate memory\n");
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+/**
  * pseudo_lock_free - Free a pseudo-locked region
  * @rdtgrp: resource group to which pseudo-locked region belonged
  *
@@ -114,11 +224,143 @@ static int pseudo_lock_init(struct rdtgroup *rdtgrp)
  */
 static void pseudo_lock_free(struct rdtgroup *rdtgrp)
 {
+	pseudo_lock_region_clear(rdtgrp->plr);
 	kfree(rdtgrp->plr);
 	rdtgrp->plr = NULL;
 }
 
 /**
+ * pseudo_lock_fn - Load kernel memory into cache
+ * @_rdtgrp: resource group to which pseudo-lock region belongs
+ *
+ * This is the core pseudo-locking flow.
+ *
+ * First we ensure that the kernel memory cannot be found in the cache.
+ * Then, while taking care that there will be as little interference as
+ * possible, the memory to be loaded is accessed while core is running
+ * with class of service set to the bitmask of the pseudo-locked region.
+ * After this is complete no future CAT allocations will be allowed to
+ * overlap with this bitmask.
+ *
+ * Local register variables are utilized to ensure that the memory region
+ * to be locked is the only memory access made during the critical locking
+ * loop.
+ *
+ * Return: 0. Waiter on waitqueue will be woken on completion.
+ */
+static int pseudo_lock_fn(void *_rdtgrp)
+{
+	struct rdtgroup *rdtgrp = _rdtgrp;
+	struct pseudo_lock_region *plr = rdtgrp->plr;
+	u32 rmid_p, closid_p;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use a regular
+	 * variable to ensure we always use a valid pointer, but the cost
+	 * is that this variable will enter the cache through evicting the
+	 * memory we are trying to lock into the cache. Thus expect lower
+	 * pseudo-locking success rate when KASAN is active.
+	 */
+	unsigned int line_size;
+	unsigned int size;
+	void *mem_r;
+#else
+	register unsigned int line_size asm("esi");
+	register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	/*
+	 * Make sure none of the allocated memory is cached. If it is we
+	 * will get a cache hit in below loop from outside of pseudo-locked
+	 * region.
+	 * wbinvd (as opposed to clflush/clflushopt) is required to
+	 * increase likelihood that allocated cache portion will be filled
+	 * with associated memory.
+	 */
+	native_wbinvd();
+
+	/*
+	 * Always called with interrupts enabled. By disabling interrupts
+	 * ensure that we will not be preempted during this critical section.
+	 */
+	local_irq_disable();
+
+	/*
+	 * Call wrmsr and rdmsr as directly as possible to avoid tracing
+	 * clobbering local register variables or affecting cache accesses.
+	 *
+	 * Disable the hardware prefetcher so that when the end of the memory
+	 * being pseudo-locked is reached the hardware will not read beyond
+	 * the buffer and evict pseudo-locked memory read earlier from the
+	 * cache.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	closid_p = this_cpu_read(pqr_state.cur_closid);
+	rmid_p = this_cpu_read(pqr_state.cur_rmid);
+	mem_r = plr->kmem;
+	size = plr->size;
+	line_size = plr->line_size;
+	/*
+	 * Critical section begin: start by writing the closid associated
+	 * with the capacity bitmask of the cache region being
+	 * pseudo-locked followed by reading of kernel memory to load it
+	 * into the cache.
+	 */
+	__wrmsr(IA32_PQR_ASSOC, rmid_p, rdtgrp->closid);
+	/*
+	 * Cache was flushed earlier. Now access kernel memory to read it
+	 * into cache region associated with just activated plr->closid.
+	 * Loop over data twice:
+	 * - In first loop the cache region is shared with the page walker
+	 *   as it populates the paging structure caches (including TLB).
+	 * - In the second loop the paging structure caches are used and
+	 *   cache region is populated with the memory being referenced.
+	 */
+	for (i = 0; i < size; i += PAGE_SIZE) {
+		/*
+		 * Add a barrier to prevent speculative execution of this
+		 * loop reading beyond the end of the buffer.
+		 */
+		rmb();
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			:
+			: "r" (mem_r), "r" (i)
+			: "%eax", "memory");
+	}
+	for (i = 0; i < size; i += line_size) {
+		/*
+		 * Add a barrier to prevent speculative execution of this
+		 * loop reading beyond the end of the buffer.
+		 */
+		rmb();
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			:
+			: "r" (mem_r), "r" (i)
+			: "%eax", "memory");
+	}
+	/*
+	 * Critical section end: restore closid with capacity bitmask that
+	 * does not overlap with pseudo-locked region.
+	 */
+	__wrmsr(IA32_PQR_ASSOC, rmid_p, closid_p);
+
+	/* Re-enable the hardware prefetcher(s) */
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_enable();
+
+	plr->thread_done = 1;
+	wake_up_interruptible(&plr->lock_thread_wq);
+	return 0;
+}
+
+/**
  * rdtgroup_monitor_in_progress - Test if monitoring in progress
  * @r: resource group being queried
  *
@@ -429,3 +671,95 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
 	}
 	return false;
 }
+
+/**
+ * rdtgroup_pseudo_lock_create - Create a pseudo-locked region
+ * @rdtgrp: resource group to which pseudo-lock region belongs
+ *
+ * Called when a resource group in the pseudo-locksetup mode receives a
+ * valid schemata that should be pseudo-locked. Since the resource group is
+ * in pseudo-locksetup mode the &struct pseudo_lock_region has already been
+ * allocated and initialized with the essential information. If a failure
+ * occurs the resource group remains in the pseudo-locksetup mode with the
+ * &struct pseudo_lock_region associated with it, but cleared from all
+ * information and ready for the user to re-attempt pseudo-locking by
+ * writing the schemata again.
+ *
+ * Return: 0 if the pseudo-locked region was successfully pseudo-locked, <0
+ * on failure. Descriptive error will be written to last_cmd_status buffer.
+ */
+int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
+{
+	struct pseudo_lock_region *plr = rdtgrp->plr;
+	struct task_struct *thread;
+	int ret;
+
+	ret = pseudo_lock_region_alloc(plr);
+	if (ret < 0)
+		return ret;
+
+	plr->thread_done = 0;
+
+	thread = kthread_create_on_node(pseudo_lock_fn, rdtgrp,
+					cpu_to_node(plr->cpu),
+					"pseudo_lock/%u", plr->cpu);
+	if (IS_ERR(thread)) {
+		ret = PTR_ERR(thread);
+		rdt_last_cmd_printf("locking thread returned error %d\n", ret);
+		goto out_region;
+	}
+
+	kthread_bind(thread, plr->cpu);
+	wake_up_process(thread);
+
+	ret = wait_event_interruptible(plr->lock_thread_wq,
+				       plr->thread_done == 1);
+	if (ret < 0) {
+		/*
+		 * If the thread does not get on the CPU for whatever
+		 * reason and the process which sets up the region is
+		 * interrupted then this will leave the thread in runnable
+		 * state and once it gets on the CPU it will derefence
+		 * the cleared, but not freed, plr struct resulting in an
+		 * empty pseudo-locking loop.
+		 */
+		rdt_last_cmd_puts("locking thread interrupted\n");
+		goto out_region;
+	}
+
+	rdtgrp->mode = RDT_MODE_PSEUDO_LOCKED;
+	closid_free(rdtgrp->closid);
+	ret = 0;
+	goto out;
+
+out_region:
+	pseudo_lock_region_clear(plr);
+out:
+	return ret;
+}
+
+/**
+ * rdtgroup_pseudo_lock_remove - Remove a pseudo-locked region
+ * @rdtgrp: resource group to which the pseudo-locked region belongs
+ *
+ * The removal of a pseudo-locked region can be initiated when the resource
+ * group is removed from user space via a "rmdir" from userspace or the
+ * unmount of the resctrl filesystem. On removal the resource group does
+ * not go back to pseudo-locksetup mode before it is removed, instead it is
+ * removed directly. There is thus assymmetry with the creation where the
+ * &struct pseudo_lock_region is removed here while it was not created in
+ * rdtgroup_pseudo_lock_create().
+ *
+ * Return: void
+ */
+void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
+{
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP)
+		/*
+		 * Default group cannot be a pseudo-locked region so we can
+		 * free closid here.
+		 */
+		closid_free(rdtgrp->closid);
+
+	pseudo_lock_free(rdtgrp);
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 30/38] x86/intel_rdt: Support creation/removal of pseudo-locked region
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (28 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 31/38] x86/intel_rdt: resctrl files reflect pseudo-locked information Reinette Chatre
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The user triggers the creation of a pseudo-locked region when writing a
valid schemata to the schemata file of a resource group in the
pseudo-locksetup mode.

A valid schemata is one that: (1) does not overlap with any other resource
group, (2) does not involve a cache that already contains a
pseudo-locked region within its hierarchy.

After a valid schemata is parsed the system is programmed to associate
the to be pseudo-lock bitmask with the closid associated with the
resource group. With the system set up the pseudo-locked region can be
created.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c | 42 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 25 +++++++++++++----
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 1ed273220ffa..6f4c0002b2c1 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -143,9 +143,26 @@ int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
 		return -EINVAL;
 	}
 
+	/*
+	 * Cannot set up more than one pseudo-locked region in a cache
+	 * hierarchy.
+	 */
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
+	    rdtgroup_pseudo_locked_in_hierarchy(d)) {
+		rdt_last_cmd_printf("pseudo-locked region in hierarchy\n");
+		return -EINVAL;
+	}
+
 	if (!cbm_validate(data->buf, &cbm_val, r))
 		return -EINVAL;
 
+	if ((rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
+	     rdtgrp->mode == RDT_MODE_SHAREABLE) &&
+	    rdtgroup_cbm_overlaps_pseudo_locked(d, cbm_val)) {
+		rdt_last_cmd_printf("CBM overlaps with pseudo-locked region\n");
+		return -EINVAL;
+	}
+
 	/*
 	 * The CBM may not overlap with the CBM of another closid if
 	 * either is exclusive.
@@ -199,6 +216,21 @@ static int parse_line(char *line, struct rdt_resource *r,
 			data.rdtgrp = rdtgrp;
 			if (r->parse_ctrlval(&data, r, d))
 				return -EINVAL;
+			if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
+				/*
+				 * In pseudo-locking setup mode and just
+				 * parsed a valid CBM that should be
+				 * pseudo-locked. Only one locked region per
+				 * resource group and domain so just do
+				 * the required initialization for single
+				 * region and return.
+				 */
+				rdtgrp->plr->r = r;
+				rdtgrp->plr->d = d;
+				rdtgrp->plr->cbm = d->new_ctrl;
+				d->plr = rdtgrp->plr;
+				return 0;
+			}
 			goto next;
 		}
 	}
@@ -322,6 +354,16 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 			goto out;
 	}
 
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
+		/*
+		 * If pseudo-locking fails we keep the resource group in
+		 * mode RDT_MODE_PSEUDO_LOCKSETUP with its class of service
+		 * active and updated for just the domain the pseudo-locked
+		 * region was requested for.
+		 */
+		ret = rdtgroup_pseudo_lock_create(rdtgrp);
+	}
+
 out:
 	rdtgroup_kn_unlock(of->kn);
 	return ret ?: nbytes;
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index b7509ec336df..88ffd1311c40 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1770,6 +1770,9 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
 
 	if (atomic_dec_and_test(&rdtgrp->waitcount) &&
 	    (rdtgrp->flags & RDT_DELETED)) {
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+			rdtgroup_pseudo_lock_remove(rdtgrp);
 		kernfs_unbreak_active_protection(kn);
 		kernfs_put(rdtgrp->kn);
 		kfree(rdtgrp);
@@ -1993,6 +1996,10 @@ static void rmdir_all_sub(void)
 		if (rdtgrp == &rdtgroup_default)
 			continue;
 
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+			rdtgroup_pseudo_lock_remove(rdtgrp);
+
 		/*
 		 * Give any CPUs back to the default group. We cannot copy
 		 * cpu_online_mask because a CPU might have executed the
@@ -2265,6 +2272,8 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
 						d->new_ctrl |= *ctrl;
 				}
 			}
+			if (d->plr && d->plr->cbm > 0)
+				used_b |= d->plr->cbm;
 			unused_b = used_b ^ (BIT_MASK(r->cache.cbm_len) - 1);
 			unused_b &= BIT_MASK(r->cache.cbm_len) - 1;
 			d->new_ctrl |= unused_b;
@@ -2641,13 +2650,19 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
 	 * If the rdtgroup is a mon group and parent directory
 	 * is a valid "mon_groups" directory, remove the mon group.
 	 */
-	if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn)
-		ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
-	else if (rdtgrp->type == RDTMON_GROUP &&
-		 is_mon_groups(parent_kn, kn->name))
+	if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn) {
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+			ret = rdtgroup_ctrl_remove(kn, rdtgrp);
+		} else {
+			ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
+		}
+	} else if (rdtgrp->type == RDTMON_GROUP &&
+		 is_mon_groups(parent_kn, kn->name)) {
 		ret = rdtgroup_rmdir_mon(kn, rdtgrp, tmpmask);
-	else
+	} else {
 		ret = -EPERM;
+	}
 
 out:
 	rdtgroup_kn_unlock(kn);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 31/38] x86/intel_rdt: resctrl files reflect pseudo-locked information
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (29 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 30/38] x86/intel_rdt: Support creation/removal of pseudo-locked region Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 32/38] x86/intel_rdt: Ensure RDT cleanup on exit Reinette Chatre
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Information about resources as well as resource groups are contained in
a variety of resctrl files. Now that pseudo-locked regions can be
created the files can be updated to present appropriate information to
the user.

Update the resource group's schemata file to show only the information
of the pseudo-locked region.

Update the resource group's size file to show the size in bytes of only
the pseudo-locked region.

Update the bit_usage file to use the letter 'P' for all pseudo-locked
regions.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c |  3 +++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    | 31 +++++++++++++++++++++++------
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
index 6f4c0002b2c1..af358ca05160 100644
--- a/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
+++ b/arch/x86/kernel/cpu/intel_rdt_ctrlmondata.c
@@ -402,6 +402,9 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
 			for_each_alloc_enabled_rdt_resource(r)
 				seq_printf(s, "%s:uninitialized\n", r->name);
+		} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+			seq_printf(s, "%s:%d=%x\n", rdtgrp->plr->r->name,
+				   rdtgrp->plr->d->id, rdtgrp->plr->cbm);
 		} else {
 			closid = rdtgrp->closid;
 			for_each_alloc_enabled_rdt_resource(r) {
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 88ffd1311c40..d08670ea1a3d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -771,14 +771,16 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
  *   H - currently used by hardware only but available for software use
  *   S - currently used and shareable by software only
  *   E - currently used exclusively by one resource group
+ *   P - currently pseudo-locked by one resource group
  */
 static int rdt_bit_usage_show(struct kernfs_open_file *of,
 			      struct seq_file *seq, void *v)
 {
 	struct rdt_resource *r = of->kn->parent->priv;
-	u32 sw_shareable, hw_shareable, exclusive;
+	u32 sw_shareable = 0, hw_shareable = 0;
+	u32 exclusive = 0, pseudo_locked = 0;
 	struct rdt_domain *dom;
-	int i, hwb, swb, excl;
+	int i, hwb, swb, excl, psl;
 	enum rdtgrp_mode mode;
 	bool sep = false;
 	u32 *ctrl;
@@ -803,12 +805,15 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 			case RDT_MODE_EXCLUSIVE:
 				exclusive |= *ctrl;
 				break;
+			case RDT_MODE_PSEUDO_LOCKSETUP:
 			/*
-			 * Temporarily handle pseudo-locking enums
-			 * to silence compile warnings until handling
-			 * added in later patches.
+			 * RDT_MODE_PSEUDO_LOCKSETUP is possible
+			 * here but not included since the CBM
+			 * associated with this CLOSID in this mode
+			 * is not initialized and no task or cpu can be
+			 * assigned this CLOSID.
 			 */
-			case RDT_MODE_PSEUDO_LOCKSETUP:
+				break;
 			case RDT_MODE_PSEUDO_LOCKED:
 			case RDT_NUM_MODES:
 				WARN(1,
@@ -817,9 +822,11 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 			}
 		}
 		for (i = r->cache.cbm_len - 1; i >= 0; i--) {
+			pseudo_locked = dom->plr ? dom->plr->cbm : 0;
 			hwb = test_bit(i, (unsigned long *)&hw_shareable);
 			swb = test_bit(i, (unsigned long *)&sw_shareable);
 			excl = test_bit(i, (unsigned long *)&exclusive);
+			psl = test_bit(i, (unsigned long *)&pseudo_locked);
 			if (hwb && swb)
 				seq_puts(seq, "X");
 			else if (hwb && !swb)
@@ -828,6 +835,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 				seq_puts(seq, "S");
 			else if (excl)
 				seq_puts(seq, "E");
+			else if (psl)
+				seq_puts(seq, "P");
 			else /* Unused bits remain */
 				seq_puts(seq, "0");
 		}
@@ -1146,6 +1155,15 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 		return -ENOENT;
 	}
 
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
+		seq_printf(s, "%*s:", max_name_width, rdtgrp->plr->r->name);
+		size = rdtgroup_cbm_to_size(rdtgrp->plr->r,
+					    rdtgrp->plr->d,
+					    rdtgrp->plr->cbm);
+		seq_printf(s, "%d=%u\n", rdtgrp->plr->d->id, size);
+		goto out;
+	}
+
 	for_each_alloc_enabled_rdt_resource(r) {
 		seq_printf(s, "%*s:", max_name_width, r->name);
 		list_for_each_entry(d, &r->domains, list) {
@@ -1163,6 +1181,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
 		seq_puts(s, "\n");
 	}
 
+out:
 	rdtgroup_kn_unlock(of->kn);
 
 	return 0;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 32/38] x86/intel_rdt: Ensure RDT cleanup on exit
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (30 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 31/38] x86/intel_rdt: resctrl files reflect pseudo-locked information Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 33/38] x86/intel_rdt: Create resctrl debug area Reinette Chatre
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

The RDT system's initialization does not have the corresponding exit
handling to ensure everything initialized on load is cleaned up also.

Introduce the cleanup routines that complement all initialization. This
includes the removal of a duplicate rdtgroup_init() declaration.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.c          | 11 +++++++++++
 arch/x86/kernel/cpu/intel_rdt.h          |  3 +--
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  7 +++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 24bfa63e86cf..35dca2187cdc 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -857,6 +857,8 @@ static __init bool get_rdt_resources(void)
 	return (rdt_mon_capable || rdt_alloc_capable);
 }
 
+static enum cpuhp_state rdt_online;
+
 static int __init intel_rdt_late_init(void)
 {
 	struct rdt_resource *r;
@@ -878,6 +880,7 @@ static int __init intel_rdt_late_init(void)
 		cpuhp_remove_state(state);
 		return ret;
 	}
+	rdt_online = state;
 
 	for_each_alloc_capable_rdt_resource(r)
 		pr_info("Intel RDT %s allocation detected\n", r->name);
@@ -889,3 +892,11 @@ static int __init intel_rdt_late_init(void)
 }
 
 late_initcall(intel_rdt_late_init);
+
+static void __exit intel_rdt_exit(void)
+{
+	cpuhp_remove_state(rdt_online);
+	rdtgroup_exit();
+}
+
+__exitcall(intel_rdt_exit);
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 886cd28b305f..c948266d59c8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -209,6 +209,7 @@ extern struct list_head rdt_all_groups;
 extern int max_name_width, max_data_width;
 
 int __init rdtgroup_init(void);
+void __exit rdtgroup_exit(void);
 
 /**
  * struct rftype - describe each file in the resctrl file system
@@ -431,8 +432,6 @@ extern struct rdt_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
-int __init rdtgroup_init(void);
-
 enum {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L3DATA,
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index d08670ea1a3d..e3380cbca914 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -2773,3 +2773,10 @@ int __init rdtgroup_init(void)
 
 	return ret;
 }
+
+void __exit rdtgroup_exit(void)
+{
+	unregister_filesystem(&rdt_fs_type);
+	sysfs_remove_mount_point(fs_kobj, "resctrl");
+	kernfs_destroy_root(rdt_root);
+}
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 33/38] x86/intel_rdt: Create resctrl debug area
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (31 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 32/38] x86/intel_rdt: Ensure RDT cleanup on exit Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

In preparation for support of debugging of RDT sub features the user can
now enable a RDT debugfs region.

Introduce the Kconfig option to enable this debug area and the subsequent
creation of the resctrl debugfs directory.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/Kconfig                         | 10 ++++++++
 arch/x86/kernel/cpu/intel_rdt.h          |  4 ++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 41 ++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c07f492b871a..4fa24d0cce5a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -448,6 +448,16 @@ config INTEL_RDT
 
 	  Say N if unsure.
 
+config INTEL_RDT_DEBUGFS
+	bool "Intel RDT debugfs interface"
+	depends on INTEL_RDT
+	select DEBUG_FS
+	help
+	  Enable the creation of Intel RDT debugfs files. In support of
+	  debugging and validation of Intel RDT sub-features that use it.
+
+	  Say N if unsure.
+
 if X86_32
 config X86_BIGSMP
 	bool "Support for big SMP systems with more than 8 CPUs"
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c948266d59c8..c4ff638e3bc6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -432,6 +432,10 @@ extern struct rdt_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+extern struct dentry *debugfs_resctrl;
+#endif
+
 enum {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L3DATA,
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index e3380cbca914..da9d470294aa 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -22,6 +22,7 @@
 
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 #include <linux/fs.h>
 #include <linux/sysfs.h>
 #include <linux/kernfs.h>
@@ -56,6 +57,10 @@ static struct kernfs_node *kn_mondata;
 static struct seq_buf last_cmd_status;
 static char last_cmd_status_buf[512];
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+struct dentry *debugfs_resctrl;
+#endif
+
 void rdt_last_cmd_clear(void)
 {
 	lockdep_assert_held(&rdtgroup_mutex);
@@ -2764,8 +2769,41 @@ int __init rdtgroup_init(void)
 	if (ret)
 		goto cleanup_mountpoint;
 
+	/*
+	 * Adding the resctrl debugfs directory here may not be ideal since
+	 * it would let the resctrl debugfs directory appear on the debugfs
+	 * filesystem before the resctrl filesystem is mounted.
+	 * It may also be ok since that would enable debugging of RDT before
+	 * resctrl is mounted.
+	 * The reason why the debugfs directory is created here and not in
+	 * rdt_mount() is because rdt_mount() takes rdtgroup_mutex and
+	 * during the debugfs directory creation also &sb->s_type->i_mutex_key
+	 * (the lockdep class of inode->i_rwsem). Other filesystem
+	 * interactions (eg. SyS_getdents) have the lock ordering:
+	 * &sb->s_type->i_mutex_key --> &mm->mmap_sem
+	 * During mmap(), called with &mm->mmap_sem, the rdtgroup_mutex
+	 * is taken, thus creating dependency:
+	 * &mm->mmap_sem --> rdtgroup_mutex for the latter that can cause
+	 * issues considering the other two lock dependencies.
+	 * By creating the debugfs directory here we avoid a dependency
+	 * that may cause deadlock (even though file operations cannot
+	 * occur until the filesystem is mounted, but I do not know how to
+	 * tell lockdep that).
+	 */
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	debugfs_resctrl = debugfs_create_dir("resctrl", NULL);
+	if (IS_ERR(debugfs_resctrl)) {
+		ret = PTR_ERR(debugfs_resctrl);
+		goto cleanup_register;
+	}
+#endif
+
 	return 0;
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+cleanup_register:
+	unregister_filesystem(&rdt_fs_type);
+#endif
 cleanup_mountpoint:
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
 cleanup_root:
@@ -2776,6 +2814,9 @@ int __init rdtgroup_init(void)
 
 void __exit rdtgroup_exit(void)
 {
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	debugfs_remove_recursive(debugfs_resctrl);
+#endif
 	unregister_filesystem(&rdt_fs_type);
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
 	kernfs_destroy_root(rdt_root);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (32 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 33/38] x86/intel_rdt: Create resctrl debug area Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 19:43   ` Greg KH
  2018-05-22 11:29 ` [PATCH V4 35/38] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

There is no simple yes/no test to determine if pseudo-locking was
successful. In order to test pseudo-locking we expose a debugfs file for
each pseudo-locked region that will record the latency of reading the
pseudo-locked memory at a stride of 32 bytes (hardcoded). These numbers
will give us an idea of locking was successful or not since they will
reflect cache hits and cache misses (hardware prefetching is disabled
during the test).

The new debugfs file "pseudo_lock_measure" will, when the
pseudo_lock_mem_latency tracepoint is enabled, record the latency of
accessing each cache line twice.

Kernel tracepoints offer us histograms that is a simple way to visualize
the memory access latency and immediately see any cache misses. For
example, the hist trigger below before trigger of the measurement
will display the memory access latency and instances at each
latency:
echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/\
                           pseudo_lock_mem_latency/trigger
echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
echo 1 > /sys/kernel/debug/resctrl/<newlock>/pseudo_lock_measure
echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/Kconfig                                  |   1 +
 arch/x86/kernel/cpu/Makefile                      |   1 +
 arch/x86/kernel/cpu/intel_rdt.h                   |   5 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 200 +++++++++++++++++++++-
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |  23 +++
 5 files changed, 229 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4fa24d0cce5a..5c872580716e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -451,6 +451,7 @@ config INTEL_RDT
 config INTEL_RDT_DEBUGFS
 	bool "Intel RDT debugfs interface"
 	depends on INTEL_RDT
+	select HIST_TRIGGERS
 	select DEBUG_FS
 	help
 	  Enable the creation of Intel RDT debugfs files. In support of
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 071f50162727..88b87fb0d8e0 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
 obj-$(CONFIG_INTEL_RDT)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
 obj-$(CONFIG_INTEL_RDT)	+= intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
+CFLAGS_intel_rdt_pseudo_lock.o = -I$(src)
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
 obj-$(CONFIG_MTRR)			+= mtrr/
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c4ff638e3bc6..c8712446f185 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -138,6 +138,8 @@ struct mongroup {
  * @line_size:		size of the cache lines
  * @size:		size of pseudo-locked region in bytes
  * @kmem:		the kernel memory associated with pseudo-locked region
+ * @debugfs_dir:	pointer to this region's directory in the debugfs
+ *			filesystem
  */
 struct pseudo_lock_region {
 	struct rdt_resource	*r;
@@ -149,6 +151,9 @@ struct pseudo_lock_region {
 	unsigned int		line_size;
 	unsigned int		size;
 	void			*kmem;
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	struct dentry		*debugfs_dir;
+#endif
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index bced04dd90b6..0a6785f1a67b 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -14,6 +14,7 @@
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/debugfs.h>
 #include <linux/kthread.h>
 #include <linux/slab.h>
 #include <asm/cacheflush.h>
@@ -21,6 +22,11 @@
 #include <asm/intel_rdt_sched.h>
 #include "intel_rdt.h"
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+#define CREATE_TRACE_POINTS
+#include "intel_rdt_pseudo_lock_event.h"
+#endif
+
 /*
  * MSR_MISC_FEATURE_CONTROL register enables the modification of hardware
  * prefetcher state. Details about this register can be found in the MSR
@@ -174,6 +180,9 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
 		plr->d->plr = NULL;
 	plr->d = NULL;
 	plr->cbm = 0;
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	plr->debugfs_dir = NULL;
+#endif
 }
 
 /**
@@ -672,6 +681,163 @@ bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d)
 	return false;
 }
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+/**
+ * measure_cycles_lat_fn - Measure cycle latency to read pseudo-locked memory
+ * @_plr: pseudo-lock region to measure
+ *
+ * There is no deterministic way to test if a memory region is cached. One
+ * way is to measure how long it takes to read the memory, the speed of
+ * access is a good way to learn how close to the cpu the data was. Even
+ * more, if the prefetcher is disabled and the memory is read at a stride
+ * of half the cache line, then a cache miss will be easy to spot since the
+ * read of the first half would be significantly slower than the read of
+ * the second half.
+ *
+ * Return: 0. Waiter on waitqueue will be woken on completion.
+ */
+static int measure_cycles_lat_fn(void *_plr)
+{
+	struct pseudo_lock_region *plr = _plr;
+	u64 start, end;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use a regular
+	 * variable to ensure we always use a valid pointer to access memory.
+	 * The cost is that accessing this pointer, which could be in
+	 * cache, will be included in the measurement of memory read latency.
+	 */
+	void *mem_r;
+#else
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	local_irq_disable();
+	/*
+	 * The wrmsr call may be reordered with the assignment below it.
+	 * Call wrmsr as directly as possible to avoid tracing clobbering
+	 * local register variable used for memory pointer.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	mem_r = plr->kmem;
+	/*
+	 * Dummy execute of the time measurement to load the needed
+	 * instructions into the L1 instruction cache.
+	 */
+	start = rdtsc_ordered();
+	for (i = 0; i < plr->size; i += 32) {
+		start = rdtsc_ordered();
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			     :
+			     : "r" (mem_r), "r" (i)
+			     : "%eax", "memory");
+		end = rdtsc_ordered();
+		trace_pseudo_lock_mem_latency((u32)(end - start));
+	}
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_enable();
+	plr->thread_done = 1;
+	wake_up_interruptible(&plr->lock_thread_wq);
+	return 0;
+}
+
+/**
+ * pseudo_lock_measure_cycles - Trigger latency measure to pseudo-locked region
+ *
+ * The measurement of latency to access a pseudo-locked region should be
+ * done from a cpu that is associated with that pseudo-locked region.
+ * Determine which cpu is associated with this region and start a thread on
+ * that cpu to perform the measurement, wait for that thread to complete.
+ *
+ * Return: 0 on success, <0 on failure
+ */
+static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp)
+{
+	struct pseudo_lock_region *plr = rdtgrp->plr;
+	struct task_struct *thread;
+	unsigned int cpu;
+	int ret;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	if (rdtgrp->flags & RDT_DELETED) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	plr->thread_done = 0;
+	cpu = cpumask_first(&plr->d->cpu_mask);
+	if (!cpu_online(cpu)) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	thread = kthread_create_on_node(measure_cycles_lat_fn, plr,
+					cpu_to_node(cpu),
+					"pseudo_lock_measure/%u", cpu);
+	if (IS_ERR(thread)) {
+		ret = PTR_ERR(thread);
+		goto out;
+	}
+	kthread_bind(thread, cpu);
+	wake_up_process(thread);
+
+	ret = wait_event_interruptible(plr->lock_thread_wq,
+				       plr->thread_done == 1);
+	if (ret < 0)
+		goto out;
+
+	ret = 0;
+
+out:
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+	return ret;
+}
+
+static ssize_t pseudo_lock_measure_trigger(struct file *file,
+					   const char __user *user_buf,
+					   size_t count, loff_t *ppos)
+{
+	struct rdtgroup *rdtgrp = file->private_data;
+	size_t buf_size;
+	char buf[32];
+	int ret;
+	bool bv;
+
+	buf_size = min(count, (sizeof(buf) - 1));
+	if (copy_from_user(buf, user_buf, buf_size))
+		return -EFAULT;
+
+	buf[buf_size] = '\0';
+	ret = strtobool(buf, &bv);
+	if (ret == 0 && bv) {
+		ret = debugfs_file_get(file->f_path.dentry);
+		if (unlikely(ret))
+			return ret;
+		ret = pseudo_lock_measure_cycles(rdtgrp);
+		if (ret == 0)
+			ret = count;
+		debugfs_file_put(file->f_path.dentry);
+	}
+
+	return ret;
+}
+
+static const struct file_operations pseudo_measure_fops = {
+	.write = pseudo_lock_measure_trigger,
+	.open = simple_open,
+	.llseek = default_llseek,
+};
+#endif /* CONFIG_INTEL_RDT_DEBUGFS */
+
 /**
  * rdtgroup_pseudo_lock_create - Create a pseudo-locked region
  * @rdtgrp: resource group to which pseudo-lock region belongs
@@ -692,6 +858,9 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 {
 	struct pseudo_lock_region *plr = rdtgrp->plr;
 	struct task_struct *thread;
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	struct dentry *entry;
+#endif
 	int ret;
 
 	ret = pseudo_lock_region_alloc(plr);
@@ -727,11 +896,33 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 		goto out_region;
 	}
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
+					      debugfs_resctrl);
+	if (IS_ERR(plr->debugfs_dir)) {
+		ret = PTR_ERR(plr->debugfs_dir);
+		plr->debugfs_dir = NULL;
+		goto out_region;
+	}
+
+	entry = debugfs_create_file("pseudo_lock_measure", 0200,
+				    plr->debugfs_dir, rdtgrp,
+				    &pseudo_measure_fops);
+	if (IS_ERR(entry)) {
+		ret = PTR_ERR(entry);
+		goto out_debugfs;
+	}
+#endif
+
 	rdtgrp->mode = RDT_MODE_PSEUDO_LOCKED;
 	closid_free(rdtgrp->closid);
 	ret = 0;
 	goto out;
 
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+out_debugfs:
+	debugfs_remove_recursive(plr->debugfs_dir);
+#endif
 out_region:
 	pseudo_lock_region_clear(plr);
 out:
@@ -754,12 +945,19 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
  */
 void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
 {
-	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP)
+	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
 		/*
 		 * Default group cannot be a pseudo-locked region so we can
 		 * free closid here.
 		 */
 		closid_free(rdtgrp->closid);
+		goto free;
+	}
+
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
+	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
+#endif
 
+free:
 	pseudo_lock_free(rdtgrp);
 }
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
new file mode 100644
index 000000000000..3cd0fa27d5fe
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM resctrl
+
+#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PSEUDO_LOCK_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(pseudo_lock_mem_latency,
+	    TP_PROTO(u32 latency),
+	    TP_ARGS(latency),
+	    TP_STRUCT__entry(__field(u32, latency)),
+	    TP_fast_assign(__entry->latency = latency),
+	    TP_printk("latency=%u", __entry->latency)
+	   );
+
+#endif /* _TRACE_PSEUDO_LOCK_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE intel_rdt_pseudo_lock_event
+#include <trace/define_trace.h>
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 35/38] x86/intel_rdt: Create character device exposing pseudo-locked region
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (33 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 36/38] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

After a pseudo-locked region is created it needs to be made
available to user space for usage.

A character device supporting mmap() is created for each pseudo-locked
region. A user space application can now use mmap() system call to map
pseudo-locked region into its virtual address space.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt.h             |   5 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 282 +++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c    |  11 +-
 3 files changed, 296 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index c8712446f185..54fdd1ae62b9 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -138,6 +138,8 @@ struct mongroup {
  * @line_size:		size of the cache lines
  * @size:		size of pseudo-locked region in bytes
  * @kmem:		the kernel memory associated with pseudo-locked region
+ * @minor:		minor number of character device associated with this
+ *			region
  * @debugfs_dir:	pointer to this region's directory in the debugfs
  *			filesystem
  */
@@ -151,6 +153,7 @@ struct pseudo_lock_region {
 	unsigned int		line_size;
 	unsigned int		size;
 	void			*kmem;
+	unsigned int		minor;
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	struct dentry		*debugfs_dir;
 #endif
@@ -528,6 +531,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
 int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
 bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
 bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
+int rdt_pseudo_lock_init(void);
+void rdt_pseudo_lock_release(void);
 int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
 void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 0a6785f1a67b..359ff36d6f00 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -16,6 +16,7 @@
 #include <linux/cpumask.h>
 #include <linux/debugfs.h>
 #include <linux/kthread.h>
+#include <linux/mman.h>
 #include <linux/slab.h>
 #include <asm/cacheflush.h>
 #include <asm/intel-family.h>
@@ -40,6 +41,14 @@
  */
 static u64 prefetch_disable_bits;
 
+/*
+ * Major number assigned to and shared by all devices exposing
+ * pseudo-locked regions.
+ */
+static unsigned int pseudo_lock_major;
+static unsigned long pseudo_lock_minor_avail = GENMASK(MINORBITS, 0);
+static struct class *pseudo_lock_class;
+
 /**
  * get_prefetch_disable_bits - prefetch disable bits of supported platforms
  *
@@ -92,6 +101,66 @@ static u64 get_prefetch_disable_bits(void)
 }
 
 /**
+ * pseudo_lock_minor_get - Obtain available minor number
+ * @minor: Pointer to where new minor number will be stored
+ *
+ * A bitmask is used to track available minor numbers. Here the next free
+ * minor number is marked as unavailable and returned.
+ *
+ * Return: 0 on success, <0 on failure.
+ */
+static int pseudo_lock_minor_get(unsigned int *minor)
+{
+	unsigned long first_bit;
+
+	first_bit = find_first_bit(&pseudo_lock_minor_avail, MINORBITS);
+
+	if (first_bit == MINORBITS)
+		return -ENOSPC;
+
+	__clear_bit(first_bit, &pseudo_lock_minor_avail);
+	*minor = first_bit;
+
+	return 0;
+}
+
+/**
+ * pseudo_lock_minor_release - Return minor number to available
+ * @minor: The minor number made available
+ */
+static void pseudo_lock_minor_release(unsigned int minor)
+{
+	__set_bit(minor, &pseudo_lock_minor_avail);
+}
+
+/**
+ * region_find_by_minor - Locate a pseudo-lock region by inode minor number
+ * @minor: The minor number of the device representing pseudo-locked region
+ *
+ * When the character device is accessed we need to determine which
+ * pseudo-locked region it belongs to. This is done by matching the minor
+ * number of the device to the pseudo-locked region it belongs.
+ *
+ * Minor numbers are assigned at the time a pseudo-locked region is associated
+ * with a cache instance.
+ *
+ * Return: On success return pointer to resource group owning the pseudo-locked
+ *         region, NULL on failure.
+ */
+static struct rdtgroup *region_find_by_minor(unsigned int minor)
+{
+	struct rdtgroup *rdtgrp, *rdtgrp_match = NULL;
+
+	list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+		if (rdtgrp->plr && rdtgrp->plr->minor == minor) {
+			rdtgrp_match = rdtgrp;
+			break;
+		}
+	}
+	return rdtgrp_match;
+}
+
+/**
  * pseudo_lock_region_init - Initialize pseudo-lock region information
  * @plr: pseudo-lock region
  *
@@ -858,9 +927,11 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 {
 	struct pseudo_lock_region *plr = rdtgrp->plr;
 	struct task_struct *thread;
+	unsigned int new_minor;
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	struct dentry *entry;
 #endif
+	struct device *dev;
 	int ret;
 
 	ret = pseudo_lock_region_alloc(plr);
@@ -914,13 +985,55 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	}
 #endif
 
+	ret = pseudo_lock_minor_get(&new_minor);
+	if (ret < 0) {
+		rdt_last_cmd_puts("unable to obtain a new minor number\n");
+		goto out_debugfs;
+	}
+
+	/*
+	 * Unlock access but do not release the reference. The
+	 * pseudo-locked region will still be here on return.
+	 *
+	 * The mutex has to be released temporarily to avoid a potential
+	 * deadlock with the mm->mmap_sem semaphore which is obtained in
+	 * the device_create() callpath below as well as before the mmap()
+	 * callback is called.
+	 */
+	mutex_unlock(&rdtgroup_mutex);
+
+	dev = device_create(pseudo_lock_class, NULL,
+			    MKDEV(pseudo_lock_major, new_minor),
+			    rdtgrp, "%s", rdtgrp->kn->name);
+
+	mutex_lock(&rdtgroup_mutex);
+
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		rdt_last_cmd_printf("failed to create character device: %d\n",
+				    ret);
+		goto out_minor;
+	}
+
+	/* We released the mutex - check if group was removed while we did so */
+	if (rdtgrp->flags & RDT_DELETED) {
+		ret = -ENODEV;
+		goto out_device;
+	}
+
+	plr->minor = new_minor;
+
 	rdtgrp->mode = RDT_MODE_PSEUDO_LOCKED;
 	closid_free(rdtgrp->closid);
 	ret = 0;
 	goto out;
 
-#ifdef CONFIG_INTEL_RDT_DEBUGFS
+out_device:
+	device_destroy(pseudo_lock_class, MKDEV(pseudo_lock_major, new_minor));
+out_minor:
+	pseudo_lock_minor_release(new_minor);
 out_debugfs:
+#ifdef CONFIG_INTEL_RDT_DEBUGFS
 	debugfs_remove_recursive(plr->debugfs_dir);
 #endif
 out_region:
@@ -945,6 +1058,8 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
  */
 void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
 {
+	struct pseudo_lock_region *plr = rdtgrp->plr;
+
 	if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
 		/*
 		 * Default group cannot be a pseudo-locked region so we can
@@ -957,7 +1072,172 @@ void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
 #endif
+	device_destroy(pseudo_lock_class, MKDEV(pseudo_lock_major, plr->minor));
+	pseudo_lock_minor_release(plr->minor);
 
 free:
 	pseudo_lock_free(rdtgrp);
 }
+
+static int pseudo_lock_dev_open(struct inode *inode, struct file *filp)
+{
+	struct rdtgroup *rdtgrp;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	rdtgrp = region_find_by_minor(iminor(inode));
+	if (!rdtgrp) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -ENODEV;
+	}
+
+	filp->private_data = rdtgrp;
+	atomic_inc(&rdtgrp->waitcount);
+	/* Perform a non-seekable open - llseek is not supported */
+	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+
+	mutex_unlock(&rdtgroup_mutex);
+
+	return 0;
+}
+
+static int pseudo_lock_dev_release(struct inode *inode, struct file *filp)
+{
+	struct rdtgroup *rdtgrp;
+
+	mutex_lock(&rdtgroup_mutex);
+	rdtgrp = filp->private_data;
+	WARN_ON(!rdtgrp);
+	if (!rdtgrp) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -ENODEV;
+	}
+	filp->private_data = NULL;
+	atomic_dec(&rdtgrp->waitcount);
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
+static int pseudo_lock_dev_mremap(struct vm_area_struct *area)
+{
+	/* Not supported */
+	return -EINVAL;
+}
+
+static const struct vm_operations_struct pseudo_mmap_ops = {
+	.mremap = pseudo_lock_dev_mremap,
+};
+
+static int pseudo_lock_dev_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	unsigned long vsize = vma->vm_end - vma->vm_start;
+	unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
+	struct pseudo_lock_region *plr;
+	struct rdtgroup *rdtgrp;
+	unsigned long physical;
+	unsigned long psize;
+
+	mutex_lock(&rdtgroup_mutex);
+
+	rdtgrp = filp->private_data;
+	WARN_ON(!rdtgrp);
+	if (!rdtgrp) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -ENODEV;
+	}
+
+	plr = rdtgrp->plr;
+
+	/*
+	 * Task is required to run with affinity to the cpus associated
+	 * with the pseudo-locked region. If this is not the case the task
+	 * may be scheduled elsewhere and invalidate entries in the
+	 * pseudo-locked region.
+	 */
+	if (!cpumask_subset(&current->cpus_allowed, &plr->d->cpu_mask)) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -EINVAL;
+	}
+
+	physical = __pa(plr->kmem) >> PAGE_SHIFT;
+	psize = plr->size - off;
+
+	if (off > plr->size) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -ENOSPC;
+	}
+
+	/*
+	 * Ensure changes are carried directly to the memory being mapped,
+	 * do not allow copy-on-write mapping.
+	 */
+	if (!(vma->vm_flags & VM_SHARED)) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -EINVAL;
+	}
+
+	if (vsize > psize) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -ENOSPC;
+	}
+
+	memset(plr->kmem + off, 0, vsize);
+
+	if (remap_pfn_range(vma, vma->vm_start, physical + vma->vm_pgoff,
+			    vsize, vma->vm_page_prot)) {
+		mutex_unlock(&rdtgroup_mutex);
+		return -EAGAIN;
+	}
+	vma->vm_ops = &pseudo_mmap_ops;
+	mutex_unlock(&rdtgroup_mutex);
+	return 0;
+}
+
+static const struct file_operations pseudo_lock_dev_fops = {
+	.owner =	THIS_MODULE,
+	.llseek =	no_llseek,
+	.read =		NULL,
+	.write =	NULL,
+	.open =		pseudo_lock_dev_open,
+	.release =	pseudo_lock_dev_release,
+	.mmap =		pseudo_lock_dev_mmap,
+};
+
+static char *pseudo_lock_devnode(struct device *dev, umode_t *mode)
+{
+	struct rdtgroup *rdtgrp;
+
+	rdtgrp = dev_get_drvdata(dev);
+	if (mode)
+		*mode = 0600;
+	return kasprintf(GFP_KERNEL, "pseudo_lock/%s", rdtgrp->kn->name);
+}
+
+int rdt_pseudo_lock_init(void)
+{
+	int ret;
+
+	ret = register_chrdev(0, "pseudo_lock", &pseudo_lock_dev_fops);
+	if (ret < 0)
+		return ret;
+
+	pseudo_lock_major = ret;
+
+	pseudo_lock_class = class_create(THIS_MODULE, "pseudo_lock");
+	if (IS_ERR(pseudo_lock_class)) {
+		ret = PTR_ERR(pseudo_lock_class);
+		unregister_chrdev(pseudo_lock_major, "pseudo_lock");
+		return ret;
+	}
+
+	pseudo_lock_class->devnode = pseudo_lock_devnode;
+	return 0;
+}
+
+void rdt_pseudo_lock_release(void)
+{
+	class_destroy(pseudo_lock_class);
+	pseudo_lock_class = NULL;
+	unregister_chrdev(pseudo_lock_major, "pseudo_lock");
+	pseudo_lock_major = 0;
+}
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index da9d470294aa..4311a2859952 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -1862,10 +1862,16 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 		rdtgroup_default.mon.mon_data_kn = kn_mondata;
 	}
 
+	ret = rdt_pseudo_lock_init();
+	if (ret) {
+		dentry = ERR_PTR(ret);
+		goto out_mondata;
+	}
+
 	dentry = kernfs_mount(fs_type, flags, rdt_root,
 			      RDTGROUP_SUPER_MAGIC, NULL);
 	if (IS_ERR(dentry))
-		goto out_mondata;
+		goto out_psl;
 
 	if (rdt_alloc_capable)
 		static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
@@ -1883,6 +1889,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
 
 	goto out;
 
+out_psl:
+	rdt_pseudo_lock_release();
 out_mondata:
 	if (rdt_mon_capable)
 		kernfs_remove(kn_mondata);
@@ -2060,6 +2068,7 @@ static void rdt_kill_sb(struct super_block *sb)
 		reset_all_ctrls(r);
 	cdp_disable_all();
 	rmdir_all_sub();
+	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
 	static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
 	static_branch_disable_cpuslocked(&rdt_mon_enable_key);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 36/38] x86/intel_rdt: More precise L2 hit/miss measurements
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (34 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 35/38] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 37/38] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 38/38] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Intel Goldmont processors supports non-architectural precise events that
can be used to give us more insight into the success of L2 cache
pseudo-locking on these platforms.

Introduce a new measurement trigger that will enable two precise events,
MEM_LOAD_UOPS_RETIRED.L2_HIT and MEM_LOAD_UOPS_RETIRED.L2_MISS, while
accessing pseudo-locked data. A new tracepoint, pseudo_lock_l2, is
created to make these results visible to the user.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 145 ++++++++++++++++++++--
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h |  10 ++
 2 files changed, 146 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 359ff36d6f00..5cad8986923d 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -24,6 +24,7 @@
 #include "intel_rdt.h"
 
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
+#include <asm/perf_event.h>
 #define CREATE_TRACE_POINTS
 #include "intel_rdt_pseudo_lock_event.h"
 #endif
@@ -62,6 +63,9 @@ static struct class *pseudo_lock_class;
  * hardware prefetch disable bits are included here as they are documented
  * in the SDM.
  *
+ * When adding a platform here also add support for its cache events to
+ * measure_cycles_perf_fn()
+ *
  * Return:
  * If platform is supported, the bits to disable hardware prefetchers, 0
  * if platform is not supported.
@@ -100,6 +104,16 @@ static u64 get_prefetch_disable_bits(void)
 	return 0;
 }
 
+/*
+ * Helper to write 64bit value to MSR without tracing. Used when
+ * use of the cache should be restricted and use of registers used
+ * for local variables avoided.
+ */
+static inline void pseudo_wrmsrl_notrace(unsigned int msr, u64 val)
+{
+	__wrmsr(msr, (u32)(val & 0xffffffffULL), (u32)(val >> 32));
+}
+
 /**
  * pseudo_lock_minor_get - Obtain available minor number
  * @minor: Pointer to where new minor number will be stored
@@ -816,6 +830,107 @@ static int measure_cycles_lat_fn(void *_plr)
 	return 0;
 }
 
+static int measure_cycles_perf_fn(void *_plr)
+{
+	struct pseudo_lock_region *plr = _plr;
+	unsigned long long l2_hits, l2_miss;
+	u64 l2_hit_bits, l2_miss_bits;
+	u64 i;
+#ifdef CONFIG_KASAN
+	/*
+	 * The registers used for local register variables are also used
+	 * when KASAN is active. When KASAN is active we use regular variables
+	 * at the cost of including cache access latency to these variables
+	 * in the measurements.
+	 */
+	unsigned int line_size;
+	unsigned int size;
+	void *mem_r;
+#else
+	register unsigned int line_size asm("esi");
+	register unsigned int size asm("edi");
+#ifdef CONFIG_X86_64
+	register void *mem_r asm("rbx");
+#else
+	register void *mem_r asm("ebx");
+#endif /* CONFIG_X86_64 */
+#endif /* CONFIG_KASAN */
+
+	/*
+	 * Non-architectural event for the Goldmont Microarchitecture
+	 * from Intel x86 Architecture Software Developer Manual (SDM):
+	 * MEM_LOAD_UOPS_RETIRED D1H (event number)
+	 * Umask values:
+	 *     L1_HIT   01H
+	 *     L2_HIT   02H
+	 *     L1_MISS  08H
+	 *     L2_MISS  10H
+	 */
+
+	/*
+	 * Start by setting flags for IA32_PERFEVTSELx:
+	 *     OS  (Operating system mode)  0x2
+	 *     INT (APIC interrupt enable)  0x10
+	 *     EN  (Enable counter)         0x40
+	 *
+	 * Then add the Umask value and event number to select performance
+	 * event.
+	 */
+
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_ATOM_GOLDMONT:
+	case INTEL_FAM6_ATOM_GEMINI_LAKE:
+		l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1;
+		l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1;
+		break;
+	default:
+		goto out;
+	}
+
+	local_irq_disable();
+	/*
+	 * Call wrmsr direcly to avoid the local register variables from
+	 * being overwritten due to reordering of their assignment with
+	 * the wrmsr calls.
+	 */
+	__wrmsr(MSR_MISC_FEATURE_CONTROL, prefetch_disable_bits, 0x0);
+	/* Disable events and reset counters */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
+	/* Set and enable the L2 counters */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits);
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits);
+	mem_r = plr->kmem;
+	size = plr->size;
+	line_size = plr->line_size;
+	for (i = 0; i < size; i += line_size) {
+		asm volatile("mov (%0,%1,1), %%eax\n\t"
+			     :
+			     : "r" (mem_r), "r" (i)
+			     : "%eax", "memory");
+	}
+	/*
+	 * Call wrmsr directly (no tracing) to not influence
+	 * the cache access counters as they are disabled.
+	 */
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0,
+			      l2_hit_bits & ~(0x40ULL << 16));
+	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1,
+			      l2_miss_bits & ~(0x40ULL << 16));
+	l2_hits = native_read_pmc(0);
+	l2_miss = native_read_pmc(1);
+	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
+	local_irq_enable();
+	trace_pseudo_lock_l2(l2_hits, l2_miss);
+
+out:
+	plr->thread_done = 1;
+	wake_up_interruptible(&plr->lock_thread_wq);
+	return 0;
+}
+
 /**
  * pseudo_lock_measure_cycles - Trigger latency measure to pseudo-locked region
  *
@@ -826,12 +941,12 @@ static int measure_cycles_lat_fn(void *_plr)
  *
  * Return: 0 on success, <0 on failure
  */
-static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp)
+static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp, int sel)
 {
 	struct pseudo_lock_region *plr = rdtgrp->plr;
 	struct task_struct *thread;
 	unsigned int cpu;
-	int ret;
+	int ret = -1;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
@@ -848,9 +963,19 @@ static int pseudo_lock_measure_cycles(struct rdtgroup *rdtgrp)
 		goto out;
 	}
 
-	thread = kthread_create_on_node(measure_cycles_lat_fn, plr,
-					cpu_to_node(cpu),
-					"pseudo_lock_measure/%u", cpu);
+	if (sel == 1)
+		thread = kthread_create_on_node(measure_cycles_lat_fn, plr,
+						cpu_to_node(cpu),
+						"pseudo_lock_measure/%u",
+						cpu);
+	else if (sel == 2)
+		thread = kthread_create_on_node(measure_cycles_perf_fn, plr,
+						cpu_to_node(cpu),
+						"pseudo_lock_measure/%u",
+						cpu);
+	else
+		goto out;
+
 	if (IS_ERR(thread)) {
 		ret = PTR_ERR(thread);
 		goto out;
@@ -879,19 +1004,21 @@ static ssize_t pseudo_lock_measure_trigger(struct file *file,
 	size_t buf_size;
 	char buf[32];
 	int ret;
-	bool bv;
+	int sel;
 
 	buf_size = min(count, (sizeof(buf) - 1));
 	if (copy_from_user(buf, user_buf, buf_size))
 		return -EFAULT;
 
 	buf[buf_size] = '\0';
-	ret = strtobool(buf, &bv);
-	if (ret == 0 && bv) {
+	ret = kstrtoint(buf, 10, &sel);
+	if (ret == 0) {
+		if (sel != 1 && sel != 2)
+			return -EINVAL;
 		ret = debugfs_file_get(file->f_path.dentry);
 		if (unlikely(ret))
 			return ret;
-		ret = pseudo_lock_measure_cycles(rdtgrp);
+		ret = pseudo_lock_measure_cycles(rdtgrp, sel);
 		if (ret == 0)
 			ret = count;
 		debugfs_file_put(file->f_path.dentry);
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
index 3cd0fa27d5fe..efad50d2ee2f 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -15,6 +15,16 @@ TRACE_EVENT(pseudo_lock_mem_latency,
 	    TP_printk("latency=%u", __entry->latency)
 	   );
 
+TRACE_EVENT(pseudo_lock_l2,
+	    TP_PROTO(u64 l2_hits, u64 l2_miss),
+	    TP_ARGS(l2_hits, l2_miss),
+	    TP_STRUCT__entry(__field(u64, l2_hits)
+			     __field(u64, l2_miss)),
+	    TP_fast_assign(__entry->l2_hits = l2_hits;
+			   __entry->l2_miss = l2_miss;),
+	    TP_printk("hits=%llu miss=%llu",
+		      __entry->l2_hits, __entry->l2_miss));
+
 #endif /* _TRACE_PSEUDO_LOCK_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 37/38] x86/intel_rdt: Support L3 cache performance event of Broadwell
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (35 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 36/38] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  2018-05-22 11:29 ` [PATCH V4 38/38] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Broadwell microarchitecture supports pseudo-locking. Add support for
the L3 cache related performance events of these systems so that
the success of pseudo-locking can be measured more accurately on these
platforms.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c       | 56 +++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h | 10 ++++
 2 files changed, 66 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 5cad8986923d..3a2772853f4a 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -832,6 +832,8 @@ static int measure_cycles_lat_fn(void *_plr)
 
 static int measure_cycles_perf_fn(void *_plr)
 {
+	unsigned long long l3_hits = 0, l3_miss = 0;
+	u64 l3_hit_bits = 0, l3_miss_bits = 0;
 	struct pseudo_lock_region *plr = _plr;
 	unsigned long long l2_hits, l2_miss;
 	u64 l2_hit_bits, l2_miss_bits;
@@ -865,6 +867,16 @@ static int measure_cycles_perf_fn(void *_plr)
 	 *     L2_HIT   02H
 	 *     L1_MISS  08H
 	 *     L2_MISS  10H
+	 *
+	 * On Broadwell Microarchitecture the MEM_LOAD_UOPS_RETIRED event
+	 * has two "no fix" errata associated with it: BDM35 and BDM100. On
+	 * this platform we use the following events instead:
+	 *  L2_RQSTS 24H (Documented in https://download.01.org/perfmon/BDW/)
+	 *       REFERENCES FFH
+	 *       MISS       3FH
+	 *  LONGEST_LAT_CACHE 2EH (Documented in SDM)
+	 *       REFERENCE 4FH
+	 *       MISS      41H
 	 */
 
 	/*
@@ -883,6 +895,14 @@ static int measure_cycles_perf_fn(void *_plr)
 		l2_hit_bits = (0x52ULL << 16) | (0x2 << 8) | 0xd1;
 		l2_miss_bits = (0x52ULL << 16) | (0x10 << 8) | 0xd1;
 		break;
+	case INTEL_FAM6_BROADWELL_X:
+		/* On BDW the l2_hit_bits count references, not hits */
+		l2_hit_bits = (0x52ULL << 16) | (0xff << 8) | 0x24;
+		l2_miss_bits = (0x52ULL << 16) | (0x3f << 8) | 0x24;
+		/* On BDW the l3_hit_bits count references, not hits */
+		l3_hit_bits = (0x52ULL << 16) | (0x4f << 8) | 0x2e;
+		l3_miss_bits = (0x52ULL << 16) | (0x41 << 8) | 0x2e;
+		break;
 	default:
 		goto out;
 	}
@@ -899,9 +919,21 @@ static int measure_cycles_perf_fn(void *_plr)
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, 0x0);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0, 0x0);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 1, 0x0);
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 2, 0x0);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_PERFCTR0 + 3, 0x0);
+	}
 	/* Set and enable the L2 counters */
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0, l2_hit_bits);
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1, l2_miss_bits);
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+				      l3_hit_bits);
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+				      l3_miss_bits);
+	}
 	mem_r = plr->kmem;
 	size = plr->size;
 	line_size = plr->line_size;
@@ -919,11 +951,35 @@ static int measure_cycles_perf_fn(void *_plr)
 			      l2_hit_bits & ~(0x40ULL << 16));
 	pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 1,
 			      l2_miss_bits & ~(0x40ULL << 16));
+	if (l3_hit_bits > 0) {
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 2,
+				      l3_hit_bits & ~(0x40ULL << 16));
+		pseudo_wrmsrl_notrace(MSR_ARCH_PERFMON_EVENTSEL0 + 3,
+				      l3_miss_bits & ~(0x40ULL << 16));
+	}
 	l2_hits = native_read_pmc(0);
 	l2_miss = native_read_pmc(1);
+	if (l3_hit_bits > 0) {
+		l3_hits = native_read_pmc(2);
+		l3_miss = native_read_pmc(3);
+	}
 	wrmsr(MSR_MISC_FEATURE_CONTROL, 0x0, 0x0);
 	local_irq_enable();
+	/*
+	 * On BDW we count references and misses, need to adjust. Sometimes
+	 * the "hits" counter is a bit more than the references, for
+	 * example, x references but x + 1 hits. To not report invalid
+	 * hit values in this case we treat that as misses eaqual to
+	 * references.
+	 */
+	if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+		l2_hits -= (l2_miss > l2_hits ? l2_hits : l2_miss);
 	trace_pseudo_lock_l2(l2_hits, l2_miss);
+	if (l3_hit_bits > 0) {
+		if (boot_cpu_data.x86_model == INTEL_FAM6_BROADWELL_X)
+			l3_hits -= (l3_miss > l3_hits ? l3_hits : l3_miss);
+		trace_pseudo_lock_l3(l3_hits, l3_miss);
+	}
 
 out:
 	plr->thread_done = 1;
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
index efad50d2ee2f..2c041e6d9f05 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock_event.h
@@ -25,6 +25,16 @@ TRACE_EVENT(pseudo_lock_l2,
 	    TP_printk("hits=%llu miss=%llu",
 		      __entry->l2_hits, __entry->l2_miss));
 
+TRACE_EVENT(pseudo_lock_l3,
+	    TP_PROTO(u64 l3_hits, u64 l3_miss),
+	    TP_ARGS(l3_hits, l3_miss),
+	    TP_STRUCT__entry(__field(u64, l3_hits)
+			     __field(u64, l3_miss)),
+	    TP_fast_assign(__entry->l3_hits = l3_hits;
+			   __entry->l3_miss = l3_miss;),
+	    TP_printk("hits=%llu miss=%llu",
+		      __entry->l3_hits, __entry->l3_miss));
+
 #endif /* _TRACE_PSEUDO_LOCK_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 38/38] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
  2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
                   ` (36 preceding siblings ...)
  2018-05-22 11:29 ` [PATCH V4 37/38] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
@ 2018-05-22 11:29 ` Reinette Chatre
  37 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 11:29 UTC (permalink / raw)
  To: tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86,
	linux-kernel, Reinette Chatre

Deeper C-states impact cache content through shrinking of the cache or
flushing entire cache to memory before reducing power to the cache.
Deeper C-states will thus negatively impact the pseudo-locked regions.

To avoid impacting pseudo-locked regions C-states are limited on
pseudo-locked region creation so that cores associated with the
pseudo-locked region are prevented from entering deeper C-states.
This is accomplished by requesting a CPU latency target which will
prevent the core from entering C6 across all supported platforms.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 Documentation/x86/intel_rdt_ui.txt          |  4 +-
 arch/x86/kernel/cpu/intel_rdt.h             |  2 +
 arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c | 87 ++++++++++++++++++++++++++++-
 3 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index 70ec9c44b1ee..e77f75350f72 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -461,8 +461,8 @@ in the cache via carefully configuring the CAT feature and controlling
 application behavior. There is no guarantee that data is placed in
 cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
 “locked” data from cache. Power management C-states may shrink or
-power off cache. It is thus recommended to limit the processor maximum
-C-state, for example, by setting the processor.max_cstate kernel parameter.
+power off cache. Deeper C-states will automatically be restricted on
+pseudo-locked region creation.
 
 It is required that an application using a pseudo-locked region runs
 with affinity to the cores (or a subset of the cores) associated
diff --git a/arch/x86/kernel/cpu/intel_rdt.h b/arch/x86/kernel/cpu/intel_rdt.h
index 54fdd1ae62b9..533d2b42c795 100644
--- a/arch/x86/kernel/cpu/intel_rdt.h
+++ b/arch/x86/kernel/cpu/intel_rdt.h
@@ -142,6 +142,7 @@ struct mongroup {
  *			region
  * @debugfs_dir:	pointer to this region's directory in the debugfs
  *			filesystem
+ * @pm_reqs:		Power management QoS requests related to this region
  */
 struct pseudo_lock_region {
 	struct rdt_resource	*r;
@@ -157,6 +158,7 @@ struct pseudo_lock_region {
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	struct dentry		*debugfs_dir;
 #endif
+	struct list_head	pm_reqs;
 };
 
 /**
diff --git a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
index 3a2772853f4a..cd3890a926e0 100644
--- a/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
+++ b/arch/x86/kernel/cpu/intel_rdt_pseudo_lock.c
@@ -17,6 +17,7 @@
 #include <linux/debugfs.h>
 #include <linux/kthread.h>
 #include <linux/mman.h>
+#include <linux/pm_qos.h>
 #include <linux/slab.h>
 #include <asm/cacheflush.h>
 #include <asm/intel-family.h>
@@ -175,6 +176,76 @@ static struct rdtgroup *region_find_by_minor(unsigned int minor)
 }
 
 /**
+ * pseudo_lock_pm_req - A power management QoS request list entry
+ * @list:	Entry within the @pm_reqs list for a pseudo-locked region
+ * @req:	PM QoS request
+ */
+struct pseudo_lock_pm_req {
+	struct list_head list;
+	struct dev_pm_qos_request req;
+};
+
+static void pseudo_lock_cstates_relax(struct pseudo_lock_region *plr)
+{
+	struct pseudo_lock_pm_req *pm_req, *next;
+
+	list_for_each_entry_safe(pm_req, next, &plr->pm_reqs, list) {
+		dev_pm_qos_remove_request(&pm_req->req);
+		list_del(&pm_req->list);
+		kfree(pm_req);
+	}
+}
+
+/**
+ * pseudo_lock_cstates_constrain - Restrict cores from entering C6
+ *
+ * To prevent the cache from being affected by power management entering
+ * C6 has to be avoided. This is accomplished by requesting a latency
+ * requirement lower than lowest C6 exit latency of all supported
+ * platforms as found in the cpuidle state tables in the intel_idle driver.
+ * At this time it is possible to do so with a single latency requirement
+ * for all supported platforms.
+ *
+ * Since Goldmont is supported, which is affected by X86_BUG_MONITOR,
+ * the ACPI latencies need to be considered while keeping in mind that C2
+ * may be set to map to deeper sleep states. In this case the latency
+ * requirement needs to prevent entering C2 also.
+ */
+static int pseudo_lock_cstates_constrain(struct pseudo_lock_region *plr)
+{
+	struct pseudo_lock_pm_req *pm_req;
+	int cpu;
+	int ret;
+
+	for_each_cpu(cpu, &plr->d->cpu_mask) {
+		pm_req = kzalloc(sizeof(*pm_req), GFP_KERNEL);
+		if (!pm_req) {
+			rdt_last_cmd_puts("fail allocating mem for PM QoS\n");
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		ret = dev_pm_qos_add_request(get_cpu_device(cpu),
+					     &pm_req->req,
+					     DEV_PM_QOS_RESUME_LATENCY,
+					     30);
+		if (ret < 0) {
+			rdt_last_cmd_printf("fail to add latency req cpu%d\n",
+					    cpu);
+			kfree(pm_req);
+			ret = -1;
+			goto out_err;
+		}
+		list_add(&pm_req->list, &plr->pm_reqs);
+	}
+
+	return 0;
+
+out_err:
+	pseudo_lock_cstates_relax(plr);
+	return ret;
+}
+
+/**
  * pseudo_lock_region_init - Initialize pseudo-lock region information
  * @plr: pseudo-lock region
  *
@@ -239,6 +310,7 @@ static int pseudo_lock_init(struct rdtgroup *rdtgrp)
 		return -ENOMEM;
 
 	init_waitqueue_head(&plr->lock_thread_wq);
+	INIT_LIST_HEAD(&plr->pm_reqs);
 	rdtgrp->plr = plr;
 	return 0;
 }
@@ -1121,6 +1193,12 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	if (ret < 0)
 		return ret;
 
+	ret = pseudo_lock_cstates_constrain(plr);
+	if (ret < 0) {
+		ret = -EINVAL;
+		goto out_region;
+	}
+
 	plr->thread_done = 0;
 
 	thread = kthread_create_on_node(pseudo_lock_fn, rdtgrp,
@@ -1129,7 +1207,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	if (IS_ERR(thread)) {
 		ret = PTR_ERR(thread);
 		rdt_last_cmd_printf("locking thread returned error %d\n", ret);
-		goto out_region;
+		goto out_cstates;
 	}
 
 	kthread_bind(thread, plr->cpu);
@@ -1147,7 +1225,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 		 * empty pseudo-locking loop.
 		 */
 		rdt_last_cmd_puts("locking thread interrupted\n");
-		goto out_region;
+		goto out_cstates;
 	}
 
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
@@ -1156,7 +1234,7 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 	if (IS_ERR(plr->debugfs_dir)) {
 		ret = PTR_ERR(plr->debugfs_dir);
 		plr->debugfs_dir = NULL;
-		goto out_region;
+		goto out_cstates;
 	}
 
 	entry = debugfs_create_file("pseudo_lock_measure", 0200,
@@ -1219,6 +1297,8 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	debugfs_remove_recursive(plr->debugfs_dir);
 #endif
+out_cstates:
+	pseudo_lock_cstates_relax(plr);
 out_region:
 	pseudo_lock_region_clear(plr);
 out:
@@ -1252,6 +1332,7 @@ void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp)
 		goto free;
 	}
 
+	pseudo_lock_cstates_relax(plr);
 #ifdef CONFIG_INTEL_RDT_DEBUGFS
 	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
 #endif
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-22 11:29 ` [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
@ 2018-05-22 19:43   ` Greg KH
  2018-05-22 21:02     ` Reinette Chatre
  0 siblings, 1 reply; 46+ messages in thread
From: Greg KH @ 2018-05-22 19:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, tony.luck, vikas.shivappa, gavin.hindman,
	jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
> @@ -149,6 +151,9 @@ struct pseudo_lock_region {
>  	unsigned int		line_size;
>  	unsigned int		size;
>  	void			*kmem;
> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> +	struct dentry		*debugfs_dir;
> +#endif

Who cares, just always have this here, it's not going to save you
anything to #ifdef the .c code everywhere just for this one pointer.

> @@ -174,6 +180,9 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
>  		plr->d->plr = NULL;
>  	plr->d = NULL;
>  	plr->cbm = 0;
> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> +	plr->debugfs_dir = NULL;
> +#endif

See?  Ick.

> +	ret = strtobool(buf, &bv);
> +	if (ret == 0 && bv) {
> +		ret = debugfs_file_get(file->f_path.dentry);
> +		if (unlikely(ret))
> +			return ret;

Only ever use unlikely/likely if you can measure the performance
difference.  Hint, you can't do that here, it's not needed at all.

> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> +	plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
> +					      debugfs_resctrl);
> +	if (IS_ERR(plr->debugfs_dir)) {
> +		ret = PTR_ERR(plr->debugfs_dir);
> +		plr->debugfs_dir = NULL;
> +		goto out_region;

Ick no, you never need to care about the return value of a debugfs call.
You code should never do something different if a debugfs call succeeds
or fails.  And you are checking it wrong, even if you did want to do
this :)

> +	}
> +
> +	entry = debugfs_create_file("pseudo_lock_measure", 0200,
> +				    plr->debugfs_dir, rdtgrp,
> +				    &pseudo_measure_fops);
> +	if (IS_ERR(entry)) {
> +		ret = PTR_ERR(entry);
> +		goto out_debugfs;
> +	}

Again, you don't care, don't do this.

> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> +	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
> +#endif

Don't put ifdefs in .c files, it's not the Linux way at all.  You can
make this a lot simpler/easier to maintain over time if you do not.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-22 19:43   ` Greg KH
@ 2018-05-22 21:02     ` Reinette Chatre
  2018-05-23  8:05       ` Greg KH
  0 siblings, 1 reply; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 21:02 UTC (permalink / raw)
  To: Greg KH
  Cc: tglx, fenghua.yu, tony.luck, vikas.shivappa, gavin.hindman,
	jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

Hi Greg,

Thank you very much for taking a look.

On 5/22/2018 12:43 PM, Greg KH wrote:
> On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
>> @@ -149,6 +151,9 @@ struct pseudo_lock_region {
>>  	unsigned int		line_size;
>>  	unsigned int		size;
>>  	void			*kmem;
>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>> +	struct dentry		*debugfs_dir;
>> +#endif
> 
> Who cares, just always have this here, it's not going to save you
> anything to #ifdef the .c code everywhere just for this one pointer.

ok

> 
>> @@ -174,6 +180,9 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
>>  		plr->d->plr = NULL;
>>  	plr->d = NULL;
>>  	plr->cbm = 0;
>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>> +	plr->debugfs_dir = NULL;
>> +#endif
> 
> See?  Ick.
> 
>> +	ret = strtobool(buf, &bv);
>> +	if (ret == 0 && bv) {
>> +		ret = debugfs_file_get(file->f_path.dentry);
>> +		if (unlikely(ret))
>> +			return ret;
> 
> Only ever use unlikely/likely if you can measure the performance
> difference.  Hint, you can't do that here, it's not needed at all.

Here my intention was to follow the current best practices and in the
kernel source I am working with eight of the ten usages of
debugfs_file_get() is followed by an unlikely(). My assumption was thus
that this is a best practice. Thanks for catching this - I'll change it.

>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>> +	plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
>> +					      debugfs_resctrl);
>> +	if (IS_ERR(plr->debugfs_dir)) {
>> +		ret = PTR_ERR(plr->debugfs_dir);
>> +		plr->debugfs_dir = NULL;
>> +		goto out_region;
> 
> Ick no, you never need to care about the return value of a debugfs call.
> You code should never do something different if a debugfs call succeeds
> or fails.  And you are checking it wrong, even if you did want to do
> this :)

Ah - I see I need to be using IS_ERR_OR_NULL() instead of IS_ERR()? If
this is the case then please note that there seems to be quite a few
debugfs_create_dir() calls within the kernel that have the same issue.

>> +	}
>> +
>> +	entry = debugfs_create_file("pseudo_lock_measure", 0200,
>> +				    plr->debugfs_dir, rdtgrp,
>> +				    &pseudo_measure_fops);
>> +	if (IS_ERR(entry)) {
>> +		ret = PTR_ERR(entry);
>> +		goto out_debugfs;
>> +	}
> 
> Again, you don't care, don't do this.
> 
>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>> +	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
>> +#endif
> 
> Don't put ifdefs in .c files, it's not the Linux way at all.  You can
> make this a lot simpler/easier to maintain over time if you do not.

My mistake - I assumed this would be ok based on my interpretation of
how CONFIG_GENERIC_IRQ_DEBUGFS is used.

I could rework the debugfs code to be contained in a new debugfs
specific .c file that is only compiled if the configuration is set. The
ifdefs will then be restricted to a .h file that contains the
declarations of these debugfs functions with empty variants when the
user did not select the debugfs config option.

Would that be acceptable to you?

Reinette

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details
  2018-05-22 11:29 ` [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details Reinette Chatre
@ 2018-05-22 21:03   ` Randy Dunlap
  2018-05-22 21:09     ` Reinette Chatre
  0 siblings, 1 reply; 46+ messages in thread
From: Randy Dunlap @ 2018-05-22 21:03 UTC (permalink / raw)
  To: Reinette Chatre, tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

On 05/22/2018 04:29 AM, Reinette Chatre wrote:
> With cache regions now explicitly marked as "shareable" or "exclusive"
> we would like to communicate to the user how portions of the cache
> are used.
> 
> Introduce "bit_usage" that indicates for each resource
> how portions of the cache are configured to be used.
> 
> To assist the user to distinguish whether the sharing is from software or
> hardware we add the following annotation:
> 
> 0 - currently unused
> X - currently available for sharing and used by software and hardware
> H - currently used by hardware only but available for software use
> S - currently used and shareable by software only
> E - currently used exclusively by one resource group
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 79 ++++++++++++++++++++++++++++++++
>  1 file changed, 79 insertions(+)
> 

Hi,

All occurrences of seq_puts(f, one_char_string); should instead be
seq_putc(f, one_char);

Same for patch 14.

> diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> index 5b78b4aa1a50..d0040f83532d 100644
> --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
> @@ -714,6 +714,78 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +/**
> + * rdt_bit_usage_show - Display current usage of resources
> + *
> + * A domain is a shared resource that can now be allocated differently. Here
> + * we display the current regions of the domain as an annotated bitmask.
> + * For each domain of this resource its allocation bitmask
> + * is annotated as below to indicate the current usage of the corresponding bit:
> + *   0 - currently unused
> + *   X - currently available for sharing and used by software and hardware
> + *   H - currently used by hardware only but available for software use
> + *   S - currently used and shareable by software only
> + *   E - currently used exclusively by one resource group
> + */
> +static int rdt_bit_usage_show(struct kernfs_open_file *of,
> +			      struct seq_file *seq, void *v)
> +{
> +	struct rdt_resource *r = of->kn->parent->priv;
> +	u32 sw_shareable, hw_shareable, exclusive;
> +	struct rdt_domain *dom;
> +	int i, hwb, swb, excl;
> +	enum rdtgrp_mode mode;
> +	bool sep = false;
> +	u32 *ctrl;
> +
> +	mutex_lock(&rdtgroup_mutex);
> +	hw_shareable = r->cache.shareable_bits;
> +	list_for_each_entry(dom, &r->domains, list) {
> +		if (sep)
> +			seq_puts(seq, ";");
> +		ctrl = dom->ctrl_val;
> +		sw_shareable = 0;
> +		exclusive = 0;
> +		seq_printf(seq, "%d=", dom->id);
> +		for (i = 0; i < r->num_closid; i++, ctrl++) {
> +			if (!closid_allocated(i))
> +				continue;
> +			mode = rdtgroup_mode_by_closid(i);
> +			switch (mode) {
> +			case RDT_MODE_SHAREABLE:
> +				sw_shareable |= *ctrl;
> +				break;
> +			case RDT_MODE_EXCLUSIVE:
> +				exclusive |= *ctrl;
> +				break;
> +			case RDT_NUM_MODES:
> +				WARN(1,
> +				     "invalid mode for closid %d\n", i);
> +				break;
> +			}
> +		}
> +		for (i = r->cache.cbm_len - 1; i >= 0; i--) {
> +			hwb = test_bit(i, (unsigned long *)&hw_shareable);
> +			swb = test_bit(i, (unsigned long *)&sw_shareable);
> +			excl = test_bit(i, (unsigned long *)&exclusive);
> +			if (hwb && swb)
> +				seq_puts(seq, "X");
> +			else if (hwb && !swb)
> +				seq_puts(seq, "H");
> +			else if (!hwb && swb)
> +				seq_puts(seq, "S");
> +			else if (excl)
> +				seq_puts(seq, "E");
> +			else /* Unused bits remain */
> +				seq_puts(seq, "0");
> +		}
> +		sep = true;
> +	}
> +	seq_puts(seq, "\n");
> +	mutex_unlock(&rdtgroup_mutex);
> +	return 0;
> +}


-- 
~Randy

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details
  2018-05-22 21:03   ` Randy Dunlap
@ 2018-05-22 21:09     ` Reinette Chatre
  0 siblings, 0 replies; 46+ messages in thread
From: Reinette Chatre @ 2018-05-22 21:09 UTC (permalink / raw)
  To: Randy Dunlap, tglx, fenghua.yu, tony.luck, vikas.shivappa
  Cc: gavin.hindman, jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

Hi Randy,

On 5/22/2018 2:03 PM, Randy Dunlap wrote:
> On 05/22/2018 04:29 AM, Reinette Chatre wrote:
>> With cache regions now explicitly marked as "shareable" or "exclusive"
>> we would like to communicate to the user how portions of the cache
>> are used.
>>
>> Introduce "bit_usage" that indicates for each resource
>> how portions of the cache are configured to be used.
>>
>> To assist the user to distinguish whether the sharing is from software or
>> hardware we add the following annotation:
>>
>> 0 - currently unused
>> X - currently available for sharing and used by software and hardware
>> H - currently used by hardware only but available for software use
>> S - currently used and shareable by software only
>> E - currently used exclusively by one resource group
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>  arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 79 ++++++++++++++++++++++++++++++++
>>  1 file changed, 79 insertions(+)
>>
> 
> Hi,
> 
> All occurrences of seq_puts(f, one_char_string); should instead be
> seq_putc(f, one_char);
> 
> Same for patch 14.

Thank you very much for catching this. Will do.

Reinette

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-22 21:02     ` Reinette Chatre
@ 2018-05-23  8:05       ` Greg KH
  2018-05-23 17:19         ` Reinette Chatre
  0 siblings, 1 reply; 46+ messages in thread
From: Greg KH @ 2018-05-23  8:05 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, tony.luck, vikas.shivappa, gavin.hindman,
	jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

On Tue, May 22, 2018 at 02:02:37PM -0700, Reinette Chatre wrote:
> Hi Greg,
> 
> Thank you very much for taking a look.
> 
> On 5/22/2018 12:43 PM, Greg KH wrote:
> > On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
> >> @@ -149,6 +151,9 @@ struct pseudo_lock_region {
> >>  	unsigned int		line_size;
> >>  	unsigned int		size;
> >>  	void			*kmem;
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +	struct dentry		*debugfs_dir;
> >> +#endif
> > 
> > Who cares, just always have this here, it's not going to save you
> > anything to #ifdef the .c code everywhere just for this one pointer.
> 
> ok
> 
> > 
> >> @@ -174,6 +180,9 @@ static void pseudo_lock_region_clear(struct pseudo_lock_region *plr)
> >>  		plr->d->plr = NULL;
> >>  	plr->d = NULL;
> >>  	plr->cbm = 0;
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +	plr->debugfs_dir = NULL;
> >> +#endif
> > 
> > See?  Ick.
> > 
> >> +	ret = strtobool(buf, &bv);
> >> +	if (ret == 0 && bv) {
> >> +		ret = debugfs_file_get(file->f_path.dentry);
> >> +		if (unlikely(ret))
> >> +			return ret;
> > 
> > Only ever use unlikely/likely if you can measure the performance
> > difference.  Hint, you can't do that here, it's not needed at all.
> 
> Here my intention was to follow the current best practices and in the
> kernel source I am working with eight of the ten usages of
> debugfs_file_get() is followed by an unlikely(). My assumption was thus
> that this is a best practice. Thanks for catching this - I'll change it.

Really?  That's some horrible examples, any pointers to them?  I think I
need to do a massive sweep of the kernel tree and fix up all of this
crud so that people don't keep cut/paste the same bad code everywhere.

> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +	plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
> >> +					      debugfs_resctrl);
> >> +	if (IS_ERR(plr->debugfs_dir)) {
> >> +		ret = PTR_ERR(plr->debugfs_dir);
> >> +		plr->debugfs_dir = NULL;
> >> +		goto out_region;
> > 
> > Ick no, you never need to care about the return value of a debugfs call.
> > You code should never do something different if a debugfs call succeeds
> > or fails.  And you are checking it wrong, even if you did want to do
> > this :)
> 
> Ah - I see I need to be using IS_ERR_OR_NULL() instead of IS_ERR()? If
> this is the case then please note that there seems to be quite a few
> debugfs_create_dir() calls within the kernel that have the same issue.

Again, they are all wrong :)

Just ignore the return value, unless it is a directory, and then just
save it like you are here.  Don't check the value, you can always pass
it into a future debugfs call with no problems.

> >> +	}
> >> +
> >> +	entry = debugfs_create_file("pseudo_lock_measure", 0200,
> >> +				    plr->debugfs_dir, rdtgrp,
> >> +				    &pseudo_measure_fops);
> >> +	if (IS_ERR(entry)) {
> >> +		ret = PTR_ERR(entry);
> >> +		goto out_debugfs;
> >> +	}
> > 
> > Again, you don't care, don't do this.
> > 
> >> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
> >> +	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
> >> +#endif
> > 
> > Don't put ifdefs in .c files, it's not the Linux way at all.  You can
> > make this a lot simpler/easier to maintain over time if you do not.
> 
> My mistake - I assumed this would be ok based on my interpretation of
> how CONFIG_GENERIC_IRQ_DEBUGFS is used.
> 
> I could rework the debugfs code to be contained in a new debugfs
> specific .c file that is only compiled if the configuration is set. The
> ifdefs will then be restricted to a .h file that contains the
> declarations of these debugfs functions with empty variants when the
> user did not select the debugfs config option.
> 
> Would that be acceptable to you?

Yes, that is the correct way to do this.

But why would someone _not_ want this option?  Why not always just
include the functionality, that way you don't have to ask someone to
rebuild a kernel if you need that debug information.  And distros will
always enable the option anyway, so it's not like you are keeping things
"smaller", if you disable debugfs, all of that code should just compile
away to almost nothing anyway.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-23  8:05       ` Greg KH
@ 2018-05-23 17:19         ` Reinette Chatre
  2018-05-23 17:27           ` Greg KH
  0 siblings, 1 reply; 46+ messages in thread
From: Reinette Chatre @ 2018-05-23 17:19 UTC (permalink / raw)
  To: Greg KH
  Cc: tglx, fenghua.yu, tony.luck, vikas.shivappa, gavin.hindman,
	jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

Hi Greg,

On 5/23/2018 1:05 AM, Greg KH wrote:
> On Tue, May 22, 2018 at 02:02:37PM -0700, Reinette Chatre wrote:
>> On 5/22/2018 12:43 PM, Greg KH wrote:
>>> On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
>>>> +	ret = strtobool(buf, &bv);
>>>> +	if (ret == 0 && bv) {
>>>> +		ret = debugfs_file_get(file->f_path.dentry);
>>>> +		if (unlikely(ret))
>>>> +			return ret;
>>>
>>> Only ever use unlikely/likely if you can measure the performance
>>> difference.  Hint, you can't do that here, it's not needed at all.
>>
>> Here my intention was to follow the current best practices and in the
>> kernel source I am working with eight of the ten usages of
>> debugfs_file_get() is followed by an unlikely(). My assumption was thus
>> that this is a best practice. Thanks for catching this - I'll change it.
> 
> Really?  That's some horrible examples, any pointers to them?  I think I
> need to do a massive sweep of the kernel tree and fix up all of this
> crud so that people don't keep cut/paste the same bad code everywhere.

As you know debugfs_file_get() is a recent addition to the kernel,
introduced in:
commit e9117a5a4bf65d8e99f060d356a04d27a60b436d
Author: Nicolai Stange <nicstange@gmail.com>
Date:   Tue Oct 31 00:15:48 2017 +0100

    debugfs: implement per-file removal protection

Following this introduction, the same author modified the now obsolete
calls of debugfs_use_file_start() to debugfs_file_get() in commits:

commit 7cda7b8f97da9382bb945d541a85cde58d5dac27
Author: Nicolai Stange <nicstange@gmail.com>
Date:   Tue Oct 31 00:15:51 2017 +0100

    IB/hfi1: convert to debugfs_file_get() and -put()


commit 69d29f9e6a53559895e6f785f6cf72daa738f132
Author: Nicolai Stange <nicstange@gmail.com>
Date:   Tue Oct 31 00:15:50 2017 +0100

    debugfs: convert to debugfs_file_get() and -put()


In the above two commits the usage of the new debugfs_file_get()
primarily follows the pattern of:
r = debugfs_file_get(d);
if (unlikely(r))

Since the author of the new interface used the pattern above in the
conversions I do not think it is unreasonable to find other developers
following suit believing that it is a best practice.

This pattern remains in the majority when looking at the output of (on
v4.17-rc5):
git grep -A 1 ' = debugfs_file_get'

>>>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>>>> +	plr->debugfs_dir = debugfs_create_dir(rdtgrp->kn->name,
>>>> +					      debugfs_resctrl);
>>>> +	if (IS_ERR(plr->debugfs_dir)) {
>>>> +		ret = PTR_ERR(plr->debugfs_dir);
>>>> +		plr->debugfs_dir = NULL;
>>>> +		goto out_region;
>>>
>>> Ick no, you never need to care about the return value of a debugfs call.
>>> You code should never do something different if a debugfs call succeeds
>>> or fails.  And you are checking it wrong, even if you did want to do
>>> this :)
>>
>> Ah - I see I need to be using IS_ERR_OR_NULL() instead of IS_ERR()? If
>> this is the case then please note that there seems to be quite a few
>> debugfs_create_dir() calls within the kernel that have the same issue.
> 
> Again, they are all wrong :)
> 
> Just ignore the return value, unless it is a directory, and then just
> save it like you are here.  Don't check the value, you can always pass
> it into a future debugfs call with no problems.

Will do. Thank you very much for the advise.

>>>> +	}
>>>> +
>>>> +	entry = debugfs_create_file("pseudo_lock_measure", 0200,
>>>> +				    plr->debugfs_dir, rdtgrp,
>>>> +				    &pseudo_measure_fops);
>>>> +	if (IS_ERR(entry)) {
>>>> +		ret = PTR_ERR(entry);
>>>> +		goto out_debugfs;
>>>> +	}
>>>
>>> Again, you don't care, don't do this.
>>>
>>>> +#ifdef CONFIG_INTEL_RDT_DEBUGFS
>>>> +	debugfs_remove_recursive(rdtgrp->plr->debugfs_dir);
>>>> +#endif
>>>
>>> Don't put ifdefs in .c files, it's not the Linux way at all.  You can
>>> make this a lot simpler/easier to maintain over time if you do not.
>>
>> My mistake - I assumed this would be ok based on my interpretation of
>> how CONFIG_GENERIC_IRQ_DEBUGFS is used.
>>
>> I could rework the debugfs code to be contained in a new debugfs
>> specific .c file that is only compiled if the configuration is set. The
>> ifdefs will then be restricted to a .h file that contains the
>> declarations of these debugfs functions with empty variants when the
>> user did not select the debugfs config option.
>>
>> Would that be acceptable to you?
> 
> Yes, that is the correct way to do this.
> 
> But why would someone _not_ want this option?  Why not always just
> include the functionality, that way you don't have to ask someone to
> rebuild a kernel if you need that debug information.  And distros will
> always enable the option anyway, so it's not like you are keeping things
> "smaller", if you disable debugfs, all of that code should just compile
> away to almost nothing anyway.

Will do.

Thank you very much for taking the time to review and provide
constructive feedback.

Reinette

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing
  2018-05-23 17:19         ` Reinette Chatre
@ 2018-05-23 17:27           ` Greg KH
  0 siblings, 0 replies; 46+ messages in thread
From: Greg KH @ 2018-05-23 17:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: tglx, fenghua.yu, tony.luck, vikas.shivappa, gavin.hindman,
	jithu.joseph, dave.hansen, mingo, hpa, x86, linux-kernel

On Wed, May 23, 2018 at 10:19:41AM -0700, Reinette Chatre wrote:
> Hi Greg,
> 
> On 5/23/2018 1:05 AM, Greg KH wrote:
> > On Tue, May 22, 2018 at 02:02:37PM -0700, Reinette Chatre wrote:
> >> On 5/22/2018 12:43 PM, Greg KH wrote:
> >>> On Tue, May 22, 2018 at 04:29:22AM -0700, Reinette Chatre wrote:
> >>>> +	ret = strtobool(buf, &bv);
> >>>> +	if (ret == 0 && bv) {
> >>>> +		ret = debugfs_file_get(file->f_path.dentry);
> >>>> +		if (unlikely(ret))
> >>>> +			return ret;
> >>>
> >>> Only ever use unlikely/likely if you can measure the performance
> >>> difference.  Hint, you can't do that here, it's not needed at all.
> >>
> >> Here my intention was to follow the current best practices and in the
> >> kernel source I am working with eight of the ten usages of
> >> debugfs_file_get() is followed by an unlikely(). My assumption was thus
> >> that this is a best practice. Thanks for catching this - I'll change it.
> > 
> > Really?  That's some horrible examples, any pointers to them?  I think I
> > need to do a massive sweep of the kernel tree and fix up all of this
> > crud so that people don't keep cut/paste the same bad code everywhere.
> 
> As you know debugfs_file_get() is a recent addition to the kernel,
> introduced in:
> commit e9117a5a4bf65d8e99f060d356a04d27a60b436d
> Author: Nicolai Stange <nicstange@gmail.com>
> Date:   Tue Oct 31 00:15:48 2017 +0100
> 
>     debugfs: implement per-file removal protection
> 
> Following this introduction, the same author modified the now obsolete
> calls of debugfs_use_file_start() to debugfs_file_get() in commits:
> 
> commit 7cda7b8f97da9382bb945d541a85cde58d5dac27
> Author: Nicolai Stange <nicstange@gmail.com>
> Date:   Tue Oct 31 00:15:51 2017 +0100
> 
>     IB/hfi1: convert to debugfs_file_get() and -put()
> 
> 
> commit 69d29f9e6a53559895e6f785f6cf72daa738f132
> Author: Nicolai Stange <nicstange@gmail.com>
> Date:   Tue Oct 31 00:15:50 2017 +0100
> 
>     debugfs: convert to debugfs_file_get() and -put()
> 
> 
> In the above two commits the usage of the new debugfs_file_get()
> primarily follows the pattern of:
> r = debugfs_file_get(d);
> if (unlikely(r))
> 
> Since the author of the new interface used the pattern above in the
> conversions I do not think it is unreasonable to find other developers
> following suit believing that it is a best practice.

Ah, that's where that pattern came from, thanks for finding it.  It was
a conversion of the "old" api in the IB code that was using likely(),
which in a way, did make sense to use (due to the way processors assume
0 is "true")

I'll work on cleaning all of these up on my next long plane ride, should
give me something to do :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-05-23 17:27 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 11:28 [PATCH V4 00/38] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 01/38] x86/intel_rdt: Document new mode, size, and bit_usage Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 02/38] x86/intel_rdt: Introduce RDT resource group mode Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 03/38] x86/intel_rdt: Associate mode with each RDT resource group Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 04/38] x86/intel_rdt: Introduce resource group's mode resctrl file Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 05/38] x86/intel_rdt: Introduce test to determine if closid is in use Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 06/38] x86/intel_rdt: Make useful functions available internally Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 07/38] x86/intel_rdt: Initialize new resource group with sane defaults Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 08/38] x86/intel_rdt: Introduce new "exclusive" mode Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 09/38] x86/intel_rdt: Enable setting of exclusive mode Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 10/38] x86/intel_rdt: Making CBM name and type more explicit Reinette Chatre
2018-05-22 11:28 ` [PATCH V4 11/38] x86/intel_rdt: Support flexible data to parsing callbacks Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 12/38] x86/intel_rdt: Ensure requested schemata respects mode Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 13/38] x86/intel_rdt: Introduce "bit_usage" to display cache allocations details Reinette Chatre
2018-05-22 21:03   ` Randy Dunlap
2018-05-22 21:09     ` Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 14/38] x86/intel_rdt: Display resource groups' allocations' size in bytes Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 15/38] x86/intel_rdt: Documentation for Cache Pseudo-Locking Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 16/38] x86/intel_rdt: Introduce the Cache Pseudo-Locking modes Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 17/38] x86/intel_rdt: Respect read and write access Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 18/38] x86/intel_rdt: Add utility to test if tasks assigned to resource group Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 19/38] x86/intel_rdt: Add utility to restrict/restore access to resctrl files Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 20/38] x86/intel_rdt: Protect against resource group changes during locking Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 21/38] x86/intel_rdt: Utilities to restrict/restore access to specific files Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 22/38] x86/intel_rdt: Add check to determine if monitoring in progress Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 23/38] x86/intel_rdt: Introduce pseudo-locked region Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 24/38] x86/intel_rdt: Support enter/exit of locksetup mode Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 25/38] x86/intel_rdt: Enable entering of pseudo-locksetup mode Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 26/38] x86/intel_rdt: Split resource group removal in two Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 27/38] x86/intel_rdt: Add utilities to test pseudo-locked region possibility Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 28/38] x86/intel_rdt: Discover supported platforms via prefetch disable bits Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 29/38] x86/intel_rdt: Pseudo-lock region creation/removal core Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 30/38] x86/intel_rdt: Support creation/removal of pseudo-locked region Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 31/38] x86/intel_rdt: resctrl files reflect pseudo-locked information Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 32/38] x86/intel_rdt: Ensure RDT cleanup on exit Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 33/38] x86/intel_rdt: Create resctrl debug area Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 34/38] x86/intel_rdt: Create debugfs files for pseudo-locking testing Reinette Chatre
2018-05-22 19:43   ` Greg KH
2018-05-22 21:02     ` Reinette Chatre
2018-05-23  8:05       ` Greg KH
2018-05-23 17:19         ` Reinette Chatre
2018-05-23 17:27           ` Greg KH
2018-05-22 11:29 ` [PATCH V4 35/38] x86/intel_rdt: Create character device exposing pseudo-locked region Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 36/38] x86/intel_rdt: More precise L2 hit/miss measurements Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 37/38] x86/intel_rdt: Support L3 cache performance event of Broadwell Reinette Chatre
2018-05-22 11:29 ` [PATCH V4 38/38] x86/intel_rdt: Limit C-states dynamically when pseudo-locking active Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).