linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Fenghua Yu" <fenghua.yu@intel.com>
To: "Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@elte.hu>,
	"H. Peter Anvin" <h.peter.anvin@intel.com>,
	"Tony Luck" <tony.luck@intel.com>, "Tejun Heo" <tj@kernel.org>,
	"Borislav Petkov" <bp@suse.de>,
	"Stephane Eranian" <eranian@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	"David Carrillo-Cisneros" <davidcc@google.com>,
	"Ravi V Shankar" <ravi.v.shankar@intel.com>,
	"Vikas Shivappa" <vikas.shivappa@linux.intel.com>,
	"Sai Prakhya" <sai.praneeth.prakhya@intel.com>
Cc: "linux-kernel" <linux-kernel@vger.kernel.org>,
	"x86" <x86@kernel.org>, "Fenghua Yu" <fenghua.yu@intel.com>
Subject: [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT
Date: Tue, 12 Jul 2016 18:02:38 -0700	[thread overview]
Message-ID: <1468371785-53231-6-git-send-email-fenghua.yu@intel.com> (raw)
In-Reply-To: <1468371785-53231-1-git-send-email-fenghua.yu@intel.com>

From: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Adds support for IA32_PQR_ASSOC MSR writes during task scheduling. For
Cache Allocation, MSR write would let the task fill in the cache
'subset' represented by the task's capacity bit mask.

The high 32 bits in the per processor MSR IA32_PQR_ASSOC represents the
CLOSid. During context switch kernel implements this by writing the
CLOSid of the task belongs to the CPU's IA32_PQR_ASSOC MSR.

This patch also implements a common software cache for IA32_PQR_MSR
(RMID 0:9, CLOSId 32:63) to be used by both Cache monitoring (CMT) and
Cache allocation. CMT updates the RMID where as cache_alloc updates the
CLOSid in the software cache. During scheduling when the new RMID/CLOSid
value is different from the cached values, IA32_PQR_MSR is updated.
Since the measured rdmsr latency for IA32_PQR_MSR is very high (~250
 cycles) this software cache is necessary to avoid reading the MSR to
compare the current CLOSid value.

The following considerations are done for the PQR MSR write so that it
minimally impacts scheduler hot path:
 - This path does not exist on any non-intel platforms.
 - On Intel platforms, this would not exist by default unless INTEL_RDT
 is enabled.
 - remains a no-op when INTEL_RDT is enabled and intel SKU does not
 support the feature.
 - When feature is available and enabled, never does MSR write till the
 user manually starts using one of the capacity bit masks.
 - MSR write is only done when there is a task with different Closid is
 scheduled on the CPU. Typically if the task groups are bound to be
 scheduled on a set of CPUs, the number of MSR writes is greatly
 reduced.
 - A per CPU cache of CLOSids is maintained to do the check so that we
 don't have to do a rdmsr which actually costs a lot of cycles.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/events/intel/cqm.c       | 26 +++-----------------------
 arch/x86/include/asm/intel_rdt.h  | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/pqr_common.h | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt.c   | 25 +++++++++++++++++++++++++
 arch/x86/kernel/process_64.c      |  6 ++++++
 5 files changed, 89 insertions(+), 23 deletions(-)
 create mode 100644 arch/x86/include/asm/pqr_common.h

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 7b5fd81..ac76e43 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -7,9 +7,9 @@
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
+#include <asm/pqr_common.h>
 #include "../perf_event.h"
 
-#define MSR_IA32_PQR_ASSOC	0x0c8f
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
 
@@ -24,32 +24,13 @@ static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
 unsigned int mbm_socket_max;
 
-/**
- * struct intel_pqr_state - State cache for the PQR MSR
- * @rmid:		The cached Resource Monitoring ID
- * @closid:		The cached Class Of Service ID
- * @rmid_usecnt:	The usage counter for rmid
- *
- * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
- * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
- * contains both parts, so we need to cache them.
- *
- * The cache also helps to avoid pointless updates if the value does
- * not change.
- */
-struct intel_pqr_state {
-	u32			rmid;
-	u32			closid;
-	int			rmid_usecnt;
-};
-
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
  * updated from a remote CPU. Both functions which modify the state
  * (intel_cqm_event_start and intel_cqm_event_stop) are called with
  * interrupts disabled, which is sufficient for the protection.
  */
-static DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
@@ -476,9 +457,9 @@ static void cqm_mask_call(struct rmid_read *rr)
  */
 static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
 {
-	struct perf_event *event;
 	struct list_head *head = &group->hw.cqm_group_entry;
 	u32 old_rmid = group->hw.cqm_rmid;
+	struct perf_event *event;
 
 	lockdep_assert_held(&cache_mutex);
 
@@ -1583,7 +1564,6 @@ static void intel_cqm_cpu_starting(unsigned int cpu)
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 	state->rmid = 0;
-	state->closid = 0;
 	state->rmid_usecnt = 0;
 
 	WARN_ON(c->x86_cache_max_rmid != cqm_max_rmid);
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 4f45dc8..afb6da3 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -3,14 +3,42 @@
 
 #ifdef CONFIG_INTEL_RDT
 
+#include <linux/jump_label.h>
+
 #define MAX_CBM_LENGTH			32
 #define IA32_L3_CBM_BASE		0xc90
 #define CBM_FROM_INDEX(x)		(IA32_L3_CBM_BASE + x)
 
+extern struct static_key rdt_enable_key;
+void __intel_rdt_sched_in(void *dummy);
+
 struct clos_cbm_table {
 	unsigned long l3_cbm;
 	unsigned int clos_refcnt;
 };
 
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ * which supports L3 cache allocation.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ * when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+	/*
+	 * Call the schedule in code only when RDT is enabled.
+	 */
+	if (static_key_false(&rdt_enable_key))
+		__intel_rdt_sched_in(NULL);
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
 #endif
 #endif
diff --git a/arch/x86/include/asm/pqr_common.h b/arch/x86/include/asm/pqr_common.h
new file mode 100644
index 0000000..11e985c
--- /dev/null
+++ b/arch/x86/include/asm/pqr_common.h
@@ -0,0 +1,27 @@
+#ifndef _X86_RDT_H_
+#define _X86_RDT_H_
+
+#define MSR_IA32_PQR_ASSOC	0x0c8f
+
+/**
+ * struct intel_pqr_state - State cache for the PQR MSR
+ * @rmid:		The cached Resource Monitoring ID
+ * @closid:		The cached Class Of Service ID
+ * @rmid_usecnt:	The usage counter for rmid
+ *
+ * The upper 32 bits of MSR_IA32_PQR_ASSOC contain closid and the
+ * lower 10 bits rmid. The update to MSR_IA32_PQR_ASSOC always
+ * contains both parts, so we need to cache them.
+ *
+ * The cache also helps to avoid pointless updates if the value does
+ * not change.
+ */
+struct intel_pqr_state {
+	u32			rmid;
+	u32			closid;
+	int			rmid_usecnt;
+};
+
+DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
+
+#endif
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6ad5b48..8379df8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,6 +24,8 @@
 
 #include <linux/slab.h>
 #include <linux/err.h>
+#include <linux/sched.h>
+#include <asm/pqr_common.h>
 #include <asm/intel_rdt.h>
 
 /*
@@ -44,12 +46,33 @@ static cpumask_t rdt_cpumask;
  */
 static cpumask_t tmp_cpumask;
 static DEFINE_MUTEX(rdt_group_mutex);
+struct static_key __read_mostly rdt_enable_key = STATIC_KEY_INIT_FALSE;
 
 struct rdt_remote_data {
 	int msr;
 	u64 val;
 };
 
+void __intel_rdt_sched_in(void *dummy)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	u32 closid = current->closid;
+
+	if (closid == state->closid)
+		return;
+
+	wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
+	state->closid = closid;
+}
+
+/*
+ * Synchronize the IA32_PQR_ASSOC MSR of all currently running tasks.
+ */
+static inline void closid_tasks_sync(void)
+{
+	on_each_cpu_mask(cpu_online_mask, __intel_rdt_sched_in, NULL, 1);
+}
+
 static inline void closid_get(u32 closid)
 {
 	struct clos_cbm_table *cct = &cctable[closid];
@@ -240,6 +263,8 @@ static int __init intel_rdt_late_init(void)
 
 	for_each_online_cpu(i)
 		rdt_cpumask_update(i);
+
+	static_key_slow_inc(&rdt_enable_key);
 	pr_info("Intel cache allocation enabled\n");
 out_err:
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6e789ca..71fc41a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -48,6 +48,7 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
+#include <asm/intel_rdt.h>
 #include <asm/xen/hypervisor.h>
 
 asmlinkage extern void ret_from_fork(void);
@@ -472,6 +473,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 			loadsegment(ss, __KERNEL_DS);
 	}
 
+	/*
+	 * Load the Intel cache allocation PQR MSR.
+	 */
+	intel_rdt_sched_in();
+
 	return prev_p;
 }
 
-- 
2.5.0

  parent reply	other threads:[~2016-07-12 22:09 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13  1:02 [PATCH 00/32] Enable Intel Resource Allocation in Resource Director Technology Fenghua Yu
2016-07-13  1:02 ` [PATCH 01/32] x86/intel_rdt: Cache Allocation documentation Fenghua Yu
2016-07-13  1:02 ` [PATCH 02/32] x86/intel_rdt: Add support for Cache Allocation detection Fenghua Yu
2016-07-26 19:00   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 03/32] x86/intel_rdt: Add Class of service management Fenghua Yu
2016-07-13  1:02 ` [PATCH 04/32] x86/intel_rdt: Add L3 cache capacity bitmask management Fenghua Yu
2016-07-22  7:12   ` Marcelo Tosatti
2016-07-22 21:43     ` Luck, Tony
2016-07-23  4:31       ` Marcelo Tosatti
2016-07-26  3:18         ` Luck, Tony
2016-07-26 17:10         ` Shivappa Vikas
2016-07-13  1:02 ` Fenghua Yu [this message]
2016-07-25 16:25   ` [PATCH 05/32] x86/intel_rdt: Implement scheduling support for Intel RDT Nilay Vaish
2016-07-25 16:31   ` Nilay Vaish
2016-07-25 18:05     ` Luck, Tony
2016-07-25 22:47       ` David Carrillo-Cisneros
2016-07-13  1:02 ` [PATCH 06/32] x86/intel_rdt: Hot cpu support for Cache Allocation Fenghua Yu
2016-07-13  9:19   ` Thomas Gleixner
2016-07-21 19:46     ` Shivappa Vikas
2016-07-14  0:40   ` David Carrillo-Cisneros
2016-07-14 22:58     ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 07/32] x86/intel_rdt: Intel haswell Cache Allocation enumeration Fenghua Yu
2016-07-13  1:02 ` [PATCH 08/32] Define CONFIG_INTEL_RDT Fenghua Yu
2016-07-13 10:25   ` Thomas Gleixner
2016-07-13 18:05     ` Yu, Fenghua
2016-07-13 21:09       ` Thomas Gleixner
2016-07-13 21:18         ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 09/32] x86/intel_rdt: Intel Code Data Prioritization detection Fenghua Yu
2016-07-13  1:02 ` [PATCH 10/32] x86/intel_rdt: Adds support to enable Code Data Prioritization Fenghua Yu
2016-07-26 19:23   ` Nilay Vaish
2016-07-26 20:32     ` Shivappa Vikas
2016-07-13  1:02 ` [PATCH 11/32] x86/intel_rdt: Class of service and capacity bitmask management for CDP Fenghua Yu
2016-07-13  1:02 ` [PATCH 12/32] x86/intel_rdt: Hot cpu update for code data prioritization Fenghua Yu
2016-07-13  1:02 ` [PATCH 13/32] Documentation, x86: Documentation for Intel resource allocation user interface Fenghua Yu
2016-07-13 12:47   ` Thomas Gleixner
2016-07-13 17:13     ` Luck, Tony
2016-07-14  6:53       ` Thomas Gleixner
2016-07-14 17:16         ` Luck, Tony
2016-07-19 12:32           ` Thomas Gleixner
2016-08-04 23:38             ` Yu, Fenghua
2016-07-27 16:20   ` Nilay Vaish
2016-07-27 16:57     ` Luck, Tony
2016-08-03 22:15   ` Marcelo Tosatti
2016-07-13  1:02 ` [PATCH 14/32] x86/cpufeatures: Get max closid and max cbm len and clean feature comments and code Fenghua Yu
2016-07-27 16:49   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 15/32] cacheinfo: Introduce cache id Fenghua Yu
2016-07-27 17:04   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 16/32] Documentation, ABI: Add a document entry for " Fenghua Yu
2016-07-13  1:02 ` [PATCH 17/32] x86, intel_cacheinfo: Enable cache id in x86 Fenghua Yu
2016-07-28  5:41   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 18/32] drivers/base/cacheinfo.c: Export some cacheinfo functions for others to use Fenghua Yu
2016-07-13  1:02 ` [PATCH 19/32] sched.h: Add rg_list and rdtgroup in task_struct Fenghua Yu
2016-07-13 12:56   ` Thomas Gleixner
2016-07-13 17:50     ` Yu, Fenghua
2016-07-28  5:53   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 20/32] magic number for rscctrl file system Fenghua Yu
2016-07-28  5:57   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 21/32] x86/intel_rdt.h: Header for inter_rdt.c Fenghua Yu
2016-07-28 14:07   ` Nilay Vaish
2016-07-13  1:02 ` [PATCH 22/32] x86/intel_rdt_rdtgroup.h: Header for user interface Fenghua Yu
2016-07-13  1:02 ` [PATCH 23/32] x86/intel_rdt.c: Extend RDT to per cache and per resources Fenghua Yu
2016-07-13 13:07   ` Thomas Gleixner
2016-07-13 17:40     ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 24/32] Task fork and exit for rdtgroup Fenghua Yu
2016-07-13 13:14   ` Thomas Gleixner
2016-07-13 17:32     ` Yu, Fenghua
2016-07-13 21:02       ` Thomas Gleixner
2016-07-13 21:22         ` Yu, Fenghua
2016-07-13  1:02 ` [PATCH 25/32] x86/intel_rdt_rdtgroup.c: User interface for RDT Fenghua Yu
2016-07-14 12:30   ` Thomas Gleixner
2016-07-13  1:02 ` [PATCH 26/32] x86/intel_rdt_rdtgroup.c: Create info directory Fenghua Yu
2016-07-13  1:03 ` [PATCH 27/32] x86/intel_rdt_rdtgroup.c: Implement rscctrl file system commands Fenghua Yu
2016-07-13  1:03 ` [PATCH 28/32] x86/intel_rdt_rdtgroup.c: Read and write cpus Fenghua Yu
2016-07-13  1:03 ` [PATCH 29/32] x86/intel_rdt_rdtgroup.c: Tasks iterator and write Fenghua Yu
2016-07-13  1:03 ` [PATCH 30/32] x86/intel_rdt_rdtgroup.c: Process schemas input from rscctrl interface Fenghua Yu
2016-07-14  0:41   ` David Carrillo-Cisneros
2016-07-14  6:11     ` Thomas Gleixner
2016-07-14  6:16       ` Yu, Fenghua
2016-07-14  6:32     ` Yu, Fenghua
2016-07-13  1:03 ` [PATCH 31/32] MAINTAINERS: Add maintainer for Intel RDT resource allocation Fenghua Yu
2016-07-13  1:03 ` [PATCH 32/32] x86/Makefile: Build intel_rdt_rdtgroup.c Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1468371785-53231-6-git-send-email-fenghua.yu@intel.com \
    --to=fenghua.yu@intel.com \
    --cc=bp@suse.de \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=h.peter.anvin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=sai.praneeth.prakhya@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vikas.shivappa@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).