linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes
@ 2017-01-06 21:56 Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 01/12] Documentation, x86/cqm: Intel Resource Monitoring Documentation Vikas Shivappa
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

Another attempt for cqm2 series-

Cqm(cache quality monitoring) is part of Intel RDT(resource director
technology) which enables monitoring and controlling of processor shared
resources via MSR interface.

The current upstream cqm(Cache monitoring) has major issues which make
the feature almost unusable which this series tries to fix and also
address Thomas comments on previous versions of the cqm2 patch series to
better document/organize what we are trying to fix.

	Changes in V5
- Based on Peterz feedback, removed the file interface in perf_event
cgroup to start and stop continuous monitoring.
- Based on Andi's feedback and references David has sent a patch optimizing
the perf overhead as a seperate patch which is generic and not cqm
specific.

This is a continuation of patch series David(davidcc@google.com)
previously posted and hence its based on his patches and is also trying
to fix the same issues. Patches apply on 4.10-rc2 

Below are the issues and the fixes we attempt-

- Issue(1): Inaccurate data for per package data, systemwide. Just prints
zeros or arbitrary numbers.

Fix: Patches fix this by just throwing an error if the mode is not supported. 
The modes supported is task monitoring and cgroup monitoring. 
Also the per package
data for say socket x is returned with the -C <cpu on socketx> -G cgrpy option.
The systemwide data can be looked up by monitoring root cgroup.

- Issue(2): RMIDs are global and dont scale with more packages and hence
also run out of RMIDs very soon.

Fix: Support per pkg RMIDs hence scale better with more
packages, and get more RMIDs to use and use when needed (ie when tasks
are actually scheduled on the package).

- Issue(3): Cgroup monitoring is not complete. No hierarchical monitoring
support, inconsistent or wrong data seen when monitoring cgroup.

Fix: cgroup monitoring support added. 
Patch adds full cgroup monitoring support. Can monitor different cgroups
in the same hierarchy together and separately. And can also monitor a
task and the cgroup which the task belongs.

- Issue(4): Lot of inconsistent data is seen currently when we monitor different
kind of events like cgroup and task events *together*.

Fix: Patch adds support to be
able to monitor a cgroup x and as task p1 with in a cgroup x and also
monitor different cgroup and tasks together.

- Issue(5): CAT and cqm/mbm write the same PQR_ASSOC_MSR seperately
Fix: Integrate the sched in code and write the PQR_MSR only once every switch_to 

- Issue(6): RMID recycling leads to inaccurate data and complicates the
code and increases the code foot print. Currently, it almost makes the
feature *unusable* as we only see zeroes and inconsistent data once we
run out of RMIDs in the life time of a systemboot. The only way to get
right numbers is to reboot the system once we run out of RMIDs.

Root cause: Recycling steals an RMID from an existing event x and gives
it to an other event y. However due to the nature of monitoring
llc_occupancy we may miss tracking an unknown(possibly large) part of
cache fills at the time when event does not have RMID. Hence the user
ends up with inaccurate data for both events x and y and the inaccuracy
is arbitrary and cannot be measured.  Even if an event x gets another
RMID very soon after loosing the previous RMID, we still miss all the
occupancy data that was tied to the previous RMID which means we cannot
get accurate data even when for most of the time event has an RMID.
There is no way to guarantee accurate results with recycling and data is
inaccurate by arbitrary degree. The fact that an event can loose an RMID
anytime complicates a lot of code in sched_in, init, count, read. It
also complicates mbm as we may loose the RMID anytime and hence need to
keep a history of all the old counts.

Fix: Recycling is removed based on Tony's idea originally that its
introducing a lot of code, failing to provide accurate data and hence
questionable benefits. Because inspite of several attempts to improve
the recycling there is no way to guarantee accurate data as explained
above and the incorrectness is of arbitrary degree(where we cant say for
    ex: the data is off by x% ). As a fix we introduce per-pkg RMIDs to
mitigate the scarcity of RMIDs to a large extent - this is because RMIDs
are plenty - about 2 to 4 per logical processor/SMT thread on each
package. So on a 2 socket BDW system with say 44 logical processors/SMT
threads we have 176 RMIDs on each package (a total of 2x176 = 352
    RMIDs). Also cgroup is fully supported and hence many threads like
all threads in one VM/container can be grouped which use just one RMID.
The RMIDs scale with the number of sockets. If we still run out of RMIDs
perf read throws an error because we are not able to monitor as we run
out of limited h/w resource.

This may be better unlike recycling(even with a better version than the
one upstream)where the user thinks events are being monitored but they
actually are not monitored for arbitrary amount of time hence resulting
in inaccurate data of arbitrary degree. The inaccurate data defeats the
purpose of RDT whose goal is to provide a consistent system behaviour by
giving the ability to monitor and control processor resources in an
accurate and reliable fashion.  The fix instead helps provide accurate
data and for large extent mitigates the RMID scarcity.

Whats working now (unit tested):
Task monitoring, cgroup hierarchical monitoring, monitor multiple
cgroups, cgroup and task in same cgroup, 
per pkg rmids, error on read.

TBD : 
- Most of MBM is working but will need updates to hierarchical
monitoring and other new feature related changes we introduce. 

Below is a list of patches and what each patch fixes, Each commit
message also gives details on what the patch actually fixes among the
bunch:

[PATCH 02/12] x86/cqm: Remove cqm recycling/conflict handling

Before the patch: Users sees only zeros or wrong data once we run out of
RMIDs.
After: User would see either correct data or an error that we run out of
RMIDs.

[PATCH 03/12] x86/rdt: Add rdt common/cqm compile option
[PATCH 04/12] x86/cqm: Add Per pkg rmid support

Before patch: RMIds are global.
Tests: Available RMIDs increase by x times where x is # of packages.
Adds LAZY RMID alloc - RMIDs are alloced during first sched in

[PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare
[PATCH 06/12] x86/cqm: Add cgroup hierarchical monitoring support
[PATCH 07/12] x86/rdt,cqm: Scheduling support update

Before patch: cgroup monitoring not supported fully.
After: cgroup monitoring is fully supported including hierarchical
monitoring.

[PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup

Before patch: cgroup and task could not be monitored together and would
result in a lot of inconsistent data.
After : Can monitor task and cgroup together and also supports
monitoring a task within a cgroup and the cgroup together.

[PATCH 9/12] x86/cqm: Add RMID reuse

Before patch: Once RMID is used , its never used again.
After: We reuse the RMIDs which are freed. User can specify NOLAZY RMID
allocation and open fails if we fail to get all RMIDs at open.

[PATCH 10/12] perf/core,x86/cqm: Add read for Cgroup events,per pkg
[PATCH 11/12] perf/stat: fix bug in handling events in error state
[PATCH 12/12] perf/stat: revamp read error handling, snapshot and

Patches 1/12 - 9/12 Add all the features but the data is not visible to
the perf/core nor the perf user mode. The 11-12 fix these and make the
data availabe to the perf user mode.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/12] Documentation, x86/cqm: Intel Resource Monitoring Documentation
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 02/12] x86/cqm: Remove cqm recycling/conflict handling Vikas Shivappa
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

Add documentation of usage of cqm and mbm events using perf interface
and examples.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 Documentation/x86/intel_rdt_mon_ui.txt | 62 ++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 Documentation/x86/intel_rdt_mon_ui.txt

diff --git a/Documentation/x86/intel_rdt_mon_ui.txt b/Documentation/x86/intel_rdt_mon_ui.txt
new file mode 100644
index 0000000..881fa58
--- /dev/null
+++ b/Documentation/x86/intel_rdt_mon_ui.txt
@@ -0,0 +1,62 @@
+User Interface for Resource Monitoring in Intel Resource Director Technology
+
+Vikas Shivappa<vikas.shivappa@intel.com>
+David Carrillo-Cisneros<davidcc@google.com>
+Stephane Eranian <eranian@google.com>
+
+This feature is enabled by the CONFIG_INTEL_RDT_M Kconfig and the
+X86 /proc/cpuinfo flag bits cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
+
+Resource Monitoring
+-------------------
+Resource Monitoring includes cqm(cache quality monitoring) and
+mbm(memory bandwidth monitoring) and uses the perf interface. A light
+weight interface to enable monitoring without perf is enabled as well.
+
+CQM provides OS/VMM a way to monitor llc occupancy. It measures the
+amount of L3 cache fills per task or cgroup.
+
+MBM provides OS/VMM a way to monitor bandwidth from one level of cache
+to another. The current patches support L3 external bandwidth
+monitoring. It supports both 'local bandwidth' and 'total bandwidth'
+monitoring for the socket. Local bandwidth measures the amount of data
+sent through the memory controller on the socket and total b/w measures
+the total system bandwidth.
+
+To check the monitoring events enabled:
+
+$ ./tools/perf/perf list | grep -i cqm
+intel_cqm/llc_occupancy/                           [Kernel PMU event]
+intel_cqm/local_bytes/                             [Kernel PMU event]
+intel_cqm/total_bytes/                             [Kernel PMU event]
+
+Monitoring tasks and cgroups using perf
+---------------------------------------
+Monitoring tasks and cgroup is like using any other perf event.
+
+$perf stat -I 1000 -e intel_cqm_llc/local_bytes/ -p PID1
+
+This will monitor the local_bytes event of the PID1 and report once
+every 1000ms
+
+$mkdir /sys/fs/cgroup/perf_event/p1
+$echo PID1 > /sys/fs/cgroup/perf_event/p1/tasks
+$echo PID2 > /sys/fs/cgroup/perf_event/p1/tasks
+
+$perf stat -I 1000 -e intel_cqm_llc/llc_occupancy/ -a -G p1
+
+This will monitor the llc_occupancy event of the perf cgroup p1 in
+interval mode.
+
+Hierarchical monitoring should work just like other events and users can
+also monitor a task with in a cgroup and the cgroup together, or
+different cgroups in the same hierarchy can be monitored together.
+
+The events are associated with RMIDs and are grouped when optimal. The
+RMIDs are limited hardware resources and if runout the events would just
+throw error on read.
+
+To obtain per package data for cgroups(package x) provide any cpu in the
+package as input to -C:
+
+$perf stat -I 1000 -e intel_cqm_llc/llc_occupancy/ -C <cpu_y on package_x> -G p1
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/12] x86/cqm: Remove cqm recycling/conflict handling
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 01/12] Documentation, x86/cqm: Intel Resource Monitoring Documentation Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 03/12] x86/rdt: Add rdt common/cqm compile option Vikas Shivappa
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: David Carrillo-Cisneros <davidcc@google.com>

We only supported global RMIDs till now and not per-pkg RMIDs and hence
the RMIDs would not scale and we would more easily run out of RMIDs.
When we run out of RMIDs we did RMID recycling and ended up in several
issues some of which listed below and complications which may outweigh
benefits.

RMID recycling 'steals' RMID from an event x that is being monitored and
gives it to an event y that is also being monitored.
- This does not guarantee that we get the correct data for both the events
as there is always times when the events are not being monitored. Hence
there could be incorrect data to the user. The extent of error is
arbitrary and unknown as we cannot measure how much of the occupancy we
missed.
- It complicated the usage of RMIDs, reading of data, MBM counting
a lot because when reading the counters we had to keep a history of
previous counts and also make sure that no one has stolen that RMID we
want to use. All this had a lot of code foot print.
- When a process changes RMID it is
no more tracking the old cache fills it had in the old RMID - so we are
not even guaranteeing the data to be correct for the time its monitored
if it had and RMID for almost all of the time.
- There was inconsistent data in the current patch due to the way RMID
was recycled, we ended up stealing RMIDs from monitored events before
using the freed RMIDs in the limbo list. This led to 0 counts as soon as
one runs out of RMIDs in the lifetime of systemboot which almost makes
the feature unusable. Also the state transitions were messy.

This patch removes RMID recycling and just
throw error when there are no more RMIDs to monitor. H/w provides ~4
RMIDs per logical processor on each package and if we use all of them
and use only when the tasks are scheduled on the pkgs,
the problem of scarcity can be mitigated.

The conflict handling is also removed as it resulted in a lot of
inconsistent data when different kinds of events like systemwide,cgroup
and task were monitored together.

David's original patch modified by Vikas<vikas.shivappa@linux.intel.com>
to only remove recycling parts from code and edit commit message.

Tests: The cqm should either gives correct numbers for task events
or just throw error that its run out of RMIDs.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c | 647 ++------------------------------------------
 1 file changed, 30 insertions(+), 617 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 8c00dc0..7c37a25 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -4,6 +4,8 @@
  * Based very, very heavily on work by Peter Zijlstra.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/perf_event.h>
 #include <linux/slab.h>
 #include <asm/cpu_device_id.h>
@@ -18,6 +20,7 @@
  * Guaranteed time in ms as per SDM where MBM counters will not overflow.
  */
 #define MBM_CTR_OVERFLOW_TIME	1000
+#define RMID_DEFAULT_QUEUE_TIME	250
 
 static u32 cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
@@ -91,15 +94,6 @@ struct sample {
 #define QOS_MBM_TOTAL_EVENT_ID	0x02
 #define QOS_MBM_LOCAL_EVENT_ID	0x03
 
-/*
- * This is central to the rotation algorithm in __intel_cqm_rmid_rotate().
- *
- * This rmid is always free and is guaranteed to have an associated
- * near-zero occupancy value, i.e. no cachelines are tagged with this
- * RMID, once __intel_cqm_rmid_rotate() returns.
- */
-static u32 intel_cqm_rotation_rmid;
-
 #define INVALID_RMID		(-1)
 
 /*
@@ -112,7 +106,7 @@ struct sample {
  */
 static inline bool __rmid_valid(u32 rmid)
 {
-	if (!rmid || rmid == INVALID_RMID)
+	if (!rmid || rmid > cqm_max_rmid)
 		return false;
 
 	return true;
@@ -137,8 +131,7 @@ static u64 __rmid_read(u32 rmid)
 }
 
 enum rmid_recycle_state {
-	RMID_YOUNG = 0,
-	RMID_AVAILABLE,
+	RMID_AVAILABLE = 0,
 	RMID_DIRTY,
 };
 
@@ -228,7 +221,7 @@ static void __put_rmid(u32 rmid)
 	entry = __rmid_entry(rmid);
 
 	entry->queue_time = jiffies;
-	entry->state = RMID_YOUNG;
+	entry->state = RMID_DIRTY;
 
 	list_add_tail(&entry->list, &cqm_rmid_limbo_lru);
 }
@@ -281,10 +274,6 @@ static int intel_cqm_setup_rmid_cache(void)
 	entry = __rmid_entry(0);
 	list_del(&entry->list);
 
-	mutex_lock(&cache_mutex);
-	intel_cqm_rotation_rmid = __get_rmid();
-	mutex_unlock(&cache_mutex);
-
 	return 0;
 
 fail:
@@ -343,92 +332,6 @@ static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event)
 }
 #endif
 
-/*
- * Determine if @a's tasks intersect with @b's tasks
- *
- * There are combinations of events that we explicitly prohibit,
- *
- *		   PROHIBITS
- *     system-wide    -> 	cgroup and task
- *     cgroup 	      ->	system-wide
- *     		      ->	task in cgroup
- *     task 	      -> 	system-wide
- *     		      ->	task in cgroup
- *
- * Call this function before allocating an RMID.
- */
-static bool __conflict_event(struct perf_event *a, struct perf_event *b)
-{
-#ifdef CONFIG_CGROUP_PERF
-	/*
-	 * We can have any number of cgroups but only one system-wide
-	 * event at a time.
-	 */
-	if (a->cgrp && b->cgrp) {
-		struct perf_cgroup *ac = a->cgrp;
-		struct perf_cgroup *bc = b->cgrp;
-
-		/*
-		 * This condition should have been caught in
-		 * __match_event() and we should be sharing an RMID.
-		 */
-		WARN_ON_ONCE(ac == bc);
-
-		if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) ||
-		    cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup))
-			return true;
-
-		return false;
-	}
-
-	if (a->cgrp || b->cgrp) {
-		struct perf_cgroup *ac, *bc;
-
-		/*
-		 * cgroup and system-wide events are mutually exclusive
-		 */
-		if ((a->cgrp && !(b->attach_state & PERF_ATTACH_TASK)) ||
-		    (b->cgrp && !(a->attach_state & PERF_ATTACH_TASK)))
-			return true;
-
-		/*
-		 * Ensure neither event is part of the other's cgroup
-		 */
-		ac = event_to_cgroup(a);
-		bc = event_to_cgroup(b);
-		if (ac == bc)
-			return true;
-
-		/*
-		 * Must have cgroup and non-intersecting task events.
-		 */
-		if (!ac || !bc)
-			return false;
-
-		/*
-		 * We have cgroup and task events, and the task belongs
-		 * to a cgroup. Check for for overlap.
-		 */
-		if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) ||
-		    cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup))
-			return true;
-
-		return false;
-	}
-#endif
-	/*
-	 * If one of them is not a task, same story as above with cgroups.
-	 */
-	if (!(a->attach_state & PERF_ATTACH_TASK) ||
-	    !(b->attach_state & PERF_ATTACH_TASK))
-		return true;
-
-	/*
-	 * Must be non-overlapping.
-	 */
-	return false;
-}
-
 struct rmid_read {
 	u32 rmid;
 	u32 evt_type;
@@ -458,461 +361,14 @@ static void cqm_mask_call(struct rmid_read *rr)
 }
 
 /*
- * Exchange the RMID of a group of events.
- */
-static u32 intel_cqm_xchg_rmid(struct perf_event *group, u32 rmid)
-{
-	struct perf_event *event;
-	struct list_head *head = &group->hw.cqm_group_entry;
-	u32 old_rmid = group->hw.cqm_rmid;
-
-	lockdep_assert_held(&cache_mutex);
-
-	/*
-	 * If our RMID is being deallocated, perform a read now.
-	 */
-	if (__rmid_valid(old_rmid) && !__rmid_valid(rmid)) {
-		struct rmid_read rr = {
-			.rmid = old_rmid,
-			.evt_type = group->attr.config,
-			.value = ATOMIC64_INIT(0),
-		};
-
-		cqm_mask_call(&rr);
-		local64_set(&group->count, atomic64_read(&rr.value));
-	}
-
-	raw_spin_lock_irq(&cache_lock);
-
-	group->hw.cqm_rmid = rmid;
-	list_for_each_entry(event, head, hw.cqm_group_entry)
-		event->hw.cqm_rmid = rmid;
-
-	raw_spin_unlock_irq(&cache_lock);
-
-	/*
-	 * If the allocation is for mbm, init the mbm stats.
-	 * Need to check if each event in the group is mbm event
-	 * because there could be multiple type of events in the same group.
-	 */
-	if (__rmid_valid(rmid)) {
-		event = group;
-		if (is_mbm_event(event->attr.config))
-			init_mbm_sample(rmid, event->attr.config);
-
-		list_for_each_entry(event, head, hw.cqm_group_entry) {
-			if (is_mbm_event(event->attr.config))
-				init_mbm_sample(rmid, event->attr.config);
-		}
-	}
-
-	return old_rmid;
-}
-
-/*
- * If we fail to assign a new RMID for intel_cqm_rotation_rmid because
- * cachelines are still tagged with RMIDs in limbo, we progressively
- * increment the threshold until we find an RMID in limbo with <=
- * __intel_cqm_threshold lines tagged. This is designed to mitigate the
- * problem where cachelines tagged with an RMID are not steadily being
- * evicted.
- *
- * On successful rotations we decrease the threshold back towards zero.
- *
  * __intel_cqm_max_threshold provides an upper bound on the threshold,
  * and is measured in bytes because it's exposed to userland.
  */
 static unsigned int __intel_cqm_threshold;
 static unsigned int __intel_cqm_max_threshold;
 
-/*
- * Test whether an RMID has a zero occupancy value on this cpu.
- */
-static void intel_cqm_stable(void *arg)
-{
-	struct cqm_rmid_entry *entry;
-
-	list_for_each_entry(entry, &cqm_rmid_limbo_lru, list) {
-		if (entry->state != RMID_AVAILABLE)
-			break;
-
-		if (__rmid_read(entry->rmid) > __intel_cqm_threshold)
-			entry->state = RMID_DIRTY;
-	}
-}
-
-/*
- * If we have group events waiting for an RMID that don't conflict with
- * events already running, assign @rmid.
- */
-static bool intel_cqm_sched_in_event(u32 rmid)
-{
-	struct perf_event *leader, *event;
-
-	lockdep_assert_held(&cache_mutex);
-
-	leader = list_first_entry(&cache_groups, struct perf_event,
-				  hw.cqm_groups_entry);
-	event = leader;
-
-	list_for_each_entry_continue(event, &cache_groups,
-				     hw.cqm_groups_entry) {
-		if (__rmid_valid(event->hw.cqm_rmid))
-			continue;
-
-		if (__conflict_event(event, leader))
-			continue;
-
-		intel_cqm_xchg_rmid(event, rmid);
-		return true;
-	}
-
-	return false;
-}
-
-/*
- * Initially use this constant for both the limbo queue time and the
- * rotation timer interval, pmu::hrtimer_interval_ms.
- *
- * They don't need to be the same, but the two are related since if you
- * rotate faster than you recycle RMIDs, you may run out of available
- * RMIDs.
- */
-#define RMID_DEFAULT_QUEUE_TIME 250	/* ms */
-
-static unsigned int __rmid_queue_time_ms = RMID_DEFAULT_QUEUE_TIME;
-
-/*
- * intel_cqm_rmid_stabilize - move RMIDs from limbo to free list
- * @nr_available: number of freeable RMIDs on the limbo list
- *
- * Quiescent state; wait for all 'freed' RMIDs to become unused, i.e. no
- * cachelines are tagged with those RMIDs. After this we can reuse them
- * and know that the current set of active RMIDs is stable.
- *
- * Return %true or %false depending on whether stabilization needs to be
- * reattempted.
- *
- * If we return %true then @nr_available is updated to indicate the
- * number of RMIDs on the limbo list that have been queued for the
- * minimum queue time (RMID_AVAILABLE), but whose data occupancy values
- * are above __intel_cqm_threshold.
- */
-static bool intel_cqm_rmid_stabilize(unsigned int *available)
-{
-	struct cqm_rmid_entry *entry, *tmp;
-
-	lockdep_assert_held(&cache_mutex);
-
-	*available = 0;
-	list_for_each_entry(entry, &cqm_rmid_limbo_lru, list) {
-		unsigned long min_queue_time;
-		unsigned long now = jiffies;
-
-		/*
-		 * We hold RMIDs placed into limbo for a minimum queue
-		 * time. Before the minimum queue time has elapsed we do
-		 * not recycle RMIDs.
-		 *
-		 * The reasoning is that until a sufficient time has
-		 * passed since we stopped using an RMID, any RMID
-		 * placed onto the limbo list will likely still have
-		 * data tagged in the cache, which means we'll probably
-		 * fail to recycle it anyway.
-		 *
-		 * We can save ourselves an expensive IPI by skipping
-		 * any RMIDs that have not been queued for the minimum
-		 * time.
-		 */
-		min_queue_time = entry->queue_time +
-			msecs_to_jiffies(__rmid_queue_time_ms);
-
-		if (time_after(min_queue_time, now))
-			break;
-
-		entry->state = RMID_AVAILABLE;
-		(*available)++;
-	}
-
-	/*
-	 * Fast return if none of the RMIDs on the limbo list have been
-	 * sitting on the queue for the minimum queue time.
-	 */
-	if (!*available)
-		return false;
-
-	/*
-	 * Test whether an RMID is free for each package.
-	 */
-	on_each_cpu_mask(&cqm_cpumask, intel_cqm_stable, NULL, true);
-
-	list_for_each_entry_safe(entry, tmp, &cqm_rmid_limbo_lru, list) {
-		/*
-		 * Exhausted all RMIDs that have waited min queue time.
-		 */
-		if (entry->state == RMID_YOUNG)
-			break;
-
-		if (entry->state == RMID_DIRTY)
-			continue;
-
-		list_del(&entry->list);	/* remove from limbo */
-
-		/*
-		 * The rotation RMID gets priority if it's
-		 * currently invalid. In which case, skip adding
-		 * the RMID to the the free lru.
-		 */
-		if (!__rmid_valid(intel_cqm_rotation_rmid)) {
-			intel_cqm_rotation_rmid = entry->rmid;
-			continue;
-		}
-
-		/*
-		 * If we have groups waiting for RMIDs, hand
-		 * them one now provided they don't conflict.
-		 */
-		if (intel_cqm_sched_in_event(entry->rmid))
-			continue;
-
-		/*
-		 * Otherwise place it onto the free list.
-		 */
-		list_add_tail(&entry->list, &cqm_rmid_free_lru);
-	}
-
-
-	return __rmid_valid(intel_cqm_rotation_rmid);
-}
-
-/*
- * Pick a victim group and move it to the tail of the group list.
- * @next: The first group without an RMID
- */
-static void __intel_cqm_pick_and_rotate(struct perf_event *next)
-{
-	struct perf_event *rotor;
-	u32 rmid;
-
-	lockdep_assert_held(&cache_mutex);
-
-	rotor = list_first_entry(&cache_groups, struct perf_event,
-				 hw.cqm_groups_entry);
-
-	/*
-	 * The group at the front of the list should always have a valid
-	 * RMID. If it doesn't then no groups have RMIDs assigned and we
-	 * don't need to rotate the list.
-	 */
-	if (next == rotor)
-		return;
-
-	rmid = intel_cqm_xchg_rmid(rotor, INVALID_RMID);
-	__put_rmid(rmid);
-
-	list_rotate_left(&cache_groups);
-}
-
-/*
- * Deallocate the RMIDs from any events that conflict with @event, and
- * place them on the back of the group list.
- */
-static void intel_cqm_sched_out_conflicting_events(struct perf_event *event)
-{
-	struct perf_event *group, *g;
-	u32 rmid;
-
-	lockdep_assert_held(&cache_mutex);
-
-	list_for_each_entry_safe(group, g, &cache_groups, hw.cqm_groups_entry) {
-		if (group == event)
-			continue;
-
-		rmid = group->hw.cqm_rmid;
-
-		/*
-		 * Skip events that don't have a valid RMID.
-		 */
-		if (!__rmid_valid(rmid))
-			continue;
-
-		/*
-		 * No conflict? No problem! Leave the event alone.
-		 */
-		if (!__conflict_event(group, event))
-			continue;
-
-		intel_cqm_xchg_rmid(group, INVALID_RMID);
-		__put_rmid(rmid);
-	}
-}
-
-/*
- * Attempt to rotate the groups and assign new RMIDs.
- *
- * We rotate for two reasons,
- *   1. To handle the scheduling of conflicting events
- *   2. To recycle RMIDs
- *
- * Rotating RMIDs is complicated because the hardware doesn't give us
- * any clues.
- *
- * There's problems with the hardware interface; when you change the
- * task:RMID map cachelines retain their 'old' tags, giving a skewed
- * picture. In order to work around this, we must always keep one free
- * RMID - intel_cqm_rotation_rmid.
- *
- * Rotation works by taking away an RMID from a group (the old RMID),
- * and assigning the free RMID to another group (the new RMID). We must
- * then wait for the old RMID to not be used (no cachelines tagged).
- * This ensure that all cachelines are tagged with 'active' RMIDs. At
- * this point we can start reading values for the new RMID and treat the
- * old RMID as the free RMID for the next rotation.
- *
- * Return %true or %false depending on whether we did any rotating.
- */
-static bool __intel_cqm_rmid_rotate(void)
-{
-	struct perf_event *group, *start = NULL;
-	unsigned int threshold_limit;
-	unsigned int nr_needed = 0;
-	unsigned int nr_available;
-	bool rotated = false;
-
-	mutex_lock(&cache_mutex);
-
-again:
-	/*
-	 * Fast path through this function if there are no groups and no
-	 * RMIDs that need cleaning.
-	 */
-	if (list_empty(&cache_groups) && list_empty(&cqm_rmid_limbo_lru))
-		goto out;
-
-	list_for_each_entry(group, &cache_groups, hw.cqm_groups_entry) {
-		if (!__rmid_valid(group->hw.cqm_rmid)) {
-			if (!start)
-				start = group;
-			nr_needed++;
-		}
-	}
-
-	/*
-	 * We have some event groups, but they all have RMIDs assigned
-	 * and no RMIDs need cleaning.
-	 */
-	if (!nr_needed && list_empty(&cqm_rmid_limbo_lru))
-		goto out;
-
-	if (!nr_needed)
-		goto stabilize;
-
-	/*
-	 * We have more event groups without RMIDs than available RMIDs,
-	 * or we have event groups that conflict with the ones currently
-	 * scheduled.
-	 *
-	 * We force deallocate the rmid of the group at the head of
-	 * cache_groups. The first event group without an RMID then gets
-	 * assigned intel_cqm_rotation_rmid. This ensures we always make
-	 * forward progress.
-	 *
-	 * Rotate the cache_groups list so the previous head is now the
-	 * tail.
-	 */
-	__intel_cqm_pick_and_rotate(start);
-
-	/*
-	 * If the rotation is going to succeed, reduce the threshold so
-	 * that we don't needlessly reuse dirty RMIDs.
-	 */
-	if (__rmid_valid(intel_cqm_rotation_rmid)) {
-		intel_cqm_xchg_rmid(start, intel_cqm_rotation_rmid);
-		intel_cqm_rotation_rmid = __get_rmid();
-
-		intel_cqm_sched_out_conflicting_events(start);
-
-		if (__intel_cqm_threshold)
-			__intel_cqm_threshold--;
-	}
-
-	rotated = true;
-
-stabilize:
-	/*
-	 * We now need to stablize the RMID we freed above (if any) to
-	 * ensure that the next time we rotate we have an RMID with zero
-	 * occupancy value.
-	 *
-	 * Alternatively, if we didn't need to perform any rotation,
-	 * we'll have a bunch of RMIDs in limbo that need stabilizing.
-	 */
-	threshold_limit = __intel_cqm_max_threshold / cqm_l3_scale;
-
-	while (intel_cqm_rmid_stabilize(&nr_available) &&
-	       __intel_cqm_threshold < threshold_limit) {
-		unsigned int steal_limit;
-
-		/*
-		 * Don't spin if nobody is actively waiting for an RMID,
-		 * the rotation worker will be kicked as soon as an
-		 * event needs an RMID anyway.
-		 */
-		if (!nr_needed)
-			break;
-
-		/* Allow max 25% of RMIDs to be in limbo. */
-		steal_limit = (cqm_max_rmid + 1) / 4;
-
-		/*
-		 * We failed to stabilize any RMIDs so our rotation
-		 * logic is now stuck. In order to make forward progress
-		 * we have a few options:
-		 *
-		 *   1. rotate ("steal") another RMID
-		 *   2. increase the threshold
-		 *   3. do nothing
-		 *
-		 * We do both of 1. and 2. until we hit the steal limit.
-		 *
-		 * The steal limit prevents all RMIDs ending up on the
-		 * limbo list. This can happen if every RMID has a
-		 * non-zero occupancy above threshold_limit, and the
-		 * occupancy values aren't dropping fast enough.
-		 *
-		 * Note that there is prioritisation at work here - we'd
-		 * rather increase the number of RMIDs on the limbo list
-		 * than increase the threshold, because increasing the
-		 * threshold skews the event data (because we reuse
-		 * dirty RMIDs) - threshold bumps are a last resort.
-		 */
-		if (nr_available < steal_limit)
-			goto again;
-
-		__intel_cqm_threshold++;
-	}
-
-out:
-	mutex_unlock(&cache_mutex);
-	return rotated;
-}
-
-static void intel_cqm_rmid_rotate(struct work_struct *work);
-
-static DECLARE_DELAYED_WORK(intel_cqm_rmid_work, intel_cqm_rmid_rotate);
-
 static struct pmu intel_cqm_pmu;
 
-static void intel_cqm_rmid_rotate(struct work_struct *work)
-{
-	unsigned long delay;
-
-	__intel_cqm_rmid_rotate();
-
-	delay = msecs_to_jiffies(intel_cqm_pmu.hrtimer_interval_ms);
-	schedule_delayed_work(&intel_cqm_rmid_work, delay);
-}
-
 static u64 update_sample(unsigned int rmid, u32 evt_type, int first)
 {
 	struct sample *mbm_current;
@@ -984,11 +440,10 @@ static void init_mbm_sample(u32 rmid, u32 evt_type)
  *
  * If we're part of a group, we use the group's RMID.
  */
-static void intel_cqm_setup_event(struct perf_event *event,
+static int intel_cqm_setup_event(struct perf_event *event,
 				  struct perf_event **group)
 {
 	struct perf_event *iter;
-	bool conflict = false;
 	u32 rmid;
 
 	event->hw.is_group_event = false;
@@ -1001,26 +456,24 @@ static void intel_cqm_setup_event(struct perf_event *event,
 			*group = iter;
 			if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
 				init_mbm_sample(rmid, event->attr.config);
-			return;
+			return 0;
 		}
 
-		/*
-		 * We only care about conflicts for events that are
-		 * actually scheduled in (and hence have a valid RMID).
-		 */
-		if (__conflict_event(iter, event) && __rmid_valid(rmid))
-			conflict = true;
 	}
 
-	if (conflict)
-		rmid = INVALID_RMID;
-	else
-		rmid = __get_rmid();
+	rmid = __get_rmid();
+
+	if (!__rmid_valid(rmid)) {
+		pr_info("out of RMIDs\n");
+		return -EINVAL;
+	}
 
 	if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
 		init_mbm_sample(rmid, event->attr.config);
 
 	event->hw.cqm_rmid = rmid;
+
+	return 0;
 }
 
 static void intel_cqm_event_read(struct perf_event *event)
@@ -1166,7 +619,6 @@ static void mbm_hrtimer_init(void)
 
 static u64 intel_cqm_event_count(struct perf_event *event)
 {
-	unsigned long flags;
 	struct rmid_read rr = {
 		.evt_type = event->attr.config,
 		.value = ATOMIC64_INIT(0),
@@ -1206,24 +658,11 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 	 * Notice that we don't perform the reading of an RMID
 	 * atomically, because we can't hold a spin lock across the
 	 * IPIs.
-	 *
-	 * Speculatively perform the read, since @event might be
-	 * assigned a different (possibly invalid) RMID while we're
-	 * busying performing the IPI calls. It's therefore necessary to
-	 * check @event's RMID afterwards, and if it has changed,
-	 * discard the result of the read.
 	 */
 	rr.rmid = ACCESS_ONCE(event->hw.cqm_rmid);
-
-	if (!__rmid_valid(rr.rmid))
-		goto out;
-
 	cqm_mask_call(&rr);
+	local64_set(&event->count, atomic64_read(&rr.value));
 
-	raw_spin_lock_irqsave(&cache_lock, flags);
-	if (event->hw.cqm_rmid == rr.rmid)
-		local64_set(&event->count, atomic64_read(&rr.value));
-	raw_spin_unlock_irqrestore(&cache_lock, flags);
 out:
 	return __perf_event_count(event);
 }
@@ -1238,34 +677,16 @@ static void intel_cqm_event_start(struct perf_event *event, int mode)
 
 	event->hw.cqm_state &= ~PERF_HES_STOPPED;
 
-	if (state->rmid_usecnt++) {
-		if (!WARN_ON_ONCE(state->rmid != rmid))
-			return;
-	} else {
-		WARN_ON_ONCE(state->rmid);
-	}
-
 	state->rmid = rmid;
 	wrmsr(MSR_IA32_PQR_ASSOC, rmid, state->closid);
 }
 
 static void intel_cqm_event_stop(struct perf_event *event, int mode)
 {
-	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-
 	if (event->hw.cqm_state & PERF_HES_STOPPED)
 		return;
 
 	event->hw.cqm_state |= PERF_HES_STOPPED;
-
-	intel_cqm_event_read(event);
-
-	if (!--state->rmid_usecnt) {
-		state->rmid = 0;
-		wrmsr(MSR_IA32_PQR_ASSOC, 0, state->closid);
-	} else {
-		WARN_ON_ONCE(!state->rmid);
-	}
 }
 
 static int intel_cqm_event_add(struct perf_event *event, int mode)
@@ -1342,8 +763,8 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 static int intel_cqm_event_init(struct perf_event *event)
 {
 	struct perf_event *group = NULL;
-	bool rotate = false;
 	unsigned long flags;
+	int ret = 0;
 
 	if (event->attr.type != intel_cqm_pmu.type)
 		return -ENOENT;
@@ -1373,46 +794,36 @@ static int intel_cqm_event_init(struct perf_event *event)
 
 	mutex_lock(&cache_mutex);
 
+	/* Will also set rmid, return error on RMID not being available*/
+	if (intel_cqm_setup_event(event, &group)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	/*
 	 * Start the mbm overflow timers when the first event is created.
 	*/
 	if (mbm_enabled && list_empty(&cache_groups))
 		mbm_start_timers();
 
-	/* Will also set rmid */
-	intel_cqm_setup_event(event, &group);
-
 	/*
 	* Hold the cache_lock as mbm timer handlers be
 	* scanning the list of events.
 	*/
 	raw_spin_lock_irqsave(&cache_lock, flags);
 
-	if (group) {
+	if (group)
 		list_add_tail(&event->hw.cqm_group_entry,
 			      &group->hw.cqm_group_entry);
-	} else {
+	else
 		list_add_tail(&event->hw.cqm_groups_entry,
 			      &cache_groups);
 
-		/*
-		 * All RMIDs are either in use or have recently been
-		 * used. Kick the rotation worker to clean/free some.
-		 *
-		 * We only do this for the group leader, rather than for
-		 * every event in a group to save on needless work.
-		 */
-		if (!__rmid_valid(event->hw.cqm_rmid))
-			rotate = true;
-	}
-
 	raw_spin_unlock_irqrestore(&cache_lock, flags);
+out:
 	mutex_unlock(&cache_mutex);
 
-	if (rotate)
-		schedule_delayed_work(&intel_cqm_rmid_work, 0);
-
-	return 0;
+	return ret;
 }
 
 EVENT_ATTR_STR(llc_occupancy, intel_cqm_llc, "event=0x01");
@@ -1706,6 +1117,8 @@ static int __init intel_cqm_init(void)
 	__intel_cqm_max_threshold =
 		boot_cpu_data.x86_cache_size * 1024 / (cqm_max_rmid + 1);
 
+	__intel_cqm_threshold = __intel_cqm_max_threshold / cqm_l3_scale;
+
 	snprintf(scale, sizeof(scale), "%u", cqm_l3_scale);
 	str = kstrdup(scale, GFP_KERNEL);
 	if (!str) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/12] x86/rdt: Add rdt common/cqm compile option
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 01/12] Documentation, x86/cqm: Intel Resource Monitoring Documentation Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 02/12] x86/cqm: Remove cqm recycling/conflict handling Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 04/12] x86/cqm: Add Per pkg rmid support Vikas Shivappa
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

Add a compile option INTEL_RDT which enables common code for all
RDT(Resource director technology) and a specific INTEL_RDT_M which
enables code for RDT monitoring. CQM(cache quality monitoring) and
mbm(memory b/w monitoring) are part of Intel RDT monitoring.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>

Conflicts:
	arch/x86/Kconfig
---
 arch/x86/Kconfig               | 17 +++++++++++++++++
 arch/x86/events/intel/Makefile |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e487493..b2f4b24 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -412,11 +412,28 @@ config GOLDFISH
        def_bool y
        depends on X86_GOLDFISH
 
+config INTEL_RDT
+	bool
+
+config INTEL_RDT_M
+	bool "Intel Resource Director Technology Monitoring support"
+	default n
+	depends on X86 && CPU_SUP_INTEL
+	select INTEL_RDT
+	help
+	  Select to enable resource monitoring which is a sub-feature of
+	  Intel Resource Director Technology(RDT). More information about
+	  RDT can be found in the Intel x86 Architecture Software
+	  Developer Manual.
+
+	  Say N if unsure.
+
 config INTEL_RDT_A
 	bool "Intel Resource Director Technology Allocation support"
 	default n
 	depends on X86 && CPU_SUP_INTEL
 	select KERNFS
+	select INTEL_RDT
 	help
 	  Select to enable resource allocation which is a sub-feature of
 	  Intel Resource Director Technology(RDT). More information about
diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index 06c2baa..2e002a5 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -1,4 +1,5 @@
-obj-$(CONFIG_CPU_SUP_INTEL)		+= core.o bts.o cqm.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= core.o bts.o
+obj-$(CONFIG_INTEL_RDT_M)		+= cqm.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= lbr.o p4.o p6.o pt.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)	+= intel-rapl-perf.o
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/12] x86/cqm: Add Per pkg rmid support
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (2 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 03/12] x86/rdt: Add rdt common/cqm compile option Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare Vikas Shivappa
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

The RMID is currently global and this extends it to per pkg rmid. The
h/w provides a set of RMIDs on each package and the same task can hence
be associated with different RMIDs on each package.

Patch introduces a new cqm_pkgs_data to keep track of the per package
free list, limbo list and other locking structures.
The corresponding rmid structures in the perf_event is
changed to hold an array of u32 RMIDs instead of a single u32.

The RMIDs are not assigned at the time of event creation and are
assigned in lazy mode at the first sched_in time for a task, thereby
rmid is never allocated if a task is not scheduled on a package. This
helps better usage of RMIDs and its scales with the increasing
sockets/packages.

Locking:
event list - perf init and terminate hold mutex. spin lock is held to
gaurd from the mbm hrtimer.
per pkg free and limbo list - global spin lock. Used by
get_rmid,put_rmid, perf start, terminate

Tests: RMIDs available increase by x times where x is number of sockets
and the usage is dynamic so we save more.

Patch is based on David Carrillo-Cisneros <davidcc@google.com> patches
in cqm2 series.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c | 340 ++++++++++++++++++++++++--------------------
 arch/x86/events/intel/cqm.h |  37 +++++
 include/linux/perf_event.h  |   2 +-
 3 files changed, 226 insertions(+), 153 deletions(-)
 create mode 100644 arch/x86/events/intel/cqm.h

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 7c37a25..68fd1da 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -11,6 +11,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/intel_rdt_common.h>
 #include "../perf_event.h"
+#include "cqm.h"
 
 #define MSR_IA32_QM_CTR		0x0c8e
 #define MSR_IA32_QM_EVTSEL	0x0c8d
@@ -25,7 +26,7 @@
 static u32 cqm_max_rmid = -1;
 static unsigned int cqm_l3_scale; /* supposedly cacheline size */
 static bool cqm_enabled, mbm_enabled;
-unsigned int mbm_socket_max;
+unsigned int cqm_socket_max;
 
 /*
  * The cached intel_pqr_state is strictly per CPU and can never be
@@ -83,6 +84,8 @@ struct sample {
  */
 static cpumask_t cqm_cpumask;
 
+struct pkg_data **cqm_pkgs_data;
+
 #define RMID_VAL_ERROR		(1ULL << 63)
 #define RMID_VAL_UNAVAIL	(1ULL << 62)
 
@@ -142,50 +145,11 @@ struct cqm_rmid_entry {
 	unsigned long queue_time;
 };
 
-/*
- * cqm_rmid_free_lru - A least recently used list of RMIDs.
- *
- * Oldest entry at the head, newest (most recently used) entry at the
- * tail. This list is never traversed, it's only used to keep track of
- * the lru order. That is, we only pick entries of the head or insert
- * them on the tail.
- *
- * All entries on the list are 'free', and their RMIDs are not currently
- * in use. To mark an RMID as in use, remove its entry from the lru
- * list.
- *
- *
- * cqm_rmid_limbo_lru - list of currently unused but (potentially) dirty RMIDs.
- *
- * This list is contains RMIDs that no one is currently using but that
- * may have a non-zero occupancy value associated with them. The
- * rotation worker moves RMIDs from the limbo list to the free list once
- * the occupancy value drops below __intel_cqm_threshold.
- *
- * Both lists are protected by cache_mutex.
- */
-static LIST_HEAD(cqm_rmid_free_lru);
-static LIST_HEAD(cqm_rmid_limbo_lru);
-
-/*
- * We use a simple array of pointers so that we can lookup a struct
- * cqm_rmid_entry in O(1). This alleviates the callers of __get_rmid()
- * and __put_rmid() from having to worry about dealing with struct
- * cqm_rmid_entry - they just deal with rmids, i.e. integers.
- *
- * Once this array is initialized it is read-only. No locks are required
- * to access it.
- *
- * All entries for all RMIDs can be looked up in the this array at all
- * times.
- */
-static struct cqm_rmid_entry **cqm_rmid_ptrs;
-
-static inline struct cqm_rmid_entry *__rmid_entry(u32 rmid)
+static inline struct cqm_rmid_entry *__rmid_entry(u32 rmid, int domain)
 {
 	struct cqm_rmid_entry *entry;
 
-	entry = cqm_rmid_ptrs[rmid];
+	entry = &cqm_pkgs_data[domain]->cqm_rmid_ptrs[rmid];
 	WARN_ON(entry->rmid != rmid);
 
 	return entry;
@@ -196,91 +160,56 @@ static inline struct cqm_rmid_entry *__rmid_entry(u32 rmid)
  *
  * We expect to be called with cache_mutex held.
  */
-static u32 __get_rmid(void)
+static u32 __get_rmid(int domain)
 {
+	struct list_head *cqm_flist;
 	struct cqm_rmid_entry *entry;
 
-	lockdep_assert_held(&cache_mutex);
+	lockdep_assert_held(&cache_lock);
 
-	if (list_empty(&cqm_rmid_free_lru))
+	cqm_flist = &cqm_pkgs_data[domain]->cqm_rmid_free_lru;
+
+	if (list_empty(cqm_flist))
 		return INVALID_RMID;
 
-	entry = list_first_entry(&cqm_rmid_free_lru, struct cqm_rmid_entry, list);
+	entry = list_first_entry(cqm_flist, struct cqm_rmid_entry, list);
 	list_del(&entry->list);
 
 	return entry->rmid;
 }
 
-static void __put_rmid(u32 rmid)
+static void __put_rmid(u32 rmid, int domain)
 {
 	struct cqm_rmid_entry *entry;
 
-	lockdep_assert_held(&cache_mutex);
+	lockdep_assert_held(&cache_lock);
 
-	WARN_ON(!__rmid_valid(rmid));
-	entry = __rmid_entry(rmid);
+	WARN_ON(!rmid);
+	entry = __rmid_entry(rmid, domain);
 
 	entry->queue_time = jiffies;
 	entry->state = RMID_DIRTY;
 
-	list_add_tail(&entry->list, &cqm_rmid_limbo_lru);
+	list_add_tail(&entry->list, &cqm_pkgs_data[domain]->cqm_rmid_limbo_lru);
 }
 
 static void cqm_cleanup(void)
 {
 	int i;
 
-	if (!cqm_rmid_ptrs)
+	if (!cqm_pkgs_data)
 		return;
 
-	for (i = 0; i < cqm_max_rmid; i++)
-		kfree(cqm_rmid_ptrs[i]);
-
-	kfree(cqm_rmid_ptrs);
-	cqm_rmid_ptrs = NULL;
-	cqm_enabled = false;
-}
-
-static int intel_cqm_setup_rmid_cache(void)
-{
-	struct cqm_rmid_entry *entry;
-	unsigned int nr_rmids;
-	int r = 0;
-
-	nr_rmids = cqm_max_rmid + 1;
-	cqm_rmid_ptrs = kzalloc(sizeof(struct cqm_rmid_entry *) *
-				nr_rmids, GFP_KERNEL);
-	if (!cqm_rmid_ptrs)
-		return -ENOMEM;
-
-	for (; r <= cqm_max_rmid; r++) {
-		struct cqm_rmid_entry *entry;
-
-		entry = kmalloc(sizeof(*entry), GFP_KERNEL);
-		if (!entry)
-			goto fail;
-
-		INIT_LIST_HEAD(&entry->list);
-		entry->rmid = r;
-		cqm_rmid_ptrs[r] = entry;
-
-		list_add_tail(&entry->list, &cqm_rmid_free_lru);
+	for (i = 0; i < cqm_socket_max; i++) {
+		if (cqm_pkgs_data[i]) {
+			kfree(cqm_pkgs_data[i]->cqm_rmid_ptrs);
+			kfree(cqm_pkgs_data[i]);
+		}
 	}
-
-	/*
-	 * RMID 0 is special and is always allocated. It's used for all
-	 * tasks that are not monitored.
-	 */
-	entry = __rmid_entry(0);
-	list_del(&entry->list);
-
-	return 0;
-
-fail:
-	cqm_cleanup();
-	return -ENOMEM;
+	kfree(cqm_pkgs_data);
 }
 
+
 /*
  * Determine if @a and @b measure the same set of tasks.
  *
@@ -333,13 +262,13 @@ static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event)
 #endif
 
 struct rmid_read {
-	u32 rmid;
+	u32 *rmid;
 	u32 evt_type;
 	atomic64_t value;
 };
 
 static void __intel_cqm_event_count(void *info);
-static void init_mbm_sample(u32 rmid, u32 evt_type);
+static void init_mbm_sample(u32 *rmid, u32 evt_type);
 static void __intel_mbm_event_count(void *info);
 
 static bool is_cqm_event(int e)
@@ -420,10 +349,11 @@ static void __intel_mbm_event_init(void *info)
 {
 	struct rmid_read *rr = info;
 
-	update_sample(rr->rmid, rr->evt_type, 1);
+	if (__rmid_valid(rr->rmid[pkg_id]))
+		update_sample(rr->rmid[pkg_id], rr->evt_type, 1);
 }
 
-static void init_mbm_sample(u32 rmid, u32 evt_type)
+static void init_mbm_sample(u32 *rmid, u32 evt_type)
 {
 	struct rmid_read rr = {
 		.rmid = rmid,
@@ -444,7 +374,7 @@ static int intel_cqm_setup_event(struct perf_event *event,
 				  struct perf_event **group)
 {
 	struct perf_event *iter;
-	u32 rmid;
+	u32 *rmid, sizet;
 
 	event->hw.is_group_event = false;
 	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
@@ -454,24 +384,20 @@ static int intel_cqm_setup_event(struct perf_event *event,
 			/* All tasks in a group share an RMID */
 			event->hw.cqm_rmid = rmid;
 			*group = iter;
-			if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
+			if (is_mbm_event(event->attr.config))
 				init_mbm_sample(rmid, event->attr.config);
 			return 0;
 		}
-
-	}
-
-	rmid = __get_rmid();
-
-	if (!__rmid_valid(rmid)) {
-		pr_info("out of RMIDs\n");
-		return -EINVAL;
 	}
 
-	if (is_mbm_event(event->attr.config) && __rmid_valid(rmid))
-		init_mbm_sample(rmid, event->attr.config);
-
-	event->hw.cqm_rmid = rmid;
+	/*
+	 * RMIDs are allocated in LAZY mode by default only when
+	 * tasks monitored are scheduled in.
+	 */
+	sizet = sizeof(u32) * cqm_socket_max;
+	event->hw.cqm_rmid = kzalloc(sizet, GFP_KERNEL);
+	if (!event->hw.cqm_rmid)
+		return -ENOMEM;
 
 	return 0;
 }
@@ -489,7 +415,7 @@ static void intel_cqm_event_read(struct perf_event *event)
 		return;
 
 	raw_spin_lock_irqsave(&cache_lock, flags);
-	rmid = event->hw.cqm_rmid;
+	rmid = event->hw.cqm_rmid[pkg_id];
 
 	if (!__rmid_valid(rmid))
 		goto out;
@@ -515,12 +441,12 @@ static void __intel_cqm_event_count(void *info)
 	struct rmid_read *rr = info;
 	u64 val;
 
-	val = __rmid_read(rr->rmid);
-
-	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
-		return;
-
-	atomic64_add(val, &rr->value);
+	if (__rmid_valid(rr->rmid[pkg_id])) {
+		val = __rmid_read(rr->rmid[pkg_id]);
+		if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+			return;
+		atomic64_add(val, &rr->value);
+	}
 }
 
 static inline bool cqm_group_leader(struct perf_event *event)
@@ -533,10 +459,12 @@ static void __intel_mbm_event_count(void *info)
 	struct rmid_read *rr = info;
 	u64 val;
 
-	val = rmid_read_mbm(rr->rmid, rr->evt_type);
-	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
-		return;
-	atomic64_add(val, &rr->value);
+	if (__rmid_valid(rr->rmid[pkg_id])) {
+		val = rmid_read_mbm(rr->rmid[pkg_id], rr->evt_type);
+		if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+			return;
+		atomic64_add(val, &rr->value);
+	}
 }
 
 static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
@@ -559,7 +487,7 @@ static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
 	}
 
 	list_for_each_entry(iter, &cache_groups, hw.cqm_groups_entry) {
-		grp_rmid = iter->hw.cqm_rmid;
+		grp_rmid = iter->hw.cqm_rmid[pkg_id];
 		if (!__rmid_valid(grp_rmid))
 			continue;
 		if (is_mbm_event(iter->attr.config))
@@ -572,7 +500,7 @@ static enum hrtimer_restart mbm_hrtimer_handle(struct hrtimer *hrtimer)
 			if (!iter1->hw.is_group_event)
 				break;
 			if (is_mbm_event(iter1->attr.config))
-				update_sample(iter1->hw.cqm_rmid,
+				update_sample(iter1->hw.cqm_rmid[pkg_id],
 					      iter1->attr.config, 0);
 		}
 	}
@@ -610,7 +538,7 @@ static void mbm_hrtimer_init(void)
 	struct hrtimer *hr;
 	int i;
 
-	for (i = 0; i < mbm_socket_max; i++) {
+	for (i = 0; i < cqm_socket_max; i++) {
 		hr = &mbm_timers[i];
 		hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 		hr->function = mbm_hrtimer_handle;
@@ -667,16 +595,39 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 	return __perf_event_count(event);
 }
 
+void alloc_needed_pkg_rmid(u32 *cqm_rmid)
+{
+	unsigned long flags;
+	u32 rmid;
+
+	if (WARN_ON(!cqm_rmid))
+		return;
+
+	if (cqm_rmid[pkg_id])
+		return;
+
+	raw_spin_lock_irqsave(&cache_lock, flags);
+
+	rmid = __get_rmid(pkg_id);
+	if (__rmid_valid(rmid))
+		cqm_rmid[pkg_id] = rmid;
+
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+}
+
 static void intel_cqm_event_start(struct perf_event *event, int mode)
 {
 	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-	u32 rmid = event->hw.cqm_rmid;
+	u32 rmid;
 
 	if (!(event->hw.cqm_state & PERF_HES_STOPPED))
 		return;
 
 	event->hw.cqm_state &= ~PERF_HES_STOPPED;
 
+	alloc_needed_pkg_rmid(event->hw.cqm_rmid);
+
+	rmid = event->hw.cqm_rmid[pkg_id];
 	state->rmid = rmid;
 	wrmsr(MSR_IA32_PQR_ASSOC, rmid, state->closid);
 }
@@ -691,22 +642,27 @@ static void intel_cqm_event_stop(struct perf_event *event, int mode)
 
 static int intel_cqm_event_add(struct perf_event *event, int mode)
 {
-	unsigned long flags;
-	u32 rmid;
-
-	raw_spin_lock_irqsave(&cache_lock, flags);
-
 	event->hw.cqm_state = PERF_HES_STOPPED;
-	rmid = event->hw.cqm_rmid;
 
-	if (__rmid_valid(rmid) && (mode & PERF_EF_START))
+	if ((mode & PERF_EF_START))
 		intel_cqm_event_start(event, mode);
 
-	raw_spin_unlock_irqrestore(&cache_lock, flags);
-
 	return 0;
 }
 
+static inline void
+	cqm_event_free_rmid(struct perf_event *event)
+{
+	u32 *rmid = event->hw.cqm_rmid;
+	int d;
+
+	for (d = 0; d < cqm_socket_max; d++) {
+		if (__rmid_valid(rmid[d]))
+			__put_rmid(rmid[d], d);
+	}
+	kfree(event->hw.cqm_rmid);
+	list_del(&event->hw.cqm_groups_entry);
+}
 static void intel_cqm_event_destroy(struct perf_event *event)
 {
 	struct perf_event *group_other = NULL;
@@ -737,16 +693,11 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 		 * If there was a group_other, make that leader, otherwise
 		 * destroy the group and return the RMID.
 		 */
-		if (group_other) {
+		if (group_other)
 			list_replace(&event->hw.cqm_groups_entry,
 				     &group_other->hw.cqm_groups_entry);
-		} else {
-			u32 rmid = event->hw.cqm_rmid;
-
-			if (__rmid_valid(rmid))
-				__put_rmid(rmid);
-			list_del(&event->hw.cqm_groups_entry);
-		}
+		else
+			cqm_event_free_rmid(event);
 	}
 
 	raw_spin_unlock_irqrestore(&cache_lock, flags);
@@ -794,7 +745,7 @@ static int intel_cqm_event_init(struct perf_event *event)
 
 	mutex_lock(&cache_mutex);
 
-	/* Will also set rmid, return error on RMID not being available*/
+	/* Delay allocating RMIDs */
 	if (intel_cqm_setup_event(event, &group)) {
 		ret = -EINVAL;
 		goto out;
@@ -1036,12 +987,95 @@ static void mbm_cleanup(void)
 	{}
 };
 
+static int pkg_data_init_cpu(int cpu)
+{
+	struct cqm_rmid_entry *ccqm_rmid_ptrs = NULL, *entry = NULL;
+	int curr_pkgid = topology_physical_package_id(cpu);
+	struct pkg_data *pkg_data = NULL;
+	int i = 0, nr_rmids, ret = 0;
+
+	if (cqm_pkgs_data[curr_pkgid])
+		return 0;
+
+	pkg_data = kzalloc_node(sizeof(struct pkg_data),
+				GFP_KERNEL, cpu_to_node(cpu));
+	if (!pkg_data)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&pkg_data->cqm_rmid_free_lru);
+	INIT_LIST_HEAD(&pkg_data->cqm_rmid_limbo_lru);
+
+	mutex_init(&pkg_data->pkg_data_mutex);
+	raw_spin_lock_init(&pkg_data->pkg_data_lock);
+
+	pkg_data->rmid_work_cpu = cpu;
+
+	nr_rmids = cqm_max_rmid + 1;
+	ccqm_rmid_ptrs = kzalloc(sizeof(struct cqm_rmid_entry) *
+			   nr_rmids, GFP_KERNEL);
+	if (!ccqm_rmid_ptrs) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	for (; i <= cqm_max_rmid; i++) {
+		entry = &ccqm_rmid_ptrs[i];
+		INIT_LIST_HEAD(&entry->list);
+		entry->rmid = i;
+
+		list_add_tail(&entry->list, &pkg_data->cqm_rmid_free_lru);
+	}
+
+	pkg_data->cqm_rmid_ptrs = ccqm_rmid_ptrs;
+	cqm_pkgs_data[curr_pkgid] = pkg_data;
+
+	/*
+	 * RMID 0 is special and is always allocated. It's used for all
+	 * tasks that are not monitored.
+	 */
+	entry = __rmid_entry(0, curr_pkgid);
+	list_del(&entry->list);
+
+	return 0;
+fail:
+	kfree(ccqm_rmid_ptrs);
+	ccqm_rmid_ptrs = NULL;
+	kfree(pkg_data);
+	pkg_data = NULL;
+	cqm_pkgs_data[curr_pkgid] = NULL;
+	return ret;
+}
+
+static int cqm_init_pkgs_data(void)
+{
+	int i, cpu, ret = 0;
+
+	cqm_pkgs_data = kzalloc(
+		sizeof(struct pkg_data *) * cqm_socket_max,
+		GFP_KERNEL);
+	if (!cqm_pkgs_data)
+		return -ENOMEM;
+
+	for (i = 0; i < cqm_socket_max; i++)
+		cqm_pkgs_data[i] = NULL;
+
+	for_each_online_cpu(cpu) {
+		ret = pkg_data_init_cpu(cpu);
+		if (ret)
+			goto fail;
+	}
+
+	return 0;
+fail:
+	cqm_cleanup();
+	return ret;
+}
+
 static int intel_mbm_init(void)
 {
 	int ret = 0, array_size, maxid = cqm_max_rmid + 1;
 
-	mbm_socket_max = topology_max_packages();
-	array_size = sizeof(struct sample) * maxid * mbm_socket_max;
+	array_size = sizeof(struct sample) * maxid * cqm_socket_max;
 	mbm_local = kmalloc(array_size, GFP_KERNEL);
 	if (!mbm_local)
 		return -ENOMEM;
@@ -1052,7 +1086,7 @@ static int intel_mbm_init(void)
 		goto out;
 	}
 
-	array_size = sizeof(struct hrtimer) * mbm_socket_max;
+	array_size = sizeof(struct hrtimer) * cqm_socket_max;
 	mbm_timers = kmalloc(array_size, GFP_KERNEL);
 	if (!mbm_timers) {
 		ret = -ENOMEM;
@@ -1128,7 +1162,8 @@ static int __init intel_cqm_init(void)
 
 	event_attr_intel_cqm_llc_scale.event_str = str;
 
-	ret = intel_cqm_setup_rmid_cache();
+	cqm_socket_max = topology_max_packages();
+	ret = cqm_init_pkgs_data();
 	if (ret)
 		goto out;
 
@@ -1171,6 +1206,7 @@ static int __init intel_cqm_init(void)
 	if (ret) {
 		kfree(str);
 		cqm_cleanup();
+		cqm_enabled = false;
 		mbm_cleanup();
 	}
 
diff --git a/arch/x86/events/intel/cqm.h b/arch/x86/events/intel/cqm.h
new file mode 100644
index 0000000..4415497
--- /dev/null
+++ b/arch/x86/events/intel/cqm.h
@@ -0,0 +1,37 @@
+#ifndef _ASM_X86_CQM_H
+#define _ASM_X86_CQM_H
+
+#ifdef CONFIG_INTEL_RDT_M
+
+#include <linux/perf_event.h>
+
+/**
+ * struct pkg_data - cqm per package(socket) meta data
+ * @cqm_rmid_free_lru    A least recently used list of free RMIDs
+ *     These RMIDs are guaranteed to have an occupancy less than the
+ * threshold occupancy
+ * @cqm_rmid_limbo_lru       list of currently unused but (potentially)
+ *     dirty RMIDs.
+ *     This list contains RMIDs that no one is currently using but that
+ *     may have a occupancy value > __intel_cqm_threshold. User can change
+ *     the threshold occupancy value.
+ * @cqm_rmid_entry - The entry in the limbo and free lists.
+ * @delayed_work - Work to reuse the RMIDs that have been freed.
+ * @rmid_work_cpu - The cpu on the package on which work is scheduled.
+ */
+struct pkg_data {
+	struct list_head	cqm_rmid_free_lru;
+	struct list_head	cqm_rmid_limbo_lru;
+
+	struct cqm_rmid_entry	*cqm_rmid_ptrs;
+
+	struct mutex		pkg_data_mutex;
+	raw_spinlock_t		pkg_data_lock;
+
+	struct delayed_work	intel_cqm_rmid_work;
+	atomic_t		reuse_scheduled;
+
+	int			rmid_work_cpu;
+};
+#endif
+#endif
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecd..a8f4749 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -141,7 +141,7 @@ struct hw_perf_event {
 		};
 		struct { /* intel_cqm */
 			int			cqm_state;
-			u32			cqm_rmid;
+			u32			*cqm_rmid;
 			int			is_group_event;
 			struct list_head	cqm_events_entry;
 			struct list_head	cqm_groups_entry;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (3 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 04/12] x86/cqm: Add Per pkg rmid support Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 06/12] x86/cqm: Add cgroup hierarchical monitoring support Vikas Shivappa
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: David Carrillo-Cisneros <davidcc@google.com>

cgroup hierarchy monitoring is not supported currently. This patch
builds all the necessary datastructures, cgroup APIs like alloc, free
etc and necessary quirks for supporting cgroup hierarchy monitoring in
later patches.

- Introduce a architecture specific data structure arch_info in
perf_cgroup to keep track of RMIDs and cgroup hierarchical monitoring.
- perf sched_in calls all the cgroup ancestors when a cgroup is
scheduled in. This will not work with cqm as we have a common per pkg
rmid associated with one task and hence cannot write different RMIds
into the MSR for each event. cqm driver enables a flag
PERF_EV_CGROUP_NO_RECURSION which indicates the perf to not call all
ancestor cgroups for each event and let the driver handle the hierarchy
monitoring for cgroup.
- Introduce event_terminate as event_destroy is called after cgrp is
disassociated from the event to support rmid handling of the cgroup.
This helps cqm clean up the cqm specific arch_info.
- Add the cgroup APIs for alloc,free,attach and can_attach

The above framework will be used to build different cgroup features in
later patches.

Tests: Same as before. Cgroup still doesnt work but we did the prep to
get it to work

Patch modified/refactored by Vikas Shivappa
<vikas.shivappa@linux.intel.com> to support recycling removal.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c       | 19 ++++++++++++++++++-
 arch/x86/include/asm/perf_event.h | 27 +++++++++++++++++++++++++++
 include/linux/perf_event.h        | 32 ++++++++++++++++++++++++++++++++
 kernel/events/core.c              | 28 +++++++++++++++++++++++++++-
 4 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 68fd1da..a9bd7bd 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -741,7 +741,13 @@ static int intel_cqm_event_init(struct perf_event *event)
 	INIT_LIST_HEAD(&event->hw.cqm_group_entry);
 	INIT_LIST_HEAD(&event->hw.cqm_groups_entry);
 
-	event->destroy = intel_cqm_event_destroy;
+	/*
+	 * CQM driver handles cgroup recursion and since only noe
+	 * RMID can be programmed at the time in each core, then
+	 * it is incompatible with the way generic code handles
+	 * cgroup hierarchies.
+	 */
+	event->event_caps |= PERF_EV_CAP_CGROUP_NO_RECURSION;
 
 	mutex_lock(&cache_mutex);
 
@@ -918,6 +924,17 @@ static int intel_cqm_event_init(struct perf_event *event)
 	.read		     = intel_cqm_event_read,
 	.count		     = intel_cqm_event_count,
 };
+#ifdef CONFIG_CGROUP_PERF
+int perf_cgroup_arch_css_alloc(struct cgroup_subsys_state *parent_css,
+				      struct cgroup_subsys_state *new_css)
+{}
+void perf_cgroup_arch_css_free(struct cgroup_subsys_state *css)
+{}
+void perf_cgroup_arch_attach(struct cgroup_taskset *tset)
+{}
+int perf_cgroup_arch_can_attach(struct cgroup_taskset *tset)
+{}
+#endif
 
 static inline void cqm_pick_event_reader(int cpu)
 {
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f353061..f38c7f0 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -299,4 +299,31 @@ static inline void perf_check_microcode(void) { }
 
 #define arch_perf_out_copy_user copy_from_user_nmi
 
+/*
+ * Hooks for architecture specific features of perf_event cgroup.
+ * Currently used by Intel's CQM.
+ */
+#ifdef CONFIG_INTEL_RDT_M
+#ifdef CONFIG_CGROUP_PERF
+
+#define perf_cgroup_arch_css_alloc	perf_cgroup_arch_css_alloc
+
+int perf_cgroup_arch_css_alloc(struct cgroup_subsys_state *parent_css,
+				      struct cgroup_subsys_state *new_css);
+
+#define perf_cgroup_arch_css_free	perf_cgroup_arch_css_free
+
+void perf_cgroup_arch_css_free(struct cgroup_subsys_state *css);
+
+#define perf_cgroup_arch_attach		perf_cgroup_arch_attach
+
+void perf_cgroup_arch_attach(struct cgroup_taskset *tset);
+
+#define perf_cgroup_arch_can_attach	perf_cgroup_arch_can_attach
+
+int perf_cgroup_arch_can_attach(struct cgroup_taskset *tset);
+
+#endif
+
+#endif
 #endif /* _ASM_X86_PERF_EVENT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a8f4749..410642a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -300,6 +300,12 @@ struct pmu {
 	int (*event_init)		(struct perf_event *event);
 
 	/*
+	 * Terminate the event for this PMU. Optional complement for a
+	 * successful event_init. Called before the event fields are tear down.
+	 */
+	void (*event_terminate)		(struct perf_event *event);
+
+	/*
 	 * Notification that the event was mapped or unmapped.  Called
 	 * in the context of the mapping task.
 	 */
@@ -516,9 +522,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_SOFTWARE: Is a software event.
  * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
  * from any CPU in the package where it is active.
+ * PERF_EV_CAP_CGROUP_NO_RECURSION: A cgroup event that handles its own
+ * cgroup scoping. It does not need to be enabled for all of its descendants
+ * cgroups.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_CGROUP_NO_RECURSION	BIT(2)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
@@ -823,6 +833,8 @@ struct perf_cgroup_info {
 };
 
 struct perf_cgroup {
+	/* Architecture specific information. */
+	void				 *arch_info;
 	struct cgroup_subsys_state	css;
 	struct perf_cgroup_info	__percpu *info;
 };
@@ -844,6 +856,7 @@ struct perf_cgroup {
 
 #ifdef CONFIG_PERF_EVENTS
 
+extern int is_cgroup_event(struct perf_event *event);
 extern void *perf_aux_output_begin(struct perf_output_handle *handle,
 				   struct perf_event *event);
 extern void perf_aux_output_end(struct perf_output_handle *handle,
@@ -1387,4 +1400,23 @@ ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr,
 #define perf_event_exit_cpu	NULL
 #endif
 
+/*
+ * Hooks for architecture specific extensions for perf_cgroup.
+ */
+#ifndef perf_cgroup_arch_css_alloc
+#define perf_cgroup_arch_css_alloc(parent_css, new_css) 0
+#endif
+
+#ifndef perf_cgroup_arch_css_free
+#define perf_cgroup_arch_css_free(css) do { } while (0)
+#endif
+
+#ifndef perf_cgroup_arch_attach
+#define perf_cgroup_arch_attach(tskset) do { } while (0)
+#endif
+
+#ifndef perf_cgroup_arch_can_attach
+#define perf_cgroup_arch_can_attach(tskset) 0
+#endif
+
 #endif /* _LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab15509..229f611 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -590,6 +590,9 @@ static inline u64 perf_event_clock(struct perf_event *event)
 	if (!cpuctx->cgrp)
 		return false;
 
+	if (event->event_caps & PERF_EV_CAP_CGROUP_NO_RECURSION)
+		return cpuctx->cgrp->css.cgroup == event->cgrp->css.cgroup;
+
 	/*
 	 * Cgroup scoping is recursive.  An event enabled for a cgroup is
 	 * also enabled for all its descendant cgroups.  If @cpuctx's
@@ -606,7 +609,7 @@ static inline void perf_detach_cgroup(struct perf_event *event)
 	event->cgrp = NULL;
 }
 
-static inline int is_cgroup_event(struct perf_event *event)
+int is_cgroup_event(struct perf_event *event)
 {
 	return event->cgrp != NULL;
 }
@@ -4019,6 +4022,9 @@ static void _free_event(struct perf_event *event)
 		mutex_unlock(&event->mmap_mutex);
 	}
 
+	if (event->pmu->event_terminate)
+		event->pmu->event_terminate(event);
+
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
 
@@ -9246,6 +9252,8 @@ static void account_event(struct perf_event *event)
 	exclusive_event_destroy(event);
 
 err_pmu:
+	if (event->pmu->event_terminate)
+		event->pmu->event_terminate(event);
 	if (event->destroy)
 		event->destroy(event);
 	module_put(pmu->module);
@@ -10748,6 +10756,7 @@ static int __init perf_event_sysfs_init(void)
 perf_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct perf_cgroup *jc;
+	int ret;
 
 	jc = kzalloc(sizeof(*jc), GFP_KERNEL);
 	if (!jc)
@@ -10759,6 +10768,12 @@ static int __init perf_event_sysfs_init(void)
 		return ERR_PTR(-ENOMEM);
 	}
 
+	jc->arch_info = NULL;
+
+	ret = perf_cgroup_arch_css_alloc(parent_css, &jc->css);
+	if (ret)
+		return ERR_PTR(ret);
+
 	return &jc->css;
 }
 
@@ -10766,6 +10781,8 @@ static void perf_cgroup_css_free(struct cgroup_subsys_state *css)
 {
 	struct perf_cgroup *jc = container_of(css, struct perf_cgroup, css);
 
+	perf_cgroup_arch_css_free(css);
+
 	free_percpu(jc->info);
 	kfree(jc);
 }
@@ -10786,11 +10803,20 @@ static void perf_cgroup_attach(struct cgroup_taskset *tset)
 
 	cgroup_taskset_for_each(task, css, tset)
 		task_function_call(task, __perf_cgroup_move, task);
+
+	perf_cgroup_arch_attach(tset);
+}
+
+static int perf_cgroup_can_attach(struct cgroup_taskset *tset)
+{
+	return perf_cgroup_arch_can_attach(tset);
 }
 
+
 struct cgroup_subsys perf_event_cgrp_subsys = {
 	.css_alloc	= perf_cgroup_css_alloc,
 	.css_free	= perf_cgroup_css_free,
+	.can_attach	= perf_cgroup_can_attach,
 	.attach		= perf_cgroup_attach,
 };
 #endif /* CONFIG_CGROUP_PERF */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/12] x86/cqm: Add cgroup hierarchical monitoring support
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (4 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 07/12] x86/rdt,cqm: Scheduling support update Vikas Shivappa
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: David Carrillo-Cisneros <davidcc@google.com>

Patch adds support for monitoring cgroup hierarchy. The
arch_info that was introduced in the perf_cgroup is used to maintain the
cgroup related rmid and hierarchy information.

Since cgroup supports hierarchical monitoring, a cgroup is always
monitoring for some ancestor. By default root is always monitored with
RMID 0 and hence when any cgroup is first created it always reports data
to the root.  mfa or 'monitor for ancestor' is used to keep track of
this information.  Basically which ancestor the cgroup is actually
monitoring for or has to report the data to.

By default, all cgroup's mfa points to root.
1.event init: When ever a new cgroup x would start to be monitored,
the mfa of the monitored cgroup's descendants point towards the cgroup
x.
2.switch_to: task finds the cgroup its associated with and if the
cgroup itself is being monitored cgroup uses its own rmid(a) else it uses
the rmid of the mfa(b).
3.read: During the read call, the cgroup x just adds the
counts of its descendants who had cgroup x as mfa and were also
monitored(To count the scenario (a) in switch_to).

Locking: cgroup traversal: rcu_readlock.  cgroup->arch_info: css_alloc,
css_free, event terminate, init hold mutex.

Tests: Cgroup monitoring should work. Monitoring multiple cgroups in the
same hierarchy works. monitoring cgroup and a task within same cgroup
doesnt work yet.

Patch modified/refactored by Vikas Shivappa
<vikas.shivappa@linux.intel.com> to support recycling removal.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c             | 227 +++++++++++++++++++++++++++-----
 arch/x86/include/asm/intel_rdt_common.h |  64 +++++++++
 2 files changed, 257 insertions(+), 34 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index a9bd7bd..c6479ae 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -85,6 +85,7 @@ struct sample {
 static cpumask_t cqm_cpumask;
 
 struct pkg_data **cqm_pkgs_data;
+struct cgrp_cqm_info cqm_rootcginfo;
 
 #define RMID_VAL_ERROR		(1ULL << 63)
 #define RMID_VAL_UNAVAIL	(1ULL << 62)
@@ -193,6 +194,11 @@ static void __put_rmid(u32 rmid, int domain)
 	list_add_tail(&entry->list, &cqm_pkgs_data[domain]->cqm_rmid_limbo_lru);
 }
 
+static bool is_task_event(struct perf_event *e)
+{
+	return (e->attach_state & PERF_ATTACH_TASK);
+}
+
 static void cqm_cleanup(void)
 {
 	int i;
@@ -209,7 +215,6 @@ static void cqm_cleanup(void)
 	kfree(cqm_pkgs_data);
 }
 
-
 /*
  * Determine if @a and @b measure the same set of tasks.
  *
@@ -224,20 +229,18 @@ static bool __match_event(struct perf_event *a, struct perf_event *b)
 		return false;
 
 #ifdef CONFIG_CGROUP_PERF
-	if (a->cgrp != b->cgrp)
-		return false;
-#endif
-
-	/* If not task event, we're machine wide */
-	if (!(b->attach_state & PERF_ATTACH_TASK))
+	if ((is_cgroup_event(a) && is_cgroup_event(b)) &&
+		(a->cgrp == b->cgrp))
 		return true;
+#endif
 
 	/*
 	 * Events that target same task are placed into the same cache group.
 	 * Mark it as a multi event group, so that we update ->count
 	 * for every event rather than just the group leader later.
 	 */
-	if (a->hw.target == b->hw.target) {
+	if ((is_task_event(a) && is_task_event(b)) &&
+		(a->hw.target == b->hw.target)) {
 		b->hw.is_group_event = true;
 		return true;
 	}
@@ -365,6 +368,63 @@ static void init_mbm_sample(u32 *rmid, u32 evt_type)
 	on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
 }
 
+static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid)
+{
+	if (rmid != NULL) {
+		cqm_info->mon_enabled = true;
+		cqm_info->rmid = rmid;
+	} else {
+		cqm_info->mon_enabled = false;
+		cqm_info->rmid = NULL;
+	}
+}
+
+static void cqm_assign_hier_rmid(struct cgroup_subsys_state *rcss, u32 *rmid)
+{
+	struct cgrp_cqm_info *ccqm_info, *rcqm_info;
+	struct cgroup_subsys_state *pos_css;
+
+	rcu_read_lock();
+
+	rcqm_info = css_to_cqm_info(rcss);
+
+	/* Enable or disable monitoring based on rmid.*/
+	cqm_enable_mon(rcqm_info, rmid);
+
+	pos_css = css_next_descendant_pre(rcss, rcss);
+	while (pos_css) {
+		ccqm_info = css_to_cqm_info(pos_css);
+
+		/*
+		 * Monitoring is being enabled.
+		 * Update the descendents to monitor for you, unless
+		 * they were already monitoring for a descendent of yours.
+		 */
+		if (rmid && (rcqm_info->level > ccqm_info->mfa->level))
+			ccqm_info->mfa = rcqm_info;
+
+		/*
+		 * Monitoring is being disabled.
+		 * Update the descendents who were monitoring for you
+		 * to monitor for the ancestor you were monitoring.
+		 */
+		if (!rmid && (ccqm_info->mfa == rcqm_info))
+			ccqm_info->mfa = rcqm_info->mfa;
+		pos_css = css_next_descendant_pre(pos_css, rcss);
+	}
+	rcu_read_unlock();
+}
+
+static int cqm_assign_rmid(struct perf_event *event, u32 *rmid)
+{
+#ifdef CONFIG_CGROUP_PERF
+	if (is_cgroup_event(event)) {
+		cqm_assign_hier_rmid(&event->cgrp->css, rmid);
+	}
+#endif
+	return 0;
+}
+
 /*
  * Find a group and setup RMID.
  *
@@ -402,11 +462,14 @@ static int intel_cqm_setup_event(struct perf_event *event,
 	return 0;
 }
 
+static u64 cqm_read_subtree(struct perf_event *event, struct rmid_read *rr);
+
 static void intel_cqm_event_read(struct perf_event *event)
 {
-	unsigned long flags;
-	u32 rmid;
-	u64 val;
+	struct rmid_read rr = {
+		.evt_type = event->attr.config,
+		.value = ATOMIC64_INIT(0),
+	};
 
 	/*
 	 * Task events are handled by intel_cqm_event_count().
@@ -414,26 +477,9 @@ static void intel_cqm_event_read(struct perf_event *event)
 	if (event->cpu == -1)
 		return;
 
-	raw_spin_lock_irqsave(&cache_lock, flags);
-	rmid = event->hw.cqm_rmid[pkg_id];
-
-	if (!__rmid_valid(rmid))
-		goto out;
-
-	if (is_mbm_event(event->attr.config))
-		val = rmid_read_mbm(rmid, event->attr.config);
-	else
-		val = __rmid_read(rmid);
-
-	/*
-	 * Ignore this reading on error states and do not update the value.
-	 */
-	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
-		goto out;
+	rr.rmid = ACCESS_ONCE(event->hw.cqm_rmid);
 
-	local64_set(&event->count, val);
-out:
-	raw_spin_unlock_irqrestore(&cache_lock, flags);
+	cqm_read_subtree(event, &rr);
 }
 
 static void __intel_cqm_event_count(void *info)
@@ -545,6 +591,55 @@ static void mbm_hrtimer_init(void)
 	}
 }
 
+static void cqm_mask_call_local(struct rmid_read *rr)
+{
+	if (is_mbm_event(rr->evt_type))
+		__intel_mbm_event_count(rr);
+	else
+		__intel_cqm_event_count(rr);
+}
+
+static inline void
+	delta_local(struct perf_event *event, struct rmid_read *rr, u32 *rmid)
+{
+	atomic64_set(&rr->value, 0);
+	rr->rmid = ACCESS_ONCE(rmid);
+
+	cqm_mask_call_local(rr);
+	local64_add(atomic64_read(&rr->value), &event->count);
+}
+
+/*
+ * Since cgroup follows hierarchy, add the count of
+ *  the descendents who were being monitored as well.
+ */
+static u64 cqm_read_subtree(struct perf_event *event, struct rmid_read *rr)
+{
+#ifdef CONFIG_CGROUP_PERF
+
+	struct cgroup_subsys_state *rcss, *pos_css;
+	struct cgrp_cqm_info *ccqm_info;
+
+	cqm_mask_call_local(rr);
+	local64_set(&event->count, atomic64_read(&(rr->value)));
+
+	if (is_task_event(event))
+		return __perf_event_count(event);
+
+	rcu_read_lock();
+	rcss = &event->cgrp->css;
+	css_for_each_descendant_pre(pos_css, rcss) {
+		ccqm_info = (css_to_cqm_info(pos_css));
+
+		/* Add the descendent 'monitored cgroup' counts */
+		if (pos_css != rcss && ccqm_info->mon_enabled)
+			delta_local(event, rr, ccqm_info->rmid);
+	}
+	rcu_read_unlock();
+#endif
+	return __perf_event_count(event);
+}
+
 static u64 intel_cqm_event_count(struct perf_event *event)
 {
 	struct rmid_read rr = {
@@ -603,7 +698,7 @@ void alloc_needed_pkg_rmid(u32 *cqm_rmid)
 	if (WARN_ON(!cqm_rmid))
 		return;
 
-	if (cqm_rmid[pkg_id])
+	if (cqm_rmid == cqm_rootcginfo.rmid || cqm_rmid[pkg_id])
 		return;
 
 	raw_spin_lock_irqsave(&cache_lock, flags);
@@ -661,9 +756,11 @@ static int intel_cqm_event_add(struct perf_event *event, int mode)
 			__put_rmid(rmid[d], d);
 	}
 	kfree(event->hw.cqm_rmid);
+	cqm_assign_rmid(event, NULL);
 	list_del(&event->hw.cqm_groups_entry);
 }
-static void intel_cqm_event_destroy(struct perf_event *event)
+
+static void intel_cqm_event_terminate(struct perf_event *event)
 {
 	struct perf_event *group_other = NULL;
 	unsigned long flags;
@@ -917,6 +1014,7 @@ static int intel_cqm_event_init(struct perf_event *event)
 	.attr_groups	     = intel_cqm_attr_groups,
 	.task_ctx_nr	     = perf_sw_context,
 	.event_init	     = intel_cqm_event_init,
+	.event_terminate     = intel_cqm_event_terminate,
 	.add		     = intel_cqm_event_add,
 	.del		     = intel_cqm_event_stop,
 	.start		     = intel_cqm_event_start,
@@ -924,12 +1022,67 @@ static int intel_cqm_event_init(struct perf_event *event)
 	.read		     = intel_cqm_event_read,
 	.count		     = intel_cqm_event_count,
 };
+
 #ifdef CONFIG_CGROUP_PERF
 int perf_cgroup_arch_css_alloc(struct cgroup_subsys_state *parent_css,
 				      struct cgroup_subsys_state *new_css)
-{}
+{
+	struct cgrp_cqm_info *cqm_info, *pcqm_info;
+	struct perf_cgroup *new_cgrp;
+
+	if (!parent_css) {
+		cqm_rootcginfo.level = 0;
+
+		cqm_rootcginfo.mon_enabled = true;
+		cqm_rootcginfo.cont_mon = true;
+		cqm_rootcginfo.mfa = NULL;
+		INIT_LIST_HEAD(&cqm_rootcginfo.tskmon_rlist);
+
+		if (new_css) {
+			new_cgrp = css_to_perf_cgroup(new_css);
+			new_cgrp->arch_info = &cqm_rootcginfo;
+		}
+		return 0;
+	}
+
+	mutex_lock(&cache_mutex);
+
+	new_cgrp = css_to_perf_cgroup(new_css);
+
+	cqm_info = kzalloc(sizeof(struct cgrp_cqm_info), GFP_KERNEL);
+	if (!cqm_info) {
+		mutex_unlock(&cache_mutex);
+		return -ENOMEM;
+	}
+
+	pcqm_info = (css_to_cqm_info(parent_css));
+	cqm_info->level = pcqm_info->level + 1;
+	cqm_info->rmid = pcqm_info->rmid;
+
+	cqm_info->cont_mon = false;
+	cqm_info->mon_enabled = false;
+	INIT_LIST_HEAD(&cqm_info->tskmon_rlist);
+	if (!pcqm_info->mfa)
+		cqm_info->mfa = pcqm_info;
+	else
+		cqm_info->mfa = pcqm_info->mfa;
+
+	new_cgrp->arch_info = cqm_info;
+	mutex_unlock(&cache_mutex);
+
+	return 0;
+}
+
 void perf_cgroup_arch_css_free(struct cgroup_subsys_state *css)
-{}
+{
+	struct perf_cgroup *cgrp = css_to_perf_cgroup(css);
+
+	mutex_lock(&cache_mutex);
+	kfree(cgrp_to_cqm_info(cgrp));
+	cgrp->arch_info = NULL;
+	mutex_unlock(&cache_mutex);
+}
+
 void perf_cgroup_arch_attach(struct cgroup_taskset *tset)
 {}
 int perf_cgroup_arch_can_attach(struct cgroup_taskset *tset)
@@ -1053,6 +1206,12 @@ static int pkg_data_init_cpu(int cpu)
 	entry = __rmid_entry(0, curr_pkgid);
 	list_del(&entry->list);
 
+	cqm_rootcginfo.rmid = kzalloc(sizeof(u32) * cqm_socket_max, GFP_KERNEL);
+	if (!cqm_rootcginfo.rmid) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
 	return 0;
 fail:
 	kfree(ccqm_rmid_ptrs);
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index b31081b..e11ed5e 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -24,4 +24,68 @@ struct intel_pqr_state {
 
 DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
 
+/**
+ * struct cgrp_cqm_info - perf_event cgroup metadata for cqm
+ * @cont_mon     Continuous monitoring flag
+ * @mon_enabled  Whether monitoring is enabled
+ * @level        Level in the cgroup tree. Root is level 0.
+ * @rmid        The rmids of the cgroup.
+ * @mfa          'Monitoring for ancestor' points to the cqm_info
+ *  of the ancestor the cgroup is monitoring for. 'Monitoring for ancestor'
+ *  means you will use an ancestors RMID at sched_in if you are
+ *  not monitoring yourself.
+ *
+ *  Due to the hierarchical nature of cgroups, every cgroup just
+ *  monitors for the 'nearest monitored ancestor' at all times.
+ *  Since root cgroup is always monitored, all descendents
+ *  at boot time monitor for root and hence all mfa points to root except
+ *  for root->mfa which is NULL.
+ *  1. RMID setup: When cgroup x start monitoring:
+ *    for each descendent y, if y's mfa->level < x->level, then
+ *    y->mfa = x. (Where level of root node = 0...)
+ *  2. sched_in: During sched_in for x
+ *    if (x->mon_enabled) choose x->rmid
+ *    else choose x->mfa->rmid.
+ *  3. read: for each descendent of cgroup x
+ *     if (x->monitored) count += rmid_read(x->rmid).
+ *  4. evt_destroy: for each descendent y of x, if (y->mfa == x) then
+ *     y->mfa = x->mfa. Meaning if any descendent was monitoring for x,
+ *     set that descendent to monitor for the cgroup which x was monitoring for.
+ *
+ * @tskmon_rlist List of tasks being monitored in the cgroup
+ *  When a task which belongs to a cgroup x is being monitored, it always uses
+ *  its own task->rmid even if cgroup x is monitored during sched_in.
+ *  To account for the counts of such tasks, cgroup keeps this list
+ *  and parses it during read.
+ *
+ *  Perf handles hierarchy for other events, but because RMIDs are per pkg
+ *  this is handled here.
+*/
+struct cgrp_cqm_info {
+	bool cont_mon;
+	bool mon_enabled;
+	int level;
+	u32 *rmid;
+	struct cgrp_cqm_info *mfa;
+	struct list_head tskmon_rlist;
+};
+
+struct tsk_rmid_entry {
+	u32 *rmid;
+	struct list_head list;
+};
+
+#ifdef CONFIG_CGROUP_PERF
+
+# define css_to_perf_cgroup(css_) container_of(css_, struct perf_cgroup, css)
+# define cgrp_to_cqm_info(cgrp_) ((struct cgrp_cqm_info *)cgrp_->arch_info)
+# define css_to_cqm_info(css_) cgrp_to_cqm_info(css_to_perf_cgroup(css_))
+
+#else
+
+# define css_to_perf_cgroup(css_) NULL
+# define cgrp_to_cqm_info(cgrp_) NULL
+# define css_to_cqm_info(css_) NULL
+
+#endif
 #endif /* _ASM_X86_INTEL_RDT_COMMON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/12] x86/rdt,cqm: Scheduling support update
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (5 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 06/12] x86/cqm: Add cgroup hierarchical monitoring support Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together Vikas Shivappa
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

Introduce a scheduling hook finish_arch_pre_lock_switch which is
called just after the perf sched_in during context switch. This method
handles both cat and cqm sched in scenarios.
The IA32_PQR_ASSOC MSR is used by cat(cache allocation) and cqm and this
patch integrates the two msr writes to one. The common sched_in patch
checks if the per cpu cache has a different RMID or CLOSid than the task
and does the MSR write.

During sched_in the task uses the task RMID if the task is monitored or
else uses the task's cgroup rmid.

Patch is based on David Carrillo-Cisneros <davidcc@google.com> patches
in cqm2 series.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c              | 45 ++++++++++-----
 arch/x86/include/asm/intel_pqr_common.h  | 38 +++++++++++++
 arch/x86/include/asm/intel_rdt.h         | 39 -------------
 arch/x86/include/asm/intel_rdt_common.h  | 11 ++++
 arch/x86/include/asm/processor.h         |  4 ++
 arch/x86/kernel/cpu/Makefile             |  1 +
 arch/x86/kernel/cpu/intel_rdt_common.c   | 98 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/intel_rdt_rdtgroup.c |  4 +-
 arch/x86/kernel/process_32.c             |  4 --
 arch/x86/kernel/process_64.c             |  4 --
 kernel/sched/core.c                      |  1 +
 kernel/sched/sched.h                     |  3 +
 12 files changed, 188 insertions(+), 64 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_pqr_common.h
 create mode 100644 arch/x86/kernel/cpu/intel_rdt_common.c

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index c6479ae..597a184 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -28,13 +28,6 @@
 static bool cqm_enabled, mbm_enabled;
 unsigned int cqm_socket_max;
 
-/*
- * The cached intel_pqr_state is strictly per CPU and can never be
- * updated from a remote CPU. Both functions which modify the state
- * (intel_cqm_event_start and intel_cqm_event_stop) are called with
- * interrupts disabled, which is sufficient for the protection.
- */
-DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
 static struct hrtimer *mbm_timers;
 /**
  * struct sample - mbm event's (local or total) data
@@ -74,6 +67,8 @@ struct sample {
 static DEFINE_MUTEX(cache_mutex);
 static DEFINE_RAW_SPINLOCK(cache_lock);
 
+DEFINE_STATIC_KEY_FALSE(cqm_enable_key);
+
 /*
  * Groups of events that have the same target(s), one RMID per group.
  */
@@ -108,7 +103,7 @@ struct sample {
  * Likewise, an rmid value of -1 is used to indicate "no rmid currently
  * assigned" and is used as part of the rotation code.
  */
-static inline bool __rmid_valid(u32 rmid)
+bool __rmid_valid(u32 rmid)
 {
 	if (!rmid || rmid > cqm_max_rmid)
 		return false;
@@ -161,7 +156,7 @@ static inline struct cqm_rmid_entry *__rmid_entry(u32 rmid, int domain)
  *
  * We expect to be called with cache_mutex held.
  */
-static u32 __get_rmid(int domain)
+u32 __get_rmid(int domain)
 {
 	struct list_head *cqm_flist;
 	struct cqm_rmid_entry *entry;
@@ -368,6 +363,23 @@ static void init_mbm_sample(u32 *rmid, u32 evt_type)
 	on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
 }
 
+#ifdef CONFIG_CGROUP_PERF
+struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
+{
+	struct cgrp_cqm_info *ccinfo = NULL;
+	struct perf_cgroup *pcgrp;
+
+	pcgrp = perf_cgroup_from_task(tsk, NULL);
+
+	if (!pcgrp)
+		return NULL;
+	else
+		ccinfo = cgrp_to_cqm_info(pcgrp);
+
+	return ccinfo;
+}
+#endif
+
 static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid)
 {
 	if (rmid != NULL) {
@@ -713,26 +725,27 @@ void alloc_needed_pkg_rmid(u32 *cqm_rmid)
 static void intel_cqm_event_start(struct perf_event *event, int mode)
 {
 	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-	u32 rmid;
 
 	if (!(event->hw.cqm_state & PERF_HES_STOPPED))
 		return;
 
 	event->hw.cqm_state &= ~PERF_HES_STOPPED;
 
-	alloc_needed_pkg_rmid(event->hw.cqm_rmid);
-
-	rmid = event->hw.cqm_rmid[pkg_id];
-	state->rmid = rmid;
-	wrmsr(MSR_IA32_PQR_ASSOC, rmid, state->closid);
+	if (is_task_event(event)) {
+		alloc_needed_pkg_rmid(event->hw.cqm_rmid);
+		state->next_task_rmid = event->hw.cqm_rmid[pkg_id];
+	}
 }
 
 static void intel_cqm_event_stop(struct perf_event *event, int mode)
 {
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+
 	if (event->hw.cqm_state & PERF_HES_STOPPED)
 		return;
 
 	event->hw.cqm_state |= PERF_HES_STOPPED;
+	state->next_task_rmid = 0;
 }
 
 static int intel_cqm_event_add(struct perf_event *event, int mode)
@@ -1366,6 +1379,8 @@ static int __init intel_cqm_init(void)
 	if (mbm_enabled)
 		pr_info("Intel MBM enabled\n");
 
+	static_branch_enable(&cqm_enable_key);
+
 	/*
 	 * Setup the hot cpu notifier once we are sure cqm
 	 * is enabled to avoid notifier leak.
diff --git a/arch/x86/include/asm/intel_pqr_common.h b/arch/x86/include/asm/intel_pqr_common.h
new file mode 100644
index 0000000..8fe9d8e
--- /dev/null
+++ b/arch/x86/include/asm/intel_pqr_common.h
@@ -0,0 +1,38 @@
+#ifndef _ASM_X86_INTEL_PQR_COMMON_H
+#define _ASM_X86_INTEL_PQR_COMMON_H
+
+#ifdef CONFIG_INTEL_RDT
+
+#include <linux/jump_label.h>
+#include <linux/types.h>
+#include <asm/percpu.h>
+#include <asm/msr.h>
+#include <asm/intel_rdt_common.h>
+
+void __intel_rdt_sched_in(void);
+
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ *   which supports resource control and we enable by mounting the
+ *   resctrl file system.
+ * - Caches the per cpu CLOSid values and does the MSR write only
+ *   when a task with a different CLOSid is scheduled in.
+ */
+static inline void intel_rdt_sched_in(void)
+{
+	if (static_branch_likely(&rdt_enable_key) ||
+		static_branch_unlikely(&cqm_enable_key)) {
+		__intel_rdt_sched_in();
+	}
+}
+
+#else
+
+static inline void intel_rdt_sched_in(void) {}
+
+#endif
+#endif
diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 95ce5c8..3b4a099 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -5,7 +5,6 @@
 
 #include <linux/kernfs.h>
 #include <linux/jump_label.h>
-
 #include <asm/intel_rdt_common.h>
 
 #define IA32_L3_QOS_CFG		0xc81
@@ -182,43 +181,5 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
 			   struct seq_file *s, void *v);
 
-/*
- * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
- *
- * Following considerations are made so that this has minimal impact
- * on scheduler hot path:
- * - This will stay as no-op unless we are running on an Intel SKU
- *   which supports resource control and we enable by mounting the
- *   resctrl file system.
- * - Caches the per cpu CLOSid values and does the MSR write only
- *   when a task with a different CLOSid is scheduled in.
- *
- * Must be called with preemption disabled.
- */
-static inline void intel_rdt_sched_in(void)
-{
-	if (static_branch_likely(&rdt_enable_key)) {
-		struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
-		int closid;
-
-		/*
-		 * If this task has a closid assigned, use it.
-		 * Else use the closid assigned to this cpu.
-		 */
-		closid = current->closid;
-		if (closid == 0)
-			closid = this_cpu_read(cpu_closid);
-
-		if (closid != state->closid) {
-			state->closid = closid;
-			wrmsr(MSR_IA32_PQR_ASSOC, state->rmid, closid);
-		}
-	}
-}
-
-#else
-
-static inline void intel_rdt_sched_in(void) {}
-
 #endif /* CONFIG_INTEL_RDT_A */
 #endif /* _ASM_X86_INTEL_RDT_H */
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index e11ed5e..544acaa 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -18,12 +18,23 @@
  */
 struct intel_pqr_state {
 	u32			rmid;
+	u32			next_task_rmid;
 	u32			closid;
 	int			rmid_usecnt;
 };
 
 DECLARE_PER_CPU(struct intel_pqr_state, pqr_state);
 
+u32 __get_rmid(int domain);
+bool __rmid_valid(u32 rmid);
+void alloc_needed_pkg_rmid(u32 *cqm_rmid);
+struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk);
+
+extern struct cgrp_cqm_info cqm_rootcginfo;
+
+DECLARE_STATIC_KEY_FALSE(cqm_enable_key);
+DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
+
 /**
  * struct cgrp_cqm_info - perf_event cgroup metadata for cqm
  * @cont_mon     Continuous monitoring flag
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index eaf1005..ec4beed 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@
 #include <asm/nops.h>
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
+#include <asm/intel_pqr_common.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -903,4 +904,7 @@ static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
 
 void stop_this_cpu(void *dummy);
 void df_debug(struct pt_regs *regs, long error_code);
+
+#define finish_arch_pre_lock_switch intel_rdt_sched_in
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 5200001..d354e84 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_CPU_SUP_CENTAUR)		+= centaur.o
 obj-$(CONFIG_CPU_SUP_TRANSMETA_32)	+= transmeta.o
 obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 
+obj-$(CONFIG_INTEL_RDT)		+= intel_rdt_common.o
 obj-$(CONFIG_INTEL_RDT_A)	+= intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_schemata.o
 
 obj-$(CONFIG_X86_MCE)			+= mcheck/
diff --git a/arch/x86/kernel/cpu/intel_rdt_common.c b/arch/x86/kernel/cpu/intel_rdt_common.c
new file mode 100644
index 0000000..c3c50cd
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_rdt_common.c
@@ -0,0 +1,98 @@
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+#include <linux/err.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpuhotplug.h>
+#include <asm/intel-family.h>
+#include <asm/intel_rdt.h>
+
+/*
+ * The cached intel_pqr_state is strictly per CPU and can never be
+ * updated from a remote CPU. Both functions which modify the state
+ * (intel_cqm_event_start and intel_cqm_event_stop) are called with
+ * interrupts disabled, which is sufficient for the protection.
+ */
+DEFINE_PER_CPU(struct intel_pqr_state, pqr_state);
+
+#define pkg_id	topology_physical_package_id(smp_processor_id())
+
+#ifdef CONFIG_INTEL_RDT_M
+static inline int get_cgroup_sched_rmid(void)
+{
+#ifdef CONFIG_CGROUP_PERF
+	struct cgrp_cqm_info *ccinfo = NULL;
+
+	ccinfo = cqminfo_from_tsk(current);
+
+	if (!ccinfo)
+		return 0;
+
+	/*
+	 * A cgroup is always monitoring for itself or
+	 * for an ancestor(default is root).
+	 */
+	if (ccinfo->mon_enabled) {
+		alloc_needed_pkg_rmid(ccinfo->rmid);
+		return ccinfo->rmid[pkg_id];
+	} else {
+		alloc_needed_pkg_rmid(ccinfo->mfa->rmid);
+		return ccinfo->mfa->rmid[pkg_id];
+	}
+#endif
+
+	return 0;
+}
+
+static inline int get_sched_in_rmid(void)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	u32 rmid = 0;
+
+	rmid = state->next_task_rmid;
+
+	return rmid ? rmid : get_cgroup_sched_rmid();
+}
+#endif
+
+/*
+ * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
+ *
+ * Following considerations are made so that this has minimal impact
+ * on scheduler hot path:
+ * - This will stay as no-op unless we are running on an Intel SKU
+ *   which supports resource control and we enable by mounting the
+ *   resctrl file system or it supports resource monitoring.
+ * - Caches the per cpu CLOSid/RMID values and does the MSR write only
+ *   when a task with a different CLOSid/RMID is scheduled in.
+ */
+void __intel_rdt_sched_in(void)
+{
+	struct intel_pqr_state *state = this_cpu_ptr(&pqr_state);
+	int closid = 0;
+	u32 rmid = 0;
+
+#ifdef CONFIG_INTEL_RDT_A
+	if (static_branch_likely(&rdt_enable_key)) {
+		/*
+		 * If this task has a closid assigned, use it.
+		 * Else use the closid assigned to this cpu.
+		 */
+		closid = current->closid;
+		if (closid == 0)
+			closid = this_cpu_read(cpu_closid);
+	}
+#endif
+
+#ifdef CONFIG_INTEL_RDT_M
+	if (static_branch_unlikely(&cqm_enable_key))
+		rmid = get_sched_in_rmid();
+#endif
+
+	if (closid != state->closid || rmid != state->rmid) {
+
+		state->closid = closid;
+		state->rmid = rmid;
+		wrmsr(MSR_IA32_PQR_ASSOC, rmid, closid);
+	}
+}
diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
index 8af04af..8b6b429 100644
--- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
+++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
@@ -206,7 +206,7 @@ static void rdt_update_cpu_closid(void *closid)
 	 * executing task might have its own closid selected. Just reuse
 	 * the context switch code.
 	 */
-	intel_rdt_sched_in();
+	__intel_rdt_sched_in();
 }
 
 /*
@@ -328,7 +328,7 @@ static void move_myself(struct callback_head *head)
 
 	preempt_disable();
 	/* update PQR_ASSOC MSR to make resource group go into effect */
-	intel_rdt_sched_in();
+	__intel_rdt_sched_in();
 	preempt_enable();
 
 	kfree(callback);
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a0ac3e8..d0d7441 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -53,7 +53,6 @@
 #include <asm/debugreg.h>
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
-#include <asm/intel_rdt.h>
 
 void __show_regs(struct pt_regs *regs, int all)
 {
@@ -297,8 +296,5 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 
 	this_cpu_write(current_task, next_p);
 
-	/* Load the Intel cache allocation PQR MSR. */
-	intel_rdt_sched_in();
-
 	return prev_p;
 }
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index a61e141..a76b65e 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -49,7 +49,6 @@
 #include <asm/switch_to.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/vdso.h>
-#include <asm/intel_rdt.h>
 
 __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
 
@@ -477,9 +476,6 @@ void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp)
 			loadsegment(ss, __KERNEL_DS);
 	}
 
-	/* Load the Intel cache allocation PQR MSR. */
-	intel_rdt_sched_in();
-
 	return prev_p;
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c56fb57..bf970ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2767,6 +2767,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
+	finish_arch_pre_lock_switch();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7b34c78..61b47a5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1121,6 +1121,9 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 #ifndef prepare_arch_switch
 # define prepare_arch_switch(next)	do { } while (0)
 #endif
+#ifndef finish_arch_pre_lock_switch
+# define finish_arch_pre_lock_switch()	do { } while (0)
+#endif
 #ifndef finish_arch_post_lock_switch
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (6 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 07/12] x86/rdt,cqm: Scheduling support update Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 09/12] x86/cqm: Add RMID reuse Vikas Shivappa
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: Vikas Shivappa<vikas.shivappa@intel.com>

This patch adds support to monitor a cgroup x and a task p1
when p1 is part of cgroup x. Since we cannot write two RMIDs during
sched in the driver handles this.

This patch introduces a u32 *rmid in the task_struck which keeps track
of the RMIDs associated with the task.  There is also a list in the
arch_info of perf_cgroup called taskmon_list which keeps track of tasks
in the cgroup that are monitored.

The taskmon_list is modified in 2 scenarios.
- at event_init of task p1 which is part of a cgroup, add the p1 to the
cgroup->tskmon_list. At event_destroy delete the task from the list.
- at the time of task move from cgrp x to a cgp y, if task was monitored
remove the task from the cgrp x tskmon_list and add it the cgrp y
tskmon_list.

sched in: When the task p1 is scheduled in, we write the task RMID in
the PQR_ASSOC MSR

read(for task p1): As any other cqm task event

read(for the cgroup x): When counting for cgroup, the taskmon list is
traversed and the corresponding RMID counts are added.

Tests: Monitoring a cgroup x and a task with in the cgroup x should
work.

Patch heavily based on David<davidcc@google.com> patch cqm2 series prior
version.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c | 137 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/sched.h       |   3 +
 2 files changed, 137 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 597a184..e1765e3 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -363,6 +363,36 @@ static void init_mbm_sample(u32 *rmid, u32 evt_type)
 	on_each_cpu_mask(&cqm_cpumask, __intel_mbm_event_init, &rr, 1);
 }
 
+static inline int add_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
+{
+	struct tsk_rmid_entry *entry;
+
+	entry = kzalloc(sizeof(struct tsk_rmid_entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&entry->list);
+	entry->rmid = rmid;
+
+	list_add_tail(&entry->list, l);
+
+	return 0;
+}
+
+static inline void del_cgrp_tskmon_entry(u32 *rmid, struct list_head *l)
+{
+	struct tsk_rmid_entry *entry = NULL, *tmp1;
+
+	list_for_each_entry_safe(entry, tmp1, l, list) {
+		if (entry->rmid == rmid) {
+
+			list_del(&entry->list);
+			kfree(entry);
+			break;
+		}
+	}
+}
+
 #ifdef CONFIG_CGROUP_PERF
 struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
 {
@@ -380,6 +410,49 @@ struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk)
 }
 #endif
 
+static inline void
+	cgrp_tskmon_update(struct task_struct *tsk, u32 *rmid, bool ena)
+{
+	struct cgrp_cqm_info *ccinfo = NULL;
+
+#ifdef CONFIG_CGROUP_PERF
+	ccinfo = cqminfo_from_tsk(tsk);
+#endif
+	if (!ccinfo)
+		return;
+
+	if (ena)
+		add_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
+	else
+		del_cgrp_tskmon_entry(rmid, &ccinfo->tskmon_rlist);
+}
+
+static int cqm_assign_task_rmid(struct perf_event *event, u32 *rmid)
+{
+	struct task_struct *tsk;
+	int ret = 0;
+
+	rcu_read_lock();
+	tsk = event->hw.target;
+	if (pid_alive(tsk)) {
+		get_task_struct(tsk);
+
+		if (rmid != NULL)
+			cgrp_tskmon_update(tsk, rmid, true);
+		else
+			cgrp_tskmon_update(tsk, tsk->rmid, false);
+
+		tsk->rmid = rmid;
+
+		put_task_struct(tsk);
+	} else {
+		ret = -EINVAL;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static inline void cqm_enable_mon(struct cgrp_cqm_info *cqm_info, u32 *rmid)
 {
 	if (rmid != NULL) {
@@ -429,8 +502,12 @@ static void cqm_assign_hier_rmid(struct cgroup_subsys_state *rcss, u32 *rmid)
 
 static int cqm_assign_rmid(struct perf_event *event, u32 *rmid)
 {
+	if (is_task_event(event)) {
+		if (cqm_assign_task_rmid(event, rmid))
+			return -EINVAL;
+	}
 #ifdef CONFIG_CGROUP_PERF
-	if (is_cgroup_event(event)) {
+	else if (is_cgroup_event(event)) {
 		cqm_assign_hier_rmid(&event->cgrp->css, rmid);
 	}
 #endif
@@ -631,6 +708,8 @@ static u64 cqm_read_subtree(struct perf_event *event, struct rmid_read *rr)
 
 	struct cgroup_subsys_state *rcss, *pos_css;
 	struct cgrp_cqm_info *ccqm_info;
+	struct tsk_rmid_entry *entry;
+	struct list_head *l;
 
 	cqm_mask_call_local(rr);
 	local64_set(&event->count, atomic64_read(&(rr->value)));
@@ -646,6 +725,13 @@ static u64 cqm_read_subtree(struct perf_event *event, struct rmid_read *rr)
 		/* Add the descendent 'monitored cgroup' counts */
 		if (pos_css != rcss && ccqm_info->mon_enabled)
 			delta_local(event, rr, ccqm_info->rmid);
+
+		/* Add your and descendent 'monitored task' counts */
+		if (!list_empty(&ccqm_info->tskmon_rlist)) {
+			l = &ccqm_info->tskmon_rlist;
+			list_for_each_entry(entry, l, list)
+				delta_local(event, rr, entry->rmid);
+		}
 	}
 	rcu_read_unlock();
 #endif
@@ -1096,10 +1182,55 @@ void perf_cgroup_arch_css_free(struct cgroup_subsys_state *css)
 	mutex_unlock(&cache_mutex);
 }
 
+/*
+ * Called while attaching/detaching task to a cgroup.
+ */
+static bool is_task_monitored(struct task_struct *tsk)
+{
+	return (tsk->rmid != NULL);
+}
+
 void perf_cgroup_arch_attach(struct cgroup_taskset *tset)
-{}
+{
+	struct cgroup_subsys_state *new_css;
+	struct cgrp_cqm_info *cqm_info;
+	struct task_struct *task;
+
+	mutex_lock(&cache_mutex);
+
+	cgroup_taskset_for_each(task, new_css, tset) {
+		if (!is_task_monitored(task))
+			continue;
+
+		cqm_info = cqminfo_from_tsk(task);
+		if (cqm_info)
+			add_cgrp_tskmon_entry(task->rmid,
+					     &cqm_info->tskmon_rlist);
+	}
+	mutex_unlock(&cache_mutex);
+}
+
 int perf_cgroup_arch_can_attach(struct cgroup_taskset *tset)
-{}
+{
+	struct cgroup_subsys_state *new_css;
+	struct cgrp_cqm_info *cqm_info;
+	struct task_struct *task;
+
+	mutex_lock(&cache_mutex);
+	cgroup_taskset_for_each(task, new_css, tset) {
+		if (!is_task_monitored(task))
+			continue;
+		cqm_info = cqminfo_from_tsk(task);
+
+		if (cqm_info)
+			del_cgrp_tskmon_entry(task->rmid,
+					     &cqm_info->tskmon_rlist);
+
+	}
+	mutex_unlock(&cache_mutex);
+
+	return 0;
+}
 #endif
 
 static inline void cqm_pick_event_reader(int cpu)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4d19052..8072583 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1824,6 +1824,9 @@ struct task_struct {
 #ifdef CONFIG_INTEL_RDT_A
 	int closid;
 #endif
+#ifdef CONFIG_INTEL_RDT_M
+	u32 *rmid;
+#endif
 #ifdef CONFIG_FUTEX
 	struct robust_list_head __user *robust_list;
 #ifdef CONFIG_COMPAT
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/12] x86/cqm: Add RMID reuse
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (7 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 10/12] perf/core,x86/cqm: Add read for Cgroup events,per pkg reads Vikas Shivappa
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

When an RMID is freed by an event it cannot be reused immediately as the
RMID may still have some cache occupancy. Hence when an RMID is freed it
goes into limbo list and not free list. This patch provides support to
periodically check the occupancy values of such RMIDs and move them to
the free list once its occupancy < threshold_occupancy value. The
threshold occupancy value can be modified by user based on his
requirements.

Tests: Before the patch, task monitoring would just throw error once
RMIDs are used in the lifetime of systemboot.
After this patch, we would be able to reuse the RMIDs that are freed.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index e1765e3..92efe12 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -174,6 +174,13 @@ u32 __get_rmid(int domain)
 	return entry->rmid;
 }
 
+static void cqm_schedule_rmidwork(int domain);
+
+static inline bool is_first_cqmwork(int domain)
+{
+	return (!atomic_cmpxchg(&cqm_pkgs_data[domain]->reuse_scheduled, 0, 1));
+}
+
 static void __put_rmid(u32 rmid, int domain)
 {
 	struct cqm_rmid_entry *entry;
@@ -294,6 +301,93 @@ static void cqm_mask_call(struct rmid_read *rr)
 static unsigned int __intel_cqm_threshold;
 static unsigned int __intel_cqm_max_threshold;
 
+/*
+ * Test whether an RMID has a zero occupancy value on this cpu.
+ */
+static void intel_cqm_stable(void)
+{
+	struct cqm_rmid_entry *entry;
+	struct list_head *llist;
+
+	llist = &cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru;
+	list_for_each_entry(entry, llist, list) {
+
+		if (__rmid_read(entry->rmid) < __intel_cqm_threshold)
+			entry->state = RMID_AVAILABLE;
+	}
+}
+
+static void __intel_cqm_rmid_reuse(void)
+{
+	struct cqm_rmid_entry *entry, *tmp;
+	struct list_head *llist, *flist;
+	struct pkg_data *pdata;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&cache_lock, flags);
+	pdata = cqm_pkgs_data[pkg_id];
+	llist = &pdata->cqm_rmid_limbo_lru;
+	flist = &pdata->cqm_rmid_free_lru;
+
+	if (list_empty(llist))
+		goto end;
+	/*
+	 * Test whether an RMID is free
+	 */
+	intel_cqm_stable();
+
+	list_for_each_entry_safe(entry, tmp, llist, list) {
+
+		if (entry->state == RMID_DIRTY)
+			continue;
+		/*
+		 * Otherwise remove from limbo and place it onto the free list.
+		 */
+		list_del(&entry->list);
+		list_add_tail(&entry->list, flist);
+	}
+
+end:
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+}
+
+static bool reschedule_cqm_work(void)
+{
+	unsigned long flags;
+	bool nwork = false;
+
+	raw_spin_lock_irqsave(&cache_lock, flags);
+
+	if (!list_empty(&cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru))
+		nwork = true;
+	else
+		atomic_set(&cqm_pkgs_data[pkg_id]->reuse_scheduled, 0U);
+
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+
+	return nwork;
+}
+
+static void cqm_schedule_rmidwork(int domain)
+{
+	struct delayed_work *dwork;
+	unsigned long delay;
+
+	dwork = &cqm_pkgs_data[domain]->intel_cqm_rmid_work;
+	delay = msecs_to_jiffies(RMID_DEFAULT_QUEUE_TIME);
+
+	schedule_delayed_work_on(cqm_pkgs_data[domain]->rmid_work_cpu,
+			     dwork, delay);
+}
+
+static void intel_cqm_rmid_reuse(struct work_struct *work)
+{
+	__intel_cqm_rmid_reuse();
+
+	if (reschedule_cqm_work())
+		cqm_schedule_rmidwork(pkg_id);
+}
+
 static struct pmu intel_cqm_pmu;
 
 static u64 update_sample(unsigned int rmid, u32 evt_type, int first)
@@ -548,6 +642,8 @@ static int intel_cqm_setup_event(struct perf_event *event,
 	if (!event->hw.cqm_rmid)
 		return -ENOMEM;
 
+	cqm_assign_rmid(event, event->hw.cqm_rmid);
+
 	return 0;
 }
 
@@ -863,6 +959,7 @@ static void intel_cqm_event_terminate(struct perf_event *event)
 {
 	struct perf_event *group_other = NULL;
 	unsigned long flags;
+	int d;
 
 	mutex_lock(&cache_mutex);
 	/*
@@ -905,6 +1002,13 @@ static void intel_cqm_event_terminate(struct perf_event *event)
 		mbm_stop_timers();
 
 	mutex_unlock(&cache_mutex);
+
+	for (d = 0; d < cqm_socket_max; d++) {
+
+		if (cqm_pkgs_data[d] != NULL && is_first_cqmwork(d)) {
+			cqm_schedule_rmidwork(d);
+		}
+	}
 }
 
 static int intel_cqm_event_init(struct perf_event *event)
@@ -1322,6 +1426,9 @@ static int pkg_data_init_cpu(int cpu)
 	mutex_init(&pkg_data->pkg_data_mutex);
 	raw_spin_lock_init(&pkg_data->pkg_data_lock);
 
+	INIT_DEFERRABLE_WORK(
+		&pkg_data->intel_cqm_rmid_work, intel_cqm_rmid_reuse);
+
 	pkg_data->rmid_work_cpu = cpu;
 
 	nr_rmids = cqm_max_rmid + 1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/12] perf/core,x86/cqm: Add read for Cgroup events,per pkg reads.
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (8 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 09/12] x86/cqm: Add RMID reuse Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 11/12] perf/stat: fix bug in handling events in error state Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 12/12] perf/stat: revamp read error handling, snapshot and per_pkg events Vikas Shivappa
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

For cqm cgroup events, the events can be read even if the event was not
active on the cpu on which the event is being read. This is because the
RMIDs are per package and hence if we read the llc_occupancy value on a
cpu x, we are really reading the occupancy for the package where cpu x
belongs.

This patch adds a PERF_INACTIVE_CPU_READ_PKG to indicate this behaviour
of cqm and also changes the perf/core to still call the reads even when
the event is inactive on the cpu for cgroup events. The task events have
event->cpu as -1 and hence it does not apply for task events.

Tests: perf stat -C <cpux> would not return a count before this patch to
the perf/core.  After this patch the count of the package is returned to
the perf/core. We still dont see the count in the perf user mode - that
is fixed in next patches.

Patch is based on David Carrillo-Cisneros <davidcc@google.com> patches
in cqm2 series.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c             | 31 ++++++++++++++++++++++++-------
 arch/x86/include/asm/intel_rdt_common.h |  2 +-
 include/linux/perf_event.h              | 19 ++++++++++++++++---
 kernel/events/core.c                    | 16 ++++++++++++----
 4 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 92efe12..3f5860c 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -111,6 +111,13 @@ bool __rmid_valid(u32 rmid)
 	return true;
 }
 
+static inline bool __rmid_valid_raw(u32 rmid)
+{
+	if (rmid > cqm_max_rmid)
+		return false;
+	return true;
+}
+
 static u64 __rmid_read(u32 rmid)
 {
 	u64 val;
@@ -884,16 +891,16 @@ static u64 intel_cqm_event_count(struct perf_event *event)
 	return __perf_event_count(event);
 }
 
-void alloc_needed_pkg_rmid(u32 *cqm_rmid)
+u32 alloc_needed_pkg_rmid(u32 *cqm_rmid)
 {
 	unsigned long flags;
 	u32 rmid;
 
 	if (WARN_ON(!cqm_rmid))
-		return;
+		return -EINVAL;
 
 	if (cqm_rmid == cqm_rootcginfo.rmid || cqm_rmid[pkg_id])
-		return;
+		return 0;
 
 	raw_spin_lock_irqsave(&cache_lock, flags);
 
@@ -902,6 +909,8 @@ void alloc_needed_pkg_rmid(u32 *cqm_rmid)
 		cqm_rmid[pkg_id] = rmid;
 
 	raw_spin_unlock_irqrestore(&cache_lock, flags);
+
+	return rmid;
 }
 
 static void intel_cqm_event_start(struct perf_event *event, int mode)
@@ -913,10 +922,8 @@ static void intel_cqm_event_start(struct perf_event *event, int mode)
 
 	event->hw.cqm_state &= ~PERF_HES_STOPPED;
 
-	if (is_task_event(event)) {
-		alloc_needed_pkg_rmid(event->hw.cqm_rmid);
+	if (is_task_event(event))
 		state->next_task_rmid = event->hw.cqm_rmid[pkg_id];
-	}
 }
 
 static void intel_cqm_event_stop(struct perf_event *event, int mode)
@@ -932,10 +939,19 @@ static void intel_cqm_event_stop(struct perf_event *event, int mode)
 
 static int intel_cqm_event_add(struct perf_event *event, int mode)
 {
+	u32 rmid;
+
 	event->hw.cqm_state = PERF_HES_STOPPED;
 
-	if ((mode & PERF_EF_START))
+	/*
+	 * If Lazy RMID alloc fails indicate the error to the user.
+	*/
+	if ((mode & PERF_EF_START)) {
+		rmid = alloc_needed_pkg_rmid(event->hw.cqm_rmid);
+		if (!__rmid_valid_raw(rmid))
+			return -EINVAL;
 		intel_cqm_event_start(event, mode);
+	}
 
 	return 0;
 }
@@ -1048,6 +1064,7 @@ static int intel_cqm_event_init(struct perf_event *event)
 	 * cgroup hierarchies.
 	 */
 	event->event_caps |= PERF_EV_CAP_CGROUP_NO_RECURSION;
+	event->event_caps |= PERF_EV_CAP_INACTIVE_CPU_READ_PKG;
 
 	mutex_lock(&cache_mutex);
 
diff --git a/arch/x86/include/asm/intel_rdt_common.h b/arch/x86/include/asm/intel_rdt_common.h
index 544acaa..fcaaaeb 100644
--- a/arch/x86/include/asm/intel_rdt_common.h
+++ b/arch/x86/include/asm/intel_rdt_common.h
@@ -27,7 +27,7 @@ struct intel_pqr_state {
 
 u32 __get_rmid(int domain);
 bool __rmid_valid(u32 rmid);
-void alloc_needed_pkg_rmid(u32 *cqm_rmid);
+u32 alloc_needed_pkg_rmid(u32 *cqm_rmid);
 struct cgrp_cqm_info *cqminfo_from_tsk(struct task_struct *tsk);
 
 extern struct cgrp_cqm_info cqm_rootcginfo;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 410642a..adfddec 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -525,10 +525,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_CGROUP_NO_RECURSION: A cgroup event that handles its own
  * cgroup scoping. It does not need to be enabled for all of its descendants
  * cgroups.
+ * PERF_EV_CAP_INACTIVE_CPU_READ_PKG: A cgroup event where we can read
+ * the package count on any cpu on the pkg even if inactive.
  */
-#define PERF_EV_CAP_SOFTWARE		BIT(0)
-#define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
-#define PERF_EV_CAP_CGROUP_NO_RECURSION	BIT(2)
+#define PERF_EV_CAP_SOFTWARE                    BIT(0)
+#define PERF_EV_CAP_READ_ACTIVE_PKG             BIT(1)
+#define PERF_EV_CAP_CGROUP_NO_RECURSION         BIT(2)
+#define PERF_EV_CAP_INACTIVE_CPU_READ_PKG       BIT(3)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
@@ -722,6 +725,16 @@ struct perf_event {
 #endif /* CONFIG_PERF_EVENTS */
 };
 
+#ifdef CONFIG_PERF_EVENTS
+static inline bool __perf_can_read_inactive(struct perf_event *event)
+{
+	if ((event->group_caps & PERF_EV_CAP_INACTIVE_CPU_READ_PKG))
+		return true;
+
+	return false;
+}
+#endif
+
 /**
  * struct perf_event_context - event context structure
  *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 229f611..e71ca66 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3443,9 +3443,13 @@ struct perf_read_data {
 
 static int find_cpu_to_read(struct perf_event *event, int local_cpu)
 {
+	bool active = event->state == PERF_EVENT_STATE_ACTIVE;
 	int event_cpu = event->oncpu;
 	u16 local_pkg, event_pkg;
 
+	if (__perf_can_read_inactive(event) && !active)
+		event_cpu = event->cpu;
+
 	if (event->group_caps & PERF_EV_CAP_READ_ACTIVE_PKG) {
 		event_pkg =  topology_physical_package_id(event_cpu);
 		local_pkg =  topology_physical_package_id(local_cpu);
@@ -3467,6 +3471,7 @@ static void __perf_event_read(void *info)
 	struct perf_event_context *ctx = event->ctx;
 	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 	struct pmu *pmu = event->pmu;
+	bool read_inactive = __perf_can_read_inactive(event);
 
 	/*
 	 * If this is a task context, we need to check whether it is
@@ -3475,7 +3480,7 @@ static void __perf_event_read(void *info)
 	 * event->count would have been updated to a recent sample
 	 * when the event was scheduled out.
 	 */
-	if (ctx->task && cpuctx->task_ctx != ctx)
+	if (ctx->task && cpuctx->task_ctx != ctx && !read_inactive)
 		return;
 
 	raw_spin_lock(&ctx->lock);
@@ -3485,7 +3490,7 @@ static void __perf_event_read(void *info)
 	}
 
 	update_event_times(event);
-	if (event->state != PERF_EVENT_STATE_ACTIVE)
+	if (ctx->task && cpuctx->task_ctx != ctx && !read_inactive)
 		goto unlock;
 
 	if (!data->group) {
@@ -3500,7 +3505,8 @@ static void __perf_event_read(void *info)
 
 	list_for_each_entry(sub, &event->sibling_list, group_entry) {
 		update_event_times(sub);
-		if (sub->state == PERF_EVENT_STATE_ACTIVE) {
+		if (sub->state == PERF_EVENT_STATE_ACTIVE ||
+		    __perf_can_read_inactive(sub)) {
 			/*
 			 * Use sibling's PMU rather than @event's since
 			 * sibling could be on different (eg: software) PMU.
@@ -3578,13 +3584,15 @@ u64 perf_event_read_local(struct perf_event *event)
 
 static int perf_event_read(struct perf_event *event, bool group)
 {
+	bool active = event->state == PERF_EVENT_STATE_ACTIVE;
 	int ret = 0, cpu_to_read, local_cpu;
 
 	/*
 	 * If event is enabled and currently active on a CPU, update the
 	 * value in the event structure:
 	 */
-	if (event->state == PERF_EVENT_STATE_ACTIVE) {
+	if (active || __perf_can_read_inactive(event)) {
+
 		struct perf_read_data data = {
 			.event = event,
 			.group = group,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 11/12] perf/stat: fix bug in handling events in error state
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (9 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 10/12] perf/core,x86/cqm: Add read for Cgroup events,per pkg reads Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  2017-01-06 21:56 ` [PATCH 12/12] perf/stat: revamp read error handling, snapshot and per_pkg events Vikas Shivappa
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: Stephane Eranian <eranian@google.com>

When an event is in error state, read() returns 0
instead of sizeof() buffer. In certain modes, such
as interval printing, ignoring the 0 return value
may cause bogus count deltas to be computed and
thus invalid results printed.

this patch fixes this problem by modifying read_counters()
to mark the event as not scaled (scaled = -1) to force
the printout routine to show <NOT COUNTED>.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 tools/perf/builtin-stat.c | 12 +++++++++---
 tools/perf/util/evsel.c   |  4 ++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a02f2e9..9f109b9 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -310,8 +310,12 @@ static int read_counter(struct perf_evsel *counter)
 			struct perf_counts_values *count;
 
 			count = perf_counts(counter->counts, cpu, thread);
-			if (perf_evsel__read(counter, cpu, thread, count))
+			if (perf_evsel__read(counter, cpu, thread, count)) {
+				counter->counts->scaled = -1;
+				perf_counts(counter->counts, cpu, thread)->ena = 0;
+				perf_counts(counter->counts, cpu, thread)->run = 0;
 				return -1;
+			}
 
 			if (STAT_RECORD) {
 				if (perf_evsel__write_stat_event(counter, cpu, thread, count)) {
@@ -336,12 +340,14 @@ static int read_counter(struct perf_evsel *counter)
 static void read_counters(void)
 {
 	struct perf_evsel *counter;
+	int ret;
 
 	evlist__for_each_entry(evsel_list, counter) {
-		if (read_counter(counter))
+		ret = read_counter(counter);
+		if (ret)
 			pr_debug("failed to read counter %s\n", counter->name);
 
-		if (perf_stat_process_counter(&stat_config, counter))
+		if (ret == 0 && perf_stat_process_counter(&stat_config, counter))
 			pr_warning("failed to process counter %s\n", counter->name);
 	}
 }
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 04e536a..7aa10e3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1232,7 +1232,7 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 	if (FD(evsel, cpu, thread) < 0)
 		return -EINVAL;
 
-	if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) < 0)
+	if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) <= 0)
 		return -errno;
 
 	return 0;
@@ -1250,7 +1250,7 @@ int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 	if (evsel->counts == NULL && perf_evsel__alloc_counts(evsel, cpu + 1, thread + 1) < 0)
 		return -ENOMEM;
 
-	if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) < 0)
+	if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) <= 0)
 		return -errno;
 
 	perf_evsel__compute_deltas(evsel, cpu, thread, &count);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 12/12] perf/stat: revamp read error handling, snapshot and per_pkg events
  2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
                   ` (10 preceding siblings ...)
  2017-01-06 21:56 ` [PATCH 11/12] perf/stat: fix bug in handling events in error state Vikas Shivappa
@ 2017-01-06 21:56 ` Vikas Shivappa
  11 siblings, 0 replies; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 21:56 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: linux-kernel, x86, hpa, tglx, mingo, peterz, ravi.v.shankar,
	tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

From: David Carrillo-Cisneros <davidcc@google.com>

A package wide event can return a valid read even if it has not run in a
specific cpu, this does not fit well with the assumption that run == 0
is equivalent to a <not counted>.

To fix the problem, this patch defines special error values for val,
run and ena (~0ULL), and use them to signal read errors, allowing run == 0
to be a valid value for package events. A new value, (NA), is output on
read error and when event has not been enabled (timed enabled == 0).

Finally, this patch revamps calculation of deltas and scaling for snapshot
events, removing the calculation of deltas for time running and enabled in
snapshot event, as should be.

Tests: After this patch, the user mode can see the package llc_occupancy
count when user calls per stat -C <cpux> -G cgroup_y and the systemwide
count when run with -a option for the cgroup

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 tools/perf/builtin-stat.c | 36 +++++++++++++++++++++++-----------
 tools/perf/util/counts.h  | 19 ++++++++++++++++++
 tools/perf/util/evsel.c   | 49 ++++++++++++++++++++++++++++++++++++-----------
 tools/perf/util/evsel.h   |  8 ++++++--
 tools/perf/util/stat.c    | 35 +++++++++++----------------------
 5 files changed, 99 insertions(+), 48 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9f109b9..69dd3a4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -311,10 +311,8 @@ static int read_counter(struct perf_evsel *counter)
 
 			count = perf_counts(counter->counts, cpu, thread);
 			if (perf_evsel__read(counter, cpu, thread, count)) {
-				counter->counts->scaled = -1;
-				perf_counts(counter->counts, cpu, thread)->ena = 0;
-				perf_counts(counter->counts, cpu, thread)->run = 0;
-				return -1;
+				/* do not write stat for failed reads. */
+				continue;
 			}
 
 			if (STAT_RECORD) {
@@ -725,12 +723,16 @@ static int run_perf_stat(int argc, const char **argv)
 
 static void print_running(u64 run, u64 ena)
 {
+	bool is_na = run == PERF_COUNTS_NA || ena == PERF_COUNTS_NA || !ena;
+
 	if (csv_output) {
-		fprintf(stat_config.output, "%s%" PRIu64 "%s%.2f",
-					csv_sep,
-					run,
-					csv_sep,
-					ena ? 100.0 * run / ena : 100.0);
+		if (is_na)
+			fprintf(stat_config.output, "%sNA%sNA", csv_sep, csv_sep);
+		else
+			fprintf(stat_config.output, "%s%" PRIu64 "%s%.2f",
+				csv_sep, run, csv_sep, 100.0 * run / ena);
+	} else if (is_na) {
+		fprintf(stat_config.output, "  (NA)");
 	} else if (run != ena) {
 		fprintf(stat_config.output, "  (%.2f%%)", 100.0 * run / ena);
 	}
@@ -1103,7 +1105,7 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
 		if (counter->cgrp)
 			os.nfields++;
 	}
-	if (run == 0 || ena == 0 || counter->counts->scaled == -1) {
+	if (run == PERF_COUNTS_NA || ena == PERF_COUNTS_NA || counter->counts->scaled == -1) {
 		if (metric_only) {
 			pm(&os, NULL, "", "", 0);
 			return;
@@ -1209,12 +1211,17 @@ static void print_aggr(char *prefix)
 		id = aggr_map->map[s];
 		first = true;
 		evlist__for_each_entry(evsel_list, counter) {
+			bool all_nan = true;
 			val = ena = run = 0;
 			nr = 0;
 			for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
 				s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
 				if (s2 != id)
 					continue;
+				/* skip NA reads. */
+				if (perf_counts_values__is_na(perf_counts(counter->counts, cpu, 0)))
+					continue;
+				all_nan = false;
 				val += perf_counts(counter->counts, cpu, 0)->val;
 				ena += perf_counts(counter->counts, cpu, 0)->ena;
 				run += perf_counts(counter->counts, cpu, 0)->run;
@@ -1228,6 +1235,10 @@ static void print_aggr(char *prefix)
 				fprintf(output, "%s", prefix);
 
 			uval = val * counter->scale;
+			if (all_nan) {
+				run = PERF_COUNTS_NA;
+				ena = PERF_COUNTS_NA;
+			}
 			printout(id, nr, counter, uval, prefix, run, ena, 1.0);
 			if (!metric_only)
 				fputc('\n', output);
@@ -1306,7 +1317,10 @@ static void print_counter(struct perf_evsel *counter, char *prefix)
 		if (prefix)
 			fprintf(output, "%s", prefix);
 
-		uval = val * counter->scale;
+		if (val != PERF_COUNTS_NA)
+			uval = val * counter->scale;
+		else
+			uval = NAN;
 		printout(cpu, 0, counter, uval, prefix, run, ena, 1.0);
 
 		fputc('\n', output);
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 34d8baa..b65e97a 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -3,6 +3,9 @@
 
 #include "xyarray.h"
 
+/* Not Available (NA) value. Any operation with a NA equals a NA. */
+#define PERF_COUNTS_NA ((u64)~0ULL)
+
 struct perf_counts_values {
 	union {
 		struct {
@@ -14,6 +17,22 @@ struct perf_counts_values {
 	};
 };
 
+static inline void
+perf_counts_values__make_na(struct perf_counts_values *values)
+{
+	values->val = PERF_COUNTS_NA;
+	values->ena = PERF_COUNTS_NA;
+	values->run = PERF_COUNTS_NA;
+}
+
+static inline bool
+perf_counts_values__is_na(struct perf_counts_values *values)
+{
+	return values->val == PERF_COUNTS_NA ||
+	       values->ena == PERF_COUNTS_NA ||
+	       values->run == PERF_COUNTS_NA;
+}
+
 struct perf_counts {
 	s8			  scaled;
 	struct perf_counts_values aggr;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7aa10e3..bbcdcd7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1191,6 +1191,9 @@ void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu, int thread,
 	if (!evsel->prev_raw_counts)
 		return;
 
+	if (perf_counts_values__is_na(count))
+		return;
+
 	if (cpu == -1) {
 		tmp = evsel->prev_raw_counts->aggr;
 		evsel->prev_raw_counts->aggr = *count;
@@ -1199,26 +1202,43 @@ void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu, int thread,
 		*perf_counts(evsel->prev_raw_counts, cpu, thread) = *count;
 	}
 
-	count->val = count->val - tmp.val;
+	/* Snapshot events do not calculate deltas for count values. */
+	if (!evsel->snapshot)
+		count->val = count->val - tmp.val;
 	count->ena = count->ena - tmp.ena;
 	count->run = count->run - tmp.run;
 }
 
 void perf_counts_values__scale(struct perf_counts_values *count,
-			       bool scale, s8 *pscaled)
+			       bool scale, bool per_pkg, bool snapshot, s8 *pscaled)
 {
 	s8 scaled = 0;
 
+	if (perf_counts_values__is_na(count)) {
+		if (pscaled)
+			*pscaled = -1;
+		return;
+	}
+
 	if (scale) {
-		if (count->run == 0) {
+		/*
+		 * per-pkg events can have run == 0 in a CPU and still be
+		 * valid.
+		 */
+		if (count->run == 0 && !per_pkg) {
 			scaled = -1;
 			count->val = 0;
 		} else if (count->run < count->ena) {
 			scaled = 1;
-			count->val = (u64)((double) count->val * count->ena / count->run + 0.5);
+			/* Snapshot events do not scale counts values. */
+			if (!snapshot && count->run)
+				count->val = (u64)((double) count->val * count->ena /
+					     count->run + 0.5);
 		}
-	} else
-		count->ena = count->run = 0;
+
+	} else {
+		count->run = count->ena;
+	}
 
 	if (pscaled)
 		*pscaled = scaled;
@@ -1232,8 +1252,10 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 	if (FD(evsel, cpu, thread) < 0)
 		return -EINVAL;
 
-	if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) <= 0)
+	if (readn(FD(evsel, cpu, thread), count, sizeof(*count)) <= 0) {
+		perf_counts_values__make_na(count);
 		return -errno;
+	}
 
 	return 0;
 }
@@ -1241,6 +1263,7 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 			      int cpu, int thread, bool scale)
 {
+	int ret = 0;
 	struct perf_counts_values count;
 	size_t nv = scale ? 3 : 1;
 
@@ -1250,13 +1273,17 @@ int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 	if (evsel->counts == NULL && perf_evsel__alloc_counts(evsel, cpu + 1, thread + 1) < 0)
 		return -ENOMEM;
 
-	if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) <= 0)
-		return -errno;
+	if (readn(FD(evsel, cpu, thread), &count, nv * sizeof(u64)) <= 0) {
+		perf_counts_values__make_na(&count);
+		ret = -errno;
+		goto exit;
+	}
 
 	perf_evsel__compute_deltas(evsel, cpu, thread, &count);
-	perf_counts_values__scale(&count, scale, NULL);
+	perf_counts_values__scale(&count, scale, evsel->per_pkg, evsel->snapshot, NULL);
+exit:
 	*perf_counts(evsel->counts, cpu, thread) = count;
-	return 0;
+	return ret;
 }
 
 static int get_group_fd(struct perf_evsel *evsel, int cpu, int thread)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 06ef6f2..9788ca1 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -82,6 +82,10 @@ struct perf_evsel_config_term {
  * @is_pos: the position (counting backwards) of the event id (PERF_SAMPLE_ID or
  *          PERF_SAMPLE_IDENTIFIER) in a non-sample event i.e. if sample_id_all
  *          is used there is an id sample appended to non-sample events
+ * @snapshot: an event that whose raw value cannot be extrapolated based on
+ *	    the ratio of running/enabled time.
+ * @per_pkg: an event that runs package wide. All cores in same package will
+ *	    read the same value, even if running time == 0.
  * @priv:   And what is in its containing unnamed union are tool specific
  */
 struct perf_evsel {
@@ -153,8 +157,8 @@ static inline int perf_evsel__nr_cpus(struct perf_evsel *evsel)
 	return perf_evsel__cpus(evsel)->nr;
 }
 
-void perf_counts_values__scale(struct perf_counts_values *count,
-			       bool scale, s8 *pscaled);
+void perf_counts_values__scale(struct perf_counts_values *count, bool scale,
+			       bool per_pkg, bool snapshot, s8 *pscaled);
 
 void perf_evsel__compute_deltas(struct perf_evsel *evsel, int cpu, int thread,
 				struct perf_counts_values *count);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 39345c2d..514b953 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -202,7 +202,7 @@ static void zero_per_pkg(struct perf_evsel *counter)
 }
 
 static int check_per_pkg(struct perf_evsel *counter,
-			 struct perf_counts_values *vals, int cpu, bool *skip)
+			 int cpu, bool *skip)
 {
 	unsigned long *mask = counter->per_pkg_mask;
 	struct cpu_map *cpus = perf_evsel__cpus(counter);
@@ -224,17 +224,6 @@ static int check_per_pkg(struct perf_evsel *counter,
 		counter->per_pkg_mask = mask;
 	}
 
-	/*
-	 * we do not consider an event that has not run as a good
-	 * instance to mark a package as used (skip=1). Otherwise
-	 * we may run into a situation where the first CPU in a package
-	 * is not running anything, yet the second is, and this function
-	 * would mark the package as used after the first CPU and would
-	 * not read the values from the second CPU.
-	 */
-	if (!(vals->run && vals->ena))
-		return 0;
-
 	s = cpu_map__get_socket(cpus, cpu, NULL);
 	if (s < 0)
 		return -1;
@@ -249,30 +238,27 @@ static int check_per_pkg(struct perf_evsel *counter,
 		       struct perf_counts_values *count)
 {
 	struct perf_counts_values *aggr = &evsel->counts->aggr;
-	static struct perf_counts_values zero;
 	bool skip = false;
 
-	if (check_per_pkg(evsel, count, cpu, &skip)) {
+	if (check_per_pkg(evsel, cpu, &skip)) {
 		pr_err("failed to read per-pkg counter\n");
 		return -1;
 	}
 
-	if (skip)
-		count = &zero;
-
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
 	case AGGR_SOCKET:
 	case AGGR_NONE:
-		if (!evsel->snapshot)
-			perf_evsel__compute_deltas(evsel, cpu, thread, count);
-		perf_counts_values__scale(count, config->scale, NULL);
+		perf_evsel__compute_deltas(evsel, cpu, thread, count);
+		perf_counts_values__scale(count, config->scale,
+					  evsel->per_pkg, evsel->snapshot, NULL);
 		if (config->aggr_mode == AGGR_NONE)
 			perf_stat__update_shadow_stats(evsel, count->values, cpu);
 		break;
 	case AGGR_GLOBAL:
-		aggr->val += count->val;
+		if (!skip)
+			aggr->val += count->val;
 		if (config->scale) {
 			aggr->ena += count->ena;
 			aggr->run += count->run;
@@ -337,9 +323,10 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (config->aggr_mode != AGGR_GLOBAL)
 		return 0;
 
-	if (!counter->snapshot)
-		perf_evsel__compute_deltas(counter, -1, -1, aggr);
-	perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
+	perf_evsel__compute_deltas(counter, -1, -1, aggr);
+	perf_counts_values__scale(aggr, config->scale,
+				  counter->per_pkg, counter->snapshot,
+				  &counter->counts->scaled);
 
 	for (i = 0; i < 3; i++)
 		update_stats(&ps->res_stats[i], count[i]);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 09/12] x86/cqm: Add RMID reuse
  2017-01-17 16:59   ` Thomas Gleixner
@ 2017-01-18  0:26     ` Shivappa Vikas
  0 siblings, 0 replies; 16+ messages in thread
From: Shivappa Vikas @ 2017-01-18  0:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Vikas Shivappa, vikas.shivappa, davidcc, eranian, linux-kernel,
	x86, hpa, mingo, peterz, ravi.v.shankar, tony.luck, fenghua.yu,
	andi.kleen, h.peter.anvin



On Tue, 17 Jan 2017, Thomas Gleixner wrote:

> On Fri, 6 Jan 2017, Vikas Shivappa wrote:
>> +static void cqm_schedule_rmidwork(int domain);
>
> This forward declaration is required because all callers of that function
> are coming _after_ the function implementation, right?
>
>> +static inline bool is_first_cqmwork(int domain)
>> +{
>> +	return (!atomic_cmpxchg(&cqm_pkgs_data[domain]->reuse_scheduled, 0, 1));
>
> What's the purpose of these outer brackets? Enhanced readability?

Will fix the coding styles and ordering of function declarations

>
>> +}
>> +
>>  static void __put_rmid(u32 rmid, int domain)
>>  {
>>  	struct cqm_rmid_entry *entry;
>> @@ -294,6 +301,93 @@ static void cqm_mask_call(struct rmid_read *rr)
>>  static unsigned int __intel_cqm_threshold;
>>  static unsigned int __intel_cqm_max_threshold;
>>
>> +/*
>> + * Test whether an RMID has a zero occupancy value on this cpu.
>
> This tests whether the occupancy value is less than
> __intel_cqm_threshold. Unless I'm missing something the value can be set by
> user space and therefor is not necessarily zero.
>
> Your commentry is really useful: It's either wrong or superflous or non
> existent.

Will fix , yes its the user configurable value.

>
>> + */
>> +static void intel_cqm_stable(void)
>> +{
>> +	struct cqm_rmid_entry *entry;
>> +	struct list_head *llist;
>> +
>> +	llist = &cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru;
>> +	list_for_each_entry(entry, llist, list) {
>> +
>> +		if (__rmid_read(entry->rmid) < __intel_cqm_threshold)
>> +			entry->state = RMID_AVAILABLE;
>> +	}
>> +}
>> +
>> +static void __intel_cqm_rmid_reuse(void)
>> +{
>> +	struct cqm_rmid_entry *entry, *tmp;
>> +	struct list_head *llist, *flist;
>> +	struct pkg_data *pdata;
>> +	unsigned long flags;
>> +
>> +	raw_spin_lock_irqsave(&cache_lock, flags);
>> +	pdata = cqm_pkgs_data[pkg_id];
>> +	llist = &pdata->cqm_rmid_limbo_lru;
>> +	flist = &pdata->cqm_rmid_free_lru;
>> +
>> +	if (list_empty(llist))
>> +		goto end;
>> +	/*
>> +	 * Test whether an RMID is free
>> +	 */
>> +	intel_cqm_stable();
>> +
>> +	list_for_each_entry_safe(entry, tmp, llist, list) {
>> +
>> +		if (entry->state == RMID_DIRTY)
>> +			continue;
>> +		/*
>> +		 * Otherwise remove from limbo and place it onto the free list.
>> +		 */
>> +		list_del(&entry->list);
>> +		list_add_tail(&entry->list, flist);
>
> This is really a performance optimized implementation. Iterate the limbo
> list first and check the occupancy value, conditionally update the state
> and then iterate the same list and conditionally move the entries which got
> just updated to the free list. Of course all of this happens with
> interrupts disabled and a global lock held to enhance it further.

Will fix this. This can really use the per pkg lock as the cache_lock global 
one is protecting the event list. Also yes, dont need two loops now as its per 
package.

>
>> +	}
>> +
>> +end:
>> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
>> +}
>> +
>> +static bool reschedule_cqm_work(void)
>> +{
>> +	unsigned long flags;
>> +	bool nwork = false;
>> +
>> +	raw_spin_lock_irqsave(&cache_lock, flags);
>> +
>> +	if (!list_empty(&cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru))
>> +		nwork = true;
>> +	else
>> +		atomic_set(&cqm_pkgs_data[pkg_id]->reuse_scheduled, 0U);
>
> 0U because the 'val' argument of atomic_set() is 'int', right?

Will fix

>
>> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
>> +
>> +	return nwork;
>> +}
>> +
>> +static void cqm_schedule_rmidwork(int domain)
>> +{
>> +	struct delayed_work *dwork;
>> +	unsigned long delay;
>> +
>> +	dwork = &cqm_pkgs_data[domain]->intel_cqm_rmid_work;
>> +	delay = msecs_to_jiffies(RMID_DEFAULT_QUEUE_TIME);
>> +
>> +	schedule_delayed_work_on(cqm_pkgs_data[domain]->rmid_work_cpu,
>> +			     dwork, delay);
>> +}
>> +
>> +static void intel_cqm_rmid_reuse(struct work_struct *work)
>
> Your naming conventions are really random. cqm_* intel_cqm_* and then
> totaly random function names. Is there any deeper thought involved or is it
> just what it looks like: random?
>
>> +{
>> +	__intel_cqm_rmid_reuse();
>> +
>> +	if (reschedule_cqm_work())
>> +		cqm_schedule_rmidwork(pkg_id);
>
> Great stuff. You first try to move the limbo RMIDs to the free list and
> then you reevaluate the limbo list again. Thereby acquiring and dropping
> the global cache lock several times.
>
> Dammit, the stupid reuse function already knows whether the list is empty
> or not. But returning that information would make too much sense.

Will fix. Dont need the global lock also.

>
>> +}
>> +
>>  static struct pmu intel_cqm_pmu;
>>
>>  static u64 update_sample(unsigned int rmid, u32 evt_type, int first)
>> @@ -548,6 +642,8 @@ static int intel_cqm_setup_event(struct perf_event *event,
>>  	if (!event->hw.cqm_rmid)
>>  		return -ENOMEM;
>>
>> +	cqm_assign_rmid(event, event->hw.cqm_rmid);
>
> Oh, so now there is a second user which calls cqm_assign_rmid(). Finally it
> might actually do something. Whether that something is useful and
> functional, I seriously doubt it.
>
>> +
>>  	return 0;
>>  }
>>
>> @@ -863,6 +959,7 @@ static void intel_cqm_event_terminate(struct perf_event *event)
>>  {
>>  	struct perf_event *group_other = NULL;
>>  	unsigned long flags;
>> +	int d;
>>
>>  	mutex_lock(&cache_mutex);
>>  	/*
>> @@ -905,6 +1002,13 @@ static void intel_cqm_event_terminate(struct perf_event *event)
>>  		mbm_stop_timers();
>>
>>  	mutex_unlock(&cache_mutex);
>> +
>> +	for (d = 0; d < cqm_socket_max; d++) {
>> +
>> +		if (cqm_pkgs_data[d] != NULL && is_first_cqmwork(d)) {
>> +			cqm_schedule_rmidwork(d);
>> +		}
>
> Let's look at the logic of this.
>
> When the event terminates, then you unconditionally schedule the rmid work
> on ALL packages whether the event was freed or not. Really brilliant.
>
> There is no reason to schedule anything if the event was never using any
> rmid on a given package or if the RMIDs are not freed because there is a
> new group leader.

Will fix to not schedule if event did not use the package.

>
> The NOHZ FULL people will love that extra work activity for nothing.
>
> Also the detection whether the work is already scheduled is intuitive as
> usual: is_first_cqmwork() ? What???? !cqmwork_scheduled() would be too
> simple to understand, right?
>
> Oh well.
>
>> +	}
>>  }
>>
>>  static int intel_cqm_event_init(struct perf_event *event)
>> @@ -1322,6 +1426,9 @@ static int pkg_data_init_cpu(int cpu)
>>  	mutex_init(&pkg_data->pkg_data_mutex);
>>  	raw_spin_lock_init(&pkg_data->pkg_data_lock);
>>
>> +	INIT_DEFERRABLE_WORK(
>> +		&pkg_data->intel_cqm_rmid_work, intel_cqm_rmid_reuse);
>
> Stop this crappy formatting.
>
> 	INIT_DEFERRABLE_WORK(&pkg_data->intel_cqm_rmid_work,
> 			     intel_cqm_rmid_reuse);
>
> is the canonical way to do line breaks. Is it really that hard?

Will fix the formatting.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 09/12] x86/cqm: Add RMID reuse
  2017-01-06 22:00 ` [PATCH 09/12] x86/cqm: Add RMID reuse Vikas Shivappa
@ 2017-01-17 16:59   ` Thomas Gleixner
  2017-01-18  0:26     ` Shivappa Vikas
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2017-01-17 16:59 UTC (permalink / raw)
  To: Vikas Shivappa
  Cc: vikas.shivappa, davidcc, eranian, linux-kernel, x86, hpa, mingo,
	peterz, ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen,
	h.peter.anvin

On Fri, 6 Jan 2017, Vikas Shivappa wrote:
> +static void cqm_schedule_rmidwork(int domain);

This forward declaration is required because all callers of that function
are coming _after_ the function implementation, right?

> +static inline bool is_first_cqmwork(int domain)
> +{
> +	return (!atomic_cmpxchg(&cqm_pkgs_data[domain]->reuse_scheduled, 0, 1));

What's the purpose of these outer brackets? Enhanced readability?

> +}
> +
>  static void __put_rmid(u32 rmid, int domain)
>  {
>  	struct cqm_rmid_entry *entry;
> @@ -294,6 +301,93 @@ static void cqm_mask_call(struct rmid_read *rr)
>  static unsigned int __intel_cqm_threshold;
>  static unsigned int __intel_cqm_max_threshold;
>  
> +/*
> + * Test whether an RMID has a zero occupancy value on this cpu.

This tests whether the occupancy value is less than
__intel_cqm_threshold. Unless I'm missing something the value can be set by
user space and therefor is not necessarily zero.

Your commentry is really useful: It's either wrong or superflous or non
existent. 

> + */
> +static void intel_cqm_stable(void)
> +{
> +	struct cqm_rmid_entry *entry;
> +	struct list_head *llist;
> +
> +	llist = &cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru;
> +	list_for_each_entry(entry, llist, list) {
> +
> +		if (__rmid_read(entry->rmid) < __intel_cqm_threshold)
> +			entry->state = RMID_AVAILABLE;
> +	}
> +}
> +
> +static void __intel_cqm_rmid_reuse(void)
> +{
> +	struct cqm_rmid_entry *entry, *tmp;
> +	struct list_head *llist, *flist;
> +	struct pkg_data *pdata;
> +	unsigned long flags;
> +
> +	raw_spin_lock_irqsave(&cache_lock, flags);
> +	pdata = cqm_pkgs_data[pkg_id];
> +	llist = &pdata->cqm_rmid_limbo_lru;
> +	flist = &pdata->cqm_rmid_free_lru;
> +
> +	if (list_empty(llist))
> +		goto end;
> +	/*
> +	 * Test whether an RMID is free
> +	 */
> +	intel_cqm_stable();
> +
> +	list_for_each_entry_safe(entry, tmp, llist, list) {
> +
> +		if (entry->state == RMID_DIRTY)
> +			continue;
> +		/*
> +		 * Otherwise remove from limbo and place it onto the free list.
> +		 */
> +		list_del(&entry->list);
> +		list_add_tail(&entry->list, flist);

This is really a performance optimized implementation. Iterate the limbo
list first and check the occupancy value, conditionally update the state
and then iterate the same list and conditionally move the entries which got
just updated to the free list. Of course all of this happens with
interrupts disabled and a global lock held to enhance it further.

> +	}
> +
> +end:
> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
> +}
> +
> +static bool reschedule_cqm_work(void)
> +{
> +	unsigned long flags;
> +	bool nwork = false;
> +
> +	raw_spin_lock_irqsave(&cache_lock, flags);
> +
> +	if (!list_empty(&cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru))
> +		nwork = true;
> +	else
> +		atomic_set(&cqm_pkgs_data[pkg_id]->reuse_scheduled, 0U);

0U because the 'val' argument of atomic_set() is 'int', right?

> +	raw_spin_unlock_irqrestore(&cache_lock, flags);
> +
> +	return nwork;
> +}
> +
> +static void cqm_schedule_rmidwork(int domain)
> +{
> +	struct delayed_work *dwork;
> +	unsigned long delay;
> +
> +	dwork = &cqm_pkgs_data[domain]->intel_cqm_rmid_work;
> +	delay = msecs_to_jiffies(RMID_DEFAULT_QUEUE_TIME);
> +
> +	schedule_delayed_work_on(cqm_pkgs_data[domain]->rmid_work_cpu,
> +			     dwork, delay);
> +}
> +
> +static void intel_cqm_rmid_reuse(struct work_struct *work)

Your naming conventions are really random. cqm_* intel_cqm_* and then
totaly random function names. Is there any deeper thought involved or is it
just what it looks like: random?

> +{
> +	__intel_cqm_rmid_reuse();
> +
> +	if (reschedule_cqm_work())
> +		cqm_schedule_rmidwork(pkg_id);

Great stuff. You first try to move the limbo RMIDs to the free list and
then you reevaluate the limbo list again. Thereby acquiring and dropping
the global cache lock several times.

Dammit, the stupid reuse function already knows whether the list is empty
or not. But returning that information would make too much sense.

> +}
> +
>  static struct pmu intel_cqm_pmu;
>  
>  static u64 update_sample(unsigned int rmid, u32 evt_type, int first)
> @@ -548,6 +642,8 @@ static int intel_cqm_setup_event(struct perf_event *event,
>  	if (!event->hw.cqm_rmid)
>  		return -ENOMEM;
>  
> +	cqm_assign_rmid(event, event->hw.cqm_rmid);

Oh, so now there is a second user which calls cqm_assign_rmid(). Finally it
might actually do something. Whether that something is useful and
functional, I seriously doubt it.

> +
>  	return 0;
>  }
>  
> @@ -863,6 +959,7 @@ static void intel_cqm_event_terminate(struct perf_event *event)
>  {
>  	struct perf_event *group_other = NULL;
>  	unsigned long flags;
> +	int d;
>  
>  	mutex_lock(&cache_mutex);
>  	/*
> @@ -905,6 +1002,13 @@ static void intel_cqm_event_terminate(struct perf_event *event)
>  		mbm_stop_timers();
>  
>  	mutex_unlock(&cache_mutex);
> +
> +	for (d = 0; d < cqm_socket_max; d++) {
> +
> +		if (cqm_pkgs_data[d] != NULL && is_first_cqmwork(d)) {
> +			cqm_schedule_rmidwork(d);
> +		}

Let's look at the logic of this.

When the event terminates, then you unconditionally schedule the rmid work
on ALL packages whether the event was freed or not. Really brilliant.

There is no reason to schedule anything if the event was never using any
rmid on a given package or if the RMIDs are not freed because there is a
new group leader.

The NOHZ FULL people will love that extra work activity for nothing.

Also the detection whether the work is already scheduled is intuitive as
usual: is_first_cqmwork() ? What???? !cqmwork_scheduled() would be too
simple to understand, right?

Oh well.

> +	}
>  }
>  
>  static int intel_cqm_event_init(struct perf_event *event)
> @@ -1322,6 +1426,9 @@ static int pkg_data_init_cpu(int cpu)
>  	mutex_init(&pkg_data->pkg_data_mutex);
>  	raw_spin_lock_init(&pkg_data->pkg_data_lock);
>  
> +	INIT_DEFERRABLE_WORK(
> +		&pkg_data->intel_cqm_rmid_work, intel_cqm_rmid_reuse);

Stop this crappy formatting.

	INIT_DEFERRABLE_WORK(&pkg_data->intel_cqm_rmid_work,
			     intel_cqm_rmid_reuse);

is the canonical way to do line breaks. Is it really that hard?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 09/12] x86/cqm: Add RMID reuse
  2017-01-06 21:59 [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes Vikas Shivappa
@ 2017-01-06 22:00 ` Vikas Shivappa
  2017-01-17 16:59   ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Vikas Shivappa @ 2017-01-06 22:00 UTC (permalink / raw)
  To: vikas.shivappa, vikas.shivappa
  Cc: davidcc, eranian, linux-kernel, x86, hpa, tglx, mingo, peterz,
	ravi.v.shankar, tony.luck, fenghua.yu, andi.kleen, h.peter.anvin

When an RMID is freed by an event it cannot be reused immediately as the
RMID may still have some cache occupancy. Hence when an RMID is freed it
goes into limbo list and not free list. This patch provides support to
periodically check the occupancy values of such RMIDs and move them to
the free list once its occupancy < threshold_occupancy value. The
threshold occupancy value can be modified by user based on his
requirements.

Tests: Before the patch, task monitoring would just throw error once
RMIDs are used in the lifetime of systemboot.
After this patch, we would be able to reuse the RMIDs that are freed.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
---
 arch/x86/events/intel/cqm.c | 107 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index e1765e3..92efe12 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -174,6 +174,13 @@ u32 __get_rmid(int domain)
 	return entry->rmid;
 }
 
+static void cqm_schedule_rmidwork(int domain);
+
+static inline bool is_first_cqmwork(int domain)
+{
+	return (!atomic_cmpxchg(&cqm_pkgs_data[domain]->reuse_scheduled, 0, 1));
+}
+
 static void __put_rmid(u32 rmid, int domain)
 {
 	struct cqm_rmid_entry *entry;
@@ -294,6 +301,93 @@ static void cqm_mask_call(struct rmid_read *rr)
 static unsigned int __intel_cqm_threshold;
 static unsigned int __intel_cqm_max_threshold;
 
+/*
+ * Test whether an RMID has a zero occupancy value on this cpu.
+ */
+static void intel_cqm_stable(void)
+{
+	struct cqm_rmid_entry *entry;
+	struct list_head *llist;
+
+	llist = &cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru;
+	list_for_each_entry(entry, llist, list) {
+
+		if (__rmid_read(entry->rmid) < __intel_cqm_threshold)
+			entry->state = RMID_AVAILABLE;
+	}
+}
+
+static void __intel_cqm_rmid_reuse(void)
+{
+	struct cqm_rmid_entry *entry, *tmp;
+	struct list_head *llist, *flist;
+	struct pkg_data *pdata;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&cache_lock, flags);
+	pdata = cqm_pkgs_data[pkg_id];
+	llist = &pdata->cqm_rmid_limbo_lru;
+	flist = &pdata->cqm_rmid_free_lru;
+
+	if (list_empty(llist))
+		goto end;
+	/*
+	 * Test whether an RMID is free
+	 */
+	intel_cqm_stable();
+
+	list_for_each_entry_safe(entry, tmp, llist, list) {
+
+		if (entry->state == RMID_DIRTY)
+			continue;
+		/*
+		 * Otherwise remove from limbo and place it onto the free list.
+		 */
+		list_del(&entry->list);
+		list_add_tail(&entry->list, flist);
+	}
+
+end:
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+}
+
+static bool reschedule_cqm_work(void)
+{
+	unsigned long flags;
+	bool nwork = false;
+
+	raw_spin_lock_irqsave(&cache_lock, flags);
+
+	if (!list_empty(&cqm_pkgs_data[pkg_id]->cqm_rmid_limbo_lru))
+		nwork = true;
+	else
+		atomic_set(&cqm_pkgs_data[pkg_id]->reuse_scheduled, 0U);
+
+	raw_spin_unlock_irqrestore(&cache_lock, flags);
+
+	return nwork;
+}
+
+static void cqm_schedule_rmidwork(int domain)
+{
+	struct delayed_work *dwork;
+	unsigned long delay;
+
+	dwork = &cqm_pkgs_data[domain]->intel_cqm_rmid_work;
+	delay = msecs_to_jiffies(RMID_DEFAULT_QUEUE_TIME);
+
+	schedule_delayed_work_on(cqm_pkgs_data[domain]->rmid_work_cpu,
+			     dwork, delay);
+}
+
+static void intel_cqm_rmid_reuse(struct work_struct *work)
+{
+	__intel_cqm_rmid_reuse();
+
+	if (reschedule_cqm_work())
+		cqm_schedule_rmidwork(pkg_id);
+}
+
 static struct pmu intel_cqm_pmu;
 
 static u64 update_sample(unsigned int rmid, u32 evt_type, int first)
@@ -548,6 +642,8 @@ static int intel_cqm_setup_event(struct perf_event *event,
 	if (!event->hw.cqm_rmid)
 		return -ENOMEM;
 
+	cqm_assign_rmid(event, event->hw.cqm_rmid);
+
 	return 0;
 }
 
@@ -863,6 +959,7 @@ static void intel_cqm_event_terminate(struct perf_event *event)
 {
 	struct perf_event *group_other = NULL;
 	unsigned long flags;
+	int d;
 
 	mutex_lock(&cache_mutex);
 	/*
@@ -905,6 +1002,13 @@ static void intel_cqm_event_terminate(struct perf_event *event)
 		mbm_stop_timers();
 
 	mutex_unlock(&cache_mutex);
+
+	for (d = 0; d < cqm_socket_max; d++) {
+
+		if (cqm_pkgs_data[d] != NULL && is_first_cqmwork(d)) {
+			cqm_schedule_rmidwork(d);
+		}
+	}
 }
 
 static int intel_cqm_event_init(struct perf_event *event)
@@ -1322,6 +1426,9 @@ static int pkg_data_init_cpu(int cpu)
 	mutex_init(&pkg_data->pkg_data_mutex);
 	raw_spin_lock_init(&pkg_data->pkg_data_lock);
 
+	INIT_DEFERRABLE_WORK(
+		&pkg_data->intel_cqm_rmid_work, intel_cqm_rmid_reuse);
+
 	pkg_data->rmid_work_cpu = cpu;
 
 	nr_rmids = cqm_max_rmid + 1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-01-18  0:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-06 21:56 [PATCH 00/12 V5] Cqm2: Intel Cache quality of monitoring fixes Vikas Shivappa
2017-01-06 21:56 ` [PATCH 01/12] Documentation, x86/cqm: Intel Resource Monitoring Documentation Vikas Shivappa
2017-01-06 21:56 ` [PATCH 02/12] x86/cqm: Remove cqm recycling/conflict handling Vikas Shivappa
2017-01-06 21:56 ` [PATCH 03/12] x86/rdt: Add rdt common/cqm compile option Vikas Shivappa
2017-01-06 21:56 ` [PATCH 04/12] x86/cqm: Add Per pkg rmid support Vikas Shivappa
2017-01-06 21:56 ` [PATCH 05/12] x86/cqm,perf/core: Cgroup support prepare Vikas Shivappa
2017-01-06 21:56 ` [PATCH 06/12] x86/cqm: Add cgroup hierarchical monitoring support Vikas Shivappa
2017-01-06 21:56 ` [PATCH 07/12] x86/rdt,cqm: Scheduling support update Vikas Shivappa
2017-01-06 21:56 ` [PATCH 08/12] x86/cqm: Add support for monitoring task and cgroup together Vikas Shivappa
2017-01-06 21:56 ` [PATCH 09/12] x86/cqm: Add RMID reuse Vikas Shivappa
2017-01-06 21:56 ` [PATCH 10/12] perf/core,x86/cqm: Add read for Cgroup events,per pkg reads Vikas Shivappa
2017-01-06 21:56 ` [PATCH 11/12] perf/stat: fix bug in handling events in error state Vikas Shivappa
2017-01-06 21:56 ` [PATCH 12/12] perf/stat: revamp read error handling, snapshot and per_pkg events Vikas Shivappa
2017-01-06 21:59 [PATCH 00/12] Cqm2: Intel Cache quality monitoring fixes Vikas Shivappa
2017-01-06 22:00 ` [PATCH 09/12] x86/cqm: Add RMID reuse Vikas Shivappa
2017-01-17 16:59   ` Thomas Gleixner
2017-01-18  0:26     ` Shivappa Vikas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).