From: David Carrillo-Cisneros <davidcc@google.com>
To: linux-kernel@vger.kernel.org
Cc: "x86@kernel.org" <x86@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Vegard Nossum <vegard.nossum@gmail.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Nilay Vaish <nilayvaish@gmail.com>, Borislav Petkov <bp@suse.de>,
Vikas Shivappa <vikas.shivappa@linux.intel.com>,
Ravi V Shankar <ravi.v.shankar@intel.com>,
Fenghua Yu <fenghua.yu@intel.com>, Paul Turner <pjt@google.com>,
Stephane Eranian <eranian@google.com>,
David Carrillo-Cisneros <davidcc@google.com>
Subject: [PATCH v3 42/46] perf/x86/intel/cmt: add rmid stealing
Date: Sat, 29 Oct 2016 17:38:39 -0700 [thread overview]
Message-ID: <1477787923-61185-43-git-send-email-davidcc@google.com> (raw)
In-Reply-To: <1477787923-61185-1-git-send-email-davidcc@google.com>
Add rmid rotation code to steal an rmid whenever not enough
pmonrs are being reactivated.
More details in code's comments.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
---
arch/x86/events/intel/cmt.c | 149 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 144 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/intel/cmt.c b/arch/x86/events/intel/cmt.c
index ba82f95..e677511 100644
--- a/arch/x86/events/intel/cmt.c
+++ b/arch/x86/events/intel/cmt.c
@@ -1368,6 +1368,106 @@ static int try_activate_dep_dirty_pmonrs(struct pkg_data *pkgd)
return nr_reused;
}
+/**
+ * can_steal_rmid() - Tell if this pmonr's rmid can be stolen.
+ *
+ * The "rmid cycle" for a pmonr starts when an Active pmonr gets its rmid
+ * stolen and completes when it receives a rmid again.
+ * A monr "rmid recoup" occurs when all its non Off/Unused pmonrs
+ * obtain a rmid (i.e. when all pmonr than need a rmid have one).
+ *
+ * A pmonr's rmid can be stolen if either:
+ * 1) No other pmonr in pmonr's monr has been stolen before, or
+ * 2) Some pmonrs have had rmids stolen but rmids for all pmonrs have been
+ * recovered (rmid recoup) and kept for at least
+ * __cmt_pre_mon_slice + __cmt_min_mon_slice time.
+ * 3) At least one of the pmonrs with pkgid smaller than @pmonr's has not
+ * completed its first "rmid cycle". Once this condition is false, the pmonr
+ * will have completed its last "rmid cycle" and stealing will no be longer
+ * allowed.
+ * This guarantees that the last "rmid cycle" of a pmonr occurs in
+ * pkgid order, preventing rmid deadlocks. It also guarantees that eventually
+ * all pmonrs will eventually have a last "rmid cycle", recovering all
+ * required rmids.
+ */
+static bool can_steal_rmid(struct pmonr *pmonr)
+{
+ union pmonr_rmids rmids;
+ struct monr *monr = pmonr->monr;
+ struct pkg_data *pkgd = NULL;
+ struct pmonr *pos_pmonr;
+ bool need_rmid_state;
+ u64 last_all_active, next_steal_time, last_pmonr_active;
+
+ last_all_active = atomic64_read(&monr->last_rmid_recoup);
+ /*
+ * Can steal if no pmonr has been stolen or all not Unused have been
+ * in Active state for long enough.
+ */
+ if (!atomic_read(&monr->nr_dep_pmonrs)) {
+ /* Check steal condition 1. */
+ if (!last_all_active)
+ return true;
+ next_steal_time = last_all_active +
+ __cmt_pre_mon_slice + __cmt_min_mon_slice;
+ /* Check steal condition 2. */
+ if (time_after64(next_steal_time, get_jiffies_64()))
+ return true;
+
+ return false;
+ }
+
+ rcu_read_lock();
+
+ /* Check for steal condition 3 without locking. */
+ while ((pkgd = cmt_pkgs_data_next_rcu(pkgd))) {
+ /* To avoid deadlocks, wait for pmonr in pkgid order. */
+ if (pkgd->pkgid >= pmonr->pkgd->pkgid)
+ break;
+ pos_pmonr = pkgd_pmonr(pkgd, monr);
+ rmids.value = atomic64_read(&pos_pmonr->atomic_rmids);
+ last_pmonr_active = atomic64_read(
+ &pos_pmonr->last_enter_active);
+
+ /* pmonrs in Dep_{Idle,Dirty} states are waiting for a rmid. */
+ need_rmid_state = rmids.sched_rmid != INVALID_RMID &&
+ rmids.sched_rmid != rmids.read_rmid;
+
+ /* test if pos_pmonr has finished its first rmid cycle. */
+ if (need_rmid_state && last_all_active <= last_pmonr_active) {
+ rcu_read_unlock();
+
+ return true;
+ }
+ }
+ rcu_read_unlock();
+
+ return false;
+}
+
+/* Steal as many rmids as possible, up to @max_to_steal. */
+static int try_steal_active_pmonrs(struct pkg_data *pkgd,
+ unsigned int max_to_steal)
+{
+ struct pmonr *pmonr, *tmp;
+ unsigned long flags;
+ int nr_stolen = 0;
+
+ raw_spin_lock_irqsave(&pkgd->lock, flags);
+
+ list_for_each_entry_safe(pmonr, tmp, &pkgd->active_pmonrs, rot_entry) {
+ if (!can_steal_rmid(pmonr))
+ continue;
+ pmonr_active_to_dep_dirty(pmonr);
+ nr_stolen++;
+ if (nr_stolen == max_to_steal)
+ break;
+ }
+ raw_spin_unlock_irqrestore(&pkgd->lock, flags);
+
+ return nr_stolen;
+}
+
static inline int __try_use_free_rmid(struct pkg_data *pkgd, u32 rmid)
{
struct pmonr *pmonr;
@@ -1485,9 +1585,17 @@ static int try_free_dirty_rmids(struct pkg_data *pkgd,
* @pkgd: The package data to rotate rmids on.
* @active_goal: Target min nr of pmonrs to put in Active state.
* @max_dirty_thld: Upper bound for dirty_thld, in CMT cache units.
+ * @max_dirty_goal: Max nr of rmids to leave dirty, waiting to drop
+ * occupancy.
+ * @dirty_cushion: nr of rmids to try to leave in dirty on top of the
+ * nr of pmonrs that need rmid (Dep_Idle), in case
+ * some dirty rmids do not drop occupancy fast enough.
*
* The goals for each iteration of rotation logic are:
* 1) to activate @active_goal pmonrs.
+ * 2) if any pmonr is waiting for rmid (Dep_Idle), to steal enough rmids to
+ * meet its dirty_goal. The dirty_goal is an estimate of the number of dirty
+ * rmids required so that next call reaches its @active_goal.
*
* In order to activate Dep_{Dirty,Idle} pmonrs, rotation logic:
* 1) activate eligible Dep_Dirty pmonrs: These pmonrs can reuse their former
@@ -1503,12 +1611,14 @@ static int try_free_dirty_rmids(struct pkg_data *pkgd,
* rmid.
*/
static int __intel_cmt_rmid_rotate(struct pkg_data *pkgd,
- unsigned int active_goal, unsigned int max_dirty_thld)
+ unsigned int active_goal, unsigned int max_dirty_thld,
+ unsigned int max_dirty_goal, unsigned int dirty_cushion)
{
unsigned int dirty_thld = 0, min_dirty, nr_activated;
- unsigned int nr_dep_pmonrs;
+ unsigned int nr_to_steal, nr_stolen;
+ unsigned int nr_dirty, dirty_goal, nr_dep_pmonrs;
unsigned long flags, *rmids_bm = NULL;
- bool do_active_goal, read_dirty = true, dirty_is_max;
+ bool do_active_goal, do_dirty_goal, read_dirty = true, dirty_is_max;
lockdep_assert_held(&pkgd->mutex);
@@ -1534,6 +1644,7 @@ static int __intel_cmt_rmid_rotate(struct pkg_data *pkgd,
raw_spin_lock_irqsave(&pkgd->lock, flags);
nr_activated += __try_use_free_rmids(pkgd);
+ nr_dirty = pkgd->nr_dirty_rmids;
nr_dep_pmonrs = pkgd->nr_dep_pmonrs;
raw_spin_unlock_irqrestore(&pkgd->lock, flags);
@@ -1544,14 +1655,27 @@ static int __intel_cmt_rmid_rotate(struct pkg_data *pkgd,
dirty_is_max = dirty_thld >= max_dirty_thld;
do_active_goal = nr_activated < active_goal && !dirty_is_max;
+ dirty_goal = min(max_dirty_goal, nr_dep_pmonrs + dirty_cushion);
+ do_dirty_goal = nr_dirty < dirty_goal;
+
/*
* Since Dep_Dirty pmonrs have their own dirty rmid, only Dep_Idle
* pmonrs are waiting for a rmid to be available. Stop if no pmonr
* wait for rmid or no goals to pursue.
*/
- if (!nr_dep_pmonrs || !do_active_goal)
+ if (!nr_dep_pmonrs || (!do_dirty_goal && !do_active_goal))
goto exit;
+ if (do_dirty_goal) {
+ nr_to_steal = dirty_goal - nr_dirty;
+ nr_stolen = try_steal_active_pmonrs(pkgd, nr_to_steal);
+ /*
+ * It tried to steal from all Active pmonrs, makes no sense
+ * to reattempt.
+ */
+ max_dirty_goal = 0;
+ }
+
/*
* Try to activate more pmonrs by increasing the dirty threshold.
* Using the minimum observed occupancy in dirty rmids guarantees to
@@ -1633,6 +1757,7 @@ static void intel_cmt_rmid_rotation_work(struct work_struct *work)
/* not precise elapsed time, but good enough for rotation purposes. */
unsigned int elapsed_ms = intel_cmt_pmu.hrtimer_interval_ms;
unsigned int active_goal, max_dirty_threshold;
+ unsigned int dirty_cushion, max_dirty_goal;
pkgd = container_of(to_delayed_work(work),
struct pkg_data, rotation_work);
@@ -1649,7 +1774,21 @@ static void intel_cmt_rmid_rotation_work(struct work_struct *work)
active_goal = max(1u, (elapsed_ms * __cmt_min_progress_rate) / 1000);
max_dirty_threshold = READ_ONCE(__cmt_max_threshold) / cmt_l3_scale;
- __intel_cmt_rmid_rotate(pkgd, active_goal, max_dirty_threshold);
+ /*
+ * Upper bound for the nr of rmids to be dirty in order to have a good
+ * chance of finding enough rmids in next iteration of rotation logic.
+ */
+ max_dirty_goal = min(active_goal + 1, (pkgd->max_rmid + 1) / 4);
+
+ /*
+ * Nr of extra rmids to put in dirty in case some don't drop occupancy.
+ * To be calculated in a sensible manner once statistics about rmid
+ * recycling rate are in place.
+ */
+ dirty_cushion = 2;
+
+ __intel_cmt_rmid_rotate(pkgd, active_goal, max_dirty_threshold,
+ max_dirty_goal, dirty_cushion);
if (intel_cmt_need_rmid_rotation(pkgd))
__intel_cmt_schedule_rotation_for_pkg(pkgd);
--
2.8.0.rc3.226.g39d4020
next prev parent reply other threads:[~2016-10-30 0:41 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-30 0:37 [PATCH v3 00/46] Cache Monitoring Technology (aka CQM) David Carrillo-Cisneros
2016-10-30 0:37 ` [PATCH v3 01/46] perf/x86/intel/cqm: remove previous version of CQM and MBM David Carrillo-Cisneros
2016-10-30 0:37 ` [PATCH v3 02/46] perf/x86/intel: rename CQM cpufeatures to CMT David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 03/46] x86/intel: add CONFIG_INTEL_RDT_M configuration flag David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 04/46] perf/x86/intel/cmt: add device initialization and CPU hotplug support David Carrillo-Cisneros
2016-11-10 15:19 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 05/46] perf/x86/intel/cmt: add per-package locks David Carrillo-Cisneros
2016-11-10 21:23 ` Thomas Gleixner
2016-11-11 2:22 ` David Carrillo-Cisneros
2016-11-11 7:21 ` Peter Zijlstra
2016-11-11 7:32 ` Ingo Molnar
2016-11-11 9:41 ` Thomas Gleixner
2016-11-11 17:21 ` David Carrillo-Cisneros
2016-11-13 10:58 ` Thomas Gleixner
2016-11-15 4:53 ` David Carrillo-Cisneros
2016-11-16 19:00 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 06/46] perf/x86/intel/cmt: add intel_cmt pmu David Carrillo-Cisneros
2016-11-10 21:27 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 07/46] perf/core: add RDT Monitoring attributes to struct hw_perf_event David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 08/46] perf/x86/intel/cmt: add MONitored Resource (monr) initialization David Carrillo-Cisneros
2016-11-10 23:09 ` Thomas Gleixner
2016-10-30 0:38 ` [PATCH v3 09/46] perf/x86/intel/cmt: add basic monr hierarchy David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 10/46] perf/x86/intel/cmt: add Package MONitored Resource (pmonr) initialization David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 11/46] perf/x86/intel/cmt: add cmt_user_flags (uflags) to monr David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 12/46] perf/x86/intel/cmt: add per-package rmid pools David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 13/46] perf/x86/intel/cmt: add pmonr's Off and Unused states David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 14/46] perf/x86/intel/cmt: add Active and Dep_{Idle, Dirty} states David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 15/46] perf/x86/intel: encapsulate rmid and closid updates in pqr cache David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 16/46] perf/x86/intel/cmt: set sched rmid and complete pmu start/stop/add/del David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 17/46] perf/x86/intel/cmt: add uflag CMT_UF_NOLAZY_RMID David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 18/46] perf/core: add arch_info field to struct perf_cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 19/46] perf/x86/intel/cmt: add support for cgroup events David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 20/46] perf/core: add pmu::event_terminate David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 21/46] perf/x86/intel/cmt: use newly introduced event_terminate David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 22/46] perf/x86/intel/cmt: sync cgroups and intel_cmt device start/stop David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 23/46] perf/core: hooks to add architecture specific features in perf_cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 24/46] perf/x86/intel/cmt: add perf_cgroup_arch_css_{online,offline} David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 25/46] perf/x86/intel/cmt: add monr->flags and CMT_MONR_ZOMBIE David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 26/46] sched: introduce the finish_arch_pre_lock_switch() scheduler hook David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 27/46] perf/x86/intel: add pqr cache flags and intel_pqr_ctx_switch David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 28/46] perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to pmu::read David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 29/46] perf/x86/intel/cmt: add error handling to intel_cmt_event_read David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 30/46] perf/x86/intel/cmt: add asynchronous read for task events David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 31/46] perf/x86/intel/cmt: add subtree read for cgroup events David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 32/46] perf/core: Add PERF_EV_CAP_READ_ANY_{CPU_,}PKG flags David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 33/46] perf/x86/intel/cmt: use PERF_EV_CAP_READ_{,CPU_}PKG flags in Intel cmt David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 34/46] perf/core: introduce PERF_EV_CAP_CGROUP_NO_RECURSION David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 35/46] perf/x86/intel/cmt: use PERF_EV_CAP_CGROUP_NO_RECURSION in intel_cmt David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 36/46] perf/core: add perf_event cgroup hooks for subsystem attributes David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 37/46] perf/x86/intel/cmt: add cont_monitoring to perf cgroup David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 38/46] perf/x86/intel/cmt: introduce read SLOs for rotation David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 39/46] perf/x86/intel/cmt: add max_recycle_threshold sysfs attribute David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 40/46] perf/x86/intel/cmt: add rotation scheduled work David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 41/46] perf/x86/intel/cmt: add rotation minimum progress SLO David Carrillo-Cisneros
2016-10-30 0:38 ` David Carrillo-Cisneros [this message]
2016-10-30 0:38 ` [PATCH v3 43/46] perf/x86/intel/cmt: add CMT_UF_NOSTEAL_RMID flag David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 44/46] perf/x86/intel/cmt: add debugfs intel_cmt directory David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 45/46] perf/stat: fix bug in handling events in error state David Carrillo-Cisneros
2016-10-30 0:38 ` [PATCH v3 46/46] perf/stat: revamp read error handling, snapshot and per_pkg events David Carrillo-Cisneros
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1477787923-61185-43-git-send-email-davidcc@google.com \
--to=davidcc@google.com \
--cc=ak@linux.intel.com \
--cc=bp@suse.de \
--cc=eranian@google.com \
--cc=fenghua.yu@intel.com \
--cc=kan.liang@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=nilayvaish@gmail.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=ravi.v.shankar@intel.com \
--cc=tglx@linutronix.de \
--cc=vegard.nossum@gmail.com \
--cc=vikas.shivappa@linux.intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).