From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754631AbbAOTh2 (ORCPT ); Thu, 15 Jan 2015 14:37:28 -0500 Received: from mail-we0-f169.google.com ([74.125.82.169]:47096 "EHLO mail-we0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753290AbbAOTh1 (ORCPT ); Thu, 15 Jan 2015 14:37:27 -0500 Date: Thu, 15 Jan 2015 19:37:23 +0000 From: Matt Fleming To: Peter Zijlstra Cc: Ingo Molnar , Jiri Olsa , Arnaldo Carvalho de Melo , Andi Kleen , Thomas Gleixner , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Kanaka Juvva , Matt Fleming Subject: Re: [PATCH v4 10/11] perf/x86/intel: Perform rotation on Intel CQM RMIDs Message-ID: <20150115193723.GA12079@codeblueprint.co.uk> References: <1415999712-5850-1-git-send-email-matt@console-pimps.org> <1415999712-5850-11-git-send-email-matt@console-pimps.org> <20150106171712.GH3337@twins.programming.kicks-ass.net> <20150109121401.GB495@console-pimps.org> <20150109130250.GH29390@twins.programming.kicks-ass.net> <20150109152442.GG495@console-pimps.org> <20150109155835.GJ29390@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150109155835.GJ29390@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 09 Jan, at 04:58:35PM, Peter Zijlstra wrote: > > Yeah, that'll work, when the free+limbo count is 1/4th the total we > should stop pulling more plugs. Perhaps something like this? It favours stealing more RMIDs over increasing the "dirty threshold". --- diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c index fc1a90245601..af58f233c93c 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c @@ -490,29 +490,27 @@ static unsigned int __rmid_queue_time_ms = RMID_DEFAULT_QUEUE_TIME; /* * intel_cqm_rmid_stabilize - move RMIDs from limbo to free list - * @available: are there freeable RMIDs on the limbo list? + * @nr_available: number of freeable RMIDs on the limbo list * * Quiescent state; wait for all 'freed' RMIDs to become unused, i.e. no * cachelines are tagged with those RMIDs. After this we can reuse them * and know that the current set of active RMIDs is stable. * - * Return %true or %false depending on whether we were able to stabilize - * an RMID for intel_cqm_rotation_rmid. + * Return %true or %false depending on whether stabilization needs to be + * reattempted. * - * If we return %false then @available is updated to indicate the reason - * we couldn't stabilize any RMIDs. @available is %false if no suitable - * RMIDs were found on the limbo list to recycle, i.e. no RMIDs had been - * on the list for the minimum queue time. If @available is %true then, - * we found suitable RMIDs to recycle but none had an associated - * occupancy value below __intel_cqm_threshold and the threshold should - * be increased and stabilization reattempted. + * If we return %true then @nr_available is updated to indicate the + * number of RMIDs on the limbo list that have been queued for the + * minimum queue time (RMID_AVAILABLE), but whose data occupancy values + * are above __intel_cqm_threshold. */ -static bool intel_cqm_rmid_stabilize(bool *available) +static bool intel_cqm_rmid_stabilize(unsigned int *available) { struct cqm_rmid_entry *entry, *tmp; lockdep_assert_held(&cache_mutex); + *available = 0; list_for_each_entry(entry, &cqm_rmid_limbo_lru, list) { unsigned long min_queue_time; unsigned long now = jiffies; @@ -539,7 +537,7 @@ static bool intel_cqm_rmid_stabilize(bool *available) break; entry->state = RMID_AVAILABLE; - *available = true; + *available++; } /* @@ -547,7 +545,7 @@ static bool intel_cqm_rmid_stabilize(bool *available) * sitting on the queue for the minimum queue time. */ if (!*available) - return false; + return true; /* * Test whether an RMID is free for each package. @@ -684,9 +682,10 @@ static void intel_cqm_sched_out_events(struct perf_event *event) static bool __intel_cqm_rmid_rotate(void) { struct perf_event *group, *start = NULL; + unsigned int threshold_limit; unsigned int nr_needed = 0; + unsigned int nr_available; bool rotated = false; - bool available; mutex_lock(&cache_mutex); @@ -756,14 +755,41 @@ stabilize: * Alternatively, if we didn't need to perform any rotation, * we'll have a bunch of RMIDs in limbo that need stabilizing. */ - if (!intel_cqm_rmid_stabilize(&available)) { - unsigned int limit; + threshold_limit = __intel_cqm_max_threshold / cqm_l3_scale; + + while (intel_cqm_rmid_stabilize(&nr_available) && + __intel_cqm_threshold < threshold_limit) { + unsigned int steal_limit; + + /* Allow max 25% of RMIDs to be in limbo. */ + steal_limit = (cqm_max_rmid + 1) / 4; - limit = __intel_cqm_max_threshold / cqm_l3_scale; - if (available && __intel_cqm_threshold < limit) { - __intel_cqm_threshold++; + /* + * We failed to stabilize any RMIDs so our rotation + * logic is now stuck. In order to make forward progress + * we have a few options: + * + * 1. rotate ("steal") another RMID + * 2. increase the threshold + * 3. do nothing + * + * We do both of 1. and 2. until we hit the steal limit. + * + * The steal limit prevents all RMIDs ending up on the + * limbo list. This can happen if every RMID has a + * non-zero occupancy above threshold_limit, and the + * occupancy values aren't dropping fast enough. + * + * Note that there is prioritisation at work here - we'd + * rather increase the number of RMIDs on the limbo list + * than increase the threshold, because increasing the + * threshold skews the event data (because we reuse + * dirty RMIDs) - threshold bumps are a last resort. + */ + if (nr_available < steal_limit) goto again; - } + + __intel_cqm_threshold++; } out: -- Matt Fleming, Intel Open Source Technology Center