Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics

From: Peter Zijlstra <peterz@infradead.org>
To: kan.liang@linux.intel.com
Cc: acme@kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org,
	tglx@linutronix.de, jolsa@kernel.org, eranian@google.com,
	alexander.shishkin@linux.intel.com, ak@linux.intel.com
Subject: Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
Date: Mon, 30 Sep 2019 16:53:21 +0200	[thread overview]
Message-ID: <20190930145321.GF4581@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190930140755.GE4581@hirez.programming.kicks-ass.net>

On Mon, Sep 30, 2019 at 04:07:55PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 30, 2019 at 03:06:15PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@linux.intel.com wrote:
> 
> > > +static bool is_first_topdown_event_in_group(struct perf_event *event)
> > > +{
> > > +	struct perf_event *first = NULL;
> > > +
> > > +	if (is_topdown_event(event->group_leader))
> > > +		first = event->group_leader;
> > > +	else {
> > > +		for_each_sibling_event(first, event->group_leader)
> > > +			if (is_topdown_event(first))
> > > +				break;
> > > +	}
> > > +
> > > +	if (event == first)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > 
> > > +static u64 icl_update_topdown_event(struct perf_event *event)
> > > +{
> > > +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > +	struct perf_event *other;
> > > +	u64 slots, metrics;
> > > +	int idx;
> > > +
> > > +	/*
> > > +	 * Only need to update all events for the first
> > > +	 * slots/metrics event in a group
> > > +	 */
> > > +	if (event && !is_first_topdown_event_in_group(event))
> > > +		return 0;
> > 
> > This is pretty crap and approaches O(n^2); let me think if there's
> > anything saner to do here.
> 
> This is also really complicated in the case where we do
> perf_remove_from_context() in the 'wrong' order.
> 
> In that case we get detached events that are not up-to-date (and never
> will be). It doesn't look like that matters, but it is weird.

So we either get called from the PMI, or read(). In the PMI there is the
perf_output_read_group() path, and that too appears broken vs the above,
it assumes perf_event_count() is up-to-date after calling pmu->read(),
which isn't true.

Now, I'm thinking that is already broken vs TXN_READ, so we should fix
that a little something like the below (needs to be tested on
Power-hv-24x7).

---

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6272,10 +6272,22 @@ static void perf_output_read_group(struc
 	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
 		values[n++] = running;
 
+	if (leader->nr_siblings > 1)
+		leader->pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
 	if ((leader != event) &&
 	    (leader->state == PERF_EVENT_STATE_ACTIVE))
 		leader->pmu->read(leader);
 
+	for_each_sibling_event(sub, leader) {
+		if ((sub != event) &&
+		    (sub->state == PERF_EVENT_STATE_ACTIVE))
+			sub->pmu->read(sub);
+	}
+
+	if (leader->nr_siblings > 1)
+		leader->pmu->commit_tx(pmu, PERF_PMU_TXN_READ);
+
 	values[n++] = perf_event_count(leader);
 	if (read_format & PERF_FORMAT_ID)
 		values[n++] = primary_event_id(leader);
@@ -6285,10 +6297,6 @@ static void perf_output_read_group(struc
 	for_each_sibling_event(sub, leader) {
 		n = 0;
 
-		if ((sub != event) &&
-		    (sub->state == PERF_EVENT_STATE_ACTIVE))
-			sub->pmu->read(sub);
-
 		values[n++] = perf_event_count(sub);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);


After that, I think we can simply do something like:

icl_update_topdown_event(..)
{
	int idx = event->hwc.idx;

	if (is_metric_idx(idx))
		return;

	// must be FIXED_SLOTS

	/* do teh thing and update SLOTS and METRIC together */
}

Hmmm?