[PATCH 0/2] perf/x86/amd: Add support for Large Increment per Cycle Events

* [PATCH 0/2] perf/x86/amd: Add support for Large Increment per Cycle Events
@ 2019-11-14 18:37 Kim Phillips
  2019-11-14 18:37 ` [PATCH 1/2] perf/x86/amd: Constrain Large Increment per Cycle events Kim Phillips
  2019-11-14 18:37 ` [PATCH 2/2] perf/x86/amd: Add support for Large Increment per Cycle Events Kim Phillips
  0 siblings, 2 replies; 9+ messages in thread
From: Kim Phillips @ 2019-11-14 18:37 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel, Kim Phillips

This patchseries adds support for Large Increment per Cycle Events,
which is needed to count events like Retired SSE/AVX FLOPs.
The first patch constrains Large Increment events to the even PMCs,
and the second patch changes the scheduler to accommodate and
program the new Merge event needed on the odd counters.

The RFC was posted here:

https://lkml.org/lkml/2019/8/26/828

Changes since then include mostly fixing interoperation with the
watchdog, splitting, rewording, and addressing Peter Zijlstra's
comments:

 - Mentioned programming the odd counter before the even counter
   in the commit text, as is now also done in the code.

 - Do the programming of the counters in the enable/disable paths
   instead of the commit_scheduler hook.

 - Instead of the loop re-counting all large increment events,
   have collect_events() and a new amd_put_event_constraints_f17h
   update a new cpuc variable 'n_lg_inc'.  Now the scheduler
   does a simple subtraction to get the target gpmax value.

 - Amend the fastpath's used_mask code to fix a problem where
   counter programming was being overwritten when running with
   the watchdog enabled.

 - Omit the superfluous __set_bit(idx + 1) in __perf_sched_find_counter
   and clear the large increment's sched->state.used bit in the
   path where a failure to schedule is determined due to the
   next counter already being used (thanks Nathan Fontenot).

 - Broaden new PMU initialization code to run on families 17h and
   above.

 - Have new is_large_inc(strcut perf_event) common to all x86 paths
   as is is_pebs_pt().  That way, the raw event code checker
   amd_is_lg_inc_event_code() can stay in its vendor-specific area
   events/amd/core.c.

 - __set_bit, WARN_ON(!gpmax), all addressed.

 - WRT changing the naming to PAIR, etc. I dislike the idea because
   h/w documentation consistently calls this now relatively old
   feature for "Large Increment per Cycle" events, and the secondary
   event needed, specifically the "Merge event (0xFFF)".  When I
   started this project the biggest problem was disambiguating
   between the Large Increment event (FLOPs, or others), and the
   Merge event (0xFFF) itself.  Different phases had "Merge" for
   the Merge event vs. "Merged" for the Large Increment event(s),
   or "Mergee", which made reading the source code too easy to
   mistake one for the other. So I opted for two distinctly
   different base terms/stem-words: Large increment (lg_inc) and
   Merge, to match the documentation, which basically has it right.
   Changing the term to "pair" would have created the same "pair" vs.
   "paired" vs. "pairer" etc. confusion, so I dropped it.

 - WRT the comment "How about you make __perf_sched_find_count() set
   the right value? That already knows it did this.", I didn't see
   how I'd get away from still having to do the constraints flag &
   LARGE_INC check in perf_assign_events(), to re-adjust the assignment
   in the assign array, or sched.state.counter.  This code really
   is only needed after the counter assignment is made, in order to
   program the h/w correctly.

Kim Phillips (2):
  perf/x86/amd: Constrain Large Increment per Cycle events
  perf/x86/amd: Add support for Large Increment per Cycle Events

 arch/x86/events/amd/core.c   | 110 +++++++++++++++++++++++++----------
 arch/x86/events/core.c       |  46 ++++++++++++++-
 arch/x86/events/perf_event.h |  21 +++++++
 3 files changed, 145 insertions(+), 32 deletions(-)

-- 
2.24.0

^ permalink raw reply	[flat|nested] 9+ messages in thread