All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [PATCH v1 0/9] PERFv1 0
@ 2019-02-05  1:14 Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 1/9] PERFv1 7 Andi Kleen
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Walnut is an functional (not security) issue with TSX. The upcoming
microcode updates on Skylake may corrupt perfmon counter 3
when RTM transactions are used.

There is a new MSR that allows to force abort RTM, and free
counter 3.

The following patchkit adds the support to perf to avoid
using counter 3, or disabling TSX when counter 3 is needed
for perf.

There are per perf event and global options to set the
default.

This patch sets the default to TSX enabled, but
that could be easily changed.

We can have a discussion on the trade offs of the default
setting. I suspect it's a decision that should be made by Linus,
as it may impact user programs either way.

The trade offs for setting the option default are:
    
Using 4 (or 8 with HT off) events in perf versus
allowing RTM usage while perf is active.
    
- Existing programs that use perf groups with 4 counters
  may not retrieve perfmon data anymore. Perf usages
  that use less than four (or 7 with HT off) counters
  are not impacted. Perf usages that don't use group
  will still work, but will see increase multiplexing.
    
- TSX programs should not functionally break from
  forcing RTM to abort because they always need a valid
  fall back path. However they will see significantly
  lower performance if they rely on TSX for performance
  (all RTM transactions will run and only abort at the end),
  potentially slowing them down so much that it is
  equivalent to functional breakage.

Patches are against tip/perf/core as of 
commit ca3bb3d027f69ac3ab1dafb32bde2f5a3a44439c (tip/perf/core)
Author: Elena Reshetova <elena.reshetova@intel.com>

-Andi

v1: Initial post

Andi Kleen (9):
  x86/pmu/intel: Export number of counters in caps
  x86/pmu/intel: Add flags argument to x86_pmu::disable
  x86/pmu/intel: Support excluding GP counters in scheduler
  x86/pmu/intel: Handle TSX with counter 3 on Skylake
  x86/pmu/intel: Add perf event attribute to enable counter 3
  x86/pmu/intel: Add global option for enable all counters
  perf stat: Make all existing groups weak
  perf stat: Don't count EL for --transaction with three counters
  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests

 arch/x86/events/core.c             | 52 ++++++++++++++++---
 arch/x86/events/intel/core.c       | 82 ++++++++++++++++++++++++++++--
 arch/x86/events/intel/knc.c        |  2 +-
 arch/x86/events/intel/p4.c         |  4 +-
 arch/x86/events/intel/p6.c         |  2 +-
 arch/x86/events/intel/uncore.c     |  2 +-
 arch/x86/events/perf_event.h       | 19 +++++--
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  3 ++
 arch/x86/include/asm/perf_event.h  |  2 +-
 arch/x86/kvm/cpuid.c               |  3 +-
 arch/x86/kvm/vmx/nested.c          | 11 +++-
 arch/x86/kvm/vmx/vmx.c             | 37 +++++++++++++-
 arch/x86/kvm/vmx/vmx.h             |  1 +
 include/linux/perf_event.h         |  1 +
 tools/perf/builtin-stat.c          | 38 ++++++++++----
 tools/perf/util/pmu.c              | 10 ++++
 tools/perf/util/pmu.h              |  1 +
 18 files changed, 236 insertions(+), 35 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 1/9] PERFv1 7
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 2/9] PERFv1 6 Andi Kleen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Export number of counters in caps

Export the number of generic and fixed counters in the core PMU in caps/
This helps users and tools with constructing groups. Will be also more
important with upcoming patches that can change the number of counters.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..58e659bfc2d9 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2248,8 +2248,28 @@ static ssize_t max_precise_show(struct device *cdev,
 
 static DEVICE_ATTR_RO(max_precise);
 
+static ssize_t num_counter_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+}
+
+static DEVICE_ATTR_RO(num_counter);
+
+static ssize_t num_counter_fixed_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters_fixed);
+}
+
+static DEVICE_ATTR_RO(num_counter_fixed);
+
 static struct attribute *x86_pmu_caps_attrs[] = {
 	&dev_attr_max_precise.attr,
+	&dev_attr_num_counter.attr,
+	&dev_attr_num_counter_fixed.attr,
 	NULL
 };
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 2/9] PERFv1 6
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 1/9] PERFv1 7 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05 14:46   ` [MODERATED] " Peter Zijlstra
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 3/9] PERFv1 9 Andi Kleen
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Add flags argument to x86_pmu::disable

Add a flags argument to pass the HEF_* flags down to x86_pmu::disable.
This will be used in a followon patch to avoid some redundant
MSR writes when reprograming counters for updates.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c       | 2 +-
 arch/x86/events/intel/core.c | 4 ++--
 arch/x86/events/intel/knc.c  | 2 +-
 arch/x86/events/intel/p4.c   | 4 ++--
 arch/x86/events/intel/p6.c   | 2 +-
 arch/x86/events/perf_event.h | 4 ++--
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 58e659bfc2d9..a63b6b587bbb 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1350,7 +1350,7 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 
 	if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) {
-		x86_pmu.disable(event);
+		x86_pmu.disable(event, flags);
 		cpuc->events[hwc->idx] = NULL;
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index daafb893449b..84ae6aaf8866 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2042,7 +2042,7 @@ static inline bool event_is_checkpointed(struct perf_event *event)
 	return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
 }
 
-static void intel_pmu_disable_event(struct perf_event *event)
+static void intel_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -2065,7 +2065,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
-	x86_pmu_disable_event(event);
+	x86_pmu_disable_event(event, flags);
 }
 
 static void intel_pmu_del_event(struct perf_event *event)
diff --git a/arch/x86/events/intel/knc.c b/arch/x86/events/intel/knc.c
index 618001c208e8..18e348f9f5e4 100644
--- a/arch/x86/events/intel/knc.c
+++ b/arch/x86/events/intel/knc.c
@@ -174,7 +174,7 @@ static void knc_pmu_enable_all(int added)
 }
 
 static inline void
-knc_pmu_disable_event(struct perf_event *event)
+knc_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	u64 val;
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index dee579efb2b2..f7b0889543fa 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -900,7 +900,7 @@ static void p4_pmu_disable_pebs(void)
 	 */
 }
 
-static inline void p4_pmu_disable_event(struct perf_event *event)
+static inline void p4_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
@@ -922,7 +922,7 @@ static void p4_pmu_disable_all(void)
 		struct perf_event *event = cpuc->events[idx];
 		if (!test_bit(idx, cpuc->active_mask))
 			continue;
-		p4_pmu_disable_event(event);
+		p4_pmu_disable_event(event, 0);
 	}
 
 	p4_pmu_disable_pebs();
diff --git a/arch/x86/events/intel/p6.c b/arch/x86/events/intel/p6.c
index 408879b0c0d4..f8470ed4ebf3 100644
--- a/arch/x86/events/intel/p6.c
+++ b/arch/x86/events/intel/p6.c
@@ -156,7 +156,7 @@ static void p6_pmu_enable_all(int added)
 }
 
 static inline void
-p6_pmu_disable_event(struct perf_event *event)
+p6_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	u64 val = P6_NOP_EVENT;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..e9262f05c82f 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -520,7 +520,7 @@ struct x86_pmu {
 	void		(*disable_all)(void);
 	void		(*enable_all)(int added);
 	void		(*enable)(struct perf_event *);
-	void		(*disable)(struct perf_event *);
+	void		(*disable)(struct perf_event *, int);
 	void		(*add)(struct perf_event *);
 	void		(*del)(struct perf_event *);
 	void		(*read)(struct perf_event *event);
@@ -788,7 +788,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign);
 
 void x86_pmu_stop(struct perf_event *event, int flags);
 
-static inline void x86_pmu_disable_event(struct perf_event *event)
+static inline void x86_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 3/9] PERFv1 9
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 1/9] PERFv1 7 Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 2/9] PERFv1 6 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05 13:58   ` [MODERATED] " Peter Zijlstra
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 4/9] PERFv1 2 Andi Kleen
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Add support for excluding GP counters per event in the x86 event
scheduler. This is right now used for a bug workaround in a followon
patch, but will be also later useful for more efficient PMU
virtualization in KVM.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c         | 24 +++++++++++++++++-------
 arch/x86/events/intel/uncore.c |  2 +-
 arch/x86/events/perf_event.h   |  3 ++-
 include/linux/perf_event.h     |  1 +
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index a63b6b587bbb..da1200ee1917 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -685,6 +685,7 @@ struct perf_sched {
 	int			max_events;
 	int			max_gp;
 	int			saved_states;
+	u64			exclude;
 	struct event_constraint	**constraints;
 	struct sched_state	state;
 	struct sched_state	saved[SCHED_STATES_MAX];
@@ -694,7 +695,7 @@ struct perf_sched {
  * Initialize interator that runs through all events and counters.
  */
 static void perf_sched_init(struct perf_sched *sched, struct event_constraint **constraints,
-			    int num, int wmin, int wmax, int gpmax)
+			    int num, int wmin, int wmax, int gpmax, u64 exclude)
 {
 	int idx;
 
@@ -703,6 +704,7 @@ static void perf_sched_init(struct perf_sched *sched, struct event_constraint **
 	sched->max_weight	= wmax;
 	sched->max_gp		= gpmax;
 	sched->constraints	= constraints;
+	sched->exclude		= exclude;
 
 	for (idx = 0; idx < num; idx++) {
 		if (constraints[idx]->weight == wmin)
@@ -745,6 +747,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
 {
 	struct event_constraint *c;
 	int idx;
+	u64 idxmsk;
 
 	if (!sched->state.unassigned)
 		return false;
@@ -753,10 +756,12 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
 		return false;
 
 	c = sched->constraints[sched->state.event];
+	idxmsk = c->idxmsk64 & ~sched->exclude;
 	/* Prefer fixed purpose counters */
-	if (c->idxmsk64 & (~0ULL << INTEL_PMC_IDX_FIXED)) {
+	if (idxmsk & (~0ULL << INTEL_PMC_IDX_FIXED)) {
 		idx = INTEL_PMC_IDX_FIXED;
-		for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_MAX) {
+		for_each_set_bit_from(idx, (unsigned long *)&idxmsk,
+				      X86_PMC_IDX_MAX) {
 			if (!__test_and_set_bit(idx, sched->state.used))
 				goto done;
 		}
@@ -764,7 +769,7 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
 
 	/* Grab the first unused counter starting with idx */
 	idx = sched->state.counter;
-	for_each_set_bit_from(idx, c->idxmsk, INTEL_PMC_IDX_FIXED) {
+	for_each_set_bit_from(idx, (unsigned long *)&idxmsk, INTEL_PMC_IDX_FIXED) {
 		if (!__test_and_set_bit(idx, sched->state.used)) {
 			if (sched->state.nr_gp++ >= sched->max_gp)
 				return false;
@@ -827,11 +832,12 @@ static bool perf_sched_next_event(struct perf_sched *sched)
  * Assign a counter for each event.
  */
 int perf_assign_events(struct event_constraint **constraints, int n,
-			int wmin, int wmax, int gpmax, int *assign)
+			int wmin, int wmax, int gpmax, int *assign,
+			u64 exclude)
 {
 	struct perf_sched sched;
 
-	perf_sched_init(&sched, constraints, n, wmin, wmax, gpmax);
+	perf_sched_init(&sched, constraints, n, wmin, wmax, gpmax, exclude);
 
 	do {
 		if (!perf_sched_find_counter(&sched))
@@ -851,6 +857,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	struct perf_event *e;
 	int i, wmin, wmax, unsched = 0;
 	struct hw_perf_event *hwc;
+	u64 exclude = 0;
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
 
@@ -873,6 +880,8 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 		hwc = &cpuc->event_list[i]->hw;
 		c = cpuc->event_constraint[i];
 
+		exclude |= hwc->exclude;
+
 		/* never assigned */
 		if (hwc->idx == -1)
 			break;
@@ -894,6 +903,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	if (i != n) {
 		int gpmax = x86_pmu.num_counters;
 
+		gpmax = max_t(int, 0, gpmax - hweight32(exclude));
 		/*
 		 * Do not allow scheduling of more than half the available
 		 * generic counters.
@@ -909,7 +919,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 			gpmax /= 2;
 
 		unsched = perf_assign_events(cpuc->event_constraint, n, wmin,
-					     wmax, gpmax, assign);
+					     wmax, gpmax, assign, exclude);
 	}
 
 	/*
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index d516161c00c4..cfaae41ffbb4 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -442,7 +442,7 @@ static int uncore_assign_events(struct intel_uncore_box *box, int assign[], int
 	/* slow path */
 	if (i != n)
 		ret = perf_assign_events(box->event_constraint, n,
-					 wmin, wmax, n, assign);
+					 wmin, wmax, n, assign, 0);
 
 	if (!assign || ret) {
 		for (i = 0; i < n; i++)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index e9262f05c82f..2ff0d9cee779 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -783,7 +783,8 @@ static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 void x86_pmu_enable_all(int added);
 
 int perf_assign_events(struct event_constraint **constraints, int n,
-			int wmin, int wmax, int gpmax, int *assign);
+		       int wmin, int wmax, int gpmax, int *assign,
+		       u64 exclude);
 int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign);
 
 void x86_pmu_stop(struct perf_event *event, int flags);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6cb5d483ab34..e7caf100d2bb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -125,6 +125,7 @@ struct hw_perf_event {
 			u64		last_tag;
 			unsigned long	config_base;
 			unsigned long	event_base;
+			u64		exclude;
 			int		event_base_rdpmc;
 			int		idx;
 			int		last_cpu;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 4/9] PERFv1 2
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (2 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 3/9] PERFv1 9 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05 14:51   ` [MODERATED] " Peter Zijlstra
  2019-02-05 15:16   ` Peter Zijlstra
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 5/9] PERFv1 8 Andi Kleen
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Handle TSX with counter 3 on Skylake

On Skylake with recent microcode updates due to errata XXX
perfmon general purpose counter 3 can be corrupted when RTM transactions
are executed.

The microcode provides a new MSR to force disable RTM
(make all RTM transactions abort).

This patch adds the low level code to manage this MSR. Depending
on a per event flag TSX is disabled while a perf event that
uses counter 3 is running. When the event flag is not set,
then avoid using counter 3. When the event is scheduled
out again reenable TSX again.

This patch avoids reprograming the MSR when counter 3
is reused for a different MSR.

Future patches will add the user interfaces to enable this
functionality.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c       | 42 ++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h       |  7 ++++-
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  3 +++
 4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 84ae6aaf8866..fb1731af7897 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2042,6 +2042,18 @@ static inline bool event_is_checkpointed(struct perf_event *event)
 	return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
 }
 
+static void intel_pmu_disable_fra(struct perf_event *event, int flags)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (event->hw.idx == 3 &&
+	    cpuc->force_rtm_abort_active &&
+	    !(flags & PERF_EF_UPDATE)) {
+		wrmsrl(MSR_TSX_FORCE_ABORT, 0);
+		cpuc->force_rtm_abort_active = false;
+	}
+}
+
 static void intel_pmu_disable_event(struct perf_event *event, int flags)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -2060,6 +2072,9 @@ static void intel_pmu_disable_event(struct perf_event *event, int flags)
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_disable(event);
 
+	if (static_cpu_has(X86_FEATURE_TSX_FORCE_ABORT))
+		intel_pmu_disable_fra(event, flags);
+
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
 		intel_pmu_disable_fixed(hwc);
 		return;
@@ -2084,6 +2099,30 @@ static void intel_pmu_read_event(struct perf_event *event)
 		x86_perf_event_update(event);
 }
 
+static bool __use_counter3(struct perf_event *event)
+{
+	return event->hw.flags & PERF_X86_EVENT_ABORT_TSX;
+}
+
+static bool use_counter3(struct perf_event *event)
+{
+	return __use_counter3(event) ||
+		(event->group_leader && __use_counter3(event->group_leader));
+}
+
+static void intel_pmu_enable_fra(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (use_counter3(event) &&
+	    event->hw.idx == 3 &&
+	    !cpuc->force_rtm_abort_active) {
+		/* No RMW, so other bits get clobbered */
+		wrmsrl(MSR_TSX_FORCE_ABORT, MSR_TFA_RTM_FORCE_ABORT);
+		cpuc->force_rtm_abort_active = true;
+	}
+}
+
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -2141,6 +2180,9 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
+	if (static_cpu_has(X86_FEATURE_TSX_FORCE_ABORT))
+		intel_pmu_enable_fra(event);
+
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
 		intel_pmu_enable_fixed(event);
 		return;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2ff0d9cee779..23d3751f0128 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -70,7 +70,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_EXCL_ACCT	0x0200 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0800 /* use large PEBS */
-
+#define PERF_X86_EVENT_ABORT_TSX	0x1000 /* force abort TSX */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -242,6 +242,11 @@ struct cpu_hw_events {
 	struct intel_excl_cntrs		*excl_cntrs;
 	int excl_thread_id; /* 0 or 1 */
 
+	/*
+	 * Manage using counter 3 on Skylake with TSX.
+	 */
+	bool		force_rtm_abort_active; /* We set the MSR */
+
 	/*
 	 * AMD specific bits
 	 */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6d6122524711..981ff9479648 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8e40c2446fd1..d7f29ca27ca1 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -666,6 +666,9 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+#define MSR_TSX_FORCE_ABORT 		0x0000010F
+#define MSR_TFA_RTM_FORCE_ABORT 			BIT_ULL(0)
+
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
 #define MSR_IA32_MCG_EBX		0x00000181
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 5/9] PERFv1 8
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (3 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 4/9] PERFv1 2 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 6/9] PERFv1 5 Andi Kleen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Add perf event attribute to enable
 counter 3

Add a "force_rtm_abort" attribute per perf event attribute
to allow user programs to opt in to use counter 3 and disable
TSX while the perf event is active.

This allows non root programs to use groups with 4 counters,
even if TSX is fully enabled.

The actual defaults depend on further patches.

We use config2 bit 0 in the core PMU for this interface.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 21 +++++++++++++++++++++
 arch/x86/events/perf_event.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index fb1731af7897..85c59e56ed1a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3183,6 +3183,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (ret)
 		return ret;
 
+	if (static_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+		if (event->attr.config2 & FORCE_RTM_ABORT)
+			event->hw.flags |= PERF_X86_EVENT_ABORT_TSX;
+		else
+			event->hw.exclude = BIT(3);
+	}
+
 	if (event->attr.precise_ip) {
 		if (!event->attr.freq) {
 			event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD;
@@ -3635,6 +3642,8 @@ PMU_FORMAT_ATTR(ldlat, "config1:0-15");
 
 PMU_FORMAT_ATTR(frontend, "config1:0-23");
 
+PMU_FORMAT_ATTR(force_rtm_abort, "config2:0");
+
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -3670,6 +3679,11 @@ static struct attribute *skl_format_attr[] = {
 	NULL,
 };
 
+static struct attribute *skl_extra_format_attr[] = {
+	&format_attr_force_rtm_abort.attr,
+	NULL,
+};
+
 static __initconst const struct x86_pmu core_pmu = {
 	.name			= "core",
 	.handle_irq		= x86_pmu_handle_irq,
@@ -4135,6 +4149,7 @@ __init int intel_pmu_init(void)
 	struct attribute **mem_attr = NULL;
 	struct attribute **tsx_attr = NULL;
 	struct attribute **to_free = NULL;
+	struct attribute **to_free2 = NULL;
 	union cpuid10_edx edx;
 	union cpuid10_eax eax;
 	union cpuid10_ebx ebx;
@@ -4599,6 +4614,11 @@ __init int intel_pmu_init(void)
 			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
 		pr_cont("Skylake events, ");
 		name = "skylake";
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			extra_attr = merge_attr(extra_attr, skl_extra_format_attr);
+			to_free2 = extra_attr;
+			x86_pmu.attrs = merge_attr(x86_pmu.attrs, skl_extra_attr);
+		}
 		break;
 
 	default:
@@ -4712,6 +4732,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
 	kfree(to_free);
+	kfree(to_free2);
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 23d3751f0128..a6413db1d09a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -775,6 +775,8 @@ int x86_pmu_hw_config(struct perf_event *event);
 
 void x86_pmu_disable_all(void);
 
+#define FORCE_RTM_ABORT (1U << 0)
+
 static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 					  u64 enable_mask)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 6/9] PERFv1 5
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (4 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 5/9] PERFv1 8 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05 15:26   ` [MODERATED] " Peter Zijlstra
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 7/9] PERFv1 1 Andi Kleen
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Add global option for enable all
 counters

Add a global setting to allow to disable RTM when counter 3
is needed for some perf event group. Otherwise RTM will
not be impacted.

This allows existing programs that want to use groups
with all four counters to run without changes.

This patch sets the default to TSX enabled, but
that could be easily changed.

The trade offs for setting the option default are:

Using 4 (or 8 with HT off) events in perf versus
allowing RTM usage while perf is active.

- Existing programs that use perf groups with 4 counters
may not retrieve perfmon data anymore. Perf usages
that use less than four (or 7 with HT off) counters
are not impacted. Perf usages that don't use group
will still work, but will see increase multiplexing.

- TSX programs should not functionally break from
forcing RTM to abort because they always need a valid
fall back path. However they will see significantly
lower performance if they rely on TSX for performance
(all RTM transactions will run and only abort at the end),
potentially slowing them down so much that it is
equivalent to functional breakage.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c       |  8 +++++++-
 arch/x86/events/intel/core.c | 13 ++++++++++++-
 arch/x86/events/perf_event.h |  3 +++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index da1200ee1917..1d570b29130c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2258,11 +2258,17 @@ static ssize_t max_precise_show(struct device *cdev,
 
 static DEVICE_ATTR_RO(max_precise);
 
+bool perf_enable_all_counters __read_mostly;
+
 static ssize_t num_counter_show(struct device *cdev,
 				  struct device_attribute *attr,
 				  char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+	int num = x86_pmu.num_counters;
+	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) &&
+		!perf_enable_all_counters && num > 0)
+		num--;
+	return snprintf(buf, PAGE_SIZE, "%d\n", num);
 }
 
 static DEVICE_ATTR_RO(num_counter);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 85c59e56ed1a..4e74064c62c1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3184,7 +3184,9 @@ static int intel_pmu_hw_config(struct perf_event *event)
 		return ret;
 
 	if (static_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
-		if (event->attr.config2 & FORCE_RTM_ABORT)
+		if ((event->attr.config2 & FORCE_RTM_ABORT) ||
+		    (perf_enable_all_counters &&
+		     !(event->attr.config2 & ALLOW_RTM)))
 			event->hw.flags |= PERF_X86_EVENT_ABORT_TSX;
 		else
 			event->hw.exclude = BIT(3);
@@ -3643,6 +3645,7 @@ PMU_FORMAT_ATTR(ldlat, "config1:0-15");
 PMU_FORMAT_ATTR(frontend, "config1:0-23");
 
 PMU_FORMAT_ATTR(force_rtm_abort, "config2:0");
+PMU_FORMAT_ATTR(allow_rtm, "config2:1");
 
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
@@ -3681,6 +3684,7 @@ static struct attribute *skl_format_attr[] = {
 
 static struct attribute *skl_extra_format_attr[] = {
 	&format_attr_force_rtm_abort.attr,
+	&format_attr_allow_rtm.attr,
 	NULL,
 };
 
@@ -4143,6 +4147,13 @@ get_events_attrs(struct attribute **base,
 	return attrs;
 }
 
+static DEVICE_BOOL_ATTR(enable_all_counters, 0644, perf_enable_all_counters);
+
+static struct attribute *skl_extra_attr[] = {
+	&dev_attr_enable_all_counters.attr.attr,
+	NULL,
+};
+
 __init int intel_pmu_init(void)
 {
 	struct attribute **extra_attr = NULL;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a6413db1d09a..29a7fc8f9f1a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -776,6 +776,7 @@ int x86_pmu_hw_config(struct perf_event *event);
 void x86_pmu_disable_all(void);
 
 #define FORCE_RTM_ABORT (1U << 0)
+#define ALLOW_RTM	(1U << 1)
 
 static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 					  u64 enable_mask)
@@ -863,6 +864,8 @@ static inline int amd_pmu_init(void)
 
 #endif /* CONFIG_CPU_SUP_AMD */
 
+extern bool perf_enable_all_counters;
+
 #ifdef CONFIG_CPU_SUP_INTEL
 
 static inline bool intel_pmu_has_bts(struct perf_event *event)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 7/9] PERFv1 1
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (5 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 6/9] PERFv1 5 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 8/9] PERFv1 4 Andi Kleen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  perf stat: Make all existing groups weak

Now that we may only have three counters make the --topdown and
the --transaction groups weak, so that they still work.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e587808591e8..c94f5ed135f1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -101,7 +101,7 @@ static const char *transaction_attrs = {
 	"cpu/tx-start/,"
 	"cpu/el-start/,"
 	"cpu/cycles-ct/"
-	"}"
+	"}:W"
 };
 
 /* More limited version when the CPU does not have all events. */
@@ -112,7 +112,7 @@ static const char * transaction_limited_attrs = {
 	"cycles,"
 	"cpu/cycles-t/,"
 	"cpu/tx-start/"
-	"}"
+	"}:W"
 };
 
 static const char * topdown_attrs[] = {
@@ -999,7 +999,7 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	attr[i - off] = NULL;
 
-	*str = malloc(len + 1 + 2);
+	*str = malloc(len + 1 + 2 + 2);
 	if (!*str)
 		return -1;
 	s = *str;
@@ -1016,6 +1016,8 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	if (use_group) {
 		s[-1] = '}';
+		*s++ = ':';
+		*s++ = 'W';
 		*s = 0;
 	} else
 		s[-1] = 0;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 8/9] PERFv1 4
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (6 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 7/9] PERFv1 1 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 9/9] PERFv1 3 Andi Kleen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

When the system only has three counters HLE (EL) is also not
available. Detect that there are only three counters and
then automatically disable el-starts for --transaction.
This avoids event multiplexing in this situation with no loss
of information.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 30 ++++++++++++++++++++++--------
 tools/perf/util/pmu.c     | 10 ++++++++++
 tools/perf/util/pmu.h     |  1 +
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c94f5ed135f1..59a5bf0389b7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -104,6 +104,18 @@ static const char *transaction_attrs = {
 	"}:W"
 };
 
+static const char *transaction_noel_attrs = {
+	"task-clock,"
+	"{"
+	"instructions,"
+	"cycles,"
+	"cpu/cycles-t/,"
+	"cpu/tx-start/,"
+	"cpu/cycles-ct/"
+	"}:W"
+};
+
+
 /* More limited version when the CPU does not have all events. */
 static const char * transaction_limited_attrs = {
 	"task-clock,"
@@ -1160,6 +1172,8 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
+		const char *attrs = transaction_limited_attrs;
+
 		/* Handle -T as -M transaction. Once platform specific metrics
 		 * support has been added to the json files, all archictures
 		 * will use this approach. To determine transaction support
@@ -1173,16 +1187,16 @@ static int add_default_attributes(void)
 		}
 
 		if (pmu_have_event("cpu", "cycles-ct") &&
-		    pmu_have_event("cpu", "el-start"))
-			err = parse_events(evsel_list, transaction_attrs,
-					   &errinfo);
-		else
-			err = parse_events(evsel_list,
-					   transaction_limited_attrs,
-					   &errinfo);
+		    pmu_have_event("cpu", "el-start")) {
+			if (pmu_num_counters("cpu") == 3)
+				attrs = transaction_noel_attrs;
+			else
+				attrs = transaction_attrs;
+		}
+		err = parse_events(evsel_list, attrs, &errinfo);
 		if (err) {
 			fprintf(stderr, "Cannot set up transaction events\n");
-			parse_events_print_error(&errinfo, transaction_attrs);
+			parse_events_print_error(&errinfo, attrs);
 			return -1;
 		}
 		return 0;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 11a234740632..5e17409b7ff6 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1495,3 +1495,13 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+int pmu_num_counters(const char *pname)
+{
+	unsigned long num;
+	struct perf_pmu *pmu = perf_pmu__find(pname);
+
+	if (pmu && perf_pmu__scan_file(pmu, "caps/num_counter", "%ld", &num) == 1)
+		return num;
+	return -1;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 76fecec7b3f9..6e772243fd1d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -85,6 +85,7 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
 void print_pmu_events(const char *event_glob, bool name_only, bool quiet,
 		      bool long_desc, bool details_flag);
 bool pmu_have_event(const char *pname, const char *name);
+int pmu_num_counters(const char *);
 
 int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt, ...) __scanf(3, 4);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] [PATCH v1 9/9] PERFv1 3
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (7 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 8/9] PERFv1 4 Andi Kleen
@ 2019-02-05  1:14 ` Andi Kleen
  2019-02-05 15:28   ` [MODERATED] " Peter Zijlstra
  2019-02-05 14:03 ` [MODERATED] Re: [PATCH v1 0/9] PERFv1 0 Greg KH
  2019-02-05 15:32 ` Peter Zijlstra
  10 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05  1:14 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Recent microcode for Skylake added a new CPUID bit and MSR to control
TSX aborting and enabling PMU counter 3. This patch adds support
for controlling counter 3 from KVM guests.

The implementation is similar to how SPEC_CTRL is implemented. Initially
the MSR is just intercepted, to avoid any impact on the entry/exit.
When the guest uses the MSR the first time add the MSR to the
entry/exit list to context switch it against the host.

Also export the CPUID bit.

Open: nesting support still needs to be tested.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c      |  4 +++-
 arch/x86/include/asm/perf_event.h |  2 +-
 arch/x86/kvm/cpuid.c              |  3 ++-
 arch/x86/kvm/vmx/nested.c         | 11 ++++++++-
 arch/x86/kvm/vmx/vmx.c            | 37 ++++++++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.h            |  1 +
 6 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4e74064c62c1..12bf728250bf 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3240,8 +3240,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	return 0;
 }
 
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
+struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, int *tsx_force_abort)
 {
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	*tsx_force_abort = cpuc->force_rtm_abort_active;
 	if (x86_pmu.guest_get_msrs)
 		return x86_pmu.guest_get_msrs(nr);
 	*nr = 0;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8bdf74902293..d6131cd66d14 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -275,7 +275,7 @@ struct perf_guest_switch_msr {
 	u64 host, guest;
 };
 
-extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, int *tsx_force_abort);
 extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
 extern void perf_check_microcode(void);
 extern int x86_perf_rdpmc_index(struct perf_event *event);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bbffa6c54697..2570b17ac372 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -409,7 +409,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
-		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
+		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+		F(TSX_FORCE_ABORT);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8ff20523661b..d61ad51bcf66 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -525,6 +525,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	 */
 	bool pred_cmd = !msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD);
 	bool spec_ctrl = !msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL);
+	bool tsx_force_abort = !msr_write_intercepted_l01(vcpu, MSR_TSX_FORCE_ABORT);
 
 	/* Nothing to do if the MSR bitmap is not in use.  */
 	if (!cpu_has_vmx_msr_bitmap() ||
@@ -532,7 +533,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 		return false;
 
 	if (!nested_cpu_has_virt_x2apic_mode(vmcs12) &&
-	    !pred_cmd && !spec_ctrl)
+	    !pred_cmd && !spec_ctrl && !tsx_force_abort)
 		return false;
 
 	page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap);
@@ -587,6 +588,14 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 					MSR_IA32_PRED_CMD,
 					MSR_TYPE_W);
 
+	if (tsx_force_abort)
+		nested_vmx_disable_intercept_for_msr(
+					msr_bitmap_l1, msr_bitmap_l0,
+					MSR_TSX_FORCE_ABORT,
+					MSR_TYPE_R | MSR_TYPE_W);
+
+
+
 	kunmap(page);
 	kvm_release_page_clean(page);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 95d618045001..9c0f07fa87df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1682,6 +1682,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		msr_info->data = to_vmx(vcpu)->spec_ctrl;
 		break;
+	case MSR_TSX_FORCE_ABORT:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_TSX_FORCE_ABORT))
+			return 1;
+
+		msr_info->data = to_vmx(vcpu)->tsx_force_abort;
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
@@ -1894,6 +1901,20 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD,
 					      MSR_TYPE_W);
 		break;
+	case MSR_TSX_FORCE_ABORT:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_TSX_FORCE_ABORT))
+			return 1;
+		if (data & ~1ULL)
+			return 1;
+		vmx->tsx_force_abort = data;
+		/* When it is disabled use whatever the host uses */
+		if (!data)
+			break;
+		vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_TSX_FORCE_ABORT,
+					      MSR_TYPE_RW);
+		break;
+
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated)
 			return 1;
@@ -6310,8 +6331,22 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 {
 	int i, nr_msrs;
 	struct perf_guest_switch_msr *msrs;
+	int tsx_force_abort;
 
-	msrs = perf_guest_get_msrs(&nr_msrs);
+	msrs = perf_guest_get_msrs(&nr_msrs, &tsx_force_abort);
+
+	/*
+	 * vmx->tsx_force_abort is only the value from the first guest
+	 * MSR write, but if the guest resets it back to zero we still
+	 * allow the host value to win. This allows perfmon monitoring
+	 * of the guest from the host.
+	 */
+	if (vmx->tsx_force_abort == tsx_force_abort)
+		clear_atomic_switch_msr(vmx, MSR_TSX_FORCE_ABORT);
+	else
+		add_atomic_switch_msr(vmx, MSR_TSX_FORCE_ABORT,
+				      vmx->tsx_force_abort,
+				      tsx_force_abort, false);
 
 	if (!msrs)
 		return;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 99328954c2fc..f0151d8aedfe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -193,6 +193,7 @@ struct vcpu_vmx {
 
 	u64		      arch_capabilities;
 	u64		      spec_ctrl;
+	u64		      tsx_force_abort;
 
 	u32 vm_entry_controls_shadow;
 	u32 vm_exit_controls_shadow;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 3/9] PERFv1 9
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 3/9] PERFv1 9 Andi Kleen
@ 2019-02-05 13:58   ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 13:58 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:04PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/pmu/intel: Support excluding GP counters in
>  scheduler
> 
> Add support for excluding GP counters per event in the x86 event
> scheduler. This is right now used for a bug workaround in a followon
> patch, but will be also later useful for more efficient PMU
> virtualization in KVM.

No it will not... I've said it before, KVM will not constrain counter
scheduling like that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 0/9] PERFv1 0
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (8 preceding siblings ...)
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 9/9] PERFv1 3 Andi Kleen
@ 2019-02-05 14:03 ` Greg KH
  2019-02-05 16:36   ` Andi Kleen
  2019-02-05 15:32 ` Peter Zijlstra
  10 siblings, 1 reply; 30+ messages in thread
From: Greg KH @ 2019-02-05 14:03 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:01PM -0800, speck for Andi Kleen wrote:
> v1: Initial post
> 
> Andi Kleen (9):
>   x86/pmu/intel: Export number of counters in caps
>   x86/pmu/intel: Add flags argument to x86_pmu::disable
>   x86/pmu/intel: Support excluding GP counters in scheduler
>   x86/pmu/intel: Handle TSX with counter 3 on Skylake
>   x86/pmu/intel: Add perf event attribute to enable counter 3
>   x86/pmu/intel: Add global option for enable all counters
>   perf stat: Make all existing groups weak
>   perf stat: Don't count EL for --transaction with three counters
>   kvm: vmx: Support TSX_FORCE_ABORT in KVM guests

Meta-comment, can you fix up and use the updated script to post these?
The [XX/NN] and "N" numbers do not match up at all and it's hard to
figure out which to trust.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 2/9] PERFv1 6
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 2/9] PERFv1 6 Andi Kleen
@ 2019-02-05 14:46   ` Peter Zijlstra
  2019-02-05 16:37     ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 14:46 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:03PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/pmu/intel: Add flags argument to x86_pmu::disable
> 
> Add a flags argument to pass the HEF_* flags down to x86_pmu::disable.
> This will be used in a followon patch to avoid some redundant
> MSR writes when reprograming counters for updates.

Neither this nor that follow up patch explains what disable needs these
flags for. Now I could probably reverse engineer it; but you really
should know better by now.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 4/9] PERFv1 2
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 4/9] PERFv1 2 Andi Kleen
@ 2019-02-05 14:51   ` Peter Zijlstra
  2019-02-05 16:35     ` Andi Kleen
  2019-02-05 15:16   ` Peter Zijlstra
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 14:51 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:05PM -0800, speck for Andi Kleen wrote:
> @@ -242,6 +242,11 @@ struct cpu_hw_events {
>  	struct intel_excl_cntrs		*excl_cntrs;
>  	int excl_thread_id; /* 0 or 1 */
>  
> +	/*
> +	 * Manage using counter 3 on Skylake with TSX.
> +	 */
> +	bool		force_rtm_abort_active; /* We set the MSR */
> +
>  	/*
>  	 * AMD specific bits
>  	 */

How often must we repeat; no bool in composite types... and could you
really not come up with a longer name?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 4/9] PERFv1 2
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 4/9] PERFv1 2 Andi Kleen
  2019-02-05 14:51   ` [MODERATED] " Peter Zijlstra
@ 2019-02-05 15:16   ` Peter Zijlstra
  2019-02-05 16:52     ` Andi Kleen
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 15:16 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:05PM -0800, speck for Andi Kleen wrote:
> +static bool __use_counter3(struct perf_event *event)
> +{
> +	return event->hw.flags & PERF_X86_EVENT_ABORT_TSX;
> +}
> +
> +static bool use_counter3(struct perf_event *event)
> +{
> +	return __use_counter3(event) ||
> +		(event->group_leader && __use_counter3(event->group_leader));
> +}

See; I _really_ hate this per-event noodling.

Why not just have the sysctl tell us if PMC3 is available or not and set
FRA when PMC3 gets selected, teh end.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 6/9] PERFv1 5
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 6/9] PERFv1 5 Andi Kleen
@ 2019-02-05 15:26   ` Peter Zijlstra
  2019-02-05 16:45     ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 15:26 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:07PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/pmu/intel: Add global option for enable all
>  counters
> 
> Add a global setting to allow to disable RTM when counter 3
> is needed for some perf event group. Otherwise RTM will
> not be impacted.
> 
> This allows existing programs that want to use groups
> with all four counters to run without changes.
> 
> This patch sets the default to TSX enabled, but
> that could be easily changed.
> 
> The trade offs for setting the option default are:
> 
> Using 4 (or 8 with HT off) events in perf versus
> allowing RTM usage while perf is active.
> 
> - Existing programs that use perf groups with 4 counters
> may not retrieve perfmon data anymore. Perf usages
> that use less than four (or 7 with HT off) counters
> are not impacted. Perf usages that don't use group
> will still work, but will see increase multiplexing.
> 
> - TSX programs should not functionally break from
> forcing RTM to abort because they always need a valid
> fall back path. However they will see significantly
> lower performance if they rely on TSX for performance
> (all RTM transactions will run and only abort at the end),
> potentially slowing them down so much that it is
> equivalent to functional breakage.

So you know tglx and me have been arguing for the opposite default. This
will 'silently' break stuff that relies on having the counters present
-- and there are people who do indeed rely on that.

At the same time; by default there are no (with the possible exception
of the NMI watchdog) users of the PMU, and I don't think TSX is in fact
used by default either.

This means that only people that explicitly use TSX and perf at the same
time are impacted with that default, and the rest will actually work.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 9/9] PERFv1 3
  2019-02-05  1:14 ` [MODERATED] [PATCH v1 9/9] PERFv1 3 Andi Kleen
@ 2019-02-05 15:28   ` Peter Zijlstra
  2019-02-05 16:48     ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 15:28 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:10PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> 
> Recent microcode for Skylake added a new CPUID bit and MSR to control
> TSX aborting and enabling PMU counter 3. This patch adds support
> for controlling counter 3 from KVM guests.
> 
> The implementation is similar to how SPEC_CTRL is implemented. Initially
> the MSR is just intercepted, to avoid any impact on the entry/exit.
> When the guest uses the MSR the first time add the MSR to the
> entry/exit list to context switch it against the host.
> 

What this does not explain is how the guest can ever possibly use this
correctly, since it doesn't have access to PMC3.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 0/9] PERFv1 0
  2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
                   ` (9 preceding siblings ...)
  2019-02-05 14:03 ` [MODERATED] Re: [PATCH v1 0/9] PERFv1 0 Greg KH
@ 2019-02-05 15:32 ` Peter Zijlstra
  10 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 15:32 UTC (permalink / raw)
  To: speck

On Mon, Feb 04, 2019 at 05:14:01PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  PERFv1
> 
> Walnut is an functional (not security) issue with TSX. The upcoming
> microcode updates on Skylake may corrupt perfmon counter 3
> when RTM transactions are used.
> 
> There is a new MSR that allows to force abort RTM, and free
> counter 3.
> 
> The following patchkit adds the support to perf to avoid
> using counter 3, or disabling TSX when counter 3 is needed
> for perf.
> 
> There are per perf event and global options to set the
> default.
> 
> This patch sets the default to TSX enabled, but
> that could be easily changed.
> 
> We can have a discussion on the trade offs of the default
> setting. I suspect it's a decision that should be made by Linus,
> as it may impact user programs either way.
> 
> The trade offs for setting the option default are:
>     
> Using 4 (or 8 with HT off) events in perf versus
> allowing RTM usage while perf is active.
>     
> - Existing programs that use perf groups with 4 counters
>   may not retrieve perfmon data anymore. Perf usages
>   that use less than four (or 7 with HT off) counters
>   are not impacted. Perf usages that don't use group
>   will still work, but will see increase multiplexing.
>     
> - TSX programs should not functionally break from
>   forcing RTM to abort because they always need a valid
>   fall back path. However they will see significantly
>   lower performance if they rely on TSX for performance
>   (all RTM transactions will run and only abort at the end),
>   potentially slowing them down so much that it is
>   equivalent to functional breakage.

Like said; I absolutely _hate_ all of this. But I think the very worst
of it is that this PMC3 corruption is on-by-default in the ucode. This
means that people who update their ucode but don't update their kernel
are utterly screwed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 4/9] PERFv1 2
  2019-02-05 14:51   ` [MODERATED] " Peter Zijlstra
@ 2019-02-05 16:35     ` Andi Kleen
  2019-02-05 16:53       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:35 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 03:51:09PM +0100, speck for Peter Zijlstra wrote:
> On Mon, Feb 04, 2019 at 05:14:05PM -0800, speck for Andi Kleen wrote:
> > @@ -242,6 +242,11 @@ struct cpu_hw_events {
> >  	struct intel_excl_cntrs		*excl_cntrs;
> >  	int excl_thread_id; /* 0 or 1 */
> >  
> > +	/*
> > +	 * Manage using counter 3 on Skylake with TSX.
> > +	 */
> > +	bool		force_rtm_abort_active; /* We set the MSR */
> > +
> >  	/*
> >  	 * AMD specific bits
> >  	 */
> 
> How often must we repeat; no bool in composite types... and could you
> really not come up with a longer name?

Okay.

However let me point to https://blogs.msdn.microsoft.com/oldnewthing/20081126-00/?p=20073

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 0/9] PERFv1 0
  2019-02-05 14:03 ` [MODERATED] Re: [PATCH v1 0/9] PERFv1 0 Greg KH
@ 2019-02-05 16:36   ` Andi Kleen
  2019-02-05 18:20     ` Greg KH
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:36 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 03:03:29PM +0100, speck for Greg KH wrote:
> On Mon, Feb 04, 2019 at 05:14:01PM -0800, speck for Andi Kleen wrote:
> > v1: Initial post
> > 
> > Andi Kleen (9):
> >   x86/pmu/intel: Export number of counters in caps
> >   x86/pmu/intel: Add flags argument to x86_pmu::disable
> >   x86/pmu/intel: Support excluding GP counters in scheduler
> >   x86/pmu/intel: Handle TSX with counter 3 on Skylake
> >   x86/pmu/intel: Add perf event attribute to enable counter 3
> >   x86/pmu/intel: Add global option for enable all counters
> >   perf stat: Make all existing groups weak
> >   perf stat: Don't count EL for --transaction with three counters
> >   kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> 
> Meta-comment, can you fix up and use the updated script to post these?
> The [XX/NN] and "N" numbers do not match up at all and it's hard to
> figure out which to trust.

I'm just using Thomas' script.  Let me post a mbox.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 2/9] PERFv1 6
  2019-02-05 14:46   ` [MODERATED] " Peter Zijlstra
@ 2019-02-05 16:37     ` Andi Kleen
  0 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:37 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 03:46:27PM +0100, speck for Peter Zijlstra wrote:
> On Mon, Feb 04, 2019 at 05:14:03PM -0800, speck for Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > Subject:  x86/pmu/intel: Add flags argument to x86_pmu::disable
> > 
> > Add a flags argument to pass the HEF_* flags down to x86_pmu::disable.
> > This will be used in a followon patch to avoid some redundant
> > MSR writes when reprograming counters for updates.
> 
> Neither this nor that follow up patch explains what disable needs these
> flags for. Now I could probably reverse engineer it; but you really
> should know better by now.

Just to know that the counter is getting reused, so can skip
the extra MSR disable. I'll mention it.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 6/9] PERFv1 5
  2019-02-05 15:26   ` [MODERATED] " Peter Zijlstra
@ 2019-02-05 16:45     ` Andi Kleen
  0 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:45 UTC (permalink / raw)
  To: speck

> So you know tglx and me have been arguing for the opposite default. This
> will 'silently' break stuff that relies on having the counters present
> -- and there are people who do indeed rely on that.

Right.

> 
> At the same time; by default there are no (with the possible exception
> of the NMI watchdog) users of the PMU, 

It seems there is more and more perf use for various purposes,
but it's hard to say.

> and I don't think TSX is in fact used by default either.

There are workloads that use it by default and need it for
performance.

> 
> This means that only people that explicitly use TSX and perf at the same
> time are impacted with that default, and the rest will actually work.

That's right.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 9/9] PERFv1 3
  2019-02-05 15:28   ` [MODERATED] " Peter Zijlstra
@ 2019-02-05 16:48     ` Andi Kleen
  2019-02-05 17:13       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:48 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 04:28:04PM +0100, speck for Peter Zijlstra wrote:
> On Mon, Feb 04, 2019 at 05:14:10PM -0800, speck for Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > Subject:  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> > 
> > Recent microcode for Skylake added a new CPUID bit and MSR to control
> > TSX aborting and enabling PMU counter 3. This patch adds support
> > for controlling counter 3 from KVM guests.
> > 
> > The implementation is similar to how SPEC_CTRL is implemented. Initially
> > the MSR is just intercepted, to avoid any impact on the entry/exit.
> > When the guest uses the MSR the first time add the MSR to the
> > entry/exit list to context switch it against the host.
> > 
> 
> What this does not explain is how the guest can ever possibly use this
> correctly, since it doesn't have access to PMC3.

Yes you're right. In the current form it is only useful for
people who really want to disable TSX and don't care about the PMU.

But we should really care about the PMU.
I guess we can force the flag on the KVM PMU events instead.

Then the host perf will do the right thing.

It won't allow a guest to disable TSX for some other reason, 
but i guess that's ok.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 4/9] PERFv1 2
  2019-02-05 15:16   ` Peter Zijlstra
@ 2019-02-05 16:52     ` Andi Kleen
  0 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 16:52 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 04:16:57PM +0100, speck for Peter Zijlstra wrote:
> On Mon, Feb 04, 2019 at 05:14:05PM -0800, speck for Andi Kleen wrote:
> > +static bool __use_counter3(struct perf_event *event)
> > +{
> > +	return event->hw.flags & PERF_X86_EVENT_ABORT_TSX;
> > +}
> > +
> > +static bool use_counter3(struct perf_event *event)
> > +{
> > +	return __use_counter3(event) ||
> > +		(event->group_leader && __use_counter3(event->group_leader));
> > +}
> 
> See; I _really_ hate this per-event noodling.
> 
> Why not just have the sysctl tell us if PMC3 is available or not and set
> FRA when PMC3 gets selected, teh end.

I was really trying to allow non root users to control this.

The experience with sysctls in perf in the past was usually that
non root users need to use them frequently, and "sudo" is not
a solution for everyone.

It would be needed for the proper KVM support at least,
unless you want KVM to toggle the sysctl.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 4/9] PERFv1 2
  2019-02-05 16:35     ` Andi Kleen
@ 2019-02-05 16:53       ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 16:53 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 08:35:58AM -0800, speck for Andi Kleen wrote:
> On Tue, Feb 05, 2019 at 03:51:09PM +0100, speck for Peter Zijlstra wrote:
> > On Mon, Feb 04, 2019 at 05:14:05PM -0800, speck for Andi Kleen wrote:
> > > @@ -242,6 +242,11 @@ struct cpu_hw_events {
> > >  	struct intel_excl_cntrs		*excl_cntrs;
> > >  	int excl_thread_id; /* 0 or 1 */
> > >  
> > > +	/*
> > > +	 * Manage using counter 3 on Skylake with TSX.
> > > +	 */
> > > +	bool		force_rtm_abort_active; /* We set the MSR */
> > > +
> > >  	/*
> > >  	 * AMD specific bits
> > >  	 */
> > 
> > How often must we repeat; no bool in composite types... and could you
> > really not come up with a longer name?
> 
> Okay.
> 
> However let me point to https://blogs.msdn.microsoft.com/oldnewthing/20081126-00/?p=20073

I'm not saying you have to use a bitfield, but use a type with
well-defined storage size.

Note how that article assumes BOOL := int, this is not true in general.
C spec does not define sizeof(_Bool) and leaves it to ABI specs.

For linux-x86 _Bool ends up being u8 for example. For Darwin-PowerPC
_Bool is u32 (just like that windows thing).

And us having to tell you things over and over is just annoying.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 9/9] PERFv1 3
  2019-02-05 16:48     ` Andi Kleen
@ 2019-02-05 17:13       ` Peter Zijlstra
  2019-02-05 17:26         ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 17:13 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 08:48:29AM -0800, speck for Andi Kleen wrote:
> On Tue, Feb 05, 2019 at 04:28:04PM +0100, speck for Peter Zijlstra wrote:
> > On Mon, Feb 04, 2019 at 05:14:10PM -0800, speck for Andi Kleen wrote:
> > > From: Andi Kleen <ak@linux.intel.com>
> > > Subject:  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> > > 
> > > Recent microcode for Skylake added a new CPUID bit and MSR to control
> > > TSX aborting and enabling PMU counter 3. This patch adds support
> > > for controlling counter 3 from KVM guests.
> > > 
> > > The implementation is similar to how SPEC_CTRL is implemented. Initially
> > > the MSR is just intercepted, to avoid any impact on the entry/exit.
> > > When the guest uses the MSR the first time add the MSR to the
> > > entry/exit list to context switch it against the host.
> > > 
> > 
> > What this does not explain is how the guest can ever possibly use this
> > correctly, since it doesn't have access to PMC3.
> 
> Yes you're right. In the current form it is only useful for
> people who really want to disable TSX and don't care about the PMU.

We should probably have a notsx boot option that completely disables the
feature including its userspace enumeration. Some security consious
folks might actually like that. TSX has been a boon to all the fault
driven exploit muck.

And userspace should be able to do this; we killed TSX on some early
models before, right?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 9/9] PERFv1 3
  2019-02-05 17:13       ` Peter Zijlstra
@ 2019-02-05 17:26         ` Andi Kleen
  2019-02-05 18:30           ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Andi Kleen @ 2019-02-05 17:26 UTC (permalink / raw)
  To: speck

> We should probably have a notsx boot option that completely disables the
> feature including its userspace enumeration. Some security consious
> folks might actually like that. TSX has been a boon to all the fault
> driven exploit muck.

TSX generally only messes with exploit detection approaches, which
are IMHO all snake oil anyways because they can be always circumvented
in other ways. If you dig into it more deeply there's no clear security case
for actually disabling TSX. In some cases it can make exploit 
easier to write, but that alone isn't a good reason to disable it.

> And userspace should be able to do this; we killed TSX on some early
> models before, right?

On parts of the models without many cores. The large core count parts
always had it.

Functionally yes, but there are workloads that rely on it for performance,
which would be effectively broken too.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 0/9] PERFv1 0
  2019-02-05 16:36   ` Andi Kleen
@ 2019-02-05 18:20     ` Greg KH
  2019-02-06  1:16       ` Andi Kleen
  0 siblings, 1 reply; 30+ messages in thread
From: Greg KH @ 2019-02-05 18:20 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 08:36:48AM -0800, speck for Andi Kleen wrote:
> On Tue, Feb 05, 2019 at 03:03:29PM +0100, speck for Greg KH wrote:
> > On Mon, Feb 04, 2019 at 05:14:01PM -0800, speck for Andi Kleen wrote:
> > > v1: Initial post
> > > 
> > > Andi Kleen (9):
> > >   x86/pmu/intel: Export number of counters in caps
> > >   x86/pmu/intel: Add flags argument to x86_pmu::disable
> > >   x86/pmu/intel: Support excluding GP counters in scheduler
> > >   x86/pmu/intel: Handle TSX with counter 3 on Skylake
> > >   x86/pmu/intel: Add perf event attribute to enable counter 3
> > >   x86/pmu/intel: Add global option for enable all counters
> > >   perf stat: Make all existing groups weak
> > >   perf stat: Don't count EL for --transaction with three counters
> > >   kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> > 
> > Meta-comment, can you fix up and use the updated script to post these?
> > The [XX/NN] and "N" numbers do not match up at all and it's hard to
> > figure out which to trust.
> 
> I'm just using Thomas' script.  Let me post a mbox.

He posted a fix for the script to address this issue, so if you could
use that next time, that would make reviewing a lot easier.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 9/9] PERFv1 3
  2019-02-05 17:26         ` Andi Kleen
@ 2019-02-05 18:30           ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-02-05 18:30 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 09:26:13AM -0800, speck for Andi Kleen wrote:
> > We should probably have a notsx boot option that completely disables the
> > feature including its userspace enumeration. Some security consious
> > folks might actually like that. TSX has been a boon to all the fault
> > driven exploit muck.
> 
> TSX generally only messes with exploit detection approaches, which
> are IMHO all snake oil anyways because they can be always circumvented
> in other ways. If you dig into it more deeply there's no clear security case
> for actually disabling TSX. In some cases it can make exploit 
> easier to write, but that alone isn't a good reason to disable it.

Making exploits harder is enough for some.

> > And userspace should be able to do this; we killed TSX on some early
> > models before, right?
> 
> On parts of the models without many cores. The large core count parts
> always had it.

Doesn't matter some always had it; userspace now knows how to handle no
TSX.

> Functionally yes, but there are workloads that rely on it for performance,
> which would be effectively broken too.

Which is an choice the admin should be able to make.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [MODERATED] Re: [PATCH v1 0/9] PERFv1 0
  2019-02-05 18:20     ` Greg KH
@ 2019-02-06  1:16       ` Andi Kleen
  0 siblings, 0 replies; 30+ messages in thread
From: Andi Kleen @ 2019-02-06  1:16 UTC (permalink / raw)
  To: speck

On Tue, Feb 05, 2019 at 07:20:00PM +0100, speck for Greg KH wrote:
> On Tue, Feb 05, 2019 at 08:36:48AM -0800, speck for Andi Kleen wrote:
> > On Tue, Feb 05, 2019 at 03:03:29PM +0100, speck for Greg KH wrote:
> > > On Mon, Feb 04, 2019 at 05:14:01PM -0800, speck for Andi Kleen wrote:
> > > > v1: Initial post
> > > > 
> > > > Andi Kleen (9):
> > > >   x86/pmu/intel: Export number of counters in caps
> > > >   x86/pmu/intel: Add flags argument to x86_pmu::disable
> > > >   x86/pmu/intel: Support excluding GP counters in scheduler
> > > >   x86/pmu/intel: Handle TSX with counter 3 on Skylake
> > > >   x86/pmu/intel: Add perf event attribute to enable counter 3
> > > >   x86/pmu/intel: Add global option for enable all counters
> > > >   perf stat: Make all existing groups weak
> > > >   perf stat: Don't count EL for --transaction with three counters
> > > >   kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> > > 
> > > Meta-comment, can you fix up and use the updated script to post these?
> > > The [XX/NN] and "N" numbers do not match up at all and it's hard to
> > > figure out which to trust.
> > 
> > I'm just using Thomas' script.  Let me post a mbox.
> 
> He posted a fix for the script to address this issue, so if you could
> use that next time, that would make reviewing a lot easier.

I can't find it.

I'll do the next post with the old script, if someone sends a new
script I will use it after that.

However I expect any serious use to be done using the mbox
anyways, so hopefully it's not too big a problem.

-Andi

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-02-06  1:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05  1:14 [MODERATED] [PATCH v1 0/9] PERFv1 0 Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 1/9] PERFv1 7 Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 2/9] PERFv1 6 Andi Kleen
2019-02-05 14:46   ` [MODERATED] " Peter Zijlstra
2019-02-05 16:37     ` Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 3/9] PERFv1 9 Andi Kleen
2019-02-05 13:58   ` [MODERATED] " Peter Zijlstra
2019-02-05  1:14 ` [MODERATED] [PATCH v1 4/9] PERFv1 2 Andi Kleen
2019-02-05 14:51   ` [MODERATED] " Peter Zijlstra
2019-02-05 16:35     ` Andi Kleen
2019-02-05 16:53       ` Peter Zijlstra
2019-02-05 15:16   ` Peter Zijlstra
2019-02-05 16:52     ` Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 5/9] PERFv1 8 Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 6/9] PERFv1 5 Andi Kleen
2019-02-05 15:26   ` [MODERATED] " Peter Zijlstra
2019-02-05 16:45     ` Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 7/9] PERFv1 1 Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 8/9] PERFv1 4 Andi Kleen
2019-02-05  1:14 ` [MODERATED] [PATCH v1 9/9] PERFv1 3 Andi Kleen
2019-02-05 15:28   ` [MODERATED] " Peter Zijlstra
2019-02-05 16:48     ` Andi Kleen
2019-02-05 17:13       ` Peter Zijlstra
2019-02-05 17:26         ` Andi Kleen
2019-02-05 18:30           ` Peter Zijlstra
2019-02-05 14:03 ` [MODERATED] Re: [PATCH v1 0/9] PERFv1 0 Greg KH
2019-02-05 16:36   ` Andi Kleen
2019-02-05 18:20     ` Greg KH
2019-02-06  1:16       ` Andi Kleen
2019-02-05 15:32 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.