historical-speck.lore.kernel.org archive mirror
 help / color / mirror / Atom feed
* [MODERATED] [PATCH v3 0/6] PERFv3
@ 2019-02-07 23:41 Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 1/6] PERFv3 Andi Kleen
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Walnut is an functional (not security) issue with TSX. The upcoming
microcode updates on Skylake may corrupt perfmon counter 3
when RTM transactions are used.

There is a new MSR that allows to force abort RTM, and free
counter 3.

The following patchkit adds the support to perf to avoid
using counter 3, or disabling TSX when counter 3 is needed
for perf.

There are per perf event and global options to set the
default.

This patch sets the default to TSX enabled, but
that could be easily changed.

We can have a discussion on the trade offs of the default
setting. I suspect it's a decision that should be made by Linus,
as it may impact user programs either way.

The trade offs for setting the option default are:
    
Using 4 (or 8 with HT off) events in perf versus
allowing RTM usage while perf is active.
    
- Existing programs that use perf groups with 4 counters
  may not retrieve perfmon data anymore. Perf usages
  that use less than four (or 7 with HT off) counters
  are not impacted. Perf usages that don't use group
  will still work, but will see increase multiplexing.
    
- TSX programs should not functionally break from
  forcing RTM to abort because they always need a valid
  fall back path. However they will see significantly
  lower performance if they rely on TSX for performance
  (all RTM transactions will run and only abort at the end),
  potentially slowing them down so much that it is
  equivalent to functional breakage.

Patches are against tip/perf/core as of 
commit ca3bb3d027f69ac3ab1dafb32bde2f5a3a44439c (tip/perf/core)
Author: Elena Reshetova <elena.reshetova@intel.com>

-Andi

v1: Initial post

v2: Minor updates in code (see individual patches)
Removed optimization to not change MSR for update. This caused missing
MSR updates in some cases. 
Redid KVM code to always intercept MSR and pass correct flag
to host perf.

v3: Use Peter's scheduling patch, with some changes and cleanups.
Dropped some obsolete patches.
KVM now always forces the guest state and doesn't rely on the host state.

Andi Kleen (6):
  x86/pmu/intel: Export number of counters in caps
  x86/pmu/intel: Handle TSX with counter 3 on Skylake
  x86/pmu/intel: Add perf event attribute to control RTM
  perf stat: Make all existing groups weak
  perf stat: Don't count EL for --transaction with three counters
  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests

 arch/x86/events/core.c             | 24 ++++++++
 arch/x86/events/intel/core.c       | 94 +++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h       | 13 ++++-
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/include/asm/msr-index.h   |  5 ++
 arch/x86/kvm/cpuid.c               |  3 +-
 arch/x86/kvm/pmu.c                 | 19 +++---
 arch/x86/kvm/pmu.h                 |  6 +-
 arch/x86/kvm/pmu_amd.c             |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c       | 20 ++++++-
 tools/perf/builtin-stat.c          | 38 ++++++++----
 tools/perf/util/pmu.c              | 10 ++++
 tools/perf/util/pmu.h              |  1 +
 14 files changed, 211 insertions(+), 26 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 1/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-08  8:45   ` [MODERATED] " Peter Zijlstra
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 2/6] PERFv3 Andi Kleen
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Export number of counters in caps

Export the number of generic and fixed counters in the core PMU in caps/
This helps users and tools with constructing groups. Will be also more
important with upcoming patches that can change the number of counters.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a19712e20..58e659bfc2d9 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2248,8 +2248,28 @@ static ssize_t max_precise_show(struct device *cdev,
 
 static DEVICE_ATTR_RO(max_precise);
 
+static ssize_t num_counter_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+}
+
+static DEVICE_ATTR_RO(num_counter);
+
+static ssize_t num_counter_fixed_show(struct device *cdev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters_fixed);
+}
+
+static DEVICE_ATTR_RO(num_counter_fixed);
+
 static struct attribute *x86_pmu_caps_attrs[] = {
 	&dev_attr_max_precise.attr,
+	&dev_attr_num_counter.attr,
+	&dev_attr_num_counter_fixed.attr,
 	NULL
 };
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 2/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 1/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-08  0:51   ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Andrew Cooper
  2019-02-08  8:50   ` [MODERATED] Re: [PATCH v3 2/6] PERFv3 Peter Zijlstra
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 3/6] PERFv3 Andi Kleen
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  x86/pmu/intel: Handle TSX with counter 3 on Skylake

Most of the code is from Peter Ziljstra at this point,
based on earlier code from AK.

On Skylake with recent microcode updates due to errata XXX
perfmon general purpose counter 3 can be corrupted when RTM transactions
are executed.

The microcode provides a new MSR to force disable RTM
(make all RTM transactions abort).

This patch adds the low level code to manage this MSR.
Depending on a global flag (/sys/devices/cpu/enable_all_counters)
schedule or not schedule events on generic counter 3.

When the flag is set, and an event uses counter 3, disable TSX
while the event is active.

This patch assumes that the kernel is using
RETPOLINE (or IBRS), otherwise speculative execution could
still corrupt counter 3 in very unlikely cases.

The enable_all_counters flag default is set to zero in this
patch. This default could be changed.

The trade offs for setting the option default are:

    Using 4 (or 8 with HT off) events in perf versus
    allowing RTM usage while perf is active.

    - Existing programs that use perf groups with 4 counters
    may not retrieve perfmon data anymore. Perf usages
    that use less than four (or 7 with HT off) counters
    are not impacted. Perf usages that don't use group
    will still work, but will see increase multiplexing.

    - TSX programs should not functionally break from
    forcing RTM to abort because they always need a valid
    fall back path. However they will see significantly
    lower performance if they rely on TSX for performance
    (all RTM transactions will run and only abort at the end),
    potentially slowing them down so much that it is
    equivalent to functional breakage.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
v2:
Use u8 instead of bool
Rename force_rtm_abort_active.
v3:
Use correct patch version that actually compiles.
v4:
Switch to Peter's implementation with some updates by AK.
Now the TFA state is checked for in enable_all,
and the extra mask is handled by get_constraint
Use a temporary constraint instead of modifying the globals.
---
 arch/x86/events/core.c             |  6 ++-
 arch/x86/events/intel/core.c       | 64 +++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h       | 10 ++++-
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  5 +++
 5 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 58e659bfc2d9..f5d1435c6071 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2252,7 +2252,11 @@ static ssize_t num_counter_show(struct device *cdev,
 				  struct device_attribute *attr,
 				  char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%d\n", x86_pmu.num_counters);
+	int num = x86_pmu.num_counters;
+	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) &&
+		perf_enable_all_counters && num > 0)
+		num--;
+	return snprintf(buf, PAGE_SIZE, "%d\n", num);
 }
 
 static DEVICE_ATTR_RO(num_counter);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index daafb893449b..b4162b4b0899 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1999,6 +1999,30 @@ static void intel_pmu_nhm_enable_all(int added)
 	intel_pmu_enable_all(added);
 }
 
+static void intel_skl_pmu_enable_all(int added)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 val;
+
+	/*
+	 * The perf code is not expected to execute RTM instructions
+	 * (and also cannot misspeculate into them due to RETPOLINE
+	 * use), so PMC3 should be 'stable'; IOW the values we
+	 * just potentially programmed into it, should still be there.
+	 *
+	 * If we programmed PMC3, make sure to set TFA before we make
+	 * things go and possibly encounter RTM instructions.
+	 * Similarly, if PMC3 got unused, make sure to clear TFA.
+	 */
+	val = MSR_TFA_RTM_FORCE_ABORT * test_bit(3, cpuc->active_mask);
+	if (cpuc->tfa_shadow != val) {
+		cpuc->tfa_shadow = val;
+		wrmsrl(MSR_TSX_FORCE_ABORT, val);
+	}
+
+	intel_pmu_enable_all(added);
+}
+
 static void enable_counter_freeze(void)
 {
 	update_debugctlmsr(get_debugctlmsr() |
@@ -3345,6 +3369,34 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	return c;
 }
 
+bool perf_enable_all_counters __read_mostly;
+
+/*
+ * On Skylake counter 3 may get corrupted when RTM is used.
+ * Either avoid counter 3, or disable RTM when counter 3 used.
+ */
+
+static struct event_constraint *
+skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+			  struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	c = hsw_get_event_constraints(cpuc, idx, event);
+
+	if (!perf_enable_all_counters) {
+		cpuc->counter3_constraint = *c;
+		c = &cpuc->counter3_constraint;
+
+		/*
+		 * Without TFA we must not use PMC3.
+		 */
+		__clear_bit(3, c->idxmsk);
+	}
+
+	return c;
+}
+
 /*
  * Broadwell:
  *
@@ -4061,8 +4113,11 @@ static struct attribute *intel_pmu_caps_attrs[] = {
        NULL
 };
 
+DEVICE_BOOL_ATTR(enable_all_counters, 0644, perf_enable_all_counters);
+
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
+	NULL,	/* May be overriden with enable_all_counters */
 	NULL,
 };
 
@@ -4543,9 +4598,16 @@ __init int intel_pmu_init(void)
 		/* all extra regs are per-cpu when HT is on */
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			x86_pmu.enable_all = intel_skl_pmu_enable_all;
+			intel_pmu_attrs[1] = &dev_attr_enable_all_counters.attr.attr;
+			x86_pmu.get_event_constraints = skl_get_event_constraints;
+			/* Could add checking&warning for !RETPOLINE here */
+		} else {
+			x86_pmu.get_event_constraints = hsw_get_event_constraints;
+		}
 
 		x86_pmu.hw_config = hsw_hw_config;
-		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
 			hsw_format_attr : nhm_format_attr;
 		extra_attr = merge_attr(extra_attr, skl_format_attr);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..2474ebfad961 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -70,7 +70,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_EXCL_ACCT	0x0200 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0800 /* use large PEBS */
-
+#define PERF_X86_EVENT_ABORT_TSX	0x1000 /* force abort TSX */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -242,6 +242,12 @@ struct cpu_hw_events {
 	struct intel_excl_cntrs		*excl_cntrs;
 	int excl_thread_id; /* 0 or 1 */
 
+	/*
+	 * Manage using counter 3 on Skylake with TSX.
+	 */
+	int				tfa_shadow;
+	struct event_constraint		counter3_constraint;
+
 	/*
 	 * AMD specific bits
 	 */
@@ -998,6 +1004,8 @@ static inline int is_ht_workaround_enabled(void)
 	return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
 }
 
+extern bool perf_enable_all_counters;
+
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6d6122524711..981ff9479648 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8e40c2446fd1..492b18720dba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -666,6 +666,11 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+#define MSR_TSX_FORCE_ABORT		0x0000010F
+
+#define MSR_TFA_RTM_FORCE_ABORT_BIT	0
+#define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
+
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
 #define MSR_IA32_MCG_EBX		0x00000181
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 3/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 1/6] PERFv3 Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 2/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-08  9:02   ` [MODERATED] " Peter Zijlstra
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 4/6] PERFv3 Andi Kleen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

Add a "force_rtm_abort" attribute per perf event attribute
to allow user programs to opt in to use counter 3 and disable
TSX while the perf event is active.

Also add a "allow_rtm" attribute to allow programs to
make sure TSX is enabled during the measurement (e.g.
if they want to measure TSX itself)

This allows non root programs to override the defaults
for their needs.

This is also needed for correct semantics in guests,
so that the KVM PMU can set the correct default
for its guests.

We use config2 bits 0/1 in the core PMU for these
interfaces.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 32 +++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h |  3 +++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b4162b4b0899..0e24b827adf8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3371,6 +3371,12 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 
 bool perf_enable_all_counters __read_mostly;
 
+static unsigned merged_config2(struct perf_event *event)
+{
+	return (event->group_leader ? event->group_leader->attr.config2 : 0) |
+		event->attr.config2;
+}
+
 /*
  * On Skylake counter 3 may get corrupted when RTM is used.
  * Either avoid counter 3, or disable RTM when counter 3 used.
@@ -3381,10 +3387,19 @@ skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
 {
 	struct event_constraint *c;
+	u64 config2;
+	bool use_rtm;
 
 	c = hsw_get_event_constraints(cpuc, idx, event);
 
-	if (!perf_enable_all_counters) {
+	config2 = merged_config2(event);
+	/* When both flags are set ALLOW wins. */
+	if (!(config2 & (ALLOW_RTM|FORCE_RTM_ABORT)))
+		use_rtm = !perf_enable_all_counters;
+	else
+		use_rtm = config2 & ALLOW_RTM;
+
+	if (use_rtm) {
 		cpuc->counter3_constraint = *c;
 		c = &cpuc->counter3_constraint;
 
@@ -3645,6 +3660,9 @@ PMU_FORMAT_ATTR(ldlat, "config1:0-15");
 
 PMU_FORMAT_ATTR(frontend, "config1:0-23");
 
+PMU_FORMAT_ATTR(force_rtm_abort, "config2:0");
+PMU_FORMAT_ATTR(allow_rtm, "config2:1");
+
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -3680,6 +3698,12 @@ static struct attribute *skl_format_attr[] = {
 	NULL,
 };
 
+static struct attribute *skl_extra_format_attr[] = {
+	&format_attr_force_rtm_abort.attr,
+	&format_attr_allow_rtm.attr,
+	NULL,
+};
+
 static __initconst const struct x86_pmu core_pmu = {
 	.name			= "core",
 	.handle_irq		= x86_pmu_handle_irq,
@@ -4148,6 +4172,7 @@ __init int intel_pmu_init(void)
 	struct attribute **mem_attr = NULL;
 	struct attribute **tsx_attr = NULL;
 	struct attribute **to_free = NULL;
+	struct attribute **to_free2 = NULL;
 	union cpuid10_edx edx;
 	union cpuid10_eax eax;
 	union cpuid10_ebx ebx;
@@ -4619,6 +4644,10 @@ __init int intel_pmu_init(void)
 			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
 		pr_cont("Skylake events, ");
 		name = "skylake";
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			extra_attr = merge_attr(extra_attr, skl_extra_format_attr);
+			to_free2 = extra_attr;
+		}
 		break;
 
 	default:
@@ -4732,6 +4761,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
 	kfree(to_free);
+	kfree(to_free2);
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2474ebfad961..b4eecd9316f1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -776,6 +776,9 @@ int x86_pmu_hw_config(struct perf_event *event);
 
 void x86_pmu_disable_all(void);
 
+#define FORCE_RTM_ABORT BIT(0)
+#define ALLOW_RTM	BIT(1)
+
 static inline void __x86_pmu_enable_event(struct hw_perf_event *hwc,
 					  u64 enable_mask)
 {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 4/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
                   ` (2 preceding siblings ...)
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 3/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 5/6] PERFv3 Andi Kleen
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 6/6] PERFv3 Andi Kleen
  5 siblings, 0 replies; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  perf stat: Make all existing groups weak

Now that we may only have three counters make the --topdown and
the --transaction groups weak, so that they still work.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e587808591e8..c94f5ed135f1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -101,7 +101,7 @@ static const char *transaction_attrs = {
 	"cpu/tx-start/,"
 	"cpu/el-start/,"
 	"cpu/cycles-ct/"
-	"}"
+	"}:W"
 };
 
 /* More limited version when the CPU does not have all events. */
@@ -112,7 +112,7 @@ static const char * transaction_limited_attrs = {
 	"cycles,"
 	"cpu/cycles-t/,"
 	"cpu/tx-start/"
-	"}"
+	"}:W"
 };
 
 static const char * topdown_attrs[] = {
@@ -999,7 +999,7 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	attr[i - off] = NULL;
 
-	*str = malloc(len + 1 + 2);
+	*str = malloc(len + 1 + 2 + 2);
 	if (!*str)
 		return -1;
 	s = *str;
@@ -1016,6 +1016,8 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	}
 	if (use_group) {
 		s[-1] = '}';
+		*s++ = ':';
+		*s++ = 'W';
 		*s = 0;
 	} else
 		s[-1] = 0;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 5/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
                   ` (3 preceding siblings ...)
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 4/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-08  0:54   ` [MODERATED] " Andrew Cooper
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 6/6] PERFv3 Andi Kleen
  5 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  perf stat: Don't count EL for --transaction with three
 counters

When the system only has three counters HLE (EL) is also not
available. Detect that there are only three counters and
then automatically disable el-starts for --transaction.
This avoids event multiplexing in this situation with no loss
of information.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 30 ++++++++++++++++++++++--------
 tools/perf/util/pmu.c     | 10 ++++++++++
 tools/perf/util/pmu.h     |  1 +
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c94f5ed135f1..59a5bf0389b7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -104,6 +104,18 @@ static const char *transaction_attrs = {
 	"}:W"
 };
 
+static const char *transaction_noel_attrs = {
+	"task-clock,"
+	"{"
+	"instructions,"
+	"cycles,"
+	"cpu/cycles-t/,"
+	"cpu/tx-start/,"
+	"cpu/cycles-ct/"
+	"}:W"
+};
+
+
 /* More limited version when the CPU does not have all events. */
 static const char * transaction_limited_attrs = {
 	"task-clock,"
@@ -1160,6 +1172,8 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
+		const char *attrs = transaction_limited_attrs;
+
 		/* Handle -T as -M transaction. Once platform specific metrics
 		 * support has been added to the json files, all archictures
 		 * will use this approach. To determine transaction support
@@ -1173,16 +1187,16 @@ static int add_default_attributes(void)
 		}
 
 		if (pmu_have_event("cpu", "cycles-ct") &&
-		    pmu_have_event("cpu", "el-start"))
-			err = parse_events(evsel_list, transaction_attrs,
-					   &errinfo);
-		else
-			err = parse_events(evsel_list,
-					   transaction_limited_attrs,
-					   &errinfo);
+		    pmu_have_event("cpu", "el-start")) {
+			if (pmu_num_counters("cpu") == 3)
+				attrs = transaction_noel_attrs;
+			else
+				attrs = transaction_attrs;
+		}
+		err = parse_events(evsel_list, attrs, &errinfo);
 		if (err) {
 			fprintf(stderr, "Cannot set up transaction events\n");
-			parse_events_print_error(&errinfo, transaction_attrs);
+			parse_events_print_error(&errinfo, attrs);
 			return -1;
 		}
 		return 0;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 11a234740632..5e17409b7ff6 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1495,3 +1495,13 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+int pmu_num_counters(const char *pname)
+{
+	unsigned long num;
+	struct perf_pmu *pmu = perf_pmu__find(pname);
+
+	if (pmu && perf_pmu__scan_file(pmu, "caps/num_counter", "%ld", &num) == 1)
+		return num;
+	return -1;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 76fecec7b3f9..6e772243fd1d 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -85,6 +85,7 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
 void print_pmu_events(const char *event_glob, bool name_only, bool quiet,
 		      bool long_desc, bool details_flag);
 bool pmu_have_event(const char *pname, const char *name);
+int pmu_num_counters(const char *);
 
 int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt, ...) __scanf(3, 4);
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] [PATCH v3 6/6] PERFv3
  2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
                   ` (4 preceding siblings ...)
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 5/6] PERFv3 Andi Kleen
@ 2019-02-07 23:41 ` Andi Kleen
  2019-02-08  9:07   ` [MODERATED] " Peter Zijlstra
  5 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2019-02-07 23:41 UTC (permalink / raw)
  To: speck; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>
Subject:  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests

Recent microcode for Skylake added a new CPUID bit and MSR to control
TSX aborting and enabling PMU counter 3. This patch adds support
for controlling counter 3 from KVM guests.

Intercept the MSR and set the correct attribute on the perf events
used by the virtualized PMU. TSX is only disabled when counter
3 is actually used by the host PMU. The guest can use all four
counters without multiplexing.

Also export the CPUID bit.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
v2:
Redo implementation. We already intercept the MSR now and
pass the correct attribute to the host perf. The MSR is only
active when counter 3 is actually used, but the guest can
use all counters without multiplexing.
v3:
Also set ALLOW_RTM to completely control guest state independent
of host.
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  3 ++-
 arch/x86/kvm/pmu.c              | 19 ++++++++++++-------
 arch/x86/kvm/pmu.h              |  6 ++++--
 arch/x86/kvm/pmu_amd.c          |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c    | 20 ++++++++++++++++++--
 6 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4660ce90de7f..75f098142672 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -470,6 +470,7 @@ struct kvm_pmu {
 	u64 global_ctrl_mask;
 	u64 reserved_bits;
 	u8 version;
+	u8 force_tsx_abort;
 	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
 	struct irq_work irq_work;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index bbffa6c54697..2570b17ac372 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -409,7 +409,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 	/* cpuid 7.0.edx*/
 	const u32 kvm_cpuid_7_0_edx_x86_features =
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
-		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP);
+		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+		F(TSX_FORCE_ABORT);
 
 	/* all calls to cpuid_count() should be made on the same cpu */
 	get_cpu();
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7db71a3..427d9fc6dbda 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -99,7 +99,8 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 				  unsigned config, bool exclude_user,
 				  bool exclude_kernel, bool intr,
-				  bool in_tx, bool in_tx_cp)
+				  bool in_tx, bool in_tx_cp,
+				  int tsx_force_abort)
 {
 	struct perf_event *event;
 	struct perf_event_attr attr = {
@@ -111,6 +112,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		.exclude_user = exclude_user,
 		.exclude_kernel = exclude_kernel,
 		.config = config,
+		.config2 = tsx_force_abort,
 	};
 
 	attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
@@ -140,7 +142,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			  u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
 	u8 event_select, unit_mask;
@@ -178,11 +181,13 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
 			      eventsel & ARCH_PERFMON_EVENTSEL_INT,
 			      (eventsel & HSW_IN_TX),
-			      (eventsel & HSW_IN_TX_CHECKPOINTED));
+			      (eventsel & HSW_IN_TX_CHECKPOINTED),
+			      pmu->force_tsx_abort);
 }
 EXPORT_SYMBOL_GPL(reprogram_gp_counter);
 
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
+void reprogram_fixed_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			     u8 ctrl, int idx)
 {
 	unsigned en_field = ctrl & 0x3;
 	bool pmi = ctrl & 0x8;
@@ -196,7 +201,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 			      kvm_x86_ops->pmu_ops->find_fixed_event(idx),
 			      !(en_field & 0x2), /* exclude user */
 			      !(en_field & 0x1), /* exclude kernel */
-			      pmi, false, false);
+			      pmi, false, false, pmu->force_tsx_abort);
 }
 EXPORT_SYMBOL_GPL(reprogram_fixed_counter);
 
@@ -208,12 +213,12 @@ void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx)
 		return;
 
 	if (pmc_is_gp(pmc))
-		reprogram_gp_counter(pmc, pmc->eventsel);
+		reprogram_gp_counter(pmu, pmc, pmc->eventsel);
 	else {
 		int idx = pmc_idx - INTEL_PMC_IDX_FIXED;
 		u8 ctrl = fixed_ctrl_field(pmu->fixed_ctr_ctrl, idx);
 
-		reprogram_fixed_counter(pmc, ctrl, idx);
+		reprogram_fixed_counter(pmu, pmc, ctrl, idx);
 	}
 }
 EXPORT_SYMBOL_GPL(reprogram_counter);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ba8898e1a854..7c31f38be1ee 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -102,8 +102,10 @@ static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
 	return NULL;
 }
 
-void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
-void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
+void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc,
+			  u64 eventsel);
+void reprogram_fixed_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, u8 ctrl,
+			     int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 1495a735b38e..1de7fc73a634 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -250,7 +250,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (data == pmc->eventsel)
 			return 0;
 		if (!(data & pmu->reserved_bits)) {
-			reprogram_gp_counter(pmc, data);
+			reprogram_gp_counter(pmu, pmc, data);
 			return 0;
 		}
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5ab4a364348e..a7d330be2c8f 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -49,7 +49,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		if (old_ctrl == new_ctrl)
 			continue;
 
-		reprogram_fixed_counter(pmc, new_ctrl, i);
+		reprogram_fixed_counter(pmu, pmc, new_ctrl, i);
 	}
 
 	pmu->fixed_ctr_ctrl = data;
@@ -148,6 +148,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	int ret;
 
 	switch (msr) {
+	case MSR_TSX_FORCE_ABORT:
+		return guest_cpuid_has(vcpu, X86_FEATURE_TSX_FORCE_ABORT);
+
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 	case MSR_CORE_PERF_GLOBAL_STATUS:
 	case MSR_CORE_PERF_GLOBAL_CTRL:
@@ -182,6 +185,11 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		*data = pmu->global_ovf_ctrl;
 		return 0;
+	case MSR_TSX_FORCE_ABORT:
+		*data = pmu->force_tsx_abort;
+		if (*data == 2)
+			*data = 0;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_fixed_pmc(pmu, msr))) {
@@ -234,6 +242,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_TSX_FORCE_ABORT:
+		if (data & ~1ULL)
+			break;
+		/* Will take effect at next enable */
+		if (!data)
+			data = BIT(1); /* Transform into ALLOW_RTM perf ABI */
+		pmu->force_tsx_abort = data;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_fixed_pmc(pmu, msr))) {
@@ -245,7 +261,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (data == pmc->eventsel)
 				return 0;
 			if (!(data & pmu->reserved_bits)) {
-				reprogram_gp_counter(pmc, data);
+				reprogram_gp_counter(pmu, pmc, data);
 				return 0;
 			}
 		}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [SUSPECTED SPAM][PATCH v3 2/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 2/6] PERFv3 Andi Kleen
@ 2019-02-08  0:51   ` Andrew Cooper
  2019-02-08  9:01     ` Peter Zijlstra
  2019-02-08  8:50   ` [MODERATED] Re: [PATCH v3 2/6] PERFv3 Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Cooper @ 2019-02-08  0:51 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 522 bytes --]

On 07/02/2019 23:41, speck for Andi Kleen wrote:
> This patch assumes that the kernel is using
> RETPOLINE (or IBRS), otherwise speculative execution could
> still corrupt counter 3 in very unlikely cases.

What has the kernel configuration got to do with it?

It is my understanding that any execution of an XBEGIN instruction, even
speculatively, even in userspace will result in PCR3 getting modified.

A CPU either has force abort mode active, or PCR3 can be changed behind
the kernel's back.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 5/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 5/6] PERFv3 Andi Kleen
@ 2019-02-08  0:54   ` Andrew Cooper
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2019-02-08  0:54 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On 07/02/2019 23:41, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  perf stat: Don't count EL for --transaction with three
>  counters
>
> When the system only has three counters HLE (EL) is also not
> available. Detect that there are only three counters and
> then automatically disable el-starts for --transaction.
> This avoids event multiplexing in this situation with no loss
> of information.

What about the non-HT case, where 7 counters are available?

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 1/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 1/6] PERFv3 Andi Kleen
@ 2019-02-08  8:45   ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  8:45 UTC (permalink / raw)
  To: speck

On Thu, Feb 07, 2019 at 03:41:03PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/pmu/intel: Export number of counters in caps
> 
> Export the number of generic and fixed counters in the core PMU in caps/
> This helps users and tools with constructing groups. Will be also more
> important with upcoming patches that can change the number of counters.

I'm still not convinced we need this. It certainly isn't needed for this
TFA crap.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 2/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 2/6] PERFv3 Andi Kleen
  2019-02-08  0:51   ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Andrew Cooper
@ 2019-02-08  8:50   ` Peter Zijlstra
  2019-02-08 17:26     ` Andi Kleen
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  8:50 UTC (permalink / raw)
  To: speck

On Thu, Feb 07, 2019 at 03:41:04PM -0800, speck for Andi Kleen wrote:
> +static struct event_constraint *
> +skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> +			  struct perf_event *event)
> +{
> +	struct event_constraint *c;
> +
> +	c = hsw_get_event_constraints(cpuc, idx, event);
> +
> +	if (!perf_enable_all_counters) {
> +		cpuc->counter3_constraint = *c;
> +		c = &cpuc->counter3_constraint;
> +
> +		/*
> +		 * Without TFA we must not use PMC3.
> +		 */
> +		__clear_bit(3, c->idxmsk);
> +	}
> +
> +	return c;
> +}

This is obviously broken... what if all 6 events are in use and have
different constraints?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [SUSPECTED SPAM][PATCH v3 2/6] PERFv3
  2019-02-08  0:51   ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Andrew Cooper
@ 2019-02-08  9:01     ` Peter Zijlstra
  2019-02-08  9:31       ` [MODERATED] Re: [PATCH " Andrew Cooper
  2019-02-08  9:39       ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  9:01 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 12:51:01AM +0000, speck for Andrew Cooper wrote:
> On 07/02/2019 23:41, speck for Andi Kleen wrote:
> > This patch assumes that the kernel is using
> > RETPOLINE (or IBRS), otherwise speculative execution could
> > still corrupt counter 3 in very unlikely cases.
> 
> What has the kernel configuration got to do with it?
> 
> It is my understanding that any execution of an XBEGIN instruction, even
> speculatively, even in userspace will result in PCR3 getting modified.
> 
> A CPU either has force abort mode active, or PCR3 can be changed behind
> the kernel's back.

We are executing kernel code; therefore any user RTM will have aborted
and is irrelevant.

So what the kernel does is:

	/*
	 * And as noted; userspace transactions will be aborted by
	 * having entered the kernel. The kernel does not use RTM
	 * itself.
	 */


	/*
	 * stops all counters; irrespective of ucode using PMC3 or not
	 */
	GLOBAL_CTRL = 0;

	/*
	 * program PMC3
	 */
	CTRVAL3 = x;
	EVTSEL3 = y;

	/*
	 * Set the TFA bit to make ucode not touch PMC3; since there has
	 * not been an RTM instruction between GLOBAL_CTRL=0 and here,
	 * PMC3 will still be {x,y} as we just wrote.
	 *
	 * This is what requires RETPOLINE/IBRS; because otherwise
	 * speculation could see a partial kernel instruction that looks
	 * like RTM, which would mess things up.
	 */
	WRMSR(MSR_TFA, 1);

	/*
	 * Let 'er rip.
	 */
	GLOBAL_CTRL = ~0ULL;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 3/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 3/6] PERFv3 Andi Kleen
@ 2019-02-08  9:02   ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  9:02 UTC (permalink / raw)
  To: speck

On Thu, Feb 07, 2019 at 03:41:05PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  x86/pmu/intel: Add perf event attribute to control RTM
> 
> Add a "force_rtm_abort" attribute per perf event attribute
> to allow user programs to opt in to use counter 3 and disable
> TSX while the perf event is active.
> 
> Also add a "allow_rtm" attribute to allow programs to
> make sure TSX is enabled during the measurement (e.g.
> if they want to measure TSX itself)
> 
> This allows non root programs to override the defaults
> for their needs.
> 
> This is also needed for correct semantics in guests,
> so that the KVM PMU can set the correct default
> for its guests.
> 
> We use config2 bits 0/1 in the core PMU for these
> interfaces.

Again; why? This isn't needed _at_all_.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 6/6] PERFv3
  2019-02-07 23:41 ` [MODERATED] [PATCH v3 6/6] PERFv3 Andi Kleen
@ 2019-02-08  9:07   ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  9:07 UTC (permalink / raw)
  To: speck

On Thu, Feb 07, 2019 at 03:41:08PM -0800, speck for Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> Subject:  kvm: vmx: Support TSX_FORCE_ABORT in KVM guests
> 
> Recent microcode for Skylake added a new CPUID bit and MSR to control
> TSX aborting and enabling PMU counter 3. This patch adds support
> for controlling counter 3 from KVM guests.
> 
> Intercept the MSR and set the correct attribute on the perf events
> used by the virtualized PMU. TSX is only disabled when counter
> 3 is actually used by the host PMU. The guest can use all four
> counters without multiplexing.
> 
> Also export the CPUID bit.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

I would really rather not do this at all.

Keep this walnut crap as small as possible. The more code you generate,
the more we need to maintain. And very soon nobody will give a crap
about this weird bug anymore, but we're stuck with all this junk.

Also; apparently the virt people themselves already said not to expose
this, so why do you persist.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 2/6] PERFv3
  2019-02-08  9:01     ` Peter Zijlstra
@ 2019-02-08  9:31       ` Andrew Cooper
  2019-02-08  9:39       ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2019-02-08  9:31 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2311 bytes --]

On 08/02/2019 09:01, speck for Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 12:51:01AM +0000, speck for Andrew Cooper wrote:
>> On 07/02/2019 23:41, speck for Andi Kleen wrote:
>>> This patch assumes that the kernel is using
>>> RETPOLINE (or IBRS), otherwise speculative execution could
>>> still corrupt counter 3 in very unlikely cases.
>> What has the kernel configuration got to do with it?
>>
>> It is my understanding that any execution of an XBEGIN instruction, even
>> speculatively, even in userspace will result in PCR3 getting modified.
>>
>> A CPU either has force abort mode active, or PCR3 can be changed behind
>> the kernel's back.
> We are executing kernel code; therefore any user RTM will have aborted
> and is irrelevant.
>
> So what the kernel does is:
>
> 	/*
> 	 * And as noted; userspace transactions will be aborted by
> 	 * having entered the kernel. The kernel does not use RTM
> 	 * itself.
> 	 */
>
>
> 	/*
> 	 * stops all counters; irrespective of ucode using PMC3 or not
> 	 */
> 	GLOBAL_CTRL = 0;
>
> 	/*
> 	 * program PMC3
> 	 */
> 	CTRVAL3 = x;
> 	EVTSEL3 = y;
>
> 	/*
> 	 * Set the TFA bit to make ucode not touch PMC3; since there has
> 	 * not been an RTM instruction between GLOBAL_CTRL=0 and here,
> 	 * PMC3 will still be {x,y} as we just wrote.
> 	 *
> 	 * This is what requires RETPOLINE/IBRS; because otherwise
> 	 * speculation could see a partial kernel instruction that looks
> 	 * like RTM, which would mess things up.
> 	 */
> 	WRMSR(MSR_TFA, 1);

Ok, so this is a narrow window during PMC reprogramming.

What other code exists between the PMC3 setup and TFA setup?  Even with
Retpoline, BCBS can be used by a determined attacker to hijack ret
speculation for a short period of time.  Then again, I suppose that is a
security hole in and of itself.

Also, you are blindly trusting that there is no part of your hypervisor
or SMI handler that will execute an XBEGIN instruction, even
speculatively.  The hypervisor part is easy if noone exposes TFA to
guests, but you have no control over the SMI handler, or the fact that
if one hits, your rets will start speculating into SRAM region again.

Perhaps the risk here is sufficiently small to be fine in practice, but
it is isn't zero.

~Andrew


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [SUSPECTED SPAM][PATCH v3 2/6] PERFv3
  2019-02-08  9:01     ` Peter Zijlstra
  2019-02-08  9:31       ` [MODERATED] Re: [PATCH " Andrew Cooper
@ 2019-02-08  9:39       ` Peter Zijlstra
  2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08  9:39 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 10:01:47AM +0100, Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 12:51:01AM +0000, speck for Andrew Cooper wrote:
> > On 07/02/2019 23:41, speck for Andi Kleen wrote:
> > > This patch assumes that the kernel is using
> > > RETPOLINE (or IBRS), otherwise speculative execution could
> > > still corrupt counter 3 in very unlikely cases.
> > 
> > What has the kernel configuration got to do with it?
> > 
> > It is my understanding that any execution of an XBEGIN instruction, even
> > speculatively, even in userspace will result in PCR3 getting modified.
> > 
> > A CPU either has force abort mode active, or PCR3 can be changed behind
> > the kernel's back.
> 
> We are executing kernel code; therefore any user RTM will have aborted
> and is irrelevant.
> 
> So what the kernel does is:
> 
> 	/*
> 	 * And as noted; userspace transactions will be aborted by
> 	 * having entered the kernel. The kernel does not use RTM
> 	 * itself.
> 	 */
> 
> 
> 	/*
> 	 * stops all counters; irrespective of ucode using PMC3 or not
> 	 */
> 	GLOBAL_CTRL = 0;
> 
> 	/*
> 	 * program PMC3
> 	 */
> 	CTRVAL3 = x;
> 	EVTSEL3 = y;
> 
> 	/*
> 	 * Set the TFA bit to make ucode not touch PMC3; since there has
> 	 * not been an RTM instruction between GLOBAL_CTRL=0 and here,
> 	 * PMC3 will still be {x,y} as we just wrote.
> 	 *
> 	 * This is what requires RETPOLINE/IBRS; because otherwise
> 	 * speculation could see a partial kernel instruction that looks
> 	 * like RTM, which would mess things up.
> 	 */
> 	WRMSR(MSR_TFA, 1);
> 
> 	/*
> 	 * Let 'er rip.
> 	 */
> 	GLOBAL_CTRL = ~0ULL;

Ah, I think I found a way to avoid having to rely on this. Let me try.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] [RFC][PATCH] performance walnuts
  2019-02-08  9:39       ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Peter Zijlstra
@ 2019-02-08 10:53         ` Peter Zijlstra
  2019-02-08 18:07           ` [MODERATED] " Andi Kleen
                             ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-08 10:53 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 10:39:50AM +0100, Peter Zijlstra wrote:
> Ah, I think I found a way to avoid having to rely on this. Let me try.

Something like so. Can someone with access to a relevant machine test
this?

If it works, I'll write a Changelog and this'll be it.

---
 arch/x86/events/intel/core.c       | 127 ++++++++++++++++++++++++++++++-------
 arch/x86/events/perf_event.h       |   6 ++
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/msr-index.h   |   6 ++
 4 files changed, 116 insertions(+), 24 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e0232bdb7aff..8352c2647a1b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1999,6 +1999,39 @@ static void intel_pmu_nhm_enable_all(int added)
 	intel_pmu_enable_all(added);
 }
 
+static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
+{
+	u64 val = MSR_TFA_RTM_FORCE_ABORT * on;
+
+	if (cpuc->tfa_shadow != val) {
+		cpuc->tfa_shadow = val;
+		wrmsrl(MSR_TSX_FORCE_ABORT, val);
+	}
+}
+
+static void intel_skl_commit_scheduling(struct cpu_hw_events *cpuc, int idx, int cntr)
+{
+	/*
+	 * We're going to use PMC3, make sure TFA is set before we touch it.
+	 */
+	if (cntr == 3)
+		intel_set_tfa(cpuc, true);
+}
+
+static void intel_skl_pmu_enable_all(int added)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * If we find PMC3 is no longer used when we enable the PMU, we can
+	 * clear TFA.
+	 */
+	if (!test_bit(3, cpuc->active_mask))
+		intel_set_tfa(cpuc, false);
+
+	intel_pmu_enable_all(added);
+}
+
 static void enable_counter_freeze(void)
 {
 	update_debugctlmsr(get_debugctlmsr() |
@@ -2768,6 +2801,35 @@ intel_stop_scheduling(struct cpu_hw_events *cpuc)
 	raw_spin_unlock(&excl_cntrs->lock);
 }
 
+static struct event_constraint *
+dyn_constraint(struct cpu_hw_events *cpuc, struct event_constraint *c, int idx)
+{
+	WARN_ON_ONCE(!cpuc->constraint_list);
+
+	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
+		struct event_constraint *cx;
+
+		/*
+		 * grab pre-allocated constraint entry
+		 */
+		cx = &cpuc->constraint_list[idx];
+
+		/*
+		 * initialize dynamic constraint
+		 * with static constraint
+		 */
+		*cx = *c;
+
+		/*
+		 * mark constraint as dynamic
+		 */
+		cx->flags |= PERF_X86_EVENT_DYNAMIC;
+		c = cx;
+	}
+
+	return c;
+}
+
 static struct event_constraint *
 intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
 			   int idx, struct event_constraint *c)
@@ -2798,27 +2860,7 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
 	 * only needed when constraint has not yet
 	 * been cloned (marked dynamic)
 	 */
-	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
-		struct event_constraint *cx;
-
-		/*
-		 * grab pre-allocated constraint entry
-		 */
-		cx = &cpuc->constraint_list[idx];
-
-		/*
-		 * initialize dynamic constraint
-		 * with static constraint
-		 */
-		*cx = *c;
-
-		/*
-		 * mark constraint as dynamic, so we
-		 * can free it later on
-		 */
-		cx->flags |= PERF_X86_EVENT_DYNAMIC;
-		c = cx;
-	}
+	c = dyn_constraint(cpuc, c, idx);
 
 	/*
 	 * From here on, the constraint is dynamic.
@@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	return c;
 }
 
+static bool allow_tsx_force_abort = true;
+
+static struct event_constraint *
+skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+			  struct perf_event *event)
+{
+	struct event_constraint *c = hsw_get_event_constraints(cpuc, idx, event);
+
+	/*
+	 * Without TFA we must not use PMC3.
+	 */
+	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
+		c = dyn_constraint(cpuc, c, idx);
+		c->idxmsk64 &= ~(1ULL << 3);
+		c->weight = hweight64(c->idxmsk64);
+	}
+
+	return c;
+}
+
 /*
  * Broadwell:
  *
@@ -3440,13 +3502,15 @@ static int intel_pmu_cpu_prepare(int cpu)
 			goto err;
 	}
 
-	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
+	if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_WALNUT)) {
 		size_t sz = X86_PMC_IDX_MAX * sizeof(struct event_constraint);
 
 		cpuc->constraint_list = kzalloc(sz, GFP_KERNEL);
 		if (!cpuc->constraint_list)
 			goto err_shared_regs;
+	}
 
+	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
 		cpuc->excl_cntrs = allocate_excl_cntrs(cpu);
 		if (!cpuc->excl_cntrs)
 			goto err_constraint_list;
@@ -3552,9 +3616,10 @@ static void free_excl_cntrs(int cpu)
 		if (c->core_id == -1 || --c->refcnt == 0)
 			kfree(c);
 		cpuc->excl_cntrs = NULL;
-		kfree(cpuc->constraint_list);
-		cpuc->constraint_list = NULL;
 	}
+
+	kfree(cpuc->constraint_list);
+	cpuc->constraint_list = NULL;
 }
 
 static void intel_pmu_cpu_dying(int cpu)
@@ -4061,9 +4126,12 @@ static struct attribute *intel_pmu_caps_attrs[] = {
        NULL
 };
 
+DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
 	NULL,
+	NULL,
 };
 
 static __init struct attribute **
@@ -4546,6 +4614,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 
+
 		x86_pmu.hw_config = hsw_hw_config;
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
@@ -4557,6 +4626,16 @@ __init int intel_pmu_init(void)
 		tsx_attr = hsw_tsx_events_attrs;
 		intel_pmu_pebs_data_source_skl(
 			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
+
+		/* If our CPU haz a walnut */
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			x86_pmu.flags |= PMU_FL_WALNUT;
+			x86_pmu.get_event_constraints = skl_get_event_constraints;
+			x86_pmu.enable_all = intel_skl_pmu_enable_all;
+			x86_pmu.commit_scheduling = intel_skl_commit_scheduling;
+			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+		}
+
 		pr_cont("Skylake events, ");
 		name = "skylake";
 		break;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b7031bfc..44b3426c618e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -242,6 +242,11 @@ struct cpu_hw_events {
 	struct intel_excl_cntrs		*excl_cntrs;
 	int excl_thread_id; /* 0 or 1 */
 
+	/*
+	 * SKL TSX_FORCE_ABORT shadow
+	 */
+	int				tfa_shadow;
+
 	/*
 	 * AMD specific bits
 	 */
@@ -676,6 +681,7 @@ do {									\
 #define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */
 #define PMU_FL_EXCL_ENABLED	0x8 /* exclusive counter active */
 #define PMU_FL_PEBS_ALL		0x10 /* all events are valid PEBS events */
+#define PMU_FL_WALNUT		0x20 /* deal with the walnut errata */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6d6122524711..981ff9479648 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8e40c2446fd1..ca5bc0eacb95 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -666,6 +666,12 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+
+#define MSR_TSX_FORCE_ABORT		0x0000010F
+
+#define MSR_TFA_RTM_FORCE_ABORT_BIT	0
+#define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
+
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
 #define MSR_IA32_MCG_EBX		0x00000181

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [PATCH v3 2/6] PERFv3
  2019-02-08  8:50   ` [MODERATED] Re: [PATCH v3 2/6] PERFv3 Peter Zijlstra
@ 2019-02-08 17:26     ` Andi Kleen
  0 siblings, 0 replies; 33+ messages in thread
From: Andi Kleen @ 2019-02-08 17:26 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 09:50:50AM +0100, speck for Peter Zijlstra wrote:
> On Thu, Feb 07, 2019 at 03:41:04PM -0800, speck for Andi Kleen wrote:
> > +static struct event_constraint *
> > +skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> > +			  struct perf_event *event)
> > +{
> > +	struct event_constraint *c;
> > +
> > +	c = hsw_get_event_constraints(cpuc, idx, event);
> > +
> > +	if (!perf_enable_all_counters) {
> > +		cpuc->counter3_constraint = *c;
> > +		c = &cpuc->counter3_constraint;
> > +
> > +		/*
> > +		 * Without TFA we must not use PMC3.
> > +		 */
> > +		__clear_bit(3, c->idxmsk);
> > +	}
> > +
> > +	return c;
> > +}
> 
> This is obviously broken... what if all 6 events are in use and have
> different constraints?

Don't understand the comment. Why 6? The intention is that if more
than 3 generic events are in use it will not schedule.

I tested this and it works.

-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
@ 2019-02-08 18:07           ` Andi Kleen
  2019-02-11 10:40             ` Peter Zijlstra
  2019-02-09  0:28           ` [MODERATED] " Linus Torvalds
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2019-02-08 18:07 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 10:39:50AM +0100, Peter Zijlstra wrote:
> > Ah, I think I found a way to avoid having to rely on this. Let me try.
> 
> Something like so. Can someone with access to a relevant machine test
> this?
> 
> If it works, I'll write a Changelog and this'll be it.

I'll test this, but I would appreciate it if you spin
new versions you at least merge the fixes from my patchkits
first.

This is missing several.

Also note that some of the other patches have also changed.

>  	 * From here on, the constraint is dynamic.
> @@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
>  	return c;
>  }
>  
> +static bool allow_tsx_force_abort = true;

The default needs more discussion.


> +
> +static struct event_constraint *
> +skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> +			  struct perf_event *event)
> +{
> +	struct event_constraint *c = hsw_get_event_constraints(cpuc, idx, event);
> +
> +	/*
> +	 * Without TFA we must not use PMC3.
> +	 */
> +	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {

This still needs the extra changes in my patchkit to allow user/kvm opt-in/out.


> @@ -4061,9 +4126,12 @@ static struct attribute *intel_pmu_caps_attrs[] = {
>         NULL
>  };
>  
> +DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);


I still think "enable_all_counters" is a far better name.

> +
>  static struct attribute *intel_pmu_attrs[] = {
>  	&dev_attr_freeze_on_smi.attr,
>  	NULL,

This needs a comment, as done in my patch.

> +	NULL,
>  };
>  
>  static __init struct attribute **
> @@ -4546,6 +4614,7 @@ __init int intel_pmu_init(void)
>  		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>  		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>  
> +

Unnecessary change.

>  		x86_pmu.hw_config = hsw_hw_config;
>  		x86_pmu.get_event_constraints = hsw_get_event_constraints;
>  		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
> @@ -4557,6 +4626,16 @@ __init int intel_pmu_init(void)
>  		tsx_attr = hsw_tsx_events_attrs;
>  		intel_pmu_pebs_data_source_skl(
>  			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
> +
> +		/* If our CPU haz a walnut */
> +		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> +			x86_pmu.flags |= PMU_FL_WALNUT;

I don't think the Walnut name will be publicly documented, so it will be just confusing.
Better to give it an descriptive name. TSX_COUNTER3 or something like this.


-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
  2019-02-08 18:07           ` [MODERATED] " Andi Kleen
@ 2019-02-09  0:28           ` Linus Torvalds
  2019-02-09  4:34             ` Andi Kleen
  2019-02-09  8:57             ` Peter Zijlstra
  2019-02-13  2:56           ` mark gross
  2019-02-15 23:45           ` [MODERATED] Encrypted Message Jon Masters
  3 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2019-02-09  0:28 UTC (permalink / raw)
  To: speck

On Fri, Feb 8, 2019 at 2:53 AM speck for Peter Zijlstra
<speck@linutronix.de> wrote:
>
> +static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
> +{
> +       u64 val = MSR_TFA_RTM_FORCE_ABORT * on;

Have you actually verified that this does the right thing on all
versions of gcc?

In particular, I certainly hope it doesn't cause an actual multiply instruction.

It might be better to just write it as

        u64 val = on ? MSR_TFA_RTM_FORCE_ABORT : 0;

because that's what we'd hope the code generation is anyway. Instead
of just assuming gcc DTRT.

                      Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-09  0:28           ` [MODERATED] " Linus Torvalds
@ 2019-02-09  4:34             ` Andi Kleen
  2019-02-09  8:57             ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Andi Kleen @ 2019-02-09  4:34 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 04:28:41PM -0800, speck for Linus Torvalds wrote:
> On Fri, Feb 8, 2019 at 2:53 AM speck for Peter Zijlstra
> <speck@linutronix.de> wrote:
> >
> > +static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
> > +{
> > +       u64 val = MSR_TFA_RTM_FORCE_ABORT * on;
> 
> Have you actually verified that this does the right thing on all
> versions of gcc?
> 
> In particular, I certainly hope it doesn't cause an actual multiply instruction.

On Skylake IMUL is 3 cycles latency so it should be fine in any case.

-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-09  0:28           ` [MODERATED] " Linus Torvalds
  2019-02-09  4:34             ` Andi Kleen
@ 2019-02-09  8:57             ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-09  8:57 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 04:28:41PM -0800, speck for Linus Torvalds wrote:
> On Fri, Feb 8, 2019 at 2:53 AM speck for Peter Zijlstra
> <speck@linutronix.de> wrote:
> >
> > +static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
> > +{
> > +       u64 val = MSR_TFA_RTM_FORCE_ABORT * on;
> 
> Have you actually verified that this does the right thing on all
> versions of gcc?
> 
> In particular, I certainly hope it doesn't cause an actual multiply instruction.
> 
> It might be better to just write it as
> 
>         u64 val = on ? MSR_TFA_RTM_FORCE_ABORT : 0;
> 
> because that's what we'd hope the code generation is anyway. Instead
> of just assuming gcc DTRT.

I did not verify in this instance; but I did look at code generation for

  b987ffc18fb3 ("x86/qspinlock: Fix compile error")

where we do a similar mult-by-bool to avoid if() and GCC got it right
there.

That said; I don't suppose there's anything wrong with using the ternary
operator here, but in general I try to avoid it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-08 18:07           ` [MODERATED] " Andi Kleen
@ 2019-02-11 10:40             ` Peter Zijlstra
  2019-02-11 14:06               ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-11 10:40 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 10:07:53AM -0800, speck for Andi Kleen wrote:
> On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:

> >  	 * From here on, the constraint is dynamic.
> > @@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> >  	return c;
> >  }
> >  
> > +static bool allow_tsx_force_abort = true;
> 
> The default needs more discussion.

This is what Thomas wants and makes sense to me. I'm done talking about
this.

> > +
> > +static struct event_constraint *
> > +skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> > +			  struct perf_event *event)
> > +{
> > +	struct event_constraint *c = hsw_get_event_constraints(cpuc, idx, event);
> > +
> > +	/*
> > +	 * Without TFA we must not use PMC3.
> > +	 */
> > +	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> 
> This still needs the extra changes in my patchkit to allow user/kvm opt-in/out.

It needs no such thing. Userspace has the one knob.

And I want to hear from Thomas and Paolo on what KVM wants; Andrew
already said he doesn't want this crap exposed to virt. And the less I
have to worry about virt the happier I am.

I've yet to see compelling arguments for making this more complicated.

Minimal and correct are the name of the game here; we need to backport
this because that fscking ucode default screws us over.

If you want complicated; you can try arguing that after we've gone
public.

> > @@ -4061,9 +4126,12 @@ static struct attribute *intel_pmu_caps_attrs[] = {
> >         NULL
> >  };
> >  
> > +DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
> 
> 
> I still think "enable_all_counters" is a far better name.

This is more explicit; the 'feature' is tsx_force_abort and will be
called so in /proc/cpuid.

> >  static struct attribute *intel_pmu_attrs[] = {
> >  	&dev_attr_freeze_on_smi.attr,
> >  	NULL,
> 
> This needs a comment, as done in my patch.

Sure can do.

> > +	NULL,
> >  };
> >  
> >  static __init struct attribute **
> > @@ -4546,6 +4614,7 @@ __init int intel_pmu_init(void)
> >  		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
> >  		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
> >  
> > +
> 
> Unnecessary change.

Already gone.

> >  		x86_pmu.hw_config = hsw_hw_config;
> >  		x86_pmu.get_event_constraints = hsw_get_event_constraints;
> >  		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
> > @@ -4557,6 +4626,16 @@ __init int intel_pmu_init(void)
> >  		tsx_attr = hsw_tsx_events_attrs;
> >  		intel_pmu_pebs_data_source_skl(
> >  			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
> > +
> > +		/* If our CPU haz a walnut */
> > +		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> > +			x86_pmu.flags |= PMU_FL_WALNUT;
> 
> I don't think the Walnut name will be publicly documented, so it will
> be just confusing.  Better to give it an descriptive name.
> TSX_COUNTER3 or something like this.

Then WTH expose us to it in the first place?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] performance walnuts
  2019-02-11 10:40             ` Peter Zijlstra
@ 2019-02-11 14:06               ` Thomas Gleixner
  2019-02-11 20:17                 ` [MODERATED] " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2019-02-11 14:06 UTC (permalink / raw)
  To: speck

On Mon, 11 Feb 2019, speck for Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 10:07:53AM -0800, speck for Andi Kleen wrote:
> > On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:
> 
> > >  	 * From here on, the constraint is dynamic.
> > > @@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> > >  	return c;
> > >  }
> > >  
> > > +static bool allow_tsx_force_abort = true;
> > 
> > The default needs more discussion.
> 
> This is what Thomas wants and makes sense to me. I'm done talking about
> this.

Again. The broken functionality is TSX. As the cure for TSX requires to
steal a performance counter, the ucode default of preferring TSX causes a
regression on the perf side for no good reason. This is already bad enough
for updates where the ucode comes independent of the kernel update.

But with the kernel we surely have no reason to follow the Intel marketing
decision and make the same unreasonable default. There is one known
workload which depends on TSX and it's very reasonable to make this one add
the command line option and not punish everyone else.

> > > +	/*
> > > +	 * Without TFA we must not use PMC3.
> > > +	 */
> > > +	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> > 
> > This still needs the extra changes in my patchkit to allow user/kvm opt-in/out.
> 
> It needs no such thing. Userspace has the one knob.
> 
> And I want to hear from Thomas and Paolo on what KVM wants; Andrew
> already said he doesn't want this crap exposed to virt. And the less I
> have to worry about virt the happier I am.

For virt the issue is even worse because letting the guest enable TSX
causes the host to have a counter stolen. So it's a host decision in the
first place. If the host decides that TSX has to abort, then the guest has
no choice whatsoever. If the host already decided to lose the counter then
it can allow the guest to fiddle with the abort state. Whether we want to
go there is a different question. Paolo?

> I've yet to see compelling arguments for making this more complicated.
> 
> Minimal and correct are the name of the game here; we need to backport
> this because that fscking ucode default screws us over.
> 
> If you want complicated; you can try arguing that after we've gone
> public.

Keep it simple. End of story.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-11 14:06               ` Thomas Gleixner
@ 2019-02-11 20:17                 ` Konrad Rzeszutek Wilk
  2019-02-11 23:39                   ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2019-02-11 20:17 UTC (permalink / raw)
  To: speck

On Mon, Feb 11, 2019 at 03:06:52PM +0100, speck for Thomas Gleixner wrote:
> On Mon, 11 Feb 2019, speck for Peter Zijlstra wrote:
> > On Fri, Feb 08, 2019 at 10:07:53AM -0800, speck for Andi Kleen wrote:
> > > On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:
> > 
> > > >  	 * From here on, the constraint is dynamic.
> > > > @@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> > > >  	return c;
> > > >  }
> > > >  
> > > > +static bool allow_tsx_force_abort = true;
> > > 
> > > The default needs more discussion.
> > 
> > This is what Thomas wants and makes sense to me. I'm done talking about
> > this.
> 
> Again. The broken functionality is TSX. As the cure for TSX requires to
> steal a performance counter, the ucode default of preferring TSX causes a
> regression on the perf side for no good reason. This is already bad enough
> for updates where the ucode comes independent of the kernel update.
> 
> But with the kernel we surely have no reason to follow the Intel marketing
> decision and make the same unreasonable default. There is one known
> workload which depends on TSX and it's very reasonable to make this one add
> the command line option and not punish everyone else.
> 
> > > > +	/*
> > > > +	 * Without TFA we must not use PMC3.
> > > > +	 */
> > > > +	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> > > 
> > > This still needs the extra changes in my patchkit to allow user/kvm opt-in/out.
> > 
> > It needs no such thing. Userspace has the one knob.
> > 
> > And I want to hear from Thomas and Paolo on what KVM wants; Andrew
> > already said he doesn't want this crap exposed to virt. And the less I
> > have to worry about virt the happier I am.
> 
> For virt the issue is even worse because letting the guest enable TSX
> causes the host to have a counter stolen. So it's a host decision in the
> first place. If the host decides that TSX has to abort, then the guest has
> no choice whatsoever. If the host already decided to lose the counter then
> it can allow the guest to fiddle with the abort state. Whether we want to
> go there is a different question. Paolo?

One extra wrinkle - if TSX aborting is enabled should KVM not expose any perf
counters at all to the guest? That would make this a whole lot more simpler.

And also not expose the TSX functionality (aka mask the CPUID) to the guest?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] performance walnuts
  2019-02-11 20:17                 ` [MODERATED] " Konrad Rzeszutek Wilk
@ 2019-02-11 23:39                   ` Thomas Gleixner
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Gleixner @ 2019-02-11 23:39 UTC (permalink / raw)
  To: speck

On Mon, 11 Feb 2019, speck for Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 11, 2019 at 03:06:52PM +0100, speck for Thomas Gleixner wrote:
> >
> > For virt the issue is even worse because letting the guest enable TSX
> > causes the host to have a counter stolen. So it's a host decision in the
> > first place. If the host decides that TSX has to abort, then the guest has
> > no choice whatsoever. If the host already decided to lose the counter then
> > it can allow the guest to fiddle with the abort state. Whether we want to
> > go there is a different question. Paolo?
> 
> One extra wrinkle - if TSX aborting is enabled should KVM not expose any perf
> counters at all to the guest? That would make this a whole lot more simpler.

I don't think you want to glue that together. The decision whether the host
hands out access to perf to the guest should be mostly independent of this
whole disaster.

The decision you need to make is vs. the control MSR, i.e. whether you want
to allow the guest to set/clear the TSX_ALWAYS_ABORT bit.

Therefore you need to look at the host state first:

 1) Host MSR_TFA_RTM_FORCE_ABORT = ON

     A) If the MSR is not exposed to the guest, guest TSX will always abort
     	and the guest cannot do anything about it.

     B) If the MSR is exposed, you need to decide the default state:

          a) Same as the host (ON)

	  b) Default microcode state (OFF)

	You surely can trap the write to MSR_TSX_FORCE_ABORT to track the
	guest state, but if the guest has the bit cleared, then you still
	don't know whether the guest uses TSX at all.

	So you have no idea whether counter 3 is going to be wreckaged or
	not. So in that case you pretty much can only prevent counter 3
	from being used on the host while the guest is scheduled in. Pretty
	crappy in which way you look at it.

    I really don't want to allow 1) B) because then you either create
    undefined state on the host or the host perf side gets really messy.

    So the simple and straight forward solution is here 1) A)


 2) Host MSR_TFA_RTM_FORCE_ABORT = OFF

    You already decided that counter 3 is lost, so it does not matter
    whether the guest plays around with the MSR or not.

    But for simplicity sake, I'm not opposed when you decide to not allow
    the guest to fiddle with the MSR in this case either.

    The only situation where it might be interesting to allow that in the
    guest is when you want to hand in 4 counters to the guest so the guest
    can decide itself whether it wants to use counter 3 or have TSX.

Otherwise the decision vs. the performance counters is pretty much the same
as you have now. Is the host willing to hand in all or some counters to the
guest or does the host want them for monitoring from the host side. There
is no real difference.

> And also not expose the TSX functionality (aka mask the CPUID) to the guest?
 
Well, the masked CPUID bit is not preventing the guest from using it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
  2019-02-08 18:07           ` [MODERATED] " Andi Kleen
  2019-02-09  0:28           ` [MODERATED] " Linus Torvalds
@ 2019-02-13  2:56           ` mark gross
  2019-02-15 17:32             ` mark gross
  2019-02-15 23:45           ` [MODERATED] Encrypted Message Jon Masters
  3 siblings, 1 reply; 33+ messages in thread
From: mark gross @ 2019-02-13  2:56 UTC (permalink / raw)
  To: speck

On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 10:39:50AM +0100, Peter Zijlstra wrote:
> > Ah, I think I found a way to avoid having to rely on this. Let me try.
> 
> Something like so. Can someone with access to a relevant machine test
> this?
FWIW I have just forwarded this patch to the folks testing Andi's work.  One of
those enginees is working on getting his access to this spec list at this time.

--mark
PS, I hope I don't accedently send this in clear text.  (I'm hoping mutt
does the right thing as I'm not finding the "encrypt"option...)


> If it works, I'll write a Changelog and this'll be it.
> 
> ---
>  arch/x86/events/intel/core.c       | 127 ++++++++++++++++++++++++++++++-------
>  arch/x86/events/perf_event.h       |   6 ++
>  arch/x86/include/asm/cpufeatures.h |   1 +
>  arch/x86/include/asm/msr-index.h   |   6 ++
>  4 files changed, 116 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index e0232bdb7aff..8352c2647a1b 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -1999,6 +1999,39 @@ static void intel_pmu_nhm_enable_all(int added)
>  	intel_pmu_enable_all(added);
>  }
>  
> +static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
> +{
> +	u64 val = MSR_TFA_RTM_FORCE_ABORT * on;
> +
> +	if (cpuc->tfa_shadow != val) {
> +		cpuc->tfa_shadow = val;
> +		wrmsrl(MSR_TSX_FORCE_ABORT, val);
> +	}
> +}
> +
> +static void intel_skl_commit_scheduling(struct cpu_hw_events *cpuc, int idx, int cntr)
> +{
> +	/*
> +	 * We're going to use PMC3, make sure TFA is set before we touch it.
> +	 */
> +	if (cntr == 3)
> +		intel_set_tfa(cpuc, true);
> +}
> +
> +static void intel_skl_pmu_enable_all(int added)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +
> +	/*
> +	 * If we find PMC3 is no longer used when we enable the PMU, we can
> +	 * clear TFA.
> +	 */
> +	if (!test_bit(3, cpuc->active_mask))
> +		intel_set_tfa(cpuc, false);
> +
> +	intel_pmu_enable_all(added);
> +}
> +
>  static void enable_counter_freeze(void)
>  {
>  	update_debugctlmsr(get_debugctlmsr() |
> @@ -2768,6 +2801,35 @@ intel_stop_scheduling(struct cpu_hw_events *cpuc)
>  	raw_spin_unlock(&excl_cntrs->lock);
>  }
>  
> +static struct event_constraint *
> +dyn_constraint(struct cpu_hw_events *cpuc, struct event_constraint *c, int idx)
> +{
> +	WARN_ON_ONCE(!cpuc->constraint_list);
> +
> +	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
> +		struct event_constraint *cx;
> +
> +		/*
> +		 * grab pre-allocated constraint entry
> +		 */
> +		cx = &cpuc->constraint_list[idx];
> +
> +		/*
> +		 * initialize dynamic constraint
> +		 * with static constraint
> +		 */
> +		*cx = *c;
> +
> +		/*
> +		 * mark constraint as dynamic
> +		 */
> +		cx->flags |= PERF_X86_EVENT_DYNAMIC;
> +		c = cx;
> +	}
> +
> +	return c;
> +}
> +
>  static struct event_constraint *
>  intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
>  			   int idx, struct event_constraint *c)
> @@ -2798,27 +2860,7 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
>  	 * only needed when constraint has not yet
>  	 * been cloned (marked dynamic)
>  	 */
> -	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
> -		struct event_constraint *cx;
> -
> -		/*
> -		 * grab pre-allocated constraint entry
> -		 */
> -		cx = &cpuc->constraint_list[idx];
> -
> -		/*
> -		 * initialize dynamic constraint
> -		 * with static constraint
> -		 */
> -		*cx = *c;
> -
> -		/*
> -		 * mark constraint as dynamic, so we
> -		 * can free it later on
> -		 */
> -		cx->flags |= PERF_X86_EVENT_DYNAMIC;
> -		c = cx;
> -	}
> +	c = dyn_constraint(cpuc, c, idx);
>  
>  	/*
>  	 * From here on, the constraint is dynamic.
> @@ -3345,6 +3387,26 @@ glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
>  	return c;
>  }
>  
> +static bool allow_tsx_force_abort = true;
> +
> +static struct event_constraint *
> +skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
> +			  struct perf_event *event)
> +{
> +	struct event_constraint *c = hsw_get_event_constraints(cpuc, idx, event);
> +
> +	/*
> +	 * Without TFA we must not use PMC3.
> +	 */
> +	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
> +		c = dyn_constraint(cpuc, c, idx);
> +		c->idxmsk64 &= ~(1ULL << 3);
> +		c->weight = hweight64(c->idxmsk64);
> +	}
> +
> +	return c;
> +}
> +
>  /*
>   * Broadwell:
>   *
> @@ -3440,13 +3502,15 @@ static int intel_pmu_cpu_prepare(int cpu)
>  			goto err;
>  	}
>  
> -	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
> +	if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_WALNUT)) {
>  		size_t sz = X86_PMC_IDX_MAX * sizeof(struct event_constraint);
>  
>  		cpuc->constraint_list = kzalloc(sz, GFP_KERNEL);
>  		if (!cpuc->constraint_list)
>  			goto err_shared_regs;
> +	}
>  
> +	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
>  		cpuc->excl_cntrs = allocate_excl_cntrs(cpu);
>  		if (!cpuc->excl_cntrs)
>  			goto err_constraint_list;
> @@ -3552,9 +3616,10 @@ static void free_excl_cntrs(int cpu)
>  		if (c->core_id == -1 || --c->refcnt == 0)
>  			kfree(c);
>  		cpuc->excl_cntrs = NULL;
> -		kfree(cpuc->constraint_list);
> -		cpuc->constraint_list = NULL;
>  	}
> +
> +	kfree(cpuc->constraint_list);
> +	cpuc->constraint_list = NULL;
>  }
>  
>  static void intel_pmu_cpu_dying(int cpu)
> @@ -4061,9 +4126,12 @@ static struct attribute *intel_pmu_caps_attrs[] = {
>         NULL
>  };
>  
> +DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
> +
>  static struct attribute *intel_pmu_attrs[] = {
>  	&dev_attr_freeze_on_smi.attr,
>  	NULL,
> +	NULL,
>  };
>  
>  static __init struct attribute **
> @@ -4546,6 +4614,7 @@ __init int intel_pmu_init(void)
>  		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>  		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>  
> +
>  		x86_pmu.hw_config = hsw_hw_config;
>  		x86_pmu.get_event_constraints = hsw_get_event_constraints;
>  		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
> @@ -4557,6 +4626,16 @@ __init int intel_pmu_init(void)
>  		tsx_attr = hsw_tsx_events_attrs;
>  		intel_pmu_pebs_data_source_skl(
>  			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
> +
> +		/* If our CPU haz a walnut */
> +		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> +			x86_pmu.flags |= PMU_FL_WALNUT;
> +			x86_pmu.get_event_constraints = skl_get_event_constraints;
> +			x86_pmu.enable_all = intel_skl_pmu_enable_all;
> +			x86_pmu.commit_scheduling = intel_skl_commit_scheduling;
> +			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
> +		}
> +
>  		pr_cont("Skylake events, ");
>  		name = "skylake";
>  		break;
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 78d7b7031bfc..44b3426c618e 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -242,6 +242,11 @@ struct cpu_hw_events {
>  	struct intel_excl_cntrs		*excl_cntrs;
>  	int excl_thread_id; /* 0 or 1 */
>  
> +	/*
> +	 * SKL TSX_FORCE_ABORT shadow
> +	 */
> +	int				tfa_shadow;
> +
>  	/*
>  	 * AMD specific bits
>  	 */
> @@ -676,6 +681,7 @@ do {									\
>  #define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */
>  #define PMU_FL_EXCL_ENABLED	0x8 /* exclusive counter active */
>  #define PMU_FL_PEBS_ALL		0x10 /* all events are valid PEBS events */
> +#define PMU_FL_WALNUT		0x20 /* deal with the walnut errata */
>  
>  #define EVENT_VAR(_id)  event_attr_##_id
>  #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 6d6122524711..981ff9479648 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -344,6 +344,7 @@
>  /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
>  #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
> +#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
>  #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
>  #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
>  #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 8e40c2446fd1..ca5bc0eacb95 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -666,6 +666,12 @@
>  
>  #define MSR_IA32_TSC_DEADLINE		0x000006E0
>  
> +
> +#define MSR_TSX_FORCE_ABORT		0x0000010F
> +
> +#define MSR_TFA_RTM_FORCE_ABORT_BIT	0
> +#define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
> +
>  /* P4/Xeon+ specific */
>  #define MSR_IA32_MCG_EAX		0x00000180
>  #define MSR_IA32_MCG_EBX		0x00000181

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-13  2:56           ` mark gross
@ 2019-02-15 17:32             ` mark gross
  2019-02-15 17:44               ` Peter Zijlstra
  2019-02-19 13:35               ` [MODERATED] [RFC][PATCH v2] " Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: mark gross @ 2019-02-15 17:32 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]

On Tue, Feb 12, 2019 at 06:56:34PM -0800, speck for mark gross wrote:
> On Fri, Feb 08, 2019 at 11:53:18AM +0100, speck for Peter Zijlstra wrote:
> > On Fri, Feb 08, 2019 at 10:39:50AM +0100, Peter Zijlstra wrote:
> > > Ah, I think I found a way to avoid having to rely on this. Let me try.
> > 
> > Something like so. Can someone with access to a relevant machine test
> > this?
> FWIW I have just forwarded this patch to the folks testing Andi's work.  One of
> those enginees is working on getting his access to this spec list at this time.
>

Ok Nelson ran the unit test on Peter's patch.  Oh I'm told we need to avoid
using the internal codenames in our patches.

the following is thest log and attached are 2 dmesg dumps from the failing
tests.


root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# echo 0 > /sys/devices/cpu/allow_tsx_force_abort 


root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg -C
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# ./perf stat -e '{cache-misses,cache-references,branches,branch-misses}' -a sleep 1
Killed
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg > tsx_force_abort_0.dmesg
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg -C
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# ./perf stat -e '{cache-misses,cache-references,branches}' -a sleep 1
Killed
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg > tsx_force_abort_0.dmesg-3counters
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg -C


root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# echo 1 > /sys/devices/cpu/allow_tsx_force_abort 
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# dmesg -C
root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# ./perf stat -e '{cache-misses,cache-references,branches,branch-misses}' -a sleep 1

 Performance counter stats for 'system wide':

            16,917      cache-misses              #    6.500 % of all cache refs    
           260,246      cache-references                                            
           774,346      branches                                                    
            36,353      branch-misses             #    4.69% of all branches        

       1.005494153 seconds time elapsed

root@jf1-otc-3AR3-33:/home/labuser/ocean/tools/perf# ./perf stat -e '{cache-misses,cache-references,branches}' -a sleep 1

 Performance counter stats for 'system wide':

            25,902      cache-misses              #    7.233 % of all cache refs    
           358,111      cache-references                                            
         5,683,441      branches                                                    

       1.005578086 seconds time elapsed


/home/labuser/ocean/tools/perf# ./perf --version
perf version 5.0.rc5.g088c82c



[-- Attachment #2: tsx_force_abort_0.dmesg --]
[-- Type: text/plain, Size: 5106 bytes --]

[60498.742721] BUG: unable to handle kernel paging request at ffffffffffffffd8
[60498.749676] #PF error: [WRITE]
[60498.752736] PGD ce8212067 P4D ce8212067 PUD ce8214067 PMD 0 
[60498.758395] Oops: 0002 [#9] SMP PTI
[60498.761889] CPU: 70 PID: 6847 Comm: perf Tainted: G      D W         5.0.0-rc5-1-peterz-feb8 #1
[60498.770577] Hardware name: Intel Corporation S2600WFQ/S2600WFQ, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[60498.781008] RIP: 0010:dyn_constraint.isra.15+0x26/0x60
[60498.786141] Code: 00 00 c3 90 0f 1f 44 00 00 48 85 ff 48 89 f0 74 44 8b 48 20 f6 c1 80 75 3a 48 63 d2 48 8b 30 80 c9 80 48 8d 14 92 48 8d 14 d7 <48> 89 32 48 8b 70 08 48 89 72 08 48 8b 70 10 48 89 72 10 48 8b 70
[60498.804887] RSP: 0018:ffffbe498ef3fd98 EFLAGS: 00010282
[60498.810111] RAX: ffffffffac91e0a0 RBX: 00000000ffffffff RCX: 0000000000000080
[60498.817245] RDX: ffffffffffffffd8 RSI: 000000000000000f RDI: 0000000000000000
[60498.824377] RBP: ffff9e34d32c6000 R08: ffff9e34dfaa5e60 R09: ffff9e1ce0007300
[60498.831510] R10: ffff9e34d0cc78c0 R11: ffffffffac98d3a8 R12: ffff9e34d32c6000
[60498.838642] R13: ffff9e34ad8d2000 R14: ffffffffac21a7a0 R15: 0000000000000000
[60498.845775] FS:  00007fc59591d500(0000) GS:ffff9e34dfa80000(0000) knlGS:0000000000000000
[60498.853857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60498.859605] CR2: ffffffffffffffd8 CR3: 0000002fb2966002 CR4: 00000000007606e0
[60498.866737] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60498.873870] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[60498.881002] PKRU: 55555554
[60498.883711] Call Trace:
[60498.886159]  skl_get_event_constraints+0x38/0x60
[60498.890779]  x86_pmu_event_init+0x1c0/0x210
[60498.894966]  perf_try_init_event+0x7b/0xa0
[60498.899063]  perf_event_alloc+0x7e0/0x980
[60498.903075]  __do_sys_perf_event_open+0x1f4/0xba0
[60498.907778]  do_syscall_64+0x55/0x100
[60498.911450]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[60498.916497] RIP: 0033:0x7fc593d82839
[60498.920074] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[60498.938819] RSP: 002b:00007ffeefbe5cb8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[60498.946384] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fc593d82839
[60498.953517] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 000056224565e7a8
[60498.960648] RBP: 00007ffeefbe5da0 R08: 0000000000000008 R09: 0000000000000008
[60498.967781] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
[60498.974913] R13: 0000000000000000 R14: 00005622456d7e58 R15: 000056224565e790
[60498.982046] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 bridge stp llc ebtable_filter ebtables ipmi_ssif intel_rapl skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper joydev input_leds kvm_intel mei_me lpc_ich mei ipmi_si ioatdma ipmi_devintf dca ipmi_msghandler acpi_pad acpi_power_meter mac_hid ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype xt_conntrack ip6table_filter nfsd ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast auth_rpcgss nf_nat_ftp nf_nat nfs_acl nf_conntrack_ftp sch_fq_codel lockd nf_conntrack grace nf_defrag_ipv6 parport_pc sunrpc nf_defrag_ipv4 ppdev libcrc32c iptable_filter lp parport ip_tables x_tables autofs4 hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
[60498.982084]  fb_sys_fops megaraid_sas i40e drm ahci i2c_i801 libahci wmi
[60499.075987] CR2: ffffffffffffffd8
[60499.079305] ---[ end trace 679d8c93fc6d7975 ]---
[60499.084898] RIP: 0010:dyn_constraint.isra.15+0x26/0x60
[60499.090033] Code: 00 00 c3 90 0f 1f 44 00 00 48 85 ff 48 89 f0 74 44 8b 48 20 f6 c1 80 75 3a 48 63 d2 48 8b 30 80 c9 80 48 8d 14 92 48 8d 14 d7 <48> 89 32 48 8b 70 08 48 89 72 08 48 8b 70 10 48 89 72 10 48 8b 70
[60499.108776] RSP: 0018:ffffbe49a0d17d98 EFLAGS: 00010282
[60499.114001] RAX: ffffffffac91e0a0 RBX: 00000000ffffffff RCX: 0000000000000080
[60499.121136] RDX: ffffffffffffffd8 RSI: 000000000000000f RDI: 0000000000000000
[60499.128266] RBP: ffff9e34d3480000 R08: ffff9e34df465e60 R09: ffff9e1ce0007300
[60499.135399] R10: ffff9e34d13cb740 R11: 0000000000000002 R12: ffff9e34d3480000
[60499.142531] R13: ffff9e34d1189000 R14: ffffffffac21a7a0 R15: 0000000000000000
[60499.149663] FS:  00007fc59591d500(0000) GS:ffff9e34dfa80000(0000) knlGS:0000000000000000
[60499.157748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60499.163495] CR2: ffffffffffffffd8 CR3: 0000002fb2966002 CR4: 00000000007606e0
[60499.170624] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60499.177749] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[60499.184880] PKRU: 55555554

[-- Attachment #3: tsx_force_abort_0.dmesg-3counters --]
[-- Type: text/plain, Size: 5273 bytes --]

[60557.346702] BUG: unable to handle kernel paging request at ffffffffffffffd8
[60557.353665] #PF error: [WRITE]
[60557.356721] PGD ce8212067 P4D ce8212067 PUD ce8214067 PMD 0 
[60557.362379] Oops: 0002 [#10] SMP PTI
[60557.365962] CPU: 70 PID: 6853 Comm: perf Tainted: G      D W         5.0.0-rc5-1-peterz-feb8 #1
[60557.374651] Hardware name: Intel Corporation S2600WFQ/S2600WFQ, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[60557.385081] RIP: 0010:dyn_constraint.isra.15+0x26/0x60
[60557.390214] Code: 00 00 c3 90 0f 1f 44 00 00 48 85 ff 48 89 f0 74 44 8b 48 20 f6 c1 80 75 3a 48 63 d2 48 8b 30 80 c9 80 48 8d 14 92 48 8d 14 d7 <48> 89 32 48 8b 70 08 48 89 72 08 48 8b 70 10 48 89 72 10 48 8b 70
[60557.408960] RSP: 0018:ffffbe498f73fd98 EFLAGS: 00010282
[60557.414185] RAX: ffffffffac91e0a0 RBX: 00000000ffffffff RCX: 0000000000000080
[60557.421317] RDX: ffffffffffffffd8 RSI: 000000000000000f RDI: 0000000000000000
[60557.428450] RBP: ffff9e34d2878000 R08: ffff9e34dfaa5e60 R09: ffff9e1ce0007300
[60557.435582] R10: ffff9e34d0cc7bc0 R11: 0000000000000002 R12: ffff9e34d2878000
[60557.442715] R13: ffff9e34ad8d1800 R14: ffffffffac21a7a0 R15: 0000000000000000
[60557.449848] FS:  00007fd918475500(0000) GS:ffff9e34dfa80000(0000) knlGS:0000000000000000
[60557.457931] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60557.463675] CR2: ffffffffffffffd8 CR3: 0000002fb2960001 CR4: 00000000007606e0
[60557.470800] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60557.477933] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[60557.485064] PKRU: 55555554
[60557.487778] Call Trace:
[60557.490230]  skl_get_event_constraints+0x38/0x60
[60557.494850]  x86_pmu_event_init+0x1c0/0x210
[60557.499037]  perf_try_init_event+0x7b/0xa0
[60557.503134]  perf_event_alloc+0x7e0/0x980
[60557.507147]  __do_sys_perf_event_open+0x1f4/0xba0
[60557.511854]  do_syscall_64+0x55/0x100
[60557.515522]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[60557.520572] RIP: 0033:0x7fd9168da839
[60557.524148] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
[60557.542894] RSP: 002b:00007ffd45e7a6a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[60557.550457] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fd9168da839
[60557.557590] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 000055a9685a67a8
[60557.564722] RBP: 00007ffd45e7a790 R08: 0000000000000008 R09: 0000000000000008
[60557.571854] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000000
[60557.578986] R13: 0000000000000000 R14: 000055a9686292b8 R15: 000055a9685a6790
[60557.586117] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 bridge stp llc ebtable_filter ebtables ipmi_ssif intel_rapl skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper joydev input_leds kvm_intel mei_me lpc_ich mei ipmi_si ioatdma ipmi_devintf dca ipmi_msghandler acpi_pad acpi_power_meter mac_hid ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype xt_conntrack ip6table_filter nfsd ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast auth_rpcgss nf_nat_ftp nf_nat nfs_acl nf_conntrack_ftp sch_fq_codel lockd nf_conntrack grace nf_defrag_ipv6 parport_pc sunrpc nf_defrag_ipv4 ppdev libcrc32c iptable_filter lp parport ip_tables x_tables autofs4 hid_generic usbhid hid ast i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
[60557.586152]  fb_sys_fops megaraid_sas i40e drm ahci i2c_i801 libahci wmi
[60557.680060] CR2: ffffffffffffffd8
[60557.683379] ---[ end trace 679d8c93fc6d7976 ]---
[60557.688978] RIP: 0010:dyn_constraint.isra.15+0x26/0x60
[60557.694115] Code: 00 00 c3 90 0f 1f 44 00 00 48 85 ff 48 89 f0 74 44 8b 48 20 f6 c1 80 75 3a 48 63 d2 48 8b 30 80 c9 80 48 8d 14 92 48 8d 14 d7 <48> 89 32 48 8b 70 08 48 89 72 08 48 8b 70 10 48 89 72 10 48 8b 70
[60557.712859] RSP: 0018:ffffbe49a0d17d98 EFLAGS: 00010282
[60557.718084] RAX: ffffffffac91e0a0 RBX: 00000000ffffffff RCX: 0000000000000080
[60557.725216] RDX: ffffffffffffffd8 RSI: 000000000000000f RDI: 0000000000000000
[60557.732349] RBP: ffff9e34d3480000 R08: ffff9e34df465e60 R09: ffff9e1ce0007300
[60557.739481] R10: ffff9e34d13cb740 R11: 0000000000000002 R12: ffff9e34d3480000
[60557.746613] R13: ffff9e34d1189000 R14: ffffffffac21a7a0 R15: 0000000000000000
[60557.753745] FS:  00007fd918475500(0000) GS:ffff9e34dfa80000(0000) knlGS:0000000000000000
[60557.761830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60557.767577] CR2: ffffffffffffffd8 CR3: 0000002fb2960001 CR4: 00000000007606e0
[60557.774710] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60557.781838] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[60557.788963] PKRU: 55555554
[60565.348258] [UFW BLOCK] IN=eno2 OUT= MAC=01:00:5e:00:00:01:00:04:96:82:75:3d:08:00 SRC=10.54.39.2 DST=224.0.0.1 LEN=32 TOS=0x00 PREC=0xC0 TTL=1 ID=14738 PROTO=2 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-15 17:32             ` mark gross
@ 2019-02-15 17:44               ` Peter Zijlstra
  2019-02-15 20:47                 ` mark gross
  2019-02-19 13:35               ` [MODERATED] [RFC][PATCH v2] " Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-15 17:44 UTC (permalink / raw)
  To: speck

On Fri, Feb 15, 2019 at 09:32:47AM -0800, speck for mark gross wrote:

> Oh I'm told we need to avoid
> using the internal codenames in our patches.

Then don't expose us to them. I will forever think of broken TSX
whenever I see a walnut now.

Anyway; I'll stare at the fails next week.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Re: [RFC][PATCH] performance walnuts
  2019-02-15 17:44               ` Peter Zijlstra
@ 2019-02-15 20:47                 ` mark gross
  2019-02-15 21:33                   ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: mark gross @ 2019-02-15 20:47 UTC (permalink / raw)
  To: speck

On Fri, Feb 15, 2019 at 06:44:10PM +0100, speck for Peter Zijlstra wrote:
> On Fri, Feb 15, 2019 at 09:32:47AM -0800, speck for mark gross wrote:
> 
> > Oh I'm told we need to avoid
> > using the internal codenames in our patches.
> 
> Then don't expose us to them. I will forever think of broken TSX
> whenever I see a walnut now.
I joined the list recently.  Was walnut codeword exposed on this list earlier
or did you get to know about walnut as a codeword via your Intel access?  


> 
> Anyway; I'll stare at the fails next week.
thanks

--mark

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC][PATCH] performance walnuts
  2019-02-15 20:47                 ` mark gross
@ 2019-02-15 21:33                   ` Thomas Gleixner
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Gleixner @ 2019-02-15 21:33 UTC (permalink / raw)
  To: speck

Mark,

On Fri, 15 Feb 2019, speck for mark gross wrote:

> On Fri, Feb 15, 2019 at 06:44:10PM +0100, speck for Peter Zijlstra wrote:
> > On Fri, Feb 15, 2019 at 09:32:47AM -0800, speck for mark gross wrote:
> > 
> > > Oh I'm told we need to avoid
> > > using the internal codenames in our patches.
> > 
> > Then don't expose us to them. I will forever think of broken TSX
> > whenever I see a walnut now.
> I joined the list recently.  Was walnut codeword exposed on this list earlier
> or did you get to know about walnut as a codeword via your Intel access?

The nut was exposed by Intel to the list:

  Date: Mon, 4 Feb 2019
  From: Andi Kleen <ak@linux.intel.com>
  Subject: PERFv1

  Walnut is an functional (not security) issue with TSX. The upcoming
  microcode updates on Skylake may corrupt perfmon counter 3
  when RTM transactions are used.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] Encrypted Message
  2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
                             ` (2 preceding siblings ...)
  2019-02-13  2:56           ` mark gross
@ 2019-02-15 23:45           ` Jon Masters
  3 siblings, 0 replies; 33+ messages in thread
From: Jon Masters @ 2019-02-15 23:45 UTC (permalink / raw)
  To: speck

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/rfc822-headers; protected-headers="v1", Size: 132 bytes --]

From: Jon Masters <jcm@redhat.com>
To: speck for Peter Zijlstra <speck@linutronix.de>
Subject: Re: [RFC][PATCH] performance walnuts

[-- Attachment #2: Type: text/plain, Size: 944 bytes --]

On 2/8/19 5:53 AM, speck for Peter Zijlstra wrote:
> +static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
> +{
> +	u64 val = MSR_TFA_RTM_FORCE_ABORT * on;
> +
> +	if (cpuc->tfa_shadow != val) {
> +		cpuc->tfa_shadow = val;
> +		wrmsrl(MSR_TSX_FORCE_ABORT, val);
> +	}
> +}

Ok let me ask a stupid question.

This MSR is exposed on a given core. What's the impact (if any) on
*other* cores that might be using TSX? For example, suppose I'm running
an application using RTM on one core while another application on
another core begins profiling. What impact does the impact of this MSR
write have on other cores? (Architecturally).

I'm assuming the implementation of HLE relies on whatever you're doing
fitting into the local core's cache and you just abort on any snoop,
etc. so it ought to be fairly self contained, but I want to know.

Jon.

-- 
Computer Architect | Sent with my Fedora powered laptop


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [MODERATED] [RFC][PATCH v2] performance walnuts
  2019-02-15 17:32             ` mark gross
  2019-02-15 17:44               ` Peter Zijlstra
@ 2019-02-19 13:35               ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2019-02-19 13:35 UTC (permalink / raw)
  To: speck

On Fri, Feb 15, 2019 at 09:32:47AM -0800, speck for mark gross wrote:

> [60498.742721] BUG: unable to handle kernel paging request at ffffffffffffffd8

There should've been a WARN before this.

Anyway, how's this?

It should probably be split into 4 patches or so, but lets see if this
works now. I tried on my IVB by killing the HT errata and forcing these
bits on and using wrmsrl_safe().

The problem was that the fake cpuc used for validating event groups
was using different cpuc allocation code than the real ones.

---
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1995,7 +1995,7 @@ static int x86_pmu_commit_txn(struct pmu
  */
 static void free_fake_cpuc(struct cpu_hw_events *cpuc)
 {
-	kfree(cpuc->shared_regs);
+	intel_cpuc_finish(cpuc);
 	kfree(cpuc);
 }
 
@@ -2007,14 +2007,11 @@ static struct cpu_hw_events *allocate_fa
 	cpuc = kzalloc(sizeof(*cpuc), GFP_KERNEL);
 	if (!cpuc)
 		return ERR_PTR(-ENOMEM);
-
-	/* only needed, if we have extra_regs */
-	if (x86_pmu.extra_regs) {
-		cpuc->shared_regs = allocate_shared_regs(cpu);
-		if (!cpuc->shared_regs)
-			goto error;
-	}
 	cpuc->is_fake = 1;
+
+	if (intel_cpuc_prepare(cpuc, cpu))
+		goto error;
+
 	return cpuc;
 error:
 	free_fake_cpuc(cpuc);
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2000,6 +2000,39 @@ static void intel_pmu_nhm_enable_all(int
 	intel_pmu_enable_all(added);
 }
 
+static void intel_set_tfa(struct cpu_hw_events *cpuc, bool on)
+{
+	u64 val = on ? MSR_TFA_RTM_FORCE_ABORT : 0;
+
+	if (cpuc->tfa_shadow != val) {
+		cpuc->tfa_shadow = val;
+		wrmsrl(MSR_TSX_FORCE_ABORT, val);
+	}
+}
+
+static void intel_skl_commit_scheduling(struct cpu_hw_events *cpuc, int idx, int cntr)
+{
+	/*
+	 * We're going to use PMC3, make sure TFA is set before we touch it.
+	 */
+	if (cntr == 3 && !cpuc->is_fake)
+		intel_set_tfa(cpuc, true);
+}
+
+static void intel_skl_pmu_enable_all(int added)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * If we find PMC3 is no longer used when we enable the PMU, we can
+	 * clear TFA.
+	 */
+	if (!test_bit(3, cpuc->active_mask))
+		intel_set_tfa(cpuc, false);
+
+	intel_pmu_enable_all(added);
+}
+
 static void enable_counter_freeze(void)
 {
 	update_debugctlmsr(get_debugctlmsr() |
@@ -2770,6 +2803,35 @@ intel_stop_scheduling(struct cpu_hw_even
 }
 
 static struct event_constraint *
+dyn_constraint(struct cpu_hw_events *cpuc, struct event_constraint *c, int idx)
+{
+	WARN_ON_ONCE(!cpuc->constraint_list);
+
+	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
+		struct event_constraint *cx;
+
+		/*
+		 * grab pre-allocated constraint entry
+		 */
+		cx = &cpuc->constraint_list[idx];
+
+		/*
+		 * initialize dynamic constraint
+		 * with static constraint
+		 */
+		*cx = *c;
+
+		/*
+		 * mark constraint as dynamic
+		 */
+		cx->flags |= PERF_X86_EVENT_DYNAMIC;
+		c = cx;
+	}
+
+	return c;
+}
+
+static struct event_constraint *
 intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
 			   int idx, struct event_constraint *c)
 {
@@ -2799,27 +2861,7 @@ intel_get_excl_constraints(struct cpu_hw
 	 * only needed when constraint has not yet
 	 * been cloned (marked dynamic)
 	 */
-	if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {
-		struct event_constraint *cx;
-
-		/*
-		 * grab pre-allocated constraint entry
-		 */
-		cx = &cpuc->constraint_list[idx];
-
-		/*
-		 * initialize dynamic constraint
-		 * with static constraint
-		 */
-		*cx = *c;
-
-		/*
-		 * mark constraint as dynamic, so we
-		 * can free it later on
-		 */
-		cx->flags |= PERF_X86_EVENT_DYNAMIC;
-		c = cx;
-	}
+	c = dyn_constraint(cpuc, c, idx);
 
 	/*
 	 * From here on, the constraint is dynamic.
@@ -3357,6 +3399,26 @@ glp_get_event_constraints(struct cpu_hw_
 	return c;
 }
 
+static bool allow_tsx_force_abort = true;
+
+static struct event_constraint *
+skl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+			  struct perf_event *event)
+{
+	struct event_constraint *c = hsw_get_event_constraints(cpuc, idx, event);
+
+	/*
+	 * Without TFA we must not use PMC3.
+	 */
+	if (!allow_tsx_force_abort && test_bit(3, c->idxmsk)) {
+		c = dyn_constraint(cpuc, c, idx);
+		c->idxmsk64 &= ~(1ULL << 3);
+		c->weight = hweight64(c->idxmsk64);
+	}
+
+	return c;
+}
+
 /*
  * Broadwell:
  *
@@ -3410,7 +3472,7 @@ ssize_t intel_event_sysfs_show(char *pag
 	return x86_event_sysfs_show(page, config, event);
 }
 
-struct intel_shared_regs *allocate_shared_regs(int cpu)
+static struct intel_shared_regs *allocate_shared_regs(int cpu)
 {
 	struct intel_shared_regs *regs;
 	int i;
@@ -3442,23 +3504,24 @@ static struct intel_excl_cntrs *allocate
 	return c;
 }
 
-static int intel_pmu_cpu_prepare(int cpu)
-{
-	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 
+int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu)
+{
 	if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
 		cpuc->shared_regs = allocate_shared_regs(cpu);
 		if (!cpuc->shared_regs)
 			goto err;
 	}
 
-	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
+	if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA)) {
 		size_t sz = X86_PMC_IDX_MAX * sizeof(struct event_constraint);
 
-		cpuc->constraint_list = kzalloc(sz, GFP_KERNEL);
+		cpuc->constraint_list = kzalloc_node(sz, GFP_KERNEL, cpu_to_node(cpu));
 		if (!cpuc->constraint_list)
 			goto err_shared_regs;
+	}
 
+	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
 		cpuc->excl_cntrs = allocate_excl_cntrs(cpu);
 		if (!cpuc->excl_cntrs)
 			goto err_constraint_list;
@@ -3480,6 +3543,11 @@ static int intel_pmu_cpu_prepare(int cpu
 	return -ENOMEM;
 }
 
+static int intel_pmu_cpu_prepare(int cpu)
+{
+	return intel_cpuc_prepare(&per_cpu(cpu_hw_events, cpu), cpu);
+}
+
 static void flip_smm_bit(void *data)
 {
 	unsigned long set = *(unsigned long *)data;
@@ -3554,9 +3622,8 @@ static void intel_pmu_cpu_starting(int c
 	}
 }
 
-static void free_excl_cntrs(int cpu)
+static void free_excl_cntrs(struct cpu_hw_events *cpuc)
 {
-	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	struct intel_excl_cntrs *c;
 
 	c = cpuc->excl_cntrs;
@@ -3564,9 +3631,10 @@ static void free_excl_cntrs(int cpu)
 		if (c->core_id == -1 || --c->refcnt == 0)
 			kfree(c);
 		cpuc->excl_cntrs = NULL;
-		kfree(cpuc->constraint_list);
-		cpuc->constraint_list = NULL;
 	}
+
+	kfree(cpuc->constraint_list);
+	cpuc->constraint_list = NULL;
 }
 
 static void intel_pmu_cpu_dying(int cpu)
@@ -3577,9 +3645,8 @@ static void intel_pmu_cpu_dying(int cpu)
 		disable_counter_freeze();
 }
 
-static void intel_pmu_cpu_dead(int cpu)
+void intel_cpuc_finish(struct cpu_hw_events *cpuc)
 {
-	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	struct intel_shared_regs *pc;
 
 	pc = cpuc->shared_regs;
@@ -3589,7 +3656,12 @@ static void intel_pmu_cpu_dead(int cpu)
 		cpuc->shared_regs = NULL;
 	}
 
-	free_excl_cntrs(cpu);
+	free_excl_cntrs(cpuc);
+}
+
+static void intel_pmu_cpu_dead(int cpu)
+{
+	intel_cpuc_finish(&per_cpu(cpu_hw_events, cpu));
 }
 
 static void intel_pmu_sched_task(struct perf_event_context *ctx,
@@ -4107,9 +4179,12 @@ static struct attribute *intel_pmu_caps_
        NULL
 };
 
+DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
 	NULL,
+	NULL,
 };
 
 static __init struct attribute **
@@ -4596,6 +4671,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 
+
 		x86_pmu.hw_config = hsw_hw_config;
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
@@ -4607,6 +4683,15 @@ __init int intel_pmu_init(void)
 		tsx_attr = hsw_tsx_events_attrs;
 		intel_pmu_pebs_data_source_skl(
 			boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X);
+
+		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+			x86_pmu.flags |= PMU_FL_TFA;
+			x86_pmu.get_event_constraints = skl_get_event_constraints;
+			x86_pmu.enable_all = intel_skl_pmu_enable_all;
+			x86_pmu.commit_scheduling = intel_skl_commit_scheduling;
+			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+		}
+
 		pr_cont("Skylake events, ");
 		name = "skylake";
 		break;
@@ -4758,7 +4843,7 @@ static __init int fixup_ht_bug(void)
 	hardlockup_detector_perf_restart();
 
 	for_each_online_cpu(c)
-		free_excl_cntrs(c);
+		free_excl_cntrs(&per_cpu(cpu_hw_events, c));
 
 	cpus_read_unlock();
 	pr_info("PMU erratum BJ122, BV98, HSD29 workaround disabled, HT off\n");
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -243,6 +243,11 @@ struct cpu_hw_events {
 	int excl_thread_id; /* 0 or 1 */
 
 	/*
+	 * SKL TSX_FORCE_ABORT shadow
+	 */
+	u64				tfa_shadow;
+
+	/*
 	 * AMD specific bits
 	 */
 	struct amd_nb			*amd_nb;
@@ -682,6 +687,7 @@ do {									\
 #define PMU_FL_EXCL_CNTRS	0x4 /* has exclusive counter requirements  */
 #define PMU_FL_EXCL_ENABLED	0x8 /* exclusive counter active */
 #define PMU_FL_PEBS_ALL		0x10 /* all events are valid PEBS events */
+#define PMU_FL_TFA		0x20 /* deal with TSX force abort */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
@@ -890,7 +896,8 @@ struct event_constraint *
 x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event);
 
-struct intel_shared_regs *allocate_shared_regs(int cpu);
+int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu);
+void intel_cpuc_finish(struct cpu_hw_events *cpuc);
 
 int intel_pmu_init(void);
 
@@ -1026,9 +1033,13 @@ static inline int intel_pmu_init(void)
 	return 0;
 }
 
-static inline struct intel_shared_regs *allocate_shared_regs(int cpu)
+static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)
+{
+	return 0;
+}
+
+static inline void intel_cpuc_finish(struct cpu_hw_event *cpuc)
 {
-	return NULL;
 }
 
 static inline int is_ht_workaround_enabled(void)
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */
 #define X86_FEATURE_AVX512_4VNNIW	(18*32+ 2) /* AVX-512 Neural Network Instructions */
 #define X86_FEATURE_AVX512_4FMAPS	(18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */
+#define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_PCONFIG		(18*32+18) /* Intel PCONFIG */
 #define X86_FEATURE_SPEC_CTRL		(18*32+26) /* "" Speculation Control (IBRS + IBPB) */
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -666,6 +666,12 @@
 
 #define MSR_IA32_TSC_DEADLINE		0x000006E0
 
+
+#define MSR_TSX_FORCE_ABORT		0x0000010F
+
+#define MSR_TFA_RTM_FORCE_ABORT_BIT	0
+#define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
+
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
 #define MSR_IA32_MCG_EBX		0x00000181

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-02-19 13:35 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07 23:41 [MODERATED] [PATCH v3 0/6] PERFv3 Andi Kleen
2019-02-07 23:41 ` [MODERATED] [PATCH v3 1/6] PERFv3 Andi Kleen
2019-02-08  8:45   ` [MODERATED] " Peter Zijlstra
2019-02-07 23:41 ` [MODERATED] [PATCH v3 2/6] PERFv3 Andi Kleen
2019-02-08  0:51   ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Andrew Cooper
2019-02-08  9:01     ` Peter Zijlstra
2019-02-08  9:31       ` [MODERATED] Re: [PATCH " Andrew Cooper
2019-02-08  9:39       ` [MODERATED] Re: [SUSPECTED SPAM][PATCH " Peter Zijlstra
2019-02-08 10:53         ` [MODERATED] [RFC][PATCH] performance walnuts Peter Zijlstra
2019-02-08 18:07           ` [MODERATED] " Andi Kleen
2019-02-11 10:40             ` Peter Zijlstra
2019-02-11 14:06               ` Thomas Gleixner
2019-02-11 20:17                 ` [MODERATED] " Konrad Rzeszutek Wilk
2019-02-11 23:39                   ` Thomas Gleixner
2019-02-09  0:28           ` [MODERATED] " Linus Torvalds
2019-02-09  4:34             ` Andi Kleen
2019-02-09  8:57             ` Peter Zijlstra
2019-02-13  2:56           ` mark gross
2019-02-15 17:32             ` mark gross
2019-02-15 17:44               ` Peter Zijlstra
2019-02-15 20:47                 ` mark gross
2019-02-15 21:33                   ` Thomas Gleixner
2019-02-19 13:35               ` [MODERATED] [RFC][PATCH v2] " Peter Zijlstra
2019-02-15 23:45           ` [MODERATED] Encrypted Message Jon Masters
2019-02-08  8:50   ` [MODERATED] Re: [PATCH v3 2/6] PERFv3 Peter Zijlstra
2019-02-08 17:26     ` Andi Kleen
2019-02-07 23:41 ` [MODERATED] [PATCH v3 3/6] PERFv3 Andi Kleen
2019-02-08  9:02   ` [MODERATED] " Peter Zijlstra
2019-02-07 23:41 ` [MODERATED] [PATCH v3 4/6] PERFv3 Andi Kleen
2019-02-07 23:41 ` [MODERATED] [PATCH v3 5/6] PERFv3 Andi Kleen
2019-02-08  0:54   ` [MODERATED] " Andrew Cooper
2019-02-07 23:41 ` [MODERATED] [PATCH v3 6/6] PERFv3 Andi Kleen
2019-02-08  9:07   ` [MODERATED] " Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).