linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes
@ 2019-04-08 17:32 Stephane Eranian
  2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian
  2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian
  0 siblings, 2 replies; 13+ messages in thread
From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

This short patch series improves the TFA patch series by adding a
guarantee to users each time the allow_force_tsx_abort (TFA) sysctl
control knob is modified. 

The current TFA support in perf_events operates as follow:
 - TFA=1
   The PMU has priority over TSX, if PMC3 is needed, then TSX transactions
   are forced to abort. PMU has access to PMC3 and can schedule events on it.

 - TFA=0
   TSX has priority over PMU. If PMC3 is needed for an event, then the event
   must be scheduled on another counter. PMC3 is not available.

When a sysadmin modifies TFA, the current code base does not change anything
to the events measured at the time nor the actual MSR controlling TFA. If the
kernel transitions from TFA=1 to TFA=0, nothing happens until the events are
descheduled on context switch, multiplexing or termination of measurement.
That means the TSX transactions still fail until then. There is no easy way
to evaluate how long this can take.

This patch series addresses this issue by rescheduling the events as part of the
sysctl changes. That way, there is the guarantee that no more perf_events events
are running on PMC3 by the time the write() syscall (from the echo) returns, and
that TSX transactions may succeed from then on. Similarly, when transitioning
from TFA=0 to TFA=1, the events are rescheduled and can use PMC3 immediately if
needed and TSX transactions systematically abort, by the time the write() syscall
returns.

To make this work, the patch uses an existing reschedule function in the generic
code, ctx_resched(). In V2, we export a new function called perf_ctx_resched()
which takes care of locking the contexts and invoking ctx_resched().

The patch adds a x86_get_pmu() call which is less than ideal, but I am open to
suggestions here.

In V2, we also switched from ksttoul() to kstrtobool() and added the proper
get_online_cpus()/put_online_cpus().

Signed-off-by: Stephane Eranian <eranian@google.com>


Stephane Eranian (2):
  perf/core: add perf_ctx_resched() as global function
  perf/x86/intel: force resched when TFA sysctl is modified

 arch/x86/events/core.c       |  4 +++
 arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h |  1 +
 include/linux/perf_event.h   | 14 ++++++++++
 kernel/events/core.c         | 18 ++++++------
 5 files changed, 79 insertions(+), 11 deletions(-)

-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function
  2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian
@ 2019-04-08 17:32 ` Stephane Eranian
  2019-04-15 15:45   ` Peter Zijlstra
  2019-04-16 11:33   ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian
  2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian
  1 sibling, 2 replies; 13+ messages in thread
From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

This patch add perf_ctx_resched() a global function that can be called
to force rescheduling of events based on event types. The function locks
both cpuctx and task_ctx internally. This will be used by a subsequent patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
Change-Id: Icbc05e5f461fd6e091b46778fe62b23f308e2be7
---
 include/linux/perf_event.h | 14 ++++++++++++++
 kernel/events/core.c       | 18 +++++++++---------
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 085a95e2582a..ee8a275df0ed 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -822,6 +822,15 @@ struct bpf_perf_event_data_kern {
 	struct perf_event *event;
 };
 
+enum event_type_t {
+	EVENT_FLEXIBLE = 0x1,
+	EVENT_PINNED = 0x2,
+	EVENT_TIME = 0x4,
+	/* see ctx_resched() for details */
+	EVENT_CPU = 0x8,
+	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
+};
+
 #ifdef CONFIG_CGROUP_PERF
 
 /*
@@ -888,6 +897,11 @@ extern void perf_sched_cb_dec(struct pmu *pmu);
 extern void perf_sched_cb_inc(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
+
+extern void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+			     struct perf_event_context *task_ctx,
+			     enum event_type_t event_type);
+
 extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
 extern int perf_event_release_kernel(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dfc4bab0b02b..30474064ec22 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -354,15 +354,6 @@ static void event_function_local(struct perf_event *event, event_f func, void *d
 	(PERF_SAMPLE_BRANCH_KERNEL |\
 	 PERF_SAMPLE_BRANCH_HV)
 
-enum event_type_t {
-	EVENT_FLEXIBLE = 0x1,
-	EVENT_PINNED = 0x2,
-	EVENT_TIME = 0x4,
-	/* see ctx_resched() for details */
-	EVENT_CPU = 0x8,
-	EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED,
-};
-
 /*
  * perf_sched_events : >0 events exist
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
@@ -2477,6 +2468,15 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 	perf_pmu_enable(cpuctx->ctx.pmu);
 }
 
+void perf_ctx_resched(struct perf_cpu_context *cpuctx,
+		      struct perf_event_context *task_ctx,
+		      enum event_type_t event_type)
+{
+	perf_ctx_lock(cpuctx, task_ctx);
+	ctx_resched(cpuctx, task_ctx, event_type);
+	perf_ctx_unlock(cpuctx, task_ctx);
+}
+
 /*
  * Cross CPU call to install and enable a performance event
  *
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified
  2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian
  2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian
@ 2019-04-08 17:32 ` Stephane Eranian
  2019-04-15 15:57   ` Peter Zijlstra
  2019-04-16 11:33   ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian
  1 sibling, 2 replies; 13+ messages in thread
From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).

$ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
     1.000123979    125,768,725,208      branches
     1.000562520    125,631,000,456      branches
     1.000942898    125,487,114,291      branches
     1.001333316    125,323,363,620      branches
     2.004721306    125,514,968,546      branches
     2.005114560    125,511,110,861      branches
     2.005482722    125,510,132,724      branches
     2.005851245    125,508,967,086      branches
     3.006323475    125,166,570,648      branches
     3.006709247    125,165,650,056      branches
     3.007086605    125,164,639,142      branches
     3.007459298    125,164,402,912      branches
     4.007922698    125,045,577,140      branches
     4.008310775    125,046,804,324      branches
     4.008670814    125,048,265,111      branches
     4.009039251    125,048,677,611      branches
     5.009503373    125,122,240,217      branches
     5.009897067    125,122,450,517      branches

Then on another connection, sysadmin does:
$ echo  1 >/sys/devices/cpu/allow_tsx_force_abort

Then perf stat adjusts the events immediately:

     5.010286029    125,121,393,483      branches
     5.010646308    125,120,556,786      branches
     6.011113588    124,963,351,832      branches
     6.011510331    124,964,267,566      branches
     6.011889913    124,964,829,130      branches
     6.012262996    124,965,841,156      branches
     7.012708299    124,419,832,234      branches [79.69%]
     7.012847908    124,416,363,853      branches [79.73%]
     7.013225462    124,400,723,712      branches [79.73%]
     7.013598191    124,376,154,434      branches [79.70%]
     8.014089834    124,250,862,693      branches [74.98%]
     8.014481363    124,267,539,139      branches [74.94%]
     8.014856006    124,259,519,786      branches [74.98%]
     8.014980848    124,225,457,969      branches [75.04%]
     9.015464576    124,204,235,423      branches [75.03%]
     9.015858587    124,204,988,490      branches [75.04%]
     9.016243680    124,220,092,486      branches [74.99%]
     9.016620104    124,231,260,146      branches [74.94%]

And vice-versa if the syadmin does:
$ echo  0 >/sys/devices/cpu/allow_tsx_force_abort

Events are again spread over the 4 counters:

    10.017096277    124,276,230,565      branches [74.96%]
    10.017237209    124,228,062,171      branches [75.03%]
    10.017478637    124,178,780,626      branches [75.03%]
    10.017853402    124,198,316,177      branches [75.03%]
    11.018334423    124,602,418,933      branches [85.40%]
    11.018722584    124,602,921,320      branches [85.42%]
    11.019095621    124,603,956,093      branches [85.42%]
    11.019467742    124,595,273,783      branches [85.42%]
    12.019945736    125,110,114,864      branches
    12.020330764    125,109,334,472      branches
    12.020688740    125,109,818,865      branches
    12.021054020    125,108,594,014      branches
    13.021516774    125,109,164,018      branches
    13.021903640    125,108,794,510      branches
    13.022270770    125,107,756,978      branches
    13.022630819    125,109,380,471      branches
    14.023114989    125,133,140,817      branches
    14.023501880    125,133,785,858      branches
    14.023868339    125,133,852,700      branches

Signed-off-by: Stephane Eranian <eranian@google.com>
Change-Id: Ib443265edce31b93ca4d10fe7695c05d00a7178e
---
 arch/x86/events/core.c       |  4 +++
 arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 87b50f4be201..fdd106267fd2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event)
 	return event->pmu == &pmu;
 }
 
+struct pmu *x86_get_pmu(void)
+{
+	return &pmu;
+}
 /*
  * Event scheduler state:
  *
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1403c05e25e2..20698e84c388 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4156,6 +4156,53 @@ static ssize_t freeze_on_smi_store(struct device *cdev,
 	return count;
 }
 
+static void update_tfa_sched(void *ignored)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct pmu *pmu = x86_get_pmu();
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+	struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+	/*
+	 * check if PMC3 is used
+	 * and if so force schedule out for all event types all contexts
+	 */
+	if (test_bit(3, cpuc->active_mask))
+		perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+}
+
+static ssize_t show_sysctl_tfa(struct device *cdev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
+}
+
+static ssize_t set_sysctl_tfa(struct device *cdev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	bool val;
+	ssize_t ret;
+
+	ret = kstrtobool(buf, &val);
+	if (ret)
+		return ret;
+
+	/* no change */
+	if (val == allow_tsx_force_abort)
+		return count;
+
+	allow_tsx_force_abort = val;
+
+	get_online_cpus();
+	on_each_cpu(update_tfa_sched, NULL, 1);
+	put_online_cpus();
+
+	return count;
+}
+
+
 static DEVICE_ATTR_RW(freeze_on_smi);
 
 static ssize_t branches_show(struct device *cdev,
@@ -4188,7 +4235,9 @@ static struct attribute *intel_pmu_caps_attrs[] = {
        NULL
 };
 
-static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+static DEVICE_ATTR(allow_tsx_force_abort, 0644,
+		   show_sysctl_tfa,
+		   set_sysctl_tfa);
 
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
@@ -4697,7 +4746,7 @@ __init int intel_pmu_init(void)
 			x86_pmu.get_event_constraints = tfa_get_event_constraints;
 			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
 			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
-			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr;
 		}
 
 		pr_cont("Skylake events, ");
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index b4d41829da4f..49ba8171ad7a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {		\
 	.event_str_ht	= ht,						\
 }
 
+struct pmu *x86_get_pmu(void);
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
-- 
2.21.0.392.gf8f6787159e-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function
  2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian
@ 2019-04-15 15:45   ` Peter Zijlstra
  2019-04-15 15:53     ` Peter Zijlstra
  2019-04-16 11:33   ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2019-04-15 15:45 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote:
> This patch add perf_ctx_resched() a global function that can be called
> to force rescheduling of events based on event types. The function locks
> both cpuctx and task_ctx internally. This will be used by a subsequent patch.

I took out the task_ctx and event_type arguments.

---
Subject: perf/core: Add perf_ctx_resched() as global function
From: Stephane Eranian <eranian@google.com>
Date: Mon, 8 Apr 2019 10:32:51 -0700

This patch add perf_ctx_resched() a global function that can be called
to force rescheduling of events. The function locks both cpuctx and
task_ctx internally. This will be used by a subsequent patch.

Cc: mingo@elte.hu
Cc: ak@linux.intel.com
Cc: jolsa@redhat.com
Cc: tglx@linutronix.de
Cc: kan.liang@intel.com
Cc: nelson.dsouza@intel.com
Cc: tonyj@suse.com
Signed-off-by: Stephane Eranian <eranian@google.com>
[peterz: simplified calling convention]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com
---
 include/linux/perf_event.h |    3 +++
 kernel/events/core.c       |    9 +++++++++
 2 files changed, 12 insertions(+)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu
 extern void perf_sched_cb_inc(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
+
+extern void perf_ctx_resched(struct perf_cpu_context *cpuctx);
+
 extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
 extern int perf_event_release_kernel(struct perf_event *event);
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2478,6 +2478,15 @@ static void ctx_resched(struct perf_cpu_
 	perf_pmu_enable(cpuctx->ctx.pmu);
 }
 
+void perf_ctx_resched(struct perf_cpu_context *cpuctx)
+{
+	struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+	perf_ctx_lock(cpuctx, task_ctx);
+	ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+	perf_ctx_unlock(cpuctx, task_ctx);
+}
+
 /*
  * Cross CPU call to install and enable a performance event
  *

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function
  2019-04-15 15:45   ` Peter Zijlstra
@ 2019-04-15 15:53     ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-04-15 15:53 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

On Mon, Apr 15, 2019 at 05:45:37PM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote:
> > This patch add perf_ctx_resched() a global function that can be called
> > to force rescheduling of events based on event types. The function locks
> > both cpuctx and task_ctx internally. This will be used by a subsequent patch.
> 
> I took out the task_ctx and event_type arguments.

> +void perf_ctx_resched(struct perf_cpu_context *cpuctx)
> +{
> +	struct perf_event_context *task_ctx = cpuctx->task_ctx;
> +
> +	perf_ctx_lock(cpuctx, task_ctx);
> +	ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
> +	perf_ctx_unlock(cpuctx, task_ctx);
> +}

Hmm... I might just rework the thing to:

  perf_pmu_resched(struct pmu *pmu);

As that is what we want; esp. in the light of that whole core context
rewrite that i'm still sitting on :/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified
  2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian
@ 2019-04-15 15:57   ` Peter Zijlstra
  2019-04-15 17:20     ` Stephane Eranian
  2019-04-16 11:33   ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2019-04-15 15:57 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj

On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote:
> +static ssize_t set_sysctl_tfa(struct device *cdev,
> +			      struct device_attribute *attr,
> +			      const char *buf, size_t count)
> +{
> +	bool val;
> +	ssize_t ret;
> +
> +	ret = kstrtobool(buf, &val);
> +	if (ret)
> +		return ret;
> +
> +	/* no change */
> +	if (val == allow_tsx_force_abort)
> +		return count;
> +
> +	allow_tsx_force_abort = val;
> +
> +	get_online_cpus();
> +	on_each_cpu(update_tfa_sched, NULL, 1);
> +	put_online_cpus();
> +
> +	return count;
> +}

So we care about concurrent writing to that file?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified
  2019-04-15 15:57   ` Peter Zijlstra
@ 2019-04-15 17:20     ` Stephane Eranian
  0 siblings, 0 replies; 13+ messages in thread
From: Stephane Eranian @ 2019-04-15 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Thomas Gleixner, Andi Kleen, Liang, Kan, mingo,
	nelson.dsouza, Jiri Olsa, tonyj

On Mon, Apr 15, 2019 at 8:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote:
> > +static ssize_t set_sysctl_tfa(struct device *cdev,
> > +                           struct device_attribute *attr,
> > +                           const char *buf, size_t count)
> > +{
> > +     bool val;
> > +     ssize_t ret;
> > +
> > +     ret = kstrtobool(buf, &val);
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* no change */
> > +     if (val == allow_tsx_force_abort)
> > +             return count;
> > +
> > +     allow_tsx_force_abort = val;
> > +
> > +     get_online_cpus();
> > +     on_each_cpu(update_tfa_sched, NULL, 1);
> > +     put_online_cpus();
> > +
> > +     return count;
> > +}
>
> So we care about concurrent writing to that file?
Not likely but we care about seeing the effects on event scheduling
before the sysctl returns.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip:perf/core] perf/core: Add perf_pmu_resched() as global function
  2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian
  2019-04-15 15:45   ` Peter Zijlstra
@ 2019-04-16 11:33   ` tip-bot for Stephane Eranian
  1 sibling, 0 replies; 13+ messages in thread
From: tip-bot for Stephane Eranian @ 2019-04-16 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, jolsa, vincent.weaver, eranian, acme, hpa,
	mingo, torvalds, alexander.shishkin, peterz

Commit-ID:  c68d224e5ed15605e651e2482c6ffd95915ddf58
Gitweb:     https://git.kernel.org/tip/c68d224e5ed15605e651e2482c6ffd95915ddf58
Author:     Stephane Eranian <eranian@google.com>
AuthorDate: Mon, 8 Apr 2019 10:32:51 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 16 Apr 2019 12:19:34 +0200

perf/core: Add perf_pmu_resched() as global function

This patch add perf_pmu_resched() a global function that can be called
to force rescheduling of events for a given PMU. The function locks
both cpuctx and task_ctx internally. This will be used by a subsequent
patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
[ Simplified the calling convention. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Cc: nelson.dsouza@intel.com
Cc: tonyj@suse.com
Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/perf_event.h |  3 +++
 kernel/events/core.c       | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 085a95e2582a..f3864e1c5569 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu *pmu);
 extern void perf_sched_cb_inc(struct pmu *pmu);
 extern int perf_event_task_disable(void);
 extern int perf_event_task_enable(void);
+
+extern void perf_pmu_resched(struct pmu *pmu);
+
 extern int perf_event_refresh(struct perf_event *event, int refresh);
 extern void perf_event_update_userpage(struct perf_event *event);
 extern int perf_event_release_kernel(struct perf_event *event);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 30a572e4c6f1..abbd4b3b96c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2478,6 +2478,16 @@ static void ctx_resched(struct perf_cpu_context *cpuctx,
 	perf_pmu_enable(cpuctx->ctx.pmu);
 }
 
+void perf_pmu_resched(struct pmu *pmu)
+{
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+	struct perf_event_context *task_ctx = cpuctx->task_ctx;
+
+	perf_ctx_lock(cpuctx, task_ctx);
+	ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU);
+	perf_ctx_unlock(cpuctx, task_ctx);
+}
+
 /*
  * Cross CPU call to install and enable a performance event
  *

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified
  2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian
  2019-04-15 15:57   ` Peter Zijlstra
@ 2019-04-16 11:33   ` tip-bot for Stephane Eranian
  2019-04-16 16:28     ` Vince Weaver
  1 sibling, 1 reply; 13+ messages in thread
From: tip-bot for Stephane Eranian @ 2019-04-16 11:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, vincent.weaver, eranian, mingo, alexander.shishkin, peterz,
	tglx, torvalds, linux-kernel, jolsa, hpa

Commit-ID:  f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
Gitweb:     https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
Author:     Stephane Eranian <eranian@google.com>
AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 16 Apr 2019 12:19:35 +0200

perf/x86/intel: Force resched when TFA sysctl is modified

This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
event is using PMC3 when the echo command returns. Vice-Versa, when TFA
is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).

  $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
     1.000123979    125,768,725,208      branches
     1.000562520    125,631,000,456      branches
     1.000942898    125,487,114,291      branches
     1.001333316    125,323,363,620      branches
     2.004721306    125,514,968,546      branches
     2.005114560    125,511,110,861      branches
     2.005482722    125,510,132,724      branches
     2.005851245    125,508,967,086      branches
     3.006323475    125,166,570,648      branches
     3.006709247    125,165,650,056      branches
     3.007086605    125,164,639,142      branches
     3.007459298    125,164,402,912      branches
     4.007922698    125,045,577,140      branches
     4.008310775    125,046,804,324      branches
     4.008670814    125,048,265,111      branches
     4.009039251    125,048,677,611      branches
     5.009503373    125,122,240,217      branches
     5.009897067    125,122,450,517      branches

Then on another connection, sysadmin does:

  $ echo  1 >/sys/devices/cpu/allow_tsx_force_abort

Then perf stat adjusts the events immediately:

     5.010286029    125,121,393,483      branches
     5.010646308    125,120,556,786      branches
     6.011113588    124,963,351,832      branches
     6.011510331    124,964,267,566      branches
     6.011889913    124,964,829,130      branches
     6.012262996    124,965,841,156      branches
     7.012708299    124,419,832,234      branches [79.69%]
     7.012847908    124,416,363,853      branches [79.73%]
     7.013225462    124,400,723,712      branches [79.73%]
     7.013598191    124,376,154,434      branches [79.70%]
     8.014089834    124,250,862,693      branches [74.98%]
     8.014481363    124,267,539,139      branches [74.94%]
     8.014856006    124,259,519,786      branches [74.98%]
     8.014980848    124,225,457,969      branches [75.04%]
     9.015464576    124,204,235,423      branches [75.03%]
     9.015858587    124,204,988,490      branches [75.04%]
     9.016243680    124,220,092,486      branches [74.99%]
     9.016620104    124,231,260,146      branches [74.94%]

And vice-versa if the syadmin does:

  $ echo  0 >/sys/devices/cpu/allow_tsx_force_abort

Events are again spread over the 4 counters:

    10.017096277    124,276,230,565      branches [74.96%]
    10.017237209    124,228,062,171      branches [75.03%]
    10.017478637    124,178,780,626      branches [75.03%]
    10.017853402    124,198,316,177      branches [75.03%]
    11.018334423    124,602,418,933      branches [85.40%]
    11.018722584    124,602,921,320      branches [85.42%]
    11.019095621    124,603,956,093      branches [85.42%]
    11.019467742    124,595,273,783      branches [85.42%]
    12.019945736    125,110,114,864      branches
    12.020330764    125,109,334,472      branches
    12.020688740    125,109,818,865      branches
    12.021054020    125,108,594,014      branches
    13.021516774    125,109,164,018      branches
    13.021903640    125,108,794,510      branches
    13.022270770    125,107,756,978      branches
    13.022630819    125,109,380,471      branches
    14.023114989    125,133,140,817      branches
    14.023501880    125,133,785,858      branches
    14.023868339    125,133,852,700      branches

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Cc: nelson.dsouza@intel.com
Cc: tonyj@suse.com
Link: https://lkml.kernel.org/r/20190408173252.37932-3-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/core.c       |  4 ++++
 arch/x86/events/intel/core.c | 50 ++++++++++++++++++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h |  1 +
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 87b50f4be201..fdd106267fd2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event)
 	return event->pmu == &pmu;
 }
 
+struct pmu *x86_get_pmu(void)
+{
+	return &pmu;
+}
 /*
  * Event scheduler state:
  *
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1bb59c4c59f2..8265b5026a19 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4156,6 +4156,50 @@ done:
 	return count;
 }
 
+static void update_tfa_sched(void *ignored)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * check if PMC3 is used
+	 * and if so force schedule out for all event types all contexts
+	 */
+	if (test_bit(3, cpuc->active_mask))
+		perf_pmu_resched(x86_get_pmu());
+}
+
+static ssize_t show_sysctl_tfa(struct device *cdev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	return snprintf(buf, 40, "%d\n", allow_tsx_force_abort);
+}
+
+static ssize_t set_sysctl_tfa(struct device *cdev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t count)
+{
+	bool val;
+	ssize_t ret;
+
+	ret = kstrtobool(buf, &val);
+	if (ret)
+		return ret;
+
+	/* no change */
+	if (val == allow_tsx_force_abort)
+		return count;
+
+	allow_tsx_force_abort = val;
+
+	get_online_cpus();
+	on_each_cpu(update_tfa_sched, NULL, 1);
+	put_online_cpus();
+
+	return count;
+}
+
+
 static DEVICE_ATTR_RW(freeze_on_smi);
 
 static ssize_t branches_show(struct device *cdev,
@@ -4188,7 +4232,9 @@ static struct attribute *intel_pmu_caps_attrs[] = {
        NULL
 };
 
-static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort);
+static DEVICE_ATTR(allow_tsx_force_abort, 0644,
+		   show_sysctl_tfa,
+		   set_sysctl_tfa);
 
 static struct attribute *intel_pmu_attrs[] = {
 	&dev_attr_freeze_on_smi.attr,
@@ -4697,7 +4743,7 @@ __init int intel_pmu_init(void)
 			x86_pmu.get_event_constraints = tfa_get_event_constraints;
 			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
 			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
-			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr;
+			intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr;
 		}
 
 		pr_cont("Skylake events, ");
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index e544d83ea4b4..9e474a5f3b86 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = {		\
 	.event_str_ht	= ht,						\
 }
 
+struct pmu *x86_get_pmu(void);
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified
  2019-04-16 11:33   ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian
@ 2019-04-16 16:28     ` Vince Weaver
  2019-04-16 17:41       ` Peter Zijlstra
  2019-04-17  6:06       ` Ingo Molnar
  0 siblings, 2 replies; 13+ messages in thread
From: Vince Weaver @ 2019-04-16 16:28 UTC (permalink / raw)
  To: vincent.weaver, mingo, eranian, peterz, alexander.shishkin, acme,
	hpa, torvalds, tglx, linux-kernel, jolsa
  Cc: linux-tip-commits

On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:

> Commit-ID:  f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> Gitweb:     https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> Author:     Stephane Eranian <eranian@google.com>
> AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> 
> perf/x86/intel: Force resched when TFA sysctl is modified

What's TFA?  Tuna-fish-alarm?  Nowhere in the commit or in the code does 
it ever say what a TFA is or why we'd want to resched when it is modified.

Vince

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified
  2019-04-16 16:28     ` Vince Weaver
@ 2019-04-16 17:41       ` Peter Zijlstra
  2019-04-17  6:06       ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2019-04-16 17:41 UTC (permalink / raw)
  To: Vince Weaver
  Cc: mingo, eranian, alexander.shishkin, acme, hpa, torvalds, tglx,
	linux-kernel, jolsa, linux-tip-commits

On Tue, Apr 16, 2019 at 12:28:00PM -0400, Vince Weaver wrote:
> On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
> 
> > Commit-ID:  f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Gitweb:     https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Author:     Stephane Eranian <eranian@google.com>
> > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> > 
> > perf/x86/intel: Force resched when TFA sysctl is modified
> 
> What's TFA?  Tuna-fish-alarm?  Nowhere in the commit or in the code does 
> it ever say what a TFA is or why we'd want to resched when it is modified.

See commit: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")

Author: Peter Zijlstra (Intel) <peterz@infradead.org>
Date:   Tue Mar 5 22:23:18 2019 +0100

    perf/x86/intel: Implement support for TSX Force Abort
    
    Skylake (and later) will receive a microcode update to address a TSX
    errata. This microcode will, on execution of a TSX instruction
    (speculative or not) use (clobber) PMC3. This update will also provide
    a new MSR to change this behaviour along with a CPUID bit to enumerate
    the presence of this new MSR.
    
    When the MSR gets set; the microcode will no longer use PMC3 but will
    Force Abort every TSX transaction (upon executing COMMIT).
    
    When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
    PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
    unused.
    
    When TFA is not allowed; clear PMC3 from all constraints such that it
    will not get used.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified
  2019-04-16 16:28     ` Vince Weaver
  2019-04-16 17:41       ` Peter Zijlstra
@ 2019-04-17  6:06       ` Ingo Molnar
  2019-04-18 21:50         ` Stephane Eranian
  1 sibling, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2019-04-17  6:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: eranian, peterz, alexander.shishkin, acme, hpa, torvalds, tglx,
	linux-kernel, jolsa, linux-tip-commits


* Vince Weaver <vincent.weaver@maine.edu> wrote:

> On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
> 
> > Commit-ID:  f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Gitweb:     https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > Author:     Stephane Eranian <eranian@google.com>
> > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> > 
> > perf/x86/intel: Force resched when TFA sysctl is modified
> 
> What's TFA?  Tuna-fish-alarm?

Heh, I wish! :-)

> [...] Nowhere in the commit or in the code does it ever say what a TFA 
> is or why we'd want to resched when it is modified.

Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general 
dislike to random acrynyms ...

Peter and me usually fix such changelog context omissions, but this one 
slipped through. :-/

The commit is too deep down perf/core already to rebase it just for the 
changelog, but if we are going to rebase it for some functional reason 
I'll take care of it next time around.

TFA. (Thanks For your Assistance. :-)

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified
  2019-04-17  6:06       ` Ingo Molnar
@ 2019-04-18 21:50         ` Stephane Eranian
  0 siblings, 0 replies; 13+ messages in thread
From: Stephane Eranian @ 2019-04-18 21:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Vince Weaver, Peter Zijlstra, Alexander Shishkin,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Linus Torvalds,
	Thomas Gleixner, LKML, Jiri Olsa, linux-tip-commits

Vince

On Tue, Apr 16, 2019 at 11:06 PM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> > On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote:
> >
> > > Commit-ID:  f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > > Gitweb:     https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574
> > > Author:     Stephane Eranian <eranian@google.com>
> > > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700
> > > Committer:  Ingo Molnar <mingo@kernel.org>
> > > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200
> > >
> > > perf/x86/intel: Force resched when TFA sysctl is modified
> >
> > What's TFA?  Tuna-fish-alarm?
>
> Heh, I wish! :-)
>
Sorry about the confusion. I was just trying to mimic the function
names that Peter used
in the code. Hard to fit the whole sysctl name in the title, anyway.

> > [...] Nowhere in the commit or in the code does it ever say what a TFA
> > is or why we'd want to resched when it is modified.
>
> Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general
> dislike to random acrynyms ...
>
> Peter and me usually fix such changelog context omissions, but this one
> slipped through. :-/
>
> The commit is too deep down perf/core already to rebase it just for the
> changelog, but if we are going to rebase it for some functional reason
> I'll take care of it next time around.
>
> TFA. (Thanks For your Assistance. :-)
>
>         Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-18 21:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian
2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian
2019-04-15 15:45   ` Peter Zijlstra
2019-04-15 15:53     ` Peter Zijlstra
2019-04-16 11:33   ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian
2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian
2019-04-15 15:57   ` Peter Zijlstra
2019-04-15 17:20     ` Stephane Eranian
2019-04-16 11:33   ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian
2019-04-16 16:28     ` Vince Weaver
2019-04-16 17:41       ` Peter Zijlstra
2019-04-17  6:06       ` Ingo Molnar
2019-04-18 21:50         ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).