* [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes @ 2019-04-08 17:32 Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian 0 siblings, 2 replies; 13+ messages in thread From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw) To: linux-kernel Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj This short patch series improves the TFA patch series by adding a guarantee to users each time the allow_force_tsx_abort (TFA) sysctl control knob is modified. The current TFA support in perf_events operates as follow: - TFA=1 The PMU has priority over TSX, if PMC3 is needed, then TSX transactions are forced to abort. PMU has access to PMC3 and can schedule events on it. - TFA=0 TSX has priority over PMU. If PMC3 is needed for an event, then the event must be scheduled on another counter. PMC3 is not available. When a sysadmin modifies TFA, the current code base does not change anything to the events measured at the time nor the actual MSR controlling TFA. If the kernel transitions from TFA=1 to TFA=0, nothing happens until the events are descheduled on context switch, multiplexing or termination of measurement. That means the TSX transactions still fail until then. There is no easy way to evaluate how long this can take. This patch series addresses this issue by rescheduling the events as part of the sysctl changes. That way, there is the guarantee that no more perf_events events are running on PMC3 by the time the write() syscall (from the echo) returns, and that TSX transactions may succeed from then on. Similarly, when transitioning from TFA=0 to TFA=1, the events are rescheduled and can use PMC3 immediately if needed and TSX transactions systematically abort, by the time the write() syscall returns. To make this work, the patch uses an existing reschedule function in the generic code, ctx_resched(). In V2, we export a new function called perf_ctx_resched() which takes care of locking the contexts and invoking ctx_resched(). The patch adds a x86_get_pmu() call which is less than ideal, but I am open to suggestions here. In V2, we also switched from ksttoul() to kstrtobool() and added the proper get_online_cpus()/put_online_cpus(). Signed-off-by: Stephane Eranian <eranian@google.com> Stephane Eranian (2): perf/core: add perf_ctx_resched() as global function perf/x86/intel: force resched when TFA sysctl is modified arch/x86/events/core.c | 4 +++ arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++-- arch/x86/events/perf_event.h | 1 + include/linux/perf_event.h | 14 ++++++++++ kernel/events/core.c | 18 ++++++------ 5 files changed, 79 insertions(+), 11 deletions(-) -- 2.21.0.392.gf8f6787159e-goog ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function 2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian @ 2019-04-08 17:32 ` Stephane Eranian 2019-04-15 15:45 ` Peter Zijlstra 2019-04-16 11:33 ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian 1 sibling, 2 replies; 13+ messages in thread From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw) To: linux-kernel Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj This patch add perf_ctx_resched() a global function that can be called to force rescheduling of events based on event types. The function locks both cpuctx and task_ctx internally. This will be used by a subsequent patch. Signed-off-by: Stephane Eranian <eranian@google.com> Change-Id: Icbc05e5f461fd6e091b46778fe62b23f308e2be7 --- include/linux/perf_event.h | 14 ++++++++++++++ kernel/events/core.c | 18 +++++++++--------- 2 files changed, 23 insertions(+), 9 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 085a95e2582a..ee8a275df0ed 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -822,6 +822,15 @@ struct bpf_perf_event_data_kern { struct perf_event *event; }; +enum event_type_t { + EVENT_FLEXIBLE = 0x1, + EVENT_PINNED = 0x2, + EVENT_TIME = 0x4, + /* see ctx_resched() for details */ + EVENT_CPU = 0x8, + EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED, +}; + #ifdef CONFIG_CGROUP_PERF /* @@ -888,6 +897,11 @@ extern void perf_sched_cb_dec(struct pmu *pmu); extern void perf_sched_cb_inc(struct pmu *pmu); extern int perf_event_task_disable(void); extern int perf_event_task_enable(void); + +extern void perf_ctx_resched(struct perf_cpu_context *cpuctx, + struct perf_event_context *task_ctx, + enum event_type_t event_type); + extern int perf_event_refresh(struct perf_event *event, int refresh); extern void perf_event_update_userpage(struct perf_event *event); extern int perf_event_release_kernel(struct perf_event *event); diff --git a/kernel/events/core.c b/kernel/events/core.c index dfc4bab0b02b..30474064ec22 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -354,15 +354,6 @@ static void event_function_local(struct perf_event *event, event_f func, void *d (PERF_SAMPLE_BRANCH_KERNEL |\ PERF_SAMPLE_BRANCH_HV) -enum event_type_t { - EVENT_FLEXIBLE = 0x1, - EVENT_PINNED = 0x2, - EVENT_TIME = 0x4, - /* see ctx_resched() for details */ - EVENT_CPU = 0x8, - EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED, -}; - /* * perf_sched_events : >0 events exist * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu @@ -2477,6 +2468,15 @@ static void ctx_resched(struct perf_cpu_context *cpuctx, perf_pmu_enable(cpuctx->ctx.pmu); } +void perf_ctx_resched(struct perf_cpu_context *cpuctx, + struct perf_event_context *task_ctx, + enum event_type_t event_type) +{ + perf_ctx_lock(cpuctx, task_ctx); + ctx_resched(cpuctx, task_ctx, event_type); + perf_ctx_unlock(cpuctx, task_ctx); +} + /* * Cross CPU call to install and enable a performance event * -- 2.21.0.392.gf8f6787159e-goog ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function 2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian @ 2019-04-15 15:45 ` Peter Zijlstra 2019-04-15 15:53 ` Peter Zijlstra 2019-04-16 11:33 ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2019-04-15 15:45 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote: > This patch add perf_ctx_resched() a global function that can be called > to force rescheduling of events based on event types. The function locks > both cpuctx and task_ctx internally. This will be used by a subsequent patch. I took out the task_ctx and event_type arguments. --- Subject: perf/core: Add perf_ctx_resched() as global function From: Stephane Eranian <eranian@google.com> Date: Mon, 8 Apr 2019 10:32:51 -0700 This patch add perf_ctx_resched() a global function that can be called to force rescheduling of events. The function locks both cpuctx and task_ctx internally. This will be used by a subsequent patch. Cc: mingo@elte.hu Cc: ak@linux.intel.com Cc: jolsa@redhat.com Cc: tglx@linutronix.de Cc: kan.liang@intel.com Cc: nelson.dsouza@intel.com Cc: tonyj@suse.com Signed-off-by: Stephane Eranian <eranian@google.com> [peterz: simplified calling convention] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com --- include/linux/perf_event.h | 3 +++ kernel/events/core.c | 9 +++++++++ 2 files changed, 12 insertions(+) --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu extern void perf_sched_cb_inc(struct pmu *pmu); extern int perf_event_task_disable(void); extern int perf_event_task_enable(void); + +extern void perf_ctx_resched(struct perf_cpu_context *cpuctx); + extern int perf_event_refresh(struct perf_event *event, int refresh); extern void perf_event_update_userpage(struct perf_event *event); extern int perf_event_release_kernel(struct perf_event *event); --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2478,6 +2478,15 @@ static void ctx_resched(struct perf_cpu_ perf_pmu_enable(cpuctx->ctx.pmu); } +void perf_ctx_resched(struct perf_cpu_context *cpuctx) +{ + struct perf_event_context *task_ctx = cpuctx->task_ctx; + + perf_ctx_lock(cpuctx, task_ctx); + ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU); + perf_ctx_unlock(cpuctx, task_ctx); +} + /* * Cross CPU call to install and enable a performance event * ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function 2019-04-15 15:45 ` Peter Zijlstra @ 2019-04-15 15:53 ` Peter Zijlstra 0 siblings, 0 replies; 13+ messages in thread From: Peter Zijlstra @ 2019-04-15 15:53 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj On Mon, Apr 15, 2019 at 05:45:37PM +0200, Peter Zijlstra wrote: > On Mon, Apr 08, 2019 at 10:32:51AM -0700, Stephane Eranian wrote: > > This patch add perf_ctx_resched() a global function that can be called > > to force rescheduling of events based on event types. The function locks > > both cpuctx and task_ctx internally. This will be used by a subsequent patch. > > I took out the task_ctx and event_type arguments. > +void perf_ctx_resched(struct perf_cpu_context *cpuctx) > +{ > + struct perf_event_context *task_ctx = cpuctx->task_ctx; > + > + perf_ctx_lock(cpuctx, task_ctx); > + ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU); > + perf_ctx_unlock(cpuctx, task_ctx); > +} Hmm... I might just rework the thing to: perf_pmu_resched(struct pmu *pmu); As that is what we want; esp. in the light of that whole core context rewrite that i'm still sitting on :/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:perf/core] perf/core: Add perf_pmu_resched() as global function 2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian 2019-04-15 15:45 ` Peter Zijlstra @ 2019-04-16 11:33 ` tip-bot for Stephane Eranian 1 sibling, 0 replies; 13+ messages in thread From: tip-bot for Stephane Eranian @ 2019-04-16 11:33 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, tglx, jolsa, vincent.weaver, eranian, acme, hpa, mingo, torvalds, alexander.shishkin, peterz Commit-ID: c68d224e5ed15605e651e2482c6ffd95915ddf58 Gitweb: https://git.kernel.org/tip/c68d224e5ed15605e651e2482c6ffd95915ddf58 Author: Stephane Eranian <eranian@google.com> AuthorDate: Mon, 8 Apr 2019 10:32:51 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 16 Apr 2019 12:19:34 +0200 perf/core: Add perf_pmu_resched() as global function This patch add perf_pmu_resched() a global function that can be called to force rescheduling of events for a given PMU. The function locks both cpuctx and task_ctx internally. This will be used by a subsequent patch. Signed-off-by: Stephane Eranian <eranian@google.com> [ Simplified the calling convention. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Cc: nelson.dsouza@intel.com Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- include/linux/perf_event.h | 3 +++ kernel/events/core.c | 10 ++++++++++ 2 files changed, 13 insertions(+) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 085a95e2582a..f3864e1c5569 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -888,6 +888,9 @@ extern void perf_sched_cb_dec(struct pmu *pmu); extern void perf_sched_cb_inc(struct pmu *pmu); extern int perf_event_task_disable(void); extern int perf_event_task_enable(void); + +extern void perf_pmu_resched(struct pmu *pmu); + extern int perf_event_refresh(struct perf_event *event, int refresh); extern void perf_event_update_userpage(struct perf_event *event); extern int perf_event_release_kernel(struct perf_event *event); diff --git a/kernel/events/core.c b/kernel/events/core.c index 30a572e4c6f1..abbd4b3b96c2 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2478,6 +2478,16 @@ static void ctx_resched(struct perf_cpu_context *cpuctx, perf_pmu_enable(cpuctx->ctx.pmu); } +void perf_pmu_resched(struct pmu *pmu) +{ + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + struct perf_event_context *task_ctx = cpuctx->task_ctx; + + perf_ctx_lock(cpuctx, task_ctx); + ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU); + perf_ctx_unlock(cpuctx, task_ctx); +} + /* * Cross CPU call to install and enable a performance event * ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified 2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian @ 2019-04-08 17:32 ` Stephane Eranian 2019-04-15 15:57 ` Peter Zijlstra 2019-04-16 11:33 ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian 1 sibling, 2 replies; 13+ messages in thread From: Stephane Eranian @ 2019-04-08 17:32 UTC (permalink / raw) To: linux-kernel Cc: peterz, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU event is using PMC3 when the echo command returns. Vice-Versa, when TFA is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing). $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches 1.000123979 125,768,725,208 branches 1.000562520 125,631,000,456 branches 1.000942898 125,487,114,291 branches 1.001333316 125,323,363,620 branches 2.004721306 125,514,968,546 branches 2.005114560 125,511,110,861 branches 2.005482722 125,510,132,724 branches 2.005851245 125,508,967,086 branches 3.006323475 125,166,570,648 branches 3.006709247 125,165,650,056 branches 3.007086605 125,164,639,142 branches 3.007459298 125,164,402,912 branches 4.007922698 125,045,577,140 branches 4.008310775 125,046,804,324 branches 4.008670814 125,048,265,111 branches 4.009039251 125,048,677,611 branches 5.009503373 125,122,240,217 branches 5.009897067 125,122,450,517 branches Then on another connection, sysadmin does: $ echo 1 >/sys/devices/cpu/allow_tsx_force_abort Then perf stat adjusts the events immediately: 5.010286029 125,121,393,483 branches 5.010646308 125,120,556,786 branches 6.011113588 124,963,351,832 branches 6.011510331 124,964,267,566 branches 6.011889913 124,964,829,130 branches 6.012262996 124,965,841,156 branches 7.012708299 124,419,832,234 branches [79.69%] 7.012847908 124,416,363,853 branches [79.73%] 7.013225462 124,400,723,712 branches [79.73%] 7.013598191 124,376,154,434 branches [79.70%] 8.014089834 124,250,862,693 branches [74.98%] 8.014481363 124,267,539,139 branches [74.94%] 8.014856006 124,259,519,786 branches [74.98%] 8.014980848 124,225,457,969 branches [75.04%] 9.015464576 124,204,235,423 branches [75.03%] 9.015858587 124,204,988,490 branches [75.04%] 9.016243680 124,220,092,486 branches [74.99%] 9.016620104 124,231,260,146 branches [74.94%] And vice-versa if the syadmin does: $ echo 0 >/sys/devices/cpu/allow_tsx_force_abort Events are again spread over the 4 counters: 10.017096277 124,276,230,565 branches [74.96%] 10.017237209 124,228,062,171 branches [75.03%] 10.017478637 124,178,780,626 branches [75.03%] 10.017853402 124,198,316,177 branches [75.03%] 11.018334423 124,602,418,933 branches [85.40%] 11.018722584 124,602,921,320 branches [85.42%] 11.019095621 124,603,956,093 branches [85.42%] 11.019467742 124,595,273,783 branches [85.42%] 12.019945736 125,110,114,864 branches 12.020330764 125,109,334,472 branches 12.020688740 125,109,818,865 branches 12.021054020 125,108,594,014 branches 13.021516774 125,109,164,018 branches 13.021903640 125,108,794,510 branches 13.022270770 125,107,756,978 branches 13.022630819 125,109,380,471 branches 14.023114989 125,133,140,817 branches 14.023501880 125,133,785,858 branches 14.023868339 125,133,852,700 branches Signed-off-by: Stephane Eranian <eranian@google.com> Change-Id: Ib443265edce31b93ca4d10fe7695c05d00a7178e --- arch/x86/events/core.c | 4 +++ arch/x86/events/intel/core.c | 53 ++++++++++++++++++++++++++++++++++-- arch/x86/events/perf_event.h | 1 + 3 files changed, 56 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 87b50f4be201..fdd106267fd2 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event) return event->pmu == &pmu; } +struct pmu *x86_get_pmu(void) +{ + return &pmu; +} /* * Event scheduler state: * diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 1403c05e25e2..20698e84c388 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4156,6 +4156,53 @@ static ssize_t freeze_on_smi_store(struct device *cdev, return count; } +static void update_tfa_sched(void *ignored) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + struct pmu *pmu = x86_get_pmu(); + struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); + struct perf_event_context *task_ctx = cpuctx->task_ctx; + + /* + * check if PMC3 is used + * and if so force schedule out for all event types all contexts + */ + if (test_bit(3, cpuc->active_mask)) + perf_ctx_resched(cpuctx, task_ctx, EVENT_ALL|EVENT_CPU); +} + +static ssize_t show_sysctl_tfa(struct device *cdev, + struct device_attribute *attr, + char *buf) +{ + return snprintf(buf, 40, "%d\n", allow_tsx_force_abort); +} + +static ssize_t set_sysctl_tfa(struct device *cdev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + bool val; + ssize_t ret; + + ret = kstrtobool(buf, &val); + if (ret) + return ret; + + /* no change */ + if (val == allow_tsx_force_abort) + return count; + + allow_tsx_force_abort = val; + + get_online_cpus(); + on_each_cpu(update_tfa_sched, NULL, 1); + put_online_cpus(); + + return count; +} + + static DEVICE_ATTR_RW(freeze_on_smi); static ssize_t branches_show(struct device *cdev, @@ -4188,7 +4235,9 @@ static struct attribute *intel_pmu_caps_attrs[] = { NULL }; -static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort); +static DEVICE_ATTR(allow_tsx_force_abort, 0644, + show_sysctl_tfa, + set_sysctl_tfa); static struct attribute *intel_pmu_attrs[] = { &dev_attr_freeze_on_smi.attr, @@ -4697,7 +4746,7 @@ __init int intel_pmu_init(void) x86_pmu.get_event_constraints = tfa_get_event_constraints; x86_pmu.enable_all = intel_tfa_pmu_enable_all; x86_pmu.commit_scheduling = intel_tfa_commit_scheduling; - intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr; + intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr; } pr_cont("Skylake events, "); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index b4d41829da4f..49ba8171ad7a 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \ .event_str_ht = ht, \ } +struct pmu *x86_get_pmu(void); extern struct x86_pmu x86_pmu __read_mostly; static inline bool x86_pmu_has_lbr_callstack(void) -- 2.21.0.392.gf8f6787159e-goog ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified 2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian @ 2019-04-15 15:57 ` Peter Zijlstra 2019-04-15 17:20 ` Stephane Eranian 2019-04-16 11:33 ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2019-04-15 15:57 UTC (permalink / raw) To: Stephane Eranian Cc: linux-kernel, tglx, ak, kan.liang, mingo, nelson.dsouza, jolsa, tonyj On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote: > +static ssize_t set_sysctl_tfa(struct device *cdev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + bool val; > + ssize_t ret; > + > + ret = kstrtobool(buf, &val); > + if (ret) > + return ret; > + > + /* no change */ > + if (val == allow_tsx_force_abort) > + return count; > + > + allow_tsx_force_abort = val; > + > + get_online_cpus(); > + on_each_cpu(update_tfa_sched, NULL, 1); > + put_online_cpus(); > + > + return count; > +} So we care about concurrent writing to that file? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified 2019-04-15 15:57 ` Peter Zijlstra @ 2019-04-15 17:20 ` Stephane Eranian 0 siblings, 0 replies; 13+ messages in thread From: Stephane Eranian @ 2019-04-15 17:20 UTC (permalink / raw) To: Peter Zijlstra Cc: LKML, Thomas Gleixner, Andi Kleen, Liang, Kan, mingo, nelson.dsouza, Jiri Olsa, tonyj On Mon, Apr 15, 2019 at 8:57 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Apr 08, 2019 at 10:32:52AM -0700, Stephane Eranian wrote: > > +static ssize_t set_sysctl_tfa(struct device *cdev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + bool val; > > + ssize_t ret; > > + > > + ret = kstrtobool(buf, &val); > > + if (ret) > > + return ret; > > + > > + /* no change */ > > + if (val == allow_tsx_force_abort) > > + return count; > > + > > + allow_tsx_force_abort = val; > > + > > + get_online_cpus(); > > + on_each_cpu(update_tfa_sched, NULL, 1); > > + put_online_cpus(); > > + > > + return count; > > +} > > So we care about concurrent writing to that file? Not likely but we care about seeing the effects on event scheduling before the sysctl returns. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified 2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian 2019-04-15 15:57 ` Peter Zijlstra @ 2019-04-16 11:33 ` tip-bot for Stephane Eranian 2019-04-16 16:28 ` Vince Weaver 1 sibling, 1 reply; 13+ messages in thread From: tip-bot for Stephane Eranian @ 2019-04-16 11:33 UTC (permalink / raw) To: linux-tip-commits Cc: acme, vincent.weaver, eranian, mingo, alexander.shishkin, peterz, tglx, torvalds, linux-kernel, jolsa, hpa Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 Author: Stephane Eranian <eranian@google.com> AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 16 Apr 2019 12:19:35 +0200 perf/x86/intel: Force resched when TFA sysctl is modified This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU event is using PMC3 when the echo command returns. Vice-Versa, when TFA is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing). $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches 1.000123979 125,768,725,208 branches 1.000562520 125,631,000,456 branches 1.000942898 125,487,114,291 branches 1.001333316 125,323,363,620 branches 2.004721306 125,514,968,546 branches 2.005114560 125,511,110,861 branches 2.005482722 125,510,132,724 branches 2.005851245 125,508,967,086 branches 3.006323475 125,166,570,648 branches 3.006709247 125,165,650,056 branches 3.007086605 125,164,639,142 branches 3.007459298 125,164,402,912 branches 4.007922698 125,045,577,140 branches 4.008310775 125,046,804,324 branches 4.008670814 125,048,265,111 branches 4.009039251 125,048,677,611 branches 5.009503373 125,122,240,217 branches 5.009897067 125,122,450,517 branches Then on another connection, sysadmin does: $ echo 1 >/sys/devices/cpu/allow_tsx_force_abort Then perf stat adjusts the events immediately: 5.010286029 125,121,393,483 branches 5.010646308 125,120,556,786 branches 6.011113588 124,963,351,832 branches 6.011510331 124,964,267,566 branches 6.011889913 124,964,829,130 branches 6.012262996 124,965,841,156 branches 7.012708299 124,419,832,234 branches [79.69%] 7.012847908 124,416,363,853 branches [79.73%] 7.013225462 124,400,723,712 branches [79.73%] 7.013598191 124,376,154,434 branches [79.70%] 8.014089834 124,250,862,693 branches [74.98%] 8.014481363 124,267,539,139 branches [74.94%] 8.014856006 124,259,519,786 branches [74.98%] 8.014980848 124,225,457,969 branches [75.04%] 9.015464576 124,204,235,423 branches [75.03%] 9.015858587 124,204,988,490 branches [75.04%] 9.016243680 124,220,092,486 branches [74.99%] 9.016620104 124,231,260,146 branches [74.94%] And vice-versa if the syadmin does: $ echo 0 >/sys/devices/cpu/allow_tsx_force_abort Events are again spread over the 4 counters: 10.017096277 124,276,230,565 branches [74.96%] 10.017237209 124,228,062,171 branches [75.03%] 10.017478637 124,178,780,626 branches [75.03%] 10.017853402 124,198,316,177 branches [75.03%] 11.018334423 124,602,418,933 branches [85.40%] 11.018722584 124,602,921,320 branches [85.42%] 11.019095621 124,603,956,093 branches [85.42%] 11.019467742 124,595,273,783 branches [85.42%] 12.019945736 125,110,114,864 branches 12.020330764 125,109,334,472 branches 12.020688740 125,109,818,865 branches 12.021054020 125,108,594,014 branches 13.021516774 125,109,164,018 branches 13.021903640 125,108,794,510 branches 13.022270770 125,107,756,978 branches 13.022630819 125,109,380,471 branches 14.023114989 125,133,140,817 branches 14.023501880 125,133,785,858 branches 14.023868339 125,133,852,700 branches Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Cc: nelson.dsouza@intel.com Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190408173252.37932-3-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/events/core.c | 4 ++++ arch/x86/events/intel/core.c | 50 ++++++++++++++++++++++++++++++++++++++++++-- arch/x86/events/perf_event.h | 1 + 3 files changed, 53 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 87b50f4be201..fdd106267fd2 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -661,6 +661,10 @@ static inline int is_x86_event(struct perf_event *event) return event->pmu == &pmu; } +struct pmu *x86_get_pmu(void) +{ + return &pmu; +} /* * Event scheduler state: * diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 1bb59c4c59f2..8265b5026a19 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4156,6 +4156,50 @@ done: return count; } +static void update_tfa_sched(void *ignored) +{ + struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events); + + /* + * check if PMC3 is used + * and if so force schedule out for all event types all contexts + */ + if (test_bit(3, cpuc->active_mask)) + perf_pmu_resched(x86_get_pmu()); +} + +static ssize_t show_sysctl_tfa(struct device *cdev, + struct device_attribute *attr, + char *buf) +{ + return snprintf(buf, 40, "%d\n", allow_tsx_force_abort); +} + +static ssize_t set_sysctl_tfa(struct device *cdev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + bool val; + ssize_t ret; + + ret = kstrtobool(buf, &val); + if (ret) + return ret; + + /* no change */ + if (val == allow_tsx_force_abort) + return count; + + allow_tsx_force_abort = val; + + get_online_cpus(); + on_each_cpu(update_tfa_sched, NULL, 1); + put_online_cpus(); + + return count; +} + + static DEVICE_ATTR_RW(freeze_on_smi); static ssize_t branches_show(struct device *cdev, @@ -4188,7 +4232,9 @@ static struct attribute *intel_pmu_caps_attrs[] = { NULL }; -static DEVICE_BOOL_ATTR(allow_tsx_force_abort, 0644, allow_tsx_force_abort); +static DEVICE_ATTR(allow_tsx_force_abort, 0644, + show_sysctl_tfa, + set_sysctl_tfa); static struct attribute *intel_pmu_attrs[] = { &dev_attr_freeze_on_smi.attr, @@ -4697,7 +4743,7 @@ __init int intel_pmu_init(void) x86_pmu.get_event_constraints = tfa_get_event_constraints; x86_pmu.enable_all = intel_tfa_pmu_enable_all; x86_pmu.commit_scheduling = intel_tfa_commit_scheduling; - intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr.attr; + intel_pmu_attrs[1] = &dev_attr_allow_tsx_force_abort.attr; } pr_cont("Skylake events, "); diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index e544d83ea4b4..9e474a5f3b86 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -713,6 +713,7 @@ static struct perf_pmu_events_ht_attr event_attr_##v = { \ .event_str_ht = ht, \ } +struct pmu *x86_get_pmu(void); extern struct x86_pmu x86_pmu __read_mostly; static inline bool x86_pmu_has_lbr_callstack(void) ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified 2019-04-16 11:33 ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian @ 2019-04-16 16:28 ` Vince Weaver 2019-04-16 17:41 ` Peter Zijlstra 2019-04-17 6:06 ` Ingo Molnar 0 siblings, 2 replies; 13+ messages in thread From: Vince Weaver @ 2019-04-16 16:28 UTC (permalink / raw) To: vincent.weaver, mingo, eranian, peterz, alexander.shishkin, acme, hpa, torvalds, tglx, linux-kernel, jolsa Cc: linux-tip-commits On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote: > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > Author: Stephane Eranian <eranian@google.com> > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700 > Committer: Ingo Molnar <mingo@kernel.org> > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200 > > perf/x86/intel: Force resched when TFA sysctl is modified What's TFA? Tuna-fish-alarm? Nowhere in the commit or in the code does it ever say what a TFA is or why we'd want to resched when it is modified. Vince ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified 2019-04-16 16:28 ` Vince Weaver @ 2019-04-16 17:41 ` Peter Zijlstra 2019-04-17 6:06 ` Ingo Molnar 1 sibling, 0 replies; 13+ messages in thread From: Peter Zijlstra @ 2019-04-16 17:41 UTC (permalink / raw) To: Vince Weaver Cc: mingo, eranian, alexander.shishkin, acme, hpa, torvalds, tglx, linux-kernel, jolsa, linux-tip-commits On Tue, Apr 16, 2019 at 12:28:00PM -0400, Vince Weaver wrote: > On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote: > > > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > Author: Stephane Eranian <eranian@google.com> > > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200 > > > > perf/x86/intel: Force resched when TFA sysctl is modified > > What's TFA? Tuna-fish-alarm? Nowhere in the commit or in the code does > it ever say what a TFA is or why we'd want to resched when it is modified. See commit: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort") Author: Peter Zijlstra (Intel) <peterz@infradead.org> Date: Tue Mar 5 22:23:18 2019 +0100 perf/x86/intel: Implement support for TSX Force Abort Skylake (and later) will receive a microcode update to address a TSX errata. This microcode will, on execution of a TSX instruction (speculative or not) use (clobber) PMC3. This update will also provide a new MSR to change this behaviour along with a CPUID bit to enumerate the presence of this new MSR. When the MSR gets set; the microcode will no longer use PMC3 but will Force Abort every TSX transaction (upon executing COMMIT). When TSX Force Abort (TFA) is allowed (default); the MSR gets set when PMC3 gets scheduled and cleared when, after scheduling, PMC3 is unused. When TFA is not allowed; clear PMC3 from all constraints such that it will not get used. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified 2019-04-16 16:28 ` Vince Weaver 2019-04-16 17:41 ` Peter Zijlstra @ 2019-04-17 6:06 ` Ingo Molnar 2019-04-18 21:50 ` Stephane Eranian 1 sibling, 1 reply; 13+ messages in thread From: Ingo Molnar @ 2019-04-17 6:06 UTC (permalink / raw) To: Vince Weaver Cc: eranian, peterz, alexander.shishkin, acme, hpa, torvalds, tglx, linux-kernel, jolsa, linux-tip-commits * Vince Weaver <vincent.weaver@maine.edu> wrote: > On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote: > > > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > Author: Stephane Eranian <eranian@google.com> > > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200 > > > > perf/x86/intel: Force resched when TFA sysctl is modified > > What's TFA? Tuna-fish-alarm? Heh, I wish! :-) > [...] Nowhere in the commit or in the code does it ever say what a TFA > is or why we'd want to resched when it is modified. Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general dislike to random acrynyms ... Peter and me usually fix such changelog context omissions, but this one slipped through. :-/ The commit is too deep down perf/core already to rebase it just for the changelog, but if we are going to rebase it for some functional reason I'll take care of it next time around. TFA. (Thanks For your Assistance. :-) Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [tip:perf/core] perf/x86/intel: Force resched when TFA sysctl is modified 2019-04-17 6:06 ` Ingo Molnar @ 2019-04-18 21:50 ` Stephane Eranian 0 siblings, 0 replies; 13+ messages in thread From: Stephane Eranian @ 2019-04-18 21:50 UTC (permalink / raw) To: Ingo Molnar Cc: Vince Weaver, Peter Zijlstra, Alexander Shishkin, Arnaldo Carvalho de Melo, H. Peter Anvin, Linus Torvalds, Thomas Gleixner, LKML, Jiri Olsa, linux-tip-commits Vince On Tue, Apr 16, 2019 at 11:06 PM Ingo Molnar <mingo@kernel.org> wrote: > > > * Vince Weaver <vincent.weaver@maine.edu> wrote: > > > On Tue, 16 Apr 2019, tip-bot for Stephane Eranian wrote: > > > > > Commit-ID: f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > > Gitweb: https://git.kernel.org/tip/f447e4eb3ad1e60d173ca997fcb2ef2a66f12574 > > > Author: Stephane Eranian <eranian@google.com> > > > AuthorDate: Mon, 8 Apr 2019 10:32:52 -0700 > > > Committer: Ingo Molnar <mingo@kernel.org> > > > CommitDate: Tue, 16 Apr 2019 12:19:35 +0200 > > > > > > perf/x86/intel: Force resched when TFA sysctl is modified > > > > What's TFA? Tuna-fish-alarm? > > Heh, I wish! :-) > Sorry about the confusion. I was just trying to mimic the function names that Peter used in the code. Hard to fit the whole sysctl name in the title, anyway. > > [...] Nowhere in the commit or in the code does it ever say what a TFA > > is or why we'd want to resched when it is modified. > > Yeah, it's the TSX-Force-Abort acronym - Intel has numbed our general > dislike to random acrynyms ... > > Peter and me usually fix such changelog context omissions, but this one > slipped through. :-/ > > The commit is too deep down perf/core already to rebase it just for the > changelog, but if we are going to rebase it for some functional reason > I'll take care of it next time around. > > TFA. (Thanks For your Assistance. :-) > > Ingo ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-04-18 21:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-08 17:32 [PATCH v2 0/3] perf/x86/intel: force reschedule on TFA changes Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 1/2] perf/core: add perf_ctx_resched() as global function Stephane Eranian 2019-04-15 15:45 ` Peter Zijlstra 2019-04-15 15:53 ` Peter Zijlstra 2019-04-16 11:33 ` [tip:perf/core] perf/core: Add perf_pmu_resched() " tip-bot for Stephane Eranian 2019-04-08 17:32 ` [PATCH v2 2/2] perf/x86/intel: force resched when TFA sysctl is modified Stephane Eranian 2019-04-15 15:57 ` Peter Zijlstra 2019-04-15 17:20 ` Stephane Eranian 2019-04-16 11:33 ` [tip:perf/core] perf/x86/intel: Force " tip-bot for Stephane Eranian 2019-04-16 16:28 ` Vince Weaver 2019-04-16 17:41 ` Peter Zijlstra 2019-04-17 6:06 ` Ingo Molnar 2019-04-18 21:50 ` Stephane Eranian
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).