linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	tonyj@suse.com, nelson.dsouza@intel.com
Subject: Re: [PATCH 1/8] perf/x86/intel: Fix memory corruption
Date: Tue, 19 Mar 2019 19:20:41 +0100	[thread overview]
Message-ID: <20190319182041.GO5996@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBQapZFCyJNRjP-GUjfOy62=rJU=Fv=cYPxPTnVFBkqtVA@mail.gmail.com>

On Tue, Mar 19, 2019 at 10:52:01AM -0700, Stephane Eranian wrote:
> > Not quite; the control on its own doesn't directly write the MSR. And
> > even when the work-around is allowed, we'll not set the MSR unless there
> > is also demand for PMC3.
> >
> Trying to understand this better here. When the workaround is enabled
> (tfa=0), you lose PMC3 and transactions operate normally.

> When it is disabled (tfa=1), transactions are all aborted and PMC3 is
> available.

Right, but we don't expose tfa.

> If you are saying that when there is a PMU event requesting PMC3, then
> you need PMC3 avail, so you set the MSR so that tfa=1 forcing all
> transactions to abort.

Right, so when allow_tfa=1 (default), we only set tfa=1 when PMC3 is
requested.

This has the advantage that,

  TSX only workload -> works
  perf 4 counteres -> works

Only when you need both of them, do you get 'trouble'.

> But in that case, you are modifying the execution of the workload when
> you are monitoring it, assuming it uses TSX.

We assume you are not in fact using TSX, not a lot of code does. If you
do use TSX a lot, and you don't want to interfere, you have to set
allow_tfa=0 and live with one counter less.

Any which way around you turn this stone, it sucks.

> You want lowest overhead and no modifications to how the workload
> operates, otherwise how representative is the data you are collecting?

Sure; but there are no good choices here. This 'fix' will break
something. We figured TSX+4-counter-perf was the least common scenario.

We konw of people that rely on 4 counter being present; you want to
explain to them how when doing an update their program suddently doesn't
work anymore?

Or you want to default to tfa=1; but then you have to explain to those
people relying on TSX why their workload stopped working.

> I understand that there is no impact on apps not using TSX, well,
> except on context switch where you have to toggle that MSR.

There is no additional code in the context switch; only the perf event
scheduling code prods at the MSR.

> But for workloads using TSX, there is potentially an impact.

Yes, well, if you're a TSX _and_ perf user, you now have an extra knob
to play with; that's not something I can do anything about. We're forced
to make a choice here.

> > Yeah, meh. You're admin, you can 'fix' it. In practise I don't expect
> > most people to care about the knob, and the few people that do, should
> > be able to make it work.
> 
> I don't understand how this can work reliably.

> You have a knob to toggle that MSR.

No, we don't have this knob.

> Then, you have another one inside perf_events

Only this knob exists allow_tfa.

> and then the sysadmin has to make sure nobody (incl. NMI watchdog) is
> using the PMU when this all happens.

You're very unlucky if the watchdog runs on PMC3, normally it runs on
Fixed1 or something. Esp early after boot. (Remember, we schedule fixed
counters first, and then general purpose counters, with a preference for
lower counters).

Anyway, you can trivially switch it off if you want.

> How can this be a practical solution? Am I missing something here?

It works just fine; it is unfortunate that we have this interaction but
that's not something we can do anything about. We're forced to deal with
this.

But if you're a TSX+perf user, have your boot scripts do:

  echo 0 > /sys/bus/event_source/devices/cpu/allow_tsx_force_abort

and you'll not use PMC3 and TSX will be 'awesome'. If you don't give a
crap about TSX (most people), just boot and be happy.

If you do care about TSX+perf and want to dynamically toggle for some
reason, you just have to be a little careful.

  reply	other threads:[~2019-03-19 18:20 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-14 13:01 [RFC][PATCH 0/8] perf/x86: event scheduling cleanups Peter Zijlstra
2019-03-14 13:01 ` [PATCH 1/8] perf/x86/intel: Fix memory corruption Peter Zijlstra
2019-03-15 11:29   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2019-03-19  6:29   ` [PATCH 1/8] " Stephane Eranian
2019-03-19 11:05     ` Peter Zijlstra
2019-03-19 17:52       ` Stephane Eranian
2019-03-19 18:20         ` Peter Zijlstra [this message]
2019-03-20 20:47           ` Stephane Eranian
2019-03-20 20:52             ` Stephane Eranian
2019-03-20 22:22             ` Peter Zijlstra
2019-03-21 12:38               ` Peter Zijlstra
2019-03-21 16:45                 ` Thomas Gleixner
2019-03-21 17:10                   ` Peter Zijlstra
2019-03-21 17:17                     ` Thomas Gleixner
2019-03-21 18:20                       ` Peter Zijlstra
2019-03-21 19:42                         ` Tony Jones
2019-03-21 19:47                           ` DSouza, Nelson
2019-03-21 20:07                             ` Peter Zijlstra
2019-03-21 23:16                               ` DSouza, Nelson
2019-03-22 22:14                                 ` DSouza, Nelson
2019-03-21 17:23                   ` Stephane Eranian
2019-03-21 17:51                     ` Thomas Gleixner
2019-03-22 19:04                       ` Stephane Eranian
2019-04-03  7:32                         ` Peter Zijlstra
2019-04-03 10:40                 ` [tip:perf/urgent] perf/x86/intel: Initialize TFA MSR tip-bot for Peter Zijlstra
2019-04-03 11:30                   ` Thomas Gleixner
2019-04-03 12:23                     ` Vince Weaver
2019-03-14 13:01 ` [RFC][PATCH 2/8] perf/x86/intel: Simplify intel_tfa_commit_scheduling() Peter Zijlstra
2019-03-14 13:01 ` [RFC][PATCH 3/8] perf/x86: Simplify x86_pmu.get_constraints() interface Peter Zijlstra
2019-03-19 21:21   ` Stephane Eranian
2019-03-14 13:01 ` [RFC][PATCH 4/8] perf/x86: Remove PERF_X86_EVENT_COMMITTED Peter Zijlstra
2019-03-19 20:48   ` Stephane Eranian
2019-03-19 21:00     ` Peter Zijlstra
2019-03-20 13:14       ` Peter Zijlstra
2019-03-20 12:23     ` Peter Zijlstra
2019-03-14 13:01 ` [RFC][PATCH 5/8] perf/x86/intel: Optimize intel_get_excl_constraints() Peter Zijlstra
2019-03-19 23:43   ` Stephane Eranian
2019-03-14 13:01 ` [RFC][PATCH 6/8] perf/x86: Clear ->event_constraint[] on put Peter Zijlstra
2019-03-19 21:50   ` Stephane Eranian
2019-03-20 12:25     ` Peter Zijlstra
2019-03-14 13:01 ` [RFC][PATCH 7/8] perf/x86: Optimize x86_schedule_events() Peter Zijlstra
2019-03-19 23:55   ` Stephane Eranian
2019-03-20 13:11     ` Peter Zijlstra
2019-03-20 19:30       ` Stephane Eranian
2019-03-14 13:01 ` [RFC][PATCH 8/8] perf/x86: Add sanity checks to x86_schedule_events() Peter Zijlstra
2019-03-15  7:15 ` [RFC][PATCH 0/8] perf/x86: event scheduling cleanups Stephane Eranian
2019-03-15  7:15   ` Stephane Eranian
2019-03-15  8:01     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190319182041.GO5996@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=nelson.dsouza@intel.com \
    --cc=tonyj@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).