linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Nadav Amit <namit@vmware.com>
Cc: Ingo Molnar <mingo@redhat.com>, Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"H . Peter Anvin " <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, Nadav Amit <nadav.amit@gmail.com>,
	x86@kernel.org, Borislav Petkov <bp@alien8.de>,
	David Woodhouse <dwmw@amazon.co.uk>
Subject: Re: [RFC PATCH 0/5] x86: dynamic indirect call promotion
Date: Wed, 28 Nov 2018 10:08:49 -0600	[thread overview]
Message-ID: <20181128160849.epmoto4o5jaxxxol@treble> (raw)
In-Reply-To: <20181018005420.82993-1-namit@vmware.com>

On Wed, Oct 17, 2018 at 05:54:15PM -0700, Nadav Amit wrote:
> This RFC introduces indirect call promotion in runtime, which for the
> matter of simplification (and branding) will be called here "relpolines"
> (relative call + trampoline). Relpolines are mainly intended as a way
> of reducing retpoline overheads due to Spectre v2.
> 
> Unlike indirect call promotion through profile guided optimization, the
> proposed approach does not require a profiling stage, works well with
> modules whose address is unknown and can adapt to changing workloads.
> 
> The main idea is simple: for every indirect call, we inject a piece of
> code with fast- and slow-path calls. The fast path is used if the target
> matches the expected (hot) target. The slow-path uses a retpoline.
> During training, the slow-path is set to call a function that saves the
> call source and target in a hash-table and keep count for call
> frequency. The most common target is then patched into the hot path.
> 
> The patching is done on-the-fly by patching the conditional branch
> (opcode and offset) that is used to compare the target to the hot
> target. This allows to direct all cores to the fast-path, while patching
> the slow-path and vice-versa. Patching follows 2 more rules: (1) Only
> patch a single byte when the code might be executed by any core. (2)
> When patching more than one byte, ensure that all cores do not run the
> to-be-patched-code by preventing this code from being preempted, and
> using synchronize_sched() after patching the branch that jumps over this
> code.
> 
> Changing all the indirect calls to use relpolines is done using assembly
> macro magic. There are alternative solutions, but this one is
> relatively simple and transparent. There is also logic to retrain the
> software predictor, but the policy it uses may need to be refined.
> 
> Eventually the results are not bad (2 VCPU VM, throughput reported):
> 
> 		base		relpoline
> 		----		---------
> nginx 	22898 		25178 (+10%)
> redis-ycsb	24523		25486 (+4%)
> dbench	2144		2103 (+2%)
> 
> When retpolines are disabled, and if retraining is off, performance
> benefits are up to 2% (nginx), but are much less impressive.

Hi Nadav,

Peter pointed me to these patches during a discussion about retpoline
profiling.  Personally, I think this is brilliant.  This could help
networking and filesystem intensive workloads a lot.

Some high-level comments:

- "Relpoline" looks confusingly a lot like "retpoline".  How about
  "optpoline"?  To avoid confusing myself I will hereafter refer to it
  as such :-)

- Instead of patching one byte at a time, is there a reason why
  text_poke_bp() can't be used?  That would greatly simplify the
  patching process, as everything could be patched in a single step.

- In many cases, a single direct call may not be sufficient, as there
  could be for example multiple tasks using different network protocols
  which need different callbacks for the same call site.

- I'm not sure about the periodic retraining logic, it seems a bit
  nondeterministic and bursty.
  
So I'd propose the following changes:

- In the optpoline, reserve space for multiple (5 or so) comparisons and
  direct calls.  Maybe the number of reserved cmp/jne/call slots can be
  tweaked by the caller somehow.  Or maybe it could grow as needed.
  Starting out, they would just be NOPs.

- Instead of the temporary learning mode, add permanent tracking to
  detect a direct call "miss" -- i.e., when none of the existing direct
  calls are applicable and the retpoline will be used.

- In the case of a miss (or N misses), it could trigger a direct call
  patching operation to be run later (workqueue or syscall exit).  If
  all the direct call slots are full, it could patch the least recently
  modified one.  If this causes thrashing (>x changes over y time), it
  could increase the number of direct call slots using a trampoline.
  Even if there were several slots, CPU branch prediction would
  presumably help make it much faster than a basic retpoline.

Thoughts?

-- 
Josh

  parent reply	other threads:[~2018-11-28 16:09 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18  0:54 [RFC PATCH 0/5] x86: dynamic indirect call promotion Nadav Amit
2018-10-18  0:54 ` [RFC PATCH 1/5] x86: introduce preemption disable prefix Nadav Amit
2018-10-18  1:22   ` Andy Lutomirski
2018-10-18  3:12     ` Nadav Amit
2018-10-18  3:26       ` Nadav Amit
2018-10-18  3:51       ` Andy Lutomirski
2018-10-18 16:47         ` Nadav Amit
2018-10-18 17:00           ` Andy Lutomirski
2018-10-18 17:25             ` Nadav Amit
2018-10-18 17:29               ` Andy Lutomirski
2018-10-18 17:42                 ` Nadav Amit
2018-10-19  1:08             ` Nadav Amit
2018-10-19  4:29               ` Andy Lutomirski
2018-10-19  4:44                 ` Nadav Amit
2018-10-20  1:22                   ` Masami Hiramatsu
2018-10-19  5:00                 ` Alexei Starovoitov
2018-10-19  8:22                   ` Peter Zijlstra
2018-10-19 14:47                     ` Alexei Starovoitov
2018-10-19  8:19                 ` Peter Zijlstra
2018-10-19 10:38                 ` Oleg Nesterov
2018-10-19  8:33               ` Peter Zijlstra
2018-10-19 14:29                 ` Andy Lutomirski
2018-11-29  9:46                   ` Peter Zijlstra
2018-10-18  7:54     ` Peter Zijlstra
2018-10-18 18:14       ` Nadav Amit
2018-10-18  0:54 ` [RFC PATCH 2/5] x86: patch indirect branch promotion Nadav Amit
2018-10-18  0:54 ` [RFC PATCH 3/5] x86: interface for accessing indirect branch locations Nadav Amit
2018-10-18  0:54 ` [RFC PATCH 4/5] x86: learning and patching indirect branch targets Nadav Amit
2018-10-18  0:54 ` [RFC PATCH 5/5] x86: relpoline: disabling interface Nadav Amit
2018-10-23 18:36 ` [RFC PATCH 0/5] x86: dynamic indirect call promotion Dave Hansen
2018-10-23 20:32   ` Nadav Amit
2018-10-23 20:37     ` Dave Hansen
2018-11-28 16:08 ` Josh Poimboeuf [this message]
2018-11-28 19:34   ` Nadav Amit
2018-11-29  0:38     ` Josh Poimboeuf
2018-11-29  1:40       ` Andy Lutomirski
2018-11-29  2:06         ` Nadav Amit
2018-11-29  3:24           ` Andy Lutomirski
2018-11-29  4:36             ` Josh Poimboeuf
2018-11-29  6:06             ` Andy Lutomirski
2018-11-29 15:19               ` Josh Poimboeuf
2018-12-01  6:52                 ` Nadav Amit
2018-12-01 14:25                   ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181128160849.epmoto4o5jaxxxol@treble \
    --to=jpoimboe@redhat.com \
    --cc=bp@alien8.de \
    --cc=dwmw@amazon.co.uk \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).