From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
x86@kernel.org, Linus Torvalds <torvalds@linux-foundation.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
Josh Poimboeuf <jpoimboe@kernel.org>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Johannes Wikner <kwikner@ethz.ch>,
Alyssa Milburn <alyssa.milburn@linux.intel.com>,
Jann Horn <jannh@google.com>, "H.J. Lu" <hjl.tools@gmail.com>,
Joao Moreira <joao.moreira@intel.com>,
Joseph Nuzman <joseph.nuzman@intel.com>,
Steven Rostedt <rostedt@goodmis.org>,
Juergen Gross <jgross@suse.com>,
Masami Hiramatsu <mhiramat@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Eric Dumazet <edumazet@google.com>
Subject: [PATCH v3 00/59] x86/retbleed: Call depth tracking mitigation
Date: Thu, 15 Sep 2022 13:10:39 +0200 [thread overview]
Message-ID: <20220915111039.092790446@infradead.org> (raw)
Hi!
Previous postings:
v2: https://lkml.kernel.org/r/20220902130625.217071627@infradead.org
v1: https://lkml.kernel.org/r/20220716230344.239749011@linutronix.de
Changes since v2 are minimal; I reworked the alignment thing per Linus'
request (patch #8) and collected a few tags.
Barring great objections I'm hoping to merge this soon so we can all get on
with other things.
--- text from v2 ---
This version is significantly different from the last in that it no longer
makes use of external call thunks allocated from the module space. Instead
every function gets aligned to 16 bytes and gets 16 bytes of (pre-symbol)
padding. (This padding will also come in handy for other things, like the
kCFI/FineIBT work.)
Prior to these patches function alignment is basically non-existent, as such
any instruction fetch for the first instructions of a function will have (on
average) half the fetch window filled with whatever comes before. By pushing
the alignment up to 16 bytes this improves matters for chips that happen to
have a 16 byte i-fetch window size (Intel) while not making matters worse for
chips that have a larger 32 byte i-fetch window (AMD Zen). In fact, it improves
the worst case for Zen from 31 bytes of garbage to 16 bytes of garbage.
As such the first many patches of the series fix up lots of alignment quirks.
The second big difference is the introduction of struct pcpu_hot. Because the
compiler managed to place two adjacent (in code) DEFINE_PER_CPU() variables in
random cachelines (it is absolutely free to do so) the introduction of the
per-cpu x86_call_depth variable sometimes introduced significant additional
cache pressure, while other times it would sit nicely in the same line with
preempt_count and not show up at all.
In order to alleviate this problem; introduce struct pcpu_hot and collect a
number of hot per-cpu variables in a way the compiler can't mess up.
Since these changes are 'unconditional', Mel was gracious enough to help test
this on his test setup across all the relevant uarchs (very much including both
Intel and AMD machines) and found that while these changes cause some very
small wins and losses across the board it is mostly noise.
Aside from these changes; the core of the depth tracking is still the same.
- objtool creates a list of functions and a list of function call sites.
- for every function the padding is overwritten with the call accounting
thunk; for every call site the call target is adjusted to point to this
thunk.
- the retbleed return thunk mechanism is used for a custom return thunk
that includes return accounting and does RSB stuffing when required.
This ensures no new compiler is required and avoids almost all overhead for
non affected machines. This new option can still be selected using:
"retbleed=stuff"
on the kernel command line.
As a refresher; the theory behind call depth tracking is:
The Return-Stack-Buffer (RSB) is a 16 deep stack that is filled on every call.
On the return path speculation will "pop" an entry and takes that as the return
target. Once the RSB is empty, the CPU falls back to other predictors, e.g. the
Branch History Buffer, which can be mistrained by user space and misguides the
(return) speculation path to a disclosure gadget of your choice -- as described
in the retbleed paper.
Call depth tracking is designed to break this speculation path by stuffing
speculation trap calls into the RSB whenver the RSB is running low. This way
the speculation stalls and never falls back to other predictors.
The assumption is that stuffing at the 12th return is sufficient to break the
speculation before it hits the underflow and the fallback to the other
predictors. Testing confirms that it works. Johannes, one of the retbleed
researchers, tried to attack this approach and confirmed that it brings the
signal to noise ratio down to the crystal ball level.
Excerpts from IBRS vs stuff from Mel's testing:
perfsyscall
6.0.0-rc1 6.0.0-rc1
tglx-mit-spectre-ibrs tglx-mit-spectre-retpoline-retstuff
Duration User 136.16 69.10
Duration System 100.50 33.04
Duration Elapsed 237.20 102.65
That's a massive improvement with a major reduction in system CPU usage.
Kernel compilation is variable. Skylake-X was modest with 2-18% gain depending
on degree of parallelisation.
Git checkouts are roughly 14% faster on Skylake-X
Network test were localhost only so are limited but even so, the gain is
large. Skylake-X again;
Netperf-TCP
6.0.0-rc1 6.0.0-rc1
tglx-mit-spectre-ibrs tglx-mit-spectre-retpoline-retstuff
Hmean send-64 241.39 ( 0.00%) 298.00 * 23.45%*
Hmean send-128 489.55 ( 0.00%) 610.46 * 24.70%*
Hmean send-256 990.85 ( 0.00%) 1201.73 * 21.28%*
Hmean send-1024 4051.84 ( 0.00%) 5006.19 * 23.55%*
Hmean send-2048 7924.75 ( 0.00%) 9777.14 * 23.37%*
Hmean send-3312 12319.98 ( 0.00%) 15210.07 * 23.46%*
Hmean send-4096 14770.62 ( 0.00%) 17941.32 * 21.47%*
Hmean send-8192 26302.00 ( 0.00%) 30170.04 * 14.71%*
Hmean send-16384 42449.51 ( 0.00%) 48036.45 * 13.16%*
While this is UDP_STREAM, TCP_STREAM is similarly impressive.
FIO measurements done by Tim Chen:
read (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 357.33 3.79 0.00% 6.14% 98.93%
retbleed=off 336.67 6.43 -5.78% 0.00% 99.01%
retbleed=ibrs 242.00 0.00 -32.28% -28.12% 99.41%
retbleed=stuff (pad) 314.33 1.53 -12.03% -6.63% 99.31%
read/write Baseline Baseline
70/30 (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 349.00 5.29 0.00% 9.06% 96.66%
retbleed=off 320.00 5.05 -8.31% 0.00% 95.54%
retbleed=ibrs 238.60 0.17 -31.63% -25.44% 98.18%
retbleed=stuff (pad) 293.37 0.81 -15.94% -8.32% 97.71%
Baseline Baseline
write (kIOPs) Mean stdev mitigations=off retbleed=off CPU util
================================================================================
mitigations=off 296.33 8.08 0.00% 6.21% 93.96%
retbleed=off 279.00 2.65 -5.85% 0.00% 93.63%
retbleed=ibrs 230.33 0.58 -22.27% -17.44% 95.92%
retbleed=stuff (pad) 266.67 1.53 -10.01% -4.42% 94.75%
---
The patches can also be found in git here:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git call-depth-tracking
next reply other threads:[~2022-09-15 11:42 UTC|newest]
Thread overview: 138+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-15 11:10 Peter Zijlstra [this message]
2022-09-15 11:10 ` [PATCH v3 01/59] x86/paravirt: Ensure proper alignment Peter Zijlstra
2022-09-21 11:08 ` [tip: x86/paravirt] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 02/59] x86/cpu: Remove segment load from switch_to_new_gdt() Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 03/59] x86/cpu: Get rid of redundant switch_to_new_gdt() invocations Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 04/59] x86/cpu: Re-enable stackprotector Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 05/59] x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc() Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 06/59] x86/vdso: Ensure all kernel code is seen by objtool Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 07/59] x86: Sanitize linker script Peter Zijlstra
2022-10-07 16:03 ` Borislav Petkov
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 08/59] arch: Introduce CONFIG_FUNCTION_ALIGNMENT Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:10 ` [PATCH v3 09/59] x86/asm: Differentiate between code and function alignment Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 10/59] x86/error_inject: Align function properly Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:10 ` [PATCH v3 11/59] x86/paravirt: Properly align PV functions Peter Zijlstra
2022-09-15 14:34 ` Juergen Gross
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 12/59] x86/entry: Align SYM_CODE_START() variants Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 13/59] crypto: x86/camellia: Remove redundant alignments Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 14/59] crypto: x86/cast5: " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 15/59] crypto: x86/crct10dif-pcl: " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 16/59] crypto: x86/serpent: " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 17/59] crypto: x86/sha1: Remove custom alignments Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 18/59] crypto: x86/sha256: " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 19/59] crypto: x86/sm[34]: Remove redundant alignments Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:10 ` [PATCH v3 20/59] crypto: twofish: " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 21/59] crypto: x86/poly1305: Remove custom function alignment Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 22/59] x86: Put hot per CPU variables into a struct Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 23/59] x86/percpu: Move preempt_count next to current_task Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 24/59] x86/percpu: Move cpu_number " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 25/59] x86/percpu: Move current_top_of_stack " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 26/59] x86/percpu: Move irq_stack variables " Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 27/59] x86/softirq: Move softirq pending next to current task Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 28/59] objtool: Allow !PC relative relocations Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 29/59] objtool: Track init section Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 30/59] objtool: Add .call_sites section Peter Zijlstra
2022-10-17 14:54 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 31/59] objtool: Add --hacks=skylake Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 32/59] objtool: Allow STT_NOTYPE -> STT_FUNC+0 tail-calls Peter Zijlstra
2022-09-22 5:27 ` Pawan Gupta
2022-09-22 10:29 ` Peter Zijlstra
2022-09-22 10:47 ` Peter Zijlstra
2022-09-22 13:15 ` Peter Zijlstra
2022-09-23 14:35 ` Peter Zijlstra
2022-09-23 17:36 ` Pawan Gupta
2022-09-15 11:11 ` [PATCH v3 33/59] objtool: Fix find_{symbol,func}_containing() Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 34/59] objtool: Allow symbol range comparisons for IBT/ENDBR Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 35/59] x86/entry: Make sync_regs() invocation a tail call Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 36/59] ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 37/59] x86/putuser: Provide room for padding Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 38/59] x86/Kconfig: Add CONFIG_CALL_THUNKS Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 39/59] x86/Kconfig: Introduce function padding Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 40/59] x86/retbleed: Add X86_FEATURE_CALL_DEPTH Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 41/59] x86/alternatives: Provide text_poke_copy_locked() Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 42/59] x86/entry: Make some entry symbols global Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 43/59] x86/paravirt: Make struct paravirt_call_site unconditionally available Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 44/59] x86/callthunks: Add call patching for call depth tracking Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 45/59] x86/modules: Add call patching Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 46/59] x86/returnthunk: Allow different return thunks Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 47/59] x86/asm: Provide ALTERNATIVE_3 Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 48/59] x86/retbleed: Add SKL return thunk Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-10-20 23:10 ` [PATCH v3 48/59] " Nathan Chancellor
2022-10-21 9:53 ` Peter Zijlstra
2022-10-21 15:21 ` Nathan Chancellor
2022-11-03 22:53 ` KVM vs AMD: " Andrew Cooper
2022-11-04 12:44 ` Peter Zijlstra
2022-11-04 15:29 ` Andrew Cooper
2022-11-04 15:32 ` Nathan Chancellor
2022-11-07 9:37 ` Paolo Bonzini
2022-09-15 11:11 ` [PATCH v3 49/59] x86/retpoline: Add SKL retthunk retpolines Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 50/59] x86/retbleed: Add SKL call thunk Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 51/59] x86/calldepth: Add ret/call counting for debug Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2022-09-15 11:11 ` [PATCH v3 52/59] static_call: Add call depth tracking support Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 53/59] kallsyms: Take callthunks into account Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 54/59] x86/orc: Make it callthunk aware Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 55/59] x86/bpf: Emit call depth accounting if required Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-01-05 21:49 ` [PATCH v3 55/59] " Joan Bruguera
2022-09-15 11:11 ` [PATCH v3 56/59] x86/ftrace: Remove ftrace_epilogue() Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-10-20 15:17 ` [tip: x86/urgent] " tip-bot2 for Peter Zijlstra
2022-12-09 15:41 ` [PATCH v3 56/59] " Steven Rostedt
2022-09-15 11:11 ` [PATCH v3 57/59] x86/ftrace: Rebalance RSB Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 58/59] x86/ftrace: Make it call depth tracking aware Peter Zijlstra
2022-09-21 10:19 ` [PATCH v3.1 " Peter Zijlstra
2022-09-21 18:45 ` Pawan Gupta
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Peter Zijlstra
2022-09-15 11:11 ` [PATCH v3 59/59] x86/retbleed: Add call depth tracking mitigation Peter Zijlstra
2022-10-17 14:53 ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220915111039.092790446@infradead.org \
--to=peterz@infradead.org \
--cc=Andrew.Cooper3@citrix.com \
--cc=alyssa.milburn@linux.intel.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=hjl.tools@gmail.com \
--cc=jannh@google.com \
--cc=jgross@suse.com \
--cc=joao.moreira@intel.com \
--cc=joseph.nuzman@intel.com \
--cc=jpoimboe@kernel.org \
--cc=kprateek.nayak@amd.com \
--cc=kwikner@ethz.ch \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.