All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/2] x86/speculation: Add finer control for when to issue IBPB
@ 2021-04-29  8:44 Anand K Mistry
  2021-04-29  8:44 ` [RFC PATCH v2 1/2] x86/speculation: Allow per-process control of " Anand K Mistry
  2021-04-29  8:44 ` [RFC PATCH v2 2/2] selftests: Benchmark for the cost of disabling IB speculation Anand K Mistry
  0 siblings, 2 replies; 5+ messages in thread
From: Anand K Mistry @ 2021-04-29  8:44 UTC (permalink / raw)
  To: x86
  Cc: joelaf, asteinhauser, bp, tglx, Anand K Mistry, Andy Lutomirski,
	Ben Segall, Catalin Marinas, Chang S. Bae,
	Daniel Bristot de Oliveira, Dave Hansen, Dietmar Eggemann,
	Fenghua Yu, Gabriel Krisman Bertazi, H. Peter Anvin, Ingo Molnar,
	Jay Lang, Jens Axboe, Juri Lelli, Kees Cook, Lai Jiangshan,
	Mel Gorman, Mike Rapoport, Oleg Nesterov, Peter Collingbourne,
	Peter Zijlstra, Shuah Khan, Steven Rostedt, Tony Luck,
	Vincent Guittot, linux-kernel, linux-kselftest


It is documented in Documentation/admin-guide/hw-vuln/spectre.rst, that
disabling indirect branch speculation for a user-space process creates
more overhead and cause it to run slower. The performance hit varies by
CPU, but on the AMD A4-9120C and A6-9220C CPUs, a simple ping-pong using
pipes between two processes runs ~10x slower when disabling IB
speculation.

Patch 2, included in this RFC but not intended for commit, is a simple
program that demonstrates this issue. Running on a A4-9120C without IB
speculation disabled, each process ping-pong takes ~7us:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1936300, iter/sec: 135383, us/iter: 7

But when IB speculation is disabled, that number increases
significantly:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 16384, t: 1500518, iter/sec: 10918, us/iter: 91

Although this test is a worst-case scenario, we can also consider a real
situation: an audio server (i.e. pulse). If we imagine a low-latency
capture, with 10ms packets and a concurrent task on the same CPU (i.e.
video encoding, for a video call), the audio server will preempt the
CPU at a rate of 100HZ. At 91us overhead per preemption (switching to
and from the audio process), that's 0.9% overhead for one process doing
preemption. In real-world testing (on a A4-9120C), I've seen 9% of CPU
used by IBPB when doing a 2-person video call.

With this patch, the number of IBPBs issued can be reduced to the
minimum necessary, only when there's a potential attacker->victim
process switch.

Running on the same A4-9120C device, this patch reduces the performance
hit of IBPB by ~half, as expected:
localhost ~ # taskset 1 /usr/local/bin/test ds
...
iters: 32768, t: 1824043, iter/sec: 17964, us/iter: 55

It should be noted, CPUs from multiple vendors experience a performance
hit due to IBPB. I also tested a Intel i3-8130U which sees a noticable
(~2x) increase in process switch time due to IBPB.
IB spec enabled:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1210821us, iter/sec: 216501, us/iter: 4

IB spec disabled:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 131072, t: 1257583us, iter/sec: 104225, us/iter: 9

Open questions:
- There are a significant number of task flags, which also now reaches the
  limit of the 'long' on 32-bit systems. Should the 'mode' flags be
  stored somewhere else?
- Having x86-specific flags in linux/sched.h feels wrong. However, this
  is the mechanism for doing atomic flag updates. Is there an alternate
  approach?

Open tasks:
- Documentation
- Naming


Changes in v2:
- Make flag per-process using prctl().

Anand K Mistry (2):
  x86/speculation: Allow per-process control of when to issue IBPB
  selftests: Benchmark for the cost of disabling IB speculation

 arch/x86/include/asm/thread_info.h            |   4 +
 arch/x86/kernel/cpu/bugs.c                    |  56 +++++++++
 arch/x86/kernel/process.c                     |  10 ++
 arch/x86/mm/tlb.c                             |  51 ++++++--
 include/linux/sched.h                         |  10 ++
 include/uapi/linux/prctl.h                    |   5 +
 .../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
 7 files changed, 236 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c

-- 
2.31.1.498.g6c1eba8ee3d-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-05-11  8:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-29  8:44 [RFC PATCH v2 0/2] x86/speculation: Add finer control for when to issue IBPB Anand K Mistry
2021-04-29  8:44 ` [RFC PATCH v2 1/2] x86/speculation: Allow per-process control of " Anand K Mistry
2021-05-03  8:48   ` Thomas Gleixner
2021-05-11  8:39     ` Anand K. Mistry
2021-04-29  8:44 ` [RFC PATCH v2 2/2] selftests: Benchmark for the cost of disabling IB speculation Anand K Mistry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.