KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v9 00/27] x86: load FPU registers on return to userland
@ 2019-04-03 16:41 Sebastian Andrzej Siewior
  2019-04-03 16:41 ` [PATCH 01/27] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
                   ` (29 more replies)
  0 siblings, 30 replies; 83+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-03 16:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, Andy Lutomirski, Paolo Bonzini, Radim Krčmář,
	kvm, Jason A. Donenfeld, Rik van Riel, Dave Hansen

This is a refurbished series originally started by by Rik van Riel. The
goal is load the FPU registers on return to userland and not on every
context switch. By this optimisation we can:
- avoid loading the registers if the task stays in kernel and does
  not return to userland
- make kernel_fpu_begin() cheaper: it only saves the registers on the
  first invocation. The second invocation does not need save them again.

To access the FPU registers in kernel we need:
- disable preemption to avoid that the scheduler switches tasks. By
  doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be
  not valid.
- disable BH because the softirq might use kernel_fpu_begin() and then
  set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion.

v8…v9:
Rebased on top of v5.1-rc3 plus:
- using the proper register pointer in __fpu__restore_sig() for the
  fsave case.

- dropped a WARN_ON() from switch_fpu_finish() if the PKRU state is
  missing. On a preemptible kernel this warning will trigger if the
  task gets preempted in fpstate_init() during memset() and
  fpstate_init_xstate().

- Adding a fastpath to __fpu__restore_sig() and
  copy_fpstate_to_sigframe() save/restore FPU state to/from userland.
  Thomas Gleixner asked for this. The series first does the slowpath and
  then the last few patches add/enable the fast path. This ensures that
  the slow works correctly (during a bisect) and then it should be
  rarely used.

Some numbers generated by 
     taskset 2 lat_sig -P 1 -W 64 -N 5000 catch

v5.1-rc3
 catch 64
 Signal handler overhead: 1.7531 microseconds
 Signal handler overhead: 1.7547 microseconds
 Signal handler overhead: 1.7362 microseconds
 catch 32
 Signal handler overhead: 2.4044 microseconds
 Signal handler overhead: 2.3732 microseconds
 Signal handler overhead: 2.3979 microseconds

up to x86/fpu: Defer FPU state load until return to userspace
 catch 64
 Signal handler overhead: 2.7556 microseconds
 Signal handler overhead: 2.7878 microseconds
 Signal handler overhead: 2.7664 microseconds
 catch 32
 Signal handler overhead: 2.5961 microseconds
 Signal handler overhead: 2.5916 microseconds
 Signal handler overhead: 2.6091 microseconds

up to x86/fpu: Add a fastpath to copy_fpstate_to_sigframe()
 catch 64
 Signal handler overhead: 1.8573 microseconds
 Signal handler overhead: 1.8469 microseconds
 Signal handler overhead: 1.8554 microseconds
 catch 32
 Signal handler overhead: 2.4875 microseconds
 Signal handler overhead: 2.4822 microseconds
 Signal handler overhead: 2.4806 microseconds
 ----
complete series
 catch 64
 Signal handler overhead: 1.8463 microseconds
 Signal handler overhead: 1.8431 microseconds
 Signal handler overhead: 1.8609 microseconds
 catch 32
 Signal handler overhead: 2.4907 microseconds
 Signal handler overhead: 2.4771 microseconds
 Signal handler overhead: 2.4939 microseconds

v7…v8:
Rebased on top of v5.1-rc1. And then:
- Remove a WARN_ON() in switch_fpu_finish. Turns out it can trigger
  during a preemption while the xstate is initialized.

- Provide __read_pkru_ins() and __write_pkru_ins which are symmetrical.
  __write_pkru() does the "write only if the value is different from
  current" check.

- The kernel threads now load `init_pkru_value' instead of `0' for the
  PKRU value.

- The PKRU value is also written into its xsave area of init_fpstate.

v6…v7:
Rebased on top of v5.0-rc7 and addressed Borislav Petkov's review.

v5…v6:
Rebased on top of v5.0-rc1. Integrated a few fixes which I noticed while
looking over the patches, dropped the first few patches which were
already applied.

v4…v5:
Rebased on top of a fix, noticed a problem with XSAVES and then redid
the restore on sig return (patch #26 to #28).

I don't like very much the sig save+restore thing that we are doing. It
has been always like that. I *think* that this is just because we have
nowhere to stash the FPU state while we are handling the signal. We
could add another fpu->state for the signal handler and avoid the thing.
Debian code-search revealed that `criu' is using it (and I didn't
figure out why). Nothing else (that is packaged in Debian). Maybe we
could get rid of this and if `criu' would then use a dedicated interface
for its needs rather the signal interface that happen to do what it
wants :)

v3…v4:
It has been suggested to remove the `initialized' member of the struct
fpu because it should not required be needed with lazy-FPU-restore and
would make the review easier. This is the first part of the series, the
second is basically the rebase of the v3 queue. As a result, the
diffstat became negative (which wasn't the case in previous version) :)
I tried to incorporate all the review comments that came up, some of
them were "outdated" after the removal of the `initialized' member. I'm
sorry should I missed any.

v1…v3:
v2 was never posted. I followed the idea to completely decouple PKRU
from xstate. This didn't quite work and made a few things complicated. 
One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU
state needs to be fiddled into xstate. This required another
xfeatures_mask so that the sanity checks were performed and
xstate_offsets would be computed. Additionally ptrace also reads/sets
xstate in order to get/set the register and PKRU is one of them. So this
would need some fiddle, too.
In v3 I dropped that decouple idea. I also learned that the wrpkru
instruction is not privileged and so caching it in kernel does not work.
Instead I keep PKRU in xstate area and load it at context switch time
while the remaining registers are deferred (until return to userland).
The offset of PKRU within xstate is enumerated at boot time so why not
use it.

The following changes since commit 79a3aaa7b82e3106be97842dedfd8429248896e6:

  Linux 5.1-rc3 (2019-03-31 14:39:29 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git x86_fpu_rtu_v9

for you to fetch changes up to c5265b05d167b3be19c787ebb913dc138485fa25:

  x86/pkeys: add PKRU value to init_fpstate (2019-04-03 18:19:10 +0200)

----------------------------------------------------------------
Rik van Riel (5):
      x86/fpu: Add (__)make_fpregs_active helpers
      x86/fpu: Eager switch PKRU state
      x86/fpu: Always store the registers in copy_fpstate_to_sigframe()
      x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD
      x86/fpu: Defer FPU state load until return to userspace

Sebastian Andrzej Siewior (22):
      x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()
      x86/fpu: Remove fpu__restore()
      x86/fpu: Remove preempt_disable() in fpu__clear()
      x86/fpu: Always init the `state' in fpu__clear()
      x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe()
      x86/fpu: Don't save fxregs for ia32 frames in copy_fpstate_to_sigframe()
      x86/fpu: Remove fpu->initialized
      x86/fpu: Remove user_fpu_begin()
      x86/fpu: Make __raw_xsave_addr() use feature number instead of mask
      x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask
      x86/pkru: Provide .*_pkru_ins() functions
      x86/fpu: Only write PKRU if it is different from current
      x86/pkeys: Don't check if PKRU is zero before writting it
      x86/entry: Add TIF_NEED_FPU_LOAD
      x86/fpu: Update xstate's PKRU value on write_pkru()
      x86/fpu: Inline copy_user_to_fpregs_zeroing()
      x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory
      x86/fpu: Merge the two code paths in __fpu__restore_sig()
      x86/fpu: Add a fastpath to __fpu__restore_sig()
      x86/fpu: Add a fastpath to copy_fpstate_to_sigframe()
      x86/fpu: Restore FPU register in copy_fpstate_to_sigframe() in order to use the fastpath
      x86/pkeys: add PKRU value to init_fpstate

 Documentation/preempt-locking.txt    |   1 -
 arch/x86/entry/common.c              |   8 ++
 arch/x86/ia32/ia32_signal.c          |  17 ++-
 arch/x86/include/asm/fpu/api.h       |  31 ++++++
 arch/x86/include/asm/fpu/internal.h  | 130 ++++++++++++++++------
 arch/x86/include/asm/fpu/signal.h    |   2 +-
 arch/x86/include/asm/fpu/types.h     |   9 --
 arch/x86/include/asm/fpu/xstate.h    |   5 +-
 arch/x86/include/asm/pgtable.h       |  28 ++++-
 arch/x86/include/asm/special_insns.h |  18 +++-
 arch/x86/include/asm/thread_info.h   |   2 +
 arch/x86/include/asm/trace/fpu.h     |   8 +-
 arch/x86/kernel/cpu/common.c         |   5 +
 arch/x86/kernel/fpu/core.c           | 194 ++++++++++++++++-----------------
 arch/x86/kernel/fpu/init.c           |   2 -
 arch/x86/kernel/fpu/regset.c         |  24 ++---
 arch/x86/kernel/fpu/signal.c         | 202 +++++++++++++++++++++--------------
 arch/x86/kernel/fpu/xstate.c         |  42 ++++----
 arch/x86/kernel/process.c            |   2 +-
 arch/x86/kernel/process_32.c         |  11 +-
 arch/x86/kernel/process_64.c         |  11 +-
 arch/x86/kernel/signal.c             |  17 ++-
 arch/x86/kernel/traps.c              |   2 +-
 arch/x86/kvm/vmx/vmx.c               |   2 +-
 arch/x86/kvm/x86.c                   |  48 +++++----
 arch/x86/math-emu/fpu_entry.c        |   3 -
 arch/x86/mm/mpx.c                    |   6 +-
 arch/x86/mm/pkeys.c                  |  21 ++--
 28 files changed, 496 insertions(+), 355 deletions(-)

Sebastian

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, back to index

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-03 16:41 [PATCH v9 00/27] x86: load FPU registers on return to userland Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 01/27] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() Sebastian Andrzej Siewior
2019-04-13 20:46   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 02/27] x86/fpu: Remove fpu__restore() Sebastian Andrzej Siewior
2019-04-13 20:47   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 03/27] x86/fpu: Remove preempt_disable() in fpu__clear() Sebastian Andrzej Siewior
2019-04-13 20:48   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 04/27] x86/fpu: Always init the `state' " Sebastian Andrzej Siewior
2019-04-13 20:48   ` [tip:x86/fpu] x86/fpu: Always init the state " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 05/27] x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2019-04-13 20:49   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 06/27] x86/fpu: Don't save fxregs for ia32 frames " Sebastian Andrzej Siewior
2019-04-13 20:50   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 07/27] x86/fpu: Remove fpu->initialized Sebastian Andrzej Siewior
2019-04-13 20:50   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 08/27] x86/fpu: Remove user_fpu_begin() Sebastian Andrzej Siewior
2019-04-13 20:51   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 09/27] x86/fpu: Add (__)make_fpregs_active helpers Sebastian Andrzej Siewior
2019-04-13 20:52   ` [tip:x86/fpu] x86/fpu: Add an __fpregs_load_activate() internal helper tip-bot for Rik van Riel
2019-04-03 16:41 ` [PATCH 10/27] x86/fpu: Make __raw_xsave_addr() use feature number instead of mask Sebastian Andrzej Siewior
2019-04-13 20:52   ` [tip:x86/fpu] x86/fpu: Make __raw_xsave_addr() use a " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 11/27] x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use " Sebastian Andrzej Siewior
2019-04-13 20:53   ` [tip:x86/fpu] x86/fpu: Use a feature number instead of mask in two more helpers tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 12/27] x86/pkru: Provide .*_pkru_ins() functions Sebastian Andrzej Siewior
2019-04-10 16:36   ` Borislav Petkov
2019-04-10 16:52     ` Borislav Petkov
2019-04-10 21:25       ` Sebastian Andrzej Siewior
2019-04-10 21:29         ` Dave Hansen
2019-04-11 13:24           ` Borislav Petkov
2019-04-13 20:54   ` [tip:x86/fpu] x86/pkeys: Provide *pkru() helpers tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 13/27] x86/fpu: Only write PKRU if it is different from current Sebastian Andrzej Siewior
2019-04-13 20:55   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 14/27] x86/pkeys: Don't check if PKRU is zero before writting it Sebastian Andrzej Siewior
2019-04-13 20:55   ` [tip:x86/fpu] x86/pkeys: Don't check if PKRU is zero before writing it tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 15/27] x86/fpu: Eager switch PKRU state Sebastian Andrzej Siewior
2019-04-13 20:56   ` [tip:x86/fpu] " tip-bot for Rik van Riel
2019-04-03 16:41 ` [PATCH 16/27] x86/entry: Add TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2019-04-13 20:57   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 17/27] x86/fpu: Always store the registers in copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2019-04-13 20:57   ` [tip:x86/fpu] " tip-bot for Rik van Riel
2019-04-03 16:41 ` [PATCH 18/27] x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD Sebastian Andrzej Siewior
2019-04-13 20:58   ` [tip:x86/fpu] " tip-bot for Rik van Riel
2019-04-03 16:41 ` [PATCH 19/27] x86/fpu: Update xstate's PKRU value on write_pkru() Sebastian Andrzej Siewior
2019-04-08 18:14   ` Dave Hansen
2019-04-08 20:03     ` Sebastian Andrzej Siewior
2019-04-13 20:59   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 20/27] x86/fpu: Inline copy_user_to_fpregs_zeroing() Sebastian Andrzej Siewior
2019-04-13 21:00   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 21/27] x86/fpu: Let __fpu__restore_sig() restore the !32bit+fxsr frame from kernel memory Sebastian Andrzej Siewior
2019-04-13 21:00   ` [tip:x86/fpu] x86/fpu: Restore from kernel memory on the 64-bit path too tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 22/27] x86/fpu: Merge the two code paths in __fpu__restore_sig() Sebastian Andrzej Siewior
2019-04-13 21:01   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 23/27] x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior
2019-04-12 14:36   ` Borislav Petkov
2019-04-12 15:24     ` Sebastian Andrzej Siewior
2019-04-12 16:22       ` Borislav Petkov
2019-04-12 16:37         ` Sebastian Andrzej Siewior
2019-04-12 16:48           ` Borislav Petkov
2019-04-12 17:19             ` Sebastian Andrzej Siewior
2019-04-12 17:29               ` Borislav Petkov
2019-04-15  9:14                 ` Sebastian Andrzej Siewior
2019-04-13 21:02   ` [tip:x86/fpu] " tip-bot for Rik van Riel
2019-04-03 16:41 ` [PATCH 24/27] x86/fpu: Add a fastpath to __fpu__restore_sig() Sebastian Andrzej Siewior
2019-04-08 17:05   ` Thomas Gleixner
2019-04-08 20:02     ` Sebastian Andrzej Siewior
2019-04-09  7:27       ` Thomas Gleixner
2019-04-12 17:17   ` Borislav Petkov
2019-04-12 17:27     ` Sebastian Andrzej Siewior
2019-04-13 21:02   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 25/27] x86/fpu: Add a fastpath to copy_fpstate_to_sigframe() Sebastian Andrzej Siewior
2019-04-13 21:03   ` [tip:x86/fpu] " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 26/27] x86/fpu: Restore FPU register in copy_fpstate_to_sigframe() in order to use the fastpath Sebastian Andrzej Siewior
2019-04-13 21:04   ` [tip:x86/fpu] x86/fpu: Restore regs " tip-bot for Sebastian Andrzej Siewior
2019-04-03 16:41 ` [PATCH 27/27] x86/pkeys: add PKRU value to init_fpstate Sebastian Andrzej Siewior
2019-04-13 21:05   ` [tip:x86/fpu] x86/pkeys: Add " tip-bot for Sebastian Andrzej Siewior
2019-04-04 14:01 ` [PATCH v9 00/27] x86: load FPU registers on return to userland David Laight
2019-04-04 14:14   ` 'Sebastian Andrzej Siewior'
2019-04-04 14:26     ` Andy Lutomirski
2019-04-04 14:31       ` Sebastian Andrzej Siewior
2019-04-04 15:10       ` David Laight
2019-04-08 17:08 ` Thomas Gleixner
2019-04-12 18:30 ` Borislav Petkov
2019-04-15  8:58   ` Sebastian Andrzej Siewior

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git