[V2,00/15] x86: Enable FSGSBASE instructions
mbox series

Message ID 1527789525-8857-1-git-send-email-chang.seok.bae@intel.com
Headers show
Series
  • x86: Enable FSGSBASE instructions
Related show

Message

Bae, Chang Seok May 31, 2018, 5:58 p.m. UTC
FSGSBASE is 64-bit instruction set to allow read/write
FS/GS base from any privilege. As introduced from
Ivybridge, enabling effort has been revolving quite long
[2,3,4] for various reasons. After extended discussions [1],
this patchset is proposed to introduce new ABIs of
customizing FS/GS base (separate from its selector).

FSGSBASE-enabled VM can be located on hosts with
either HW virtualization or SW emulation. KVM advertises
FSGSBASE when physical CPU has and emulation is
supported in QEMU/TCG [5]. In a pool of mixed systems, VMM
may disable FSGSBASE for seamless VM migrations [6].

A couple of major benefits are expected. Kernel will have
performance improvement in context switch by skipping MSR
write for GS base. User-level programs (such as JAVA-based)
benefit from avoiding system calls to edit FS/GS base.

Changes when FSGSBASE enabled:
(1) In context switch, a thread's FS/GS base is secured
regardless of its selector base on the discussion [1].
(2) (Subsequently) ptracer should expect divergence of FS/GS
index and base values. There was controveral debate on
the concerns with backward incompatibility with that.
(Cases for GDB than other toolchains [7,8]) Current
patchset as baseline version does not contain support
for the backward compatibility
(3) On paranoid entry, GS base is updated to per_CPU base
and the original base is restored at the exit

Updates from V1 [9]:
* (3), instead of comparing current and kernel GS bases
* (2); does not include ptracer backward compatibility
patches.
* With (3), CPU number is stored as early as possible
(before hitting IST stack) than at vDSO initialization.
* Include FSGSBASE documentation and enumerating capability
for user space
* Add TAINT_INSECURE flag and vDSO cleanup for the (CPU
number) initialization

[1] Recent discussion on LKML:
https://marc.info/?t=150147053700001&r=1&w=2
[2] Andy Lutomirski’s rebase work :
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fsgsbase
[3] Patch set shown in year 2016:
https://marc.info/?t=145857711900001&r=1&w=2
[4] First patch set: https://lkml.org/lkml/2015/4/10/573
[5] QEMU with FSGSBASE emulation:
https://github.com/qemu/qemu/blob/026aaf47c02b79036feb830206cfebb2a726510d/target/i386/translate.c#L8186
[6] 5-level EPT:
http://lkml.kernel.org/r/9ddf602b-6c8b-8c1e-ab46-07ed12366593@redhat.com
[7] RR/FSGSBASE:
https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
[8] CRIU/FSGSBASE:
https://lists.openvz.org/pipermail/criu/2018-March/040654.html
[9] V1:
https://lkml.org/lkml/2018/3/19/1699

Andi Kleen (3):
  x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions
  x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
  x86/fsgsbase/64: Add documentation for FSGSBASE

Andy Lutomirski (4):
  x86/fsgsbase/64: Make ptrace read FS/GS base accurately
  x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
  x86/fsgsbase/64: Preserve FS/GS state in __switch_to if FSGSBASE is on
  x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit

Chang S. Bae (8):
  x86/fsgsbase/64: Introduce FS/GS base helper functions
  x86/fsgsbase/64: Use FS/GS base helpers in core dump
  x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to
  x86/vdso: Move out the CPU number store
  taint: Add taint for insecure
  x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
  x86/fsgsbase/64: When copying a thread, use FSGSBASE if enabled
  x86/fsgsbase/64: Use per-CPU base as GS base on paranoid_entry

 Documentation/admin-guide/kernel-parameters.txt |   2 +
 Documentation/sysctl/kernel.txt                 |   1 +
 Documentation/x86/entry_64.txt                  |   9 +
 Documentation/x86/fsgs.txt                      | 104 +++++++++
 arch/x86/entry/entry_64.S                       |  74 +++++--
 arch/x86/entry/vdso/vgetcpu.c                   |   2 +-
 arch/x86/entry/vdso/vma.c                       |  38 +---
 arch/x86/include/asm/elf.h                      |   6 +-
 arch/x86/include/asm/fsgsbase.h                 | 171 +++++++++++++++
 arch/x86/include/asm/inst.h                     |  15 ++
 arch/x86/include/asm/segment.h                  |   4 +
 arch/x86/include/asm/vgtod.h                    |   2 -
 arch/x86/include/uapi/asm/hwcap2.h              |   3 +
 arch/x86/kernel/cpu/common.c                    |  39 ++++
 arch/x86/kernel/process_64.c                    | 274 ++++++++++++++++++++----
 arch/x86/kernel/ptrace.c                        |  28 +--
 arch/x86/kernel/setup_percpu.c                  |  30 ++-
 include/linux/kernel.h                          |   3 +-
 kernel/panic.c                                  |   1 +
 19 files changed, 681 insertions(+), 125 deletions(-)
 create mode 100644 Documentation/x86/fsgs.txt
 create mode 100644 arch/x86/include/asm/fsgsbase.h

Comments

Andy Lutomirski May 31, 2018, 8:37 p.m. UTC | #1
On Thu, May 31, 2018 at 10:58 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> FSGSBASE is 64-bit instruction set to allow read/write
> FS/GS base from any privilege. As introduced from
> Ivybridge, enabling effort has been revolving quite long
> [2,3,4] for various reasons. After extended discussions [1],
> this patchset is proposed to introduce new ABIs of
> customizing FS/GS base (separate from its selector).


Thanks!

I have two general comments:

1. Can you try and generate a new version of patches 1-5 quickly?  I
think it would be nice to get them merged this cycle.

2. I spoke to hpa, and he said that, after further investigation of
how gdb works, a command like 'p $gs = 0x7' results in
PTRACE_POKEUSER.  He further suggested that it would therefore be
reasonable to have POKEUSER on gs refresh gsindex (assuming the poked
value is nonzero, sigh) and to make PTRACE_SETREGS iterate over the
registers in reverse order so that it behaves sanely.  Is this indeed
the case?

3. The ptrace behavior is sufficiently subtle that I think it needs a
test case.  Can you add a new selftest (or extend the existing
fsgsbase selftest) to do something like this:

 - Create an LDT entry in slot zero with base == 1.
 - Read out the hwcap bit indicating whether we have the new instructions on.
 - MOV 0x7 to %gs and use ptrace to read gsbase.  Confirm that the result is 1.
 - MOV 0x7 to %gs, do wrgsbase to change the base to 2 (if supported),
and use ptrace to read gsbase.  Confirm that the result is 2.
 - Same as previous test, but with 0x0 instead of 0x7.
 - Allocate a TLS segment with base == 3.  Load it into %gs. Use
ptrace to read gsbase.  Confirm that the result is 3.
 - Use ptrace to toggle %gs (using POKEUSER) back and forth between
0x0, 0x7, and the TLS segment.  In each case, immediately use ptrace
to read the base and confirm that you get the expected result.  Then
resume the tracee and read the base directly, confirming that you get
the expected result.
 - Use PTRACE_SETREGS to load gs = 0, gsbase = 4.  Confirm that
GETREGS returns those values back and confirm that they are in fact
loaded into the tracee.
 - Use PTRACE_SETREGS to load gs = 0x7, gsbase = 4.  Confirm that
GETREGS returns those values back and confirm that they have the
expected values (which will depend on the hwcap bit).  Also confirm
that the expected values are loaded into the tracee.

Does this seem reasonable?  The mov_ss_trap testcase has a nice bit of
code you can borrow to invoke ptrace operations on yourself.
Bae, Chang Seok May 31, 2018, 9:11 p.m. UTC | #2
> 1. Can you try and generate a new version of patches 1-5 quickly?  I think it would be nice to get them merged this cycle.

Okay, let me reply, first, what I can do now. Will do submit patches with revisions, (as soon as I can).