Kernel-hardening archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/4] Paravirtualized Control Register pinning
@ 2020-06-17 19:07 John Andersen
  2020-06-17 19:07 ` [PATCH 1/4] X86: Update mmu_cr4_features during feature identification John Andersen
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: John Andersen @ 2020-06-17 19:07 UTC (permalink / raw)
  To: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, john.s.andersen, arjan, caoj.fnst, bhe, nivedita,
	keescook, dan.j.williams, eric.auger, aaronlewis, peterx,
	makarandsonare, linux-doc, linux-kernel, kvm, linux-kselftest,
	kernel-hardening

The paravirtualized CR pinning patchset is a strengthened version of
existing control registers pinning for paravritualized guests. It
protects KVM guests from ROP based attacks which attempt to disable key
security features. Virtualized Linux guests such as Kata Containers, AWS
Lambda, and Chromos Termina will get this protection enabled by default
when they update their kernel / configs. Using virtualization allows us
to provide a stronger version of a proven exploit mitigation technique.

We’ve patched KVM to create 6 new KVM specific MSRs used to query which
bits may be pinned, and to set which bits are pinned high or low in
control registers 0 and 4. Linux guest support was added so that
non-kexec guests will be able to take advantage of this strengthened
protection by default. A plan for enabling guests with kexec is proposed
in this cover letter. As part of that plan, we add a command line flag
that allows users to opt-in to the protection on boot if they have kexec
built into their kernel, effectively opting out of kexec support.
Hibernation and suspend to ram were enabled by updating the location
where bits in control register 4 were saved to and restored from. The
work also includes minor patches for QEMU to ensure reboot works by
clearing the added MSRs and exposing the new CPUID feature bit. There is
one SMM related selftest added in this patchset and another patch for
kvm-unit-tests that will be sent separately.

Thank you to Sean and Drew who reviewed v2, to Boris, Paolo, Andy, and
Liran who reviewed v1, and to Sean, Dave, Kristen, and Rick who've
provided feedback throughout. I appreciate your time spent reviewing and
feedback.

Here are the previous RFC versions of this patchset for reference

RFC v2: https://lkml.org/lkml/2020/2/18/1162
RFC v1: https://lkml.org/lkml/2019/12/24/380



=== High level overview of the changes ===

- A CPUID feature bit as well as MSRs were added to KVM. Guests can use
  the CPUID feature bit to determine if MSRs are available. Reading the
  first 2 MSRs returns the bits which may be pinned for CR0/4
  respectively. The next 4 MSRs are writeable and allow the guest and
  host userspace to set which bits are pinned low or pinned high for
  CR0/4.

- Hibernation and suspend-to-RAM are supported. This was done by
  updating mmu_cr4_features on feature identification of the boot CPU.
  As such, mmu_cr4_features is no longer read only after init.

- CPU hotplug is supported. Pinning is per vCPU. When running as a guest
  pinning is requested after CPU identification for non-boot CPUs. The
  boot CPU requests pinning a directly after existing pinning is setup.

- Nested virtualization is supported. SVM / VMX restore pinned bits on
  VM-Exit if they had been unset in the host VMCB / VMCS.

- As suggested by Sean, unpinning of pinned bits on return from SMM due
  to modification of SMRAM will cause an unhandleable emulation fault
  resulting in termination of the guest.

- kexec support is still pending, since the plan is a bit long it's been
  moved to the end of the cover letter. It talks about the decision to
  make a command line parameter, why we opt in to pinning (and
  effectively out of kexec). Being that those changes wouldn't be
  localized to KVM (and this patchset is on top of kvm/next).

- As Paolo requested, a patch will be sent immediately following this
  patchset for kvm-unit-tests with the unit tests for general
  functionality. selftests are included for SMM specific functionality.



=== Chanages since RFCv2 ===

- Related to Drew's comments

  - Used linux/stringify.h in selftests

  - Added comments on why we don't use GUEST_* due to SMM trickiness.
    We opt not to use GUEST_SYNC() because we also have to make a sync call
    from SMM. As such, the address of the ucall struct we'd need to pass isn't
    something we can put into the machine code in a maintainable way. At least
    so far as I could tell.

- Related to Sean's comments

  - Allowed pinning bits high or low rather than just high

  - Cleaner code path for guest and host when writing to MSRs

    - Didn't use read modify write behavior due to that requiring changes to
      selftest save restore code which made me suspect that there might be
      issues with other VMMs. The issue was because we read CR0/4 in the RWM
      operation on the MSR, we must do KVM_SET_REGS before KVM_SET_MSRS, we also
      had to call KVM_SET_SREGS and add checks for if we're in SMM or not. This
      made it a bit more messy overall so I went with the first approach Sean
      suggested where we just have pin high/low semantics.

  - If the guest writes values to the allowed MSRs that are not the correct
    value the wrmsr fails.

  - If SMRAM modification would result in unpinning bits we bail with
    X86EMUL_UNHANDLEABLE

  - Added silent restoration of pinned bits for SVM and VMX when they may have
    been modified in VMCB / VMCS. This didn't seem like a place were we'd want
    to inject a fault, please let me know if we should.



=== Description of changes and rational ===

Paravirtualized Control Register pinning is a strengthened version of
existing protections on the Write Protect, Supervisor Mode Execution /
Access Protection, and User-Mode Instruction Prevention bits. The
existing protections prevent native_write_cr*() functions from writing
values which disable those bits. This patchset prevents any guest
writes to control registers from disabling pinned bits, not just writes
from native_write_cr*(). This stops attackers within the guest from
using ROP to disable protection bits.

https://web.archive.org/web/20171029060939/http://www.blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/

The protection is implemented by adding MSRs to KVM which contain the
bits that are allowed to be pinned, and the bits which are pinned. The
guest or userspace can enable bit pinning by reading MSRs to check
which bits are allowed to be pinned, and then writing MSRs to set which
bits they want pinned.

Other hypervisors such as HyperV have implemented similar protections
for Control Registers and MSRs; which security researchers have found
effective.

https://www.abatchy.com/2018/01/kernel-exploitation-4

We add a CR pin feature bit to the KVM cpuid, read only MSRs which
guests use to identify which bits they may request be pinned, and CR
pinned low/high MSRs which contain the pinned bits. Guests can request
that KVM pin bits within control register 0 or 4 via the CR pinned MSRs.
Writes to the MSRs fail if they include bits that aren't allowed to be
pinned. Host userspace may clear or modify pinned bits at any time. Once
pinned bits are set, the guest may pin more allowed bits, but may never
clear pinned bits.

In the event that the guest vCPU attempts to disable any of the pinned
bits, the vCPU that issued the write is sent a general protection
fault, and the register is left unchanged.

When running with KVM guest support and paravirtualized CR pinning
enabled, paravirtualized and existing pinning are setup at the same
point on the boot CPU. Non-boot CPUs setup pinning upon identification.

Pinning is not active when running in SMM. Entering SMM disables pinned
bits. Writes to control registers within SMM would therefore trigger
general protection faults if pinning was enforced. Upon exit from SMM,
SMRAM is modified to ensure the values of CR0/4 that will be restored
contain the correct values for pinned bits. CR0/4 values are then
restored from SMRAM as usual.

When running with nested virtualization, should pinned bits be cleared
from host VMCS / VMCB, on VM-Exit, they will be silently restored.

Should userspace expose the CR pining CPUID feature bit, it must zero
CR pinned MSRs on reboot. If it does not, it runs the risk of having
the guest enable pinning and subsequently cause general protection
faults on next boot due to early boot code setting control registers to
values which do not contain the pinned bits.

Hibernation to disk and suspend-to-RAM are supported. identify_cpu was
updated to ensure SMEP/SMAP/UMIP are present in mmu_cr4_features. This
is necessary to ensure protections stay active during hibernation image
restoration.

Guests using the kexec system call currently do not support
paravirtualized control register pinning. This is due to early boot
code writing known good values to control registers, these values do
not contain the protected bits. This is due to CPU feature
identification being done at a later time, when the kernel properly
checks if it can enable protections. As such, the pv_cr_pin command
line option has been added which instructs the kernel to disable kexec
in favor of enabling paravirtualized control register pinning.
crashkernel is also disabled when the pv_cr_pin parameter is specified
due to its reliance on kexec.

When we make kexec compatible, we will still need a way for a kernel
with support to know if the kernel it is attempting to load has
support. If a kernel with this enabled attempts to kexec a kernel where
this is not supported, it would trigger a fault almost immediately.

Liran suggested adding a section to the built image acting as a flag to
signify support for being kexec'd by a kernel with pinning enabled.
Should that approach be implemented, it is likely that the command line
flag (pv_cr_pin) would still be desired for some deprecation period. We
wouldn't want the default behavior to change from being able to kexec
older kernels to not being able to, as this might break some users
workflows. Since we require that the user opt-in to break kexec we've
held off on attempting to fix kexec in this patchset. This way no one
sees any behavior they are not explicitly opting in to.

Security conscious kernel configurations disable kexec already, per
KSPP guidelines. Projects such as Kata Containers, AWS Lambda, ChromeOS
Termina, and others using KVM to virtualize Linux will benefit from
this protection without the need to specify pv_cr_pin on the command
line.

Pinning of sensitive CR bits has already been implemented to protect
against exploits directly calling native_write_cr*(). The current
protection cannot stop ROP attacks which jump directly to a MOV CR
instruction. Guests running with paravirtualized CR pinning are now
protected against the use of ROP to disable CR bits. The same bits that
are being pinned natively may be pinned via the CR pinned MSRs. These
bits are WP in CR0, and SMEP, SMAP, and UMIP in CR4.

Future patches could implement similar MSRs to protect bits in MSRs.
The NXE bit of the EFER MSR is a prime candidate.



=== Plan for kexec support ===

Andy's suggestion of a boot option has been incorporated as the
pv_cr_pin command line option. Boris mentioned that short-term
solutions become immutable. However, for the reasons outlined below
we need a way for the user to opt-in to pinning over kexec if both
are compiled in, and the command line parameter seems to be a good
way to do that. Liran's proposed solution of a flag within the ELF
would allow us to identify which kernels have support is assumed to
be implemented in the following scenarios.

We then have the following cases (without the addition of pv_cr_pin):


- Kernel running without pinning enabled kexecs kernel with pinning.

  - Loaded kernel has kexec

    - Do not enable pinning

  - Loaded kernel lacks kexec

    - Enable pinning

- Kernel running with pinning enabled kexecs kernel with pinning (as
  identified by ELF addition).

  - Okay

- Kernel running with pinning enabled kexecs kernel without pinning
  (as identified by lack of ELF addition).

  - User is presented with an error saying that they may not kexec
    a kernel without pinning support.


With the addition of pv_cr_pin we have the following situations:


- Kernel running without pinning enabled kexecs kernel with pinning.

  - Loaded kernel has kexec

    - pv_cr_pin command line parameter present for new kernel

      - Enable pinning

    - pv_cr_pin command line parameter not present for new kernel

      - Do not enable pinning

  - Loaded kernel lacks kexec

    - Enable pinning

- Kernel running with pinning enabled kexecs kernel with pinning (as
  identified by ELF addition).

  - Okay

- Kernel running with kexec and pinning enabled (opt-in via pv_cr_pin)
  kexecs kernel without pinning (as identified by lack of ELF addition).

  - User is presented with an error saying that they have opted
    into pinning support and may not kexec a kernel without pinning
    support.


Without the command line parameter I'm not sure how we could preserve
users workflows which might rely on kexecing older kernels (ones
which wouldn't have support). I see the benefit here being that users
have to opt-in to the possibility of breaking their workflow, via
their addition of the pv_cr_pin command line flag. Which could of
course also be called nokexec. A deprecation period could then be
chosen where eventually pinning takes preference over kexec and users
are presented with the error if they try to kexec an older kernel.
Input on this would be much appreciated, as well as if this is the
best way to handle things or if there's another way that would be
preferred. This is just what we were able to come up with to ensure
users didn't get anything broken they didn't agree to have broken.


Thanks,
John

John Andersen (4):
  X86: Update mmu_cr4_features during feature identification
  KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  selftests: kvm: add test for CR pinning with SMM
  X86: Use KVM CR pin MSRs

 .../admin-guide/kernel-parameters.txt         |  11 +
 Documentation/virt/kvm/msr.rst                |  53 +++++
 arch/x86/Kconfig                              |  10 +
 arch/x86/include/asm/kvm_host.h               |   7 +
 arch/x86/include/asm/kvm_para.h               |  28 +++
 arch/x86/include/uapi/asm/kvm_para.h          |   7 +
 arch/x86/kernel/cpu/common.c                  |  11 +-
 arch/x86/kernel/kvm.c                         |  39 ++++
 arch/x86/kernel/setup.c                       |  12 +-
 arch/x86/kvm/cpuid.c                          |   3 +-
 arch/x86/kvm/emulate.c                        |   3 +-
 arch/x86/kvm/kvm_emulate.h                    |   2 +-
 arch/x86/kvm/svm/nested.c                     |  11 +-
 arch/x86/kvm/vmx/nested.c                     |  10 +-
 arch/x86/kvm/x86.c                            | 106 ++++++++-
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  13 ++
 .../selftests/kvm/x86_64/smm_cr_pin_test.c    | 207 ++++++++++++++++++
 19 files changed, 521 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c


base-commit: 49b3deaad3452217d62dbd78da8df24eb0c7e169
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] X86: Update mmu_cr4_features during feature identification
  2020-06-17 19:07 [PATCH 0/4] Paravirtualized Control Register pinning John Andersen
@ 2020-06-17 19:07 ` John Andersen
  2020-06-18 14:09   ` Dave Hansen
  2020-06-17 19:07 ` [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning John Andersen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: John Andersen @ 2020-06-17 19:07 UTC (permalink / raw)
  To: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, john.s.andersen, arjan, caoj.fnst, bhe, nivedita,
	keescook, dan.j.williams, eric.auger, aaronlewis, peterx,
	makarandsonare, linux-doc, linux-kernel, kvm, linux-kselftest,
	kernel-hardening

In identify_cpu when setting up SMEP/SMAP/UMIP call
cr4_set_bits_and_update_boot instead of cr4_set_bits. This ensures that
mmu_cr4_features contains those bits, and does not disable those
protections when in hibernation asm.

setup_arch updates mmu_cr4_features to save what identified features are
supported for later use in hibernation asm when cr4 needs to be modified
to toggle PGE. cr4 writes happen in restore_image and restore_registers.
setup_arch occurs before identify_cpu, this leads to mmu_cr4_features
not containing some of the cr4 features which were enabled via
identify_cpu when hibernation asm is executed.

On CPU bringup when cr4_set_bits_and_update_boot is called
mmu_cr4_features will now be written to. For the boot CPU,
the __ro_after_init on mmu_cr4_features does not cause a fault. However,
__ro_after_init was removed due to it triggering faults on non-boot
CPUs.

Signed-off-by: John Andersen <john.s.andersen@intel.com>
---
 arch/x86/kernel/cpu/common.c | 6 +++---
 arch/x86/kernel/setup.c      | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d07809286b95..921e67086a00 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -297,7 +297,7 @@ __setup("nosmep", setup_disable_smep);
 static __always_inline void setup_smep(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_SMEP))
-		cr4_set_bits(X86_CR4_SMEP);
+		cr4_set_bits_and_update_boot(X86_CR4_SMEP);
 }
 
 static __init int setup_disable_smap(char *arg)
@@ -316,7 +316,7 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
 
 	if (cpu_has(c, X86_FEATURE_SMAP)) {
 #ifdef CONFIG_X86_SMAP
-		cr4_set_bits(X86_CR4_SMAP);
+		cr4_set_bits_and_update_boot(X86_CR4_SMAP);
 #else
 		cr4_clear_bits(X86_CR4_SMAP);
 #endif
@@ -333,7 +333,7 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	if (!cpu_has(c, X86_FEATURE_UMIP))
 		goto out;
 
-	cr4_set_bits(X86_CR4_UMIP);
+	cr4_set_bits_and_update_boot(X86_CR4_UMIP);
 
 	pr_info_once("x86/cpu: User Mode Instruction Prevention (UMIP) activated\n");
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a3767e74c758..d9c678b37a9b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -138,9 +138,9 @@ EXPORT_SYMBOL(boot_cpu_data);
 
 
 #if !defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
-__visible unsigned long mmu_cr4_features __ro_after_init;
+__visible unsigned long mmu_cr4_features;
 #else
-__visible unsigned long mmu_cr4_features __ro_after_init = X86_CR4_PAE;
+__visible unsigned long mmu_cr4_features = X86_CR4_PAE;
 #endif
 
 /* Boot loader ID and version as integers, for the benefit of proc_dointvec */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-06-17 19:07 [PATCH 0/4] Paravirtualized Control Register pinning John Andersen
  2020-06-17 19:07 ` [PATCH 1/4] X86: Update mmu_cr4_features during feature identification John Andersen
@ 2020-06-17 19:07 ` John Andersen
  2020-06-18 14:18   ` Dave Hansen
  2020-06-17 19:07 ` [PATCH 3/4] selftests: kvm: add test for CR pinning with SMM John Andersen
  2020-06-17 19:07 ` [PATCH 4/4] X86: Use KVM CR pin MSRs John Andersen
  3 siblings, 1 reply; 30+ messages in thread
From: John Andersen @ 2020-06-17 19:07 UTC (permalink / raw)
  To: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, john.s.andersen, arjan, caoj.fnst, bhe, nivedita,
	keescook, dan.j.williams, eric.auger, aaronlewis, peterx,
	makarandsonare, linux-doc, linux-kernel, kvm, linux-kselftest,
	kernel-hardening

Add a CR pin feature bit to the KVM cpuid. Add read only MSRs to KVM
which guests use to identify which bits they may request be pinned. Add
CR pinned MSRs to KVM. Allow guests to request that KVM pin certain
bits within control register 0 or 4 via the CR pinned MSRs. Writes to
the MSRs fail if they include bits which aren't allowed. Host userspace
may clear or modify pinned bits at any time. Once pinned bits are set,
the guest may pin additional allowed bits, but not clear any. Clear
pinning on vCPU reset.

In the event that the guest vCPU attempts to disable any of the pinned
bits, send that vCPU a general protection fault, and leave the register
unchanged. Clear pinning on vCPU reset to avoid faulting non-boot
CPUs when they are disabled and then re-enabled, which is done when
hibernating.

Pinning is not active when running in SMM. Entering SMM disables pinned
bits. Writes to control registers within SMM would therefore trigger
general protection faults if pinning was enforced. Upon exit from SMM,
SMRAM is modified to ensure the values of CR0/4 that will be restored
contain the correct values for pinned bits. CR0/4 values are then
restored from SMRAM as usual.

When running with nested virtualization, should pinned bits be cleared
from host VMCS / VMCB, on VM-Exit, they will be silently restored.

Should userspace expose the CR pinning CPUID feature bit, it must zero
CR pinned MSRs on reboot. If it does not, it runs the risk of having the
guest enable pinning and subsequently cause general protection faults on
next boot due to early boot code setting control registers to values
which do not contain the pinned bits. Userspace is responsible for
migrating the contents of the CR* pinned MSRs. If userspace fails to
migrate the MSRs the protection will no longer be active.

Pinning of sensitive CR bits has already been implemented to protect
against exploits directly calling native_write_cr*(). The current
protection cannot stop ROP attacks which jump directly to a MOV CR
instruction.

https://web.archive.org/web/20171029060939/http://www.blackbunny.io/linux-kernel-x86-64-bypass-smep-kaslr-kptr_restric/

Guests running with paravirtualized CR pinning can now be protected
against the use of ROP to disable CR bits. The same bits that are being
pinned natively may be pinned via the CR pinned MSRs. These bits are WP
in CR0, and SMEP, SMAP, and UMIP in CR4.

Other hypervisors such as HyperV have implemented similar protections
for Control Registers and MSRs; which security researchers have found
effective.

https://www.abatchy.com/2018/01/kernel-exploitation-4

Future work could implement similar MSRs to protect bits elsewhere, such
as MSRs. The NXE bit of the EFER MSR is a prime candidate.

Changes for QEMU are required to expose the CR pin cpuid feature bit. As
well as clear the MSRs on reboot and enable migration.

https://github.com/qemu/qemu/commit/1b26f03653669c97dd8729f9f59be561d68e2b2d
https://github.com/qemu/qemu/commit/3af36d57457892c3088ee88de759d4f258c159a7

Signed-off-by: John Andersen <john.s.andersen@intel.com>
---
 Documentation/virt/kvm/msr.rst       |  53 ++++++++++++++
 arch/x86/include/asm/kvm_host.h      |   7 ++
 arch/x86/include/uapi/asm/kvm_para.h |   7 ++
 arch/x86/kvm/cpuid.c                 |   3 +-
 arch/x86/kvm/emulate.c               |   3 +-
 arch/x86/kvm/kvm_emulate.h           |   2 +-
 arch/x86/kvm/svm/nested.c            |  11 ++-
 arch/x86/kvm/vmx/nested.c            |  10 ++-
 arch/x86/kvm/x86.c                   | 106 ++++++++++++++++++++++++++-
 9 files changed, 193 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index e37a14c323d2..9fa43c4a5895 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -376,3 +376,56 @@ data:
 	write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
 	and check if there are more notifications pending. The MSR is available
 	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
+
+MSR_KVM_CR0_PIN_ALLOWED:
+	0x4b564d08
+MSR_KVM_CR4_PIN_ALLOWED:
+	0x4b564d09
+
+	Read only registers informing the guest which bits may be pinned for
+	each control register respectively via the CR pinned MSRs.
+
+data:
+	Bits which may be pinned.
+
+	Attempting to pin bits other than these will result in a failure when
+	writing to the respective CR pinned MSR.
+
+	Bits which are allowed to be pinned are WP for CR0 and SMEP, SMAP, and
+	UMIP for CR4.
+
+MSR_KVM_CR0_PINNED_LOW:
+	0x4b564d0a
+MSR_KVM_CR0_PINNED_HIGH:
+	0x4b564d0b
+MSR_KVM_CR4_PINNED_LOW:
+	0x4b564d0c
+MSR_KVM_CR4_PINNED_HIGH:
+	0x4b564d0d
+
+	Used to configure pinned bits in control registers
+
+data:
+	Bits to be pinned.
+
+	Fails if data contains bits which are not allowed to be pinned. Or if
+	attempting to pin bits high that are already pinned low, or vice versa.
+	Bits which are allowed to be pinned can be found by reading the CR pin
+	allowed MSRs.
+
+	The MSRs are read/write for host userspace, and write-only for the
+	guest.
+
+	Once set to a non-zero value, the guest cannot clear any of the bits
+	that have been pinned. The guest can pin more bits, so long as those
+	bits appear in the allowed MSR, and are not already pinned to the
+	opposite value.
+
+	Host userspace may clear or change pinned bits at any point. Host
+	userspace must clear pinned bits on reboot.
+
+	The MSR enables bit pinning for control registers. Pinning is active
+	when the guest is not in SMM. If an exit from SMM results in pinned
+	bits becoming unpinned. The guest will exit. If the guest attempts to
+	write values to cr* where bits differ from pinned bits, the write will
+	fail and the guest will be sent a general protection fault.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8998e97457f..962cb48535d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -565,6 +565,11 @@ struct kvm_vcpu_hv {
 	cpumask_t tlb_flush;
 };
 
+struct kvm_vcpu_cr_pinning {
+	unsigned long high;
+	unsigned long low;
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -576,10 +581,12 @@ struct kvm_vcpu_arch {
 
 	unsigned long cr0;
 	unsigned long cr0_guest_owned_bits;
+	struct kvm_vcpu_cr_pinning cr0_pinned;
 	unsigned long cr2;
 	unsigned long cr3;
 	unsigned long cr4;
 	unsigned long cr4_guest_owned_bits;
+	struct kvm_vcpu_cr_pinning cr4_pinned;
 	unsigned long cr8;
 	u32 host_pkru;
 	u32 pkru;
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..91241e0d9691 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -32,6 +32,7 @@
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
 #define KVM_FEATURE_ASYNC_PF_INT	14
+#define KVM_FEATURE_CR_PIN		15
 
 #define KVM_HINTS_REALTIME      0
 
@@ -53,6 +54,12 @@
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
+#define MSR_KVM_CR0_PIN_ALLOWED	0x4b564d08
+#define MSR_KVM_CR4_PIN_ALLOWED	0x4b564d09
+#define MSR_KVM_CR0_PINNED_LOW	0x4b564d0a
+#define MSR_KVM_CR0_PINNED_HIGH	0x4b564d0b
+#define MSR_KVM_CR4_PINNED_LOW	0x4b564d0c
+#define MSR_KVM_CR4_PINNED_HIGH	0x4b564d0d
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..bb0ed645324d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
 			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
-			     (1 << KVM_FEATURE_ASYNC_PF_INT);
+			     (1 << KVM_FEATURE_ASYNC_PF_INT) |
+			     (1 << KVM_FEATURE_CR_PIN);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d0e2825ae617..95780422765b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2685,7 +2685,8 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
 		return X86EMUL_UNHANDLEABLE;
 	}
 
-	ctxt->ops->post_leave_smm(ctxt);
+	if (ctxt->ops->post_leave_smm(ctxt))
+		return X86EMUL_UNHANDLEABLE;
 
 	return X86EMUL_CONTINUE;
 }
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 43c93ffa76ed..e92dd7605e48 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -232,7 +232,7 @@ struct x86_emulate_ops {
 	void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags);
 	int (*pre_leave_smm)(struct x86_emulate_ctxt *ctxt,
 			     const char *smstate);
-	void (*post_leave_smm)(struct x86_emulate_ctxt *ctxt);
+	int (*post_leave_smm)(struct x86_emulate_ctxt *ctxt);
 	int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
 };
 
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 6bceafb19108..245bdff4b052 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -583,8 +583,15 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vmcb->save.idtr = hsave->save.idtr;
 	kvm_set_rflags(&svm->vcpu, hsave->save.rflags);
 	svm_set_efer(&svm->vcpu, hsave->save.efer);
-	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
-	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
+	svm_set_cr0(&svm->vcpu,
+		    (hsave->save.cr0 |
+		    X86_CR0_PE |
+		    svm->vcpu.arch.cr0_pinned.high) &
+		    ~svm->vcpu.arch.cr0_pinned.low);
+	svm_set_cr4(&svm->vcpu,
+		    (hsave->save.cr4 |
+		    svm->vcpu.arch.cr4_pinned.high) &
+		    ~svm->vcpu.arch.cr4_pinned.low);
 	if (npt_enabled) {
 		svm->vmcb->save.cr3 = hsave->save.cr3;
 		svm->vcpu.arch.cr3 = hsave->save.cr3;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index adb11b504d5c..a12bac57b374 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4110,11 +4110,17 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 	 * (KVM doesn't change it);
 	 */
 	vcpu->arch.cr0_guest_owned_bits = X86_CR0_TS;
-	vmx_set_cr0(vcpu, vmcs12->host_cr0);
+	vmx_set_cr0(vcpu,
+		    (vmcs12->host_cr0 |
+		     vcpu->arch.cr0_pinned.high) &
+		    ~vcpu->arch.cr0_pinned.low);
 
 	/* Same as above - no reason to call set_cr4_guest_host_mask().  */
 	vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
-	vmx_set_cr4(vcpu, vmcs12->host_cr4);
+	vmx_set_cr4(vcpu,
+		   (vmcs12->host_cr4 |
+		    vcpu->arch.cr4_pinned.high) &
+		    ~vcpu->arch.cr4_pinned.low);
 
 	nested_ept_uninit_mmu_context(vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00c88c2f34e4..940de9a968bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -772,6 +772,9 @@ bool pdptrs_changed(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(pdptrs_changed);
 
+#define KVM_CR0_PIN_ALLOWED	(X86_CR0_WP)
+#define KVM_CR4_PIN_ALLOWED	(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP)
+
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
 	unsigned long old_cr0 = kvm_read_cr0(vcpu);
@@ -792,6 +795,11 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 	if ((cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PE))
 		return 1;
 
+	if (!is_smm(vcpu) && !is_guest_mode(vcpu) &&
+	    (((cr0 ^ vcpu->arch.cr0_pinned.high) & vcpu->arch.cr0_pinned.high) ||
+	    ((~cr0 ^ vcpu->arch.cr0_pinned.low) & vcpu->arch.cr0_pinned.low)))
+		return 1;
+
 	if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
 #ifdef CONFIG_X86_64
 		if ((vcpu->arch.efer & EFER_LME)) {
@@ -972,6 +980,11 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 	if (kvm_valid_cr4(vcpu, cr4))
 		return 1;
 
+	if (!is_smm(vcpu) && !is_guest_mode(vcpu) &&
+	    (((cr4 ^ vcpu->arch.cr4_pinned.high) & vcpu->arch.cr4_pinned.high) ||
+	    ((~cr4 ^ vcpu->arch.cr4_pinned.low) & vcpu->arch.cr4_pinned.low)))
+		return 1;
+
 	if (is_long_mode(vcpu)) {
 		if (!(cr4 & X86_CR4_PAE))
 			return 1;
@@ -1291,6 +1304,12 @@ static const u32 emulated_msrs_all[] = {
 
 	MSR_K7_HWCR,
 	MSR_KVM_POLL_CONTROL,
+	MSR_KVM_CR0_PIN_ALLOWED,
+	MSR_KVM_CR4_PIN_ALLOWED,
+	MSR_KVM_CR0_PINNED_LOW,
+	MSR_KVM_CR0_PINNED_HIGH,
+	MSR_KVM_CR4_PINNED_LOW,
+	MSR_KVM_CR4_PINNED_HIGH,
 };
 
 static u32 emulated_msrs[ARRAY_SIZE(emulated_msrs_all)];
@@ -2986,6 +3005,54 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		vcpu->arch.msr_kvm_poll_control = data;
 		break;
 
+	case MSR_KVM_CR0_PIN_ALLOWED:
+		return (data != KVM_CR0_PIN_ALLOWED);
+	case MSR_KVM_CR4_PIN_ALLOWED:
+		return (data != KVM_CR4_PIN_ALLOWED);
+	case MSR_KVM_CR0_PINNED_LOW:
+		if ((data & ~KVM_CR0_PIN_ALLOWED) ||
+		    (data & vcpu->arch.cr0_pinned.high))
+			return 1;
+
+		if (!msr_info->host_initiated &&
+		    (~data & vcpu->arch.cr0_pinned.low))
+			return 1;
+
+		vcpu->arch.cr0_pinned.low = data;
+		break;
+	case MSR_KVM_CR0_PINNED_HIGH:
+		if ((data & ~KVM_CR0_PIN_ALLOWED) ||
+		    (data & vcpu->arch.cr0_pinned.low))
+			return 1;
+
+		if (!msr_info->host_initiated &&
+		    (~data & vcpu->arch.cr0_pinned.high))
+			return 1;
+
+		vcpu->arch.cr0_pinned.high = data;
+		break;
+	case MSR_KVM_CR4_PINNED_LOW:
+		if ((data & ~KVM_CR4_PIN_ALLOWED) ||
+		    (data & vcpu->arch.cr4_pinned.high))
+			return 1;
+
+		if (!msr_info->host_initiated &&
+		    (~data & vcpu->arch.cr4_pinned.low))
+			return 1;
+
+		vcpu->arch.cr4_pinned.low = data;
+		break;
+	case MSR_KVM_CR4_PINNED_HIGH:
+		if ((data & ~KVM_CR4_PIN_ALLOWED) ||
+		    (data & vcpu->arch.cr4_pinned.low))
+			return 1;
+
+		if (!msr_info->host_initiated &&
+		    (~data & vcpu->arch.cr4_pinned.high))
+			return 1;
+
+		vcpu->arch.cr4_pinned.high = data;
+		break;
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3250,6 +3317,24 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_POLL_CONTROL:
 		msr_info->data = vcpu->arch.msr_kvm_poll_control;
 		break;
+	case MSR_KVM_CR0_PIN_ALLOWED:
+		msr_info->data = KVM_CR0_PIN_ALLOWED;
+		break;
+	case MSR_KVM_CR4_PIN_ALLOWED:
+		msr_info->data = KVM_CR4_PIN_ALLOWED;
+		break;
+	case MSR_KVM_CR0_PINNED_LOW:
+		msr_info->data = vcpu->arch.cr0_pinned.low;
+		break;
+	case MSR_KVM_CR0_PINNED_HIGH:
+		msr_info->data = vcpu->arch.cr0_pinned.high;
+		break;
+	case MSR_KVM_CR4_PINNED_LOW:
+		msr_info->data = vcpu->arch.cr4_pinned.low;
+		break;
+	case MSR_KVM_CR4_PINNED_HIGH:
+		msr_info->data = vcpu->arch.cr4_pinned.high;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:
@@ -6414,9 +6499,23 @@ static int emulator_pre_leave_smm(struct x86_emulate_ctxt *ctxt,
 	return kvm_x86_ops.pre_leave_smm(emul_to_vcpu(ctxt), smstate);
 }
 
-static void emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt)
+static int emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt)
 {
-	kvm_smm_changed(emul_to_vcpu(ctxt));
+	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+	unsigned long cr0 = kvm_read_cr0(vcpu);
+	unsigned long cr4 = kvm_read_cr4(vcpu);
+
+	if (((cr0 ^ vcpu->arch.cr0_pinned.high) & vcpu->arch.cr0_pinned.high) ||
+	    ((~cr0 ^ vcpu->arch.cr0_pinned.low) & vcpu->arch.cr0_pinned.low))
+		return 1;
+
+	if (((cr4 ^ vcpu->arch.cr4_pinned.high) & vcpu->arch.cr4_pinned.high) ||
+	    ((~cr4 ^ vcpu->arch.cr4_pinned.low) & vcpu->arch.cr4_pinned.low))
+		return 1;
+
+	kvm_smm_changed(vcpu);
+
+	return 0;
 }
 
 static int emulator_set_xcr(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr)
@@ -9640,6 +9739,9 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vcpu->arch.ia32_xss = 0;
 
+	memset(&vcpu->arch.cr0_pinned, 0, sizeof(vcpu->arch.cr0_pinned));
+	memset(&vcpu->arch.cr4_pinned, 0, sizeof(vcpu->arch.cr4_pinned));
+
 	kvm_x86_ops.vcpu_reset(vcpu, init_event);
 }
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/4] selftests: kvm: add test for CR pinning with SMM
  2020-06-17 19:07 [PATCH 0/4] Paravirtualized Control Register pinning John Andersen
  2020-06-17 19:07 ` [PATCH 1/4] X86: Update mmu_cr4_features during feature identification John Andersen
  2020-06-17 19:07 ` [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning John Andersen
@ 2020-06-17 19:07 ` John Andersen
  2020-06-17 19:07 ` [PATCH 4/4] X86: Use KVM CR pin MSRs John Andersen
  3 siblings, 0 replies; 30+ messages in thread
From: John Andersen @ 2020-06-17 19:07 UTC (permalink / raw)
  To: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, john.s.andersen, arjan, caoj.fnst, bhe, nivedita,
	keescook, dan.j.williams, eric.auger, aaronlewis, peterx,
	makarandsonare, linux-doc, linux-kernel, kvm, linux-kselftest,
	kernel-hardening

Check that paravirtualized control register pinning blocks modifications
of pinned CR values stored in SMRAM on exit from SMM.

Signed-off-by: John Andersen <john.s.andersen@intel.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/x86_64/processor.h  |  13 ++
 .../selftests/kvm/x86_64/smm_cr_pin_test.c    | 207 ++++++++++++++++++
 4 files changed, 222 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 452787152748..c0666c3efcbb 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -10,6 +10,7 @@
 /x86_64/platform_info_test
 /x86_64/set_sregs_test
 /x86_64/smm_test
+/x86_64/smm_cr_pin_test
 /x86_64/state_test
 /x86_64/vmx_preemption_timer_test
 /x86_64/svm_vmcall_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4a166588d99f..c5c205637f38 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -45,6 +45,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 += x86_64/platform_info_test
 TEST_GEN_PROGS_x86_64 += x86_64/set_sregs_test
 TEST_GEN_PROGS_x86_64 += x86_64/smm_test
+TEST_GEN_PROGS_x86_64 += x86_64/smm_cr_pin_test
 TEST_GEN_PROGS_x86_64 += x86_64/state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test
 TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..8a2da0449772 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -200,6 +200,11 @@ static inline uint64_t get_cr0(void)
 	return cr0;
 }
 
+static inline void set_cr0(uint64_t val)
+{
+	__asm__ __volatile__("mov %0, %%cr0" : : "r" (val) : "memory");
+}
+
 static inline uint64_t get_cr3(void)
 {
 	uint64_t cr3;
@@ -383,4 +388,12 @@ void kvm_get_cpu_address_width(unsigned int *pa_bits, unsigned int *va_bits);
 /* VMX_EPT_VPID_CAP bits */
 #define VMX_EPT_VPID_CAP_AD_BITS       (1ULL << 21)
 
+/* KVM MSRs */
+#define MSR_KVM_CR0_PIN_ALLOWED	0x4b564d08
+#define MSR_KVM_CR4_PIN_ALLOWED	0x4b564d09
+#define MSR_KVM_CR0_PINNED_LOW	0x4b564d0a
+#define MSR_KVM_CR0_PINNED_HIGH	0x4b564d0b
+#define MSR_KVM_CR4_PINNED_LOW	0x4b564d0c
+#define MSR_KVM_CR4_PINNED_HIGH	0x4b564d0d
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c
new file mode 100644
index 000000000000..a32f577ca1e5
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/smm_cr_pin_test.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Tests for control register pinning not being affected by SMRAM writes.
+ */
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <linux/stringify.h>
+
+#include "test_util.h"
+
+#include "kvm_util.h"
+
+#include "processor.h"
+
+#define VCPU_ID	      1
+
+#define PAGE_SIZE  4096
+
+#define SMRAM_SIZE 65536
+#define SMRAM_MEMSLOT ((1 << 16) | 1)
+#define SMRAM_PAGES (SMRAM_SIZE / PAGE_SIZE)
+#define SMRAM_GPA 0x1000000
+#define SMRAM_STAGE_SUCCESS 0xfe
+#define SMRAM_STAGE_FAILURE 0xfd
+
+#define XSTR(s) __stringify(s)
+
+#define SYNC_PORT 0xe
+#define DONE 0xff
+
+#define CR0_PINNED X86_CR0_WP
+#define CR4_PINNED (X86_CR4_SMAP | X86_CR4_UMIP)
+#define CR4_ALL (CR4_PINNED | X86_CR4_SMEP)
+
+/*
+ * This is compiled as normal 64-bit code, however, SMI handler is executed
+ * in real-address mode. To stay simple we're limiting ourselves to a mode
+ * independent subset of asm here.
+ * SMI handler always report back fixed stage SMRAM_STAGE_SUCCESS.
+ */
+uint8_t smi_handler_success[] = {
+	0xb0, SMRAM_STAGE_SUCCESS,    /* mov $SMRAM_STAGE_SUCCESS, %al */
+	0xe4, SYNC_PORT,              /* in $SYNC_PORT, %al */
+	0x0f, 0xaa,                   /* rsm */
+};
+uint8_t smi_handler_fault[] = {
+	0xb0, SMRAM_STAGE_FAILURE,    /* mov $SMRAM_STAGE_FAILURE, %al */
+	0xe4, SYNC_PORT,              /* in $SYNC_PORT, %al */
+	0x0f, 0xaa,                   /* rsm */
+};
+
+/* We opt not to use GUEST_SYNC() here because we also have to make a sync call
+ * from SMM. As such, the address of the ucall struct we'd need to pass isn't
+ * something we can put into the above machine code in a maintainable way
+ */
+static inline void sync_with_host(uint64_t phase)
+{
+	asm volatile("in $" XSTR(SYNC_PORT)", %%al\n"
+		     : "+a" (phase));
+}
+
+void self_smi(void)
+{
+	wrmsr(APIC_BASE_MSR + (APIC_ICR >> 4),
+	      APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
+}
+
+void guest_code(void)
+{
+	uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
+
+	sync_with_host(1);
+
+	wrmsr(MSR_IA32_APICBASE, apicbase | X2APIC_ENABLE);
+
+	sync_with_host(2);
+
+	set_cr0(get_cr0() | CR0_PINNED);
+
+	wrmsr(MSR_KVM_CR0_PINNED_HIGH, CR0_PINNED);
+
+	sync_with_host(3);
+
+	set_cr4(get_cr4() | CR4_PINNED);
+
+	sync_with_host(4);
+
+	/* Pin SMEP low */
+	wrmsr(MSR_KVM_CR4_PINNED_HIGH, CR4_PINNED);
+
+	sync_with_host(5);
+
+	self_smi();
+
+	sync_with_host(DONE);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_regs regs;
+	struct kvm_sregs sregs;
+	struct kvm_vm *vm;
+	struct kvm_run *run;
+	struct kvm_x86_state *state;
+	int failure, stage, stage_reported;
+	u64 *cr;
+
+	for (failure = 0; failure <= 1; failure++) {
+		stage_reported = 0;
+
+		/* Create VM */
+		vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+		vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+
+		run = vcpu_state(vm, VCPU_ID);
+
+		vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA,
+					    SMRAM_MEMSLOT, SMRAM_PAGES, 0);
+		TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT)
+			    == SMRAM_GPA, "could not allocate guest physical addresses?");
+
+		memset(addr_gpa2hva(vm, SMRAM_GPA), 0x0, SMRAM_SIZE);
+		if (failure) {
+			memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler_fault,
+			       sizeof(smi_handler_fault));
+		} else {
+			memcpy(addr_gpa2hva(vm, SMRAM_GPA) + 0x8000, smi_handler_success,
+			       sizeof(smi_handler_success));
+		}
+		vcpu_set_msr(vm, VCPU_ID, MSR_IA32_SMBASE, SMRAM_GPA);
+
+		for (stage = 1;; stage++) {
+			_vcpu_run(vm, VCPU_ID);
+
+			memset(&regs, 0, sizeof(regs));
+			vcpu_regs_get(vm, VCPU_ID, &regs);
+
+			memset(&sregs, 0, sizeof(sregs));
+			vcpu_sregs_get(vm, VCPU_ID, &sregs);
+
+			/* stage_reported is currrently the last stage reported */
+			if (failure && stage_reported == SMRAM_STAGE_FAILURE) {
+				/* Ensure that we exit on smram modification of CR4 */
+				TEST_ASSERT(run->exit_reason == KVM_EXIT_INTERNAL_ERROR &&
+					    run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION,
+					    "Stage %d: unexpected exit reason: %u, suberror: %u (%s),\n",
+					    stage, run->exit_reason, run->internal.suberror,
+					    exit_reason_str(run->exit_reason));
+				if (run->exit_reason == KVM_EXIT_INTERNAL_ERROR)
+					goto done;
+			} else {
+				TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+					    "Stage %d: unexpected exit reason: %u (%s),\n",
+					    stage, run->exit_reason,
+					    exit_reason_str(run->exit_reason));
+			}
+
+			stage_reported = regs.rax & 0xff;
+
+			if (stage_reported == DONE) {
+				TEST_ASSERT((sregs.cr0 & CR0_PINNED) == CR0_PINNED,
+					    "Unexpected cr0. Bits missing: %llx",
+					    sregs.cr0 ^ (CR0_PINNED | sregs.cr0));
+				TEST_ASSERT((sregs.cr4 & CR4_ALL) == CR4_PINNED,
+					    "Unexpected cr4. Bits missing: %llx, cr4: %llx",
+					    sregs.cr4 ^ (CR4_ALL | sregs.cr4),
+					    sregs.cr4);
+				goto done;
+			}
+
+			TEST_ASSERT(stage_reported == stage ||
+				    stage_reported == SMRAM_STAGE_SUCCESS ||
+				    stage_reported == SMRAM_STAGE_FAILURE,
+				    "Unexpected stage: #%x, got %x",
+				    stage, stage_reported);
+
+			/* Within SMM modify CR0/4 to not contain pinned bits. */
+			if (stage_reported == SMRAM_STAGE_FAILURE) {
+				cr = (u64 *)(addr_gpa2hva(vm, SMRAM_GPA + 0x8000 + 0x7f58));
+				*cr &= ~CR0_PINNED;
+
+				cr = (u64 *)(addr_gpa2hva(vm, SMRAM_GPA + 0x8000 + 0x7f48));
+				/* Unset pinned, set one that was pinned low */
+				*cr &= ~CR4_PINNED;
+				*cr |= X86_CR4_SMEP;
+			}
+
+			state = vcpu_save_state(vm, VCPU_ID);
+			kvm_vm_release(vm);
+			kvm_vm_restart(vm, O_RDWR);
+			vm_vcpu_add(vm, VCPU_ID);
+			vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid());
+			vcpu_load_state(vm, VCPU_ID, state);
+			run = vcpu_state(vm, VCPU_ID);
+			free(state);
+		}
+
+done:
+		kvm_vm_free(vm);
+	}
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-17 19:07 [PATCH 0/4] Paravirtualized Control Register pinning John Andersen
                   ` (2 preceding siblings ...)
  2020-06-17 19:07 ` [PATCH 3/4] selftests: kvm: add test for CR pinning with SMM John Andersen
@ 2020-06-17 19:07 ` John Andersen
  2020-06-18 14:41   ` Dave Hansen
  2020-06-20  5:13   ` Andy Lutomirski
  3 siblings, 2 replies; 30+ messages in thread
From: John Andersen @ 2020-06-17 19:07 UTC (permalink / raw)
  To: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, john.s.andersen, arjan, caoj.fnst, bhe, nivedita,
	keescook, dan.j.williams, eric.auger, aaronlewis, peterx,
	makarandsonare, linux-doc, linux-kernel, kvm, linux-kselftest,
	kernel-hardening

Strengthen existing control register pinning when running
paravirtualized under KVM. Check which bits KVM supports pinning for
each control register and only pin supported bits which are already
pinned via the existing native protection. Write to KVM CR0/4 pinned
MSRs to enable pinning.

Initiate KVM assisted pinning directly following the setup of native
pinning on boot CPU. For non-boot CPUs initiate paravirtualized pinning
on CPU identification.

Identification of non-boot CPUs takes place after the boot CPU has setup
native CR pinning. Therefore, non-boot CPUs access pinned bits setup by
the boot CPU and request that those be pinned. All CPUs request
paravirtualized pinning of the same bits which are already pinned
natively.

Guests using the kexec system call currently do not support
paravirtualized control register pinning. This is due to early boot
code writing known good values to control registers, these values do
not contain the protected bits. This is due to CPU feature
identification being done at a later time, when the kernel properly
checks if it can enable protections. As such, the pv_cr_pin command line
option has been added which instructs the kernel to disable kexec in
favor of enabling paravirtualized control register pinning. crashkernel
is also disabled when the pv_cr_pin parameter is specified due to its
reliance on kexec.

When we fix kexec, we will still need a way for a kernel with support to
know if the kernel it is attempting to load has support. If a kernel
with this enabled attempts to kexec a kernel where this is not
supported, it would trigger a fault almost immediately.

Liran suggested adding a section to the built image acting as a flag to
signify support for being kexec'd by a kernel with pinning enabled.
Should that approach be implemented, it is likely that the command line
flag (pv_cr_pin) would still be desired for some deprecation period. We
wouldn't want the default behavior to change from being able to kexec
older kernels to not being able to, as this might break some users
workflows.

Signed-off-by: John Andersen <john.s.andersen@intel.com>
---
 .../admin-guide/kernel-parameters.txt         | 11 ++++++
 arch/x86/Kconfig                              | 10 +++++
 arch/x86/include/asm/kvm_para.h               | 28 +++++++++++++
 arch/x86/kernel/cpu/common.c                  |  5 +++
 arch/x86/kernel/kvm.c                         | 39 +++++++++++++++++++
 arch/x86/kernel/setup.c                       |  8 ++++
 6 files changed, 101 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 89386f6f3ab6..54fb2b5ab8fc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3926,6 +3926,17 @@
 			[KNL] Number of legacy pty's. Overwrites compiled-in
 			default number.
 
+	pv_cr_pin	[SECURITY,X86]
+			Enable paravirtualized control register pinning. When
+			running paravirutalized under KVM, request that KVM not
+			allow the guest to disable kernel protection features
+			set in CPU control registers. Specifying this option
+			will disable kexec (and crashkernel). If kexec support
+			has not been compiled into the kernel and host KVM
+			supports paravirtualized control register pinning, it
+			will be active by default without the need to specify
+			this parameter.
+
 	quiet		[KNL] Disable most log messages
 
 	r128=		[HW,DRM]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67f6a40b5e93..bc0b27483001 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -800,6 +800,7 @@ config KVM_GUEST
 	bool "KVM Guest support (including kvmclock)"
 	depends on PARAVIRT
 	select PARAVIRT_CLOCK
+	select PARAVIRT_CR_PIN
 	select ARCH_CPUIDLE_HALTPOLL
 	default y
 	---help---
@@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING
 config PARAVIRT_CLOCK
 	bool
 
+config PARAVIRT_CR_PIN
+       bool "Paravirtual bit pinning for CR0 and CR4"
+       depends on KVM_GUEST
+       help
+         Select this option to have the virtualised guest request that the
+         hypervisor disallow it from disabling protections set in control
+         registers. The hypervisor will prevent exploits from disabling
+         features such as SMEP, SMAP, UMIP, and WP.
+
 config JAILHOUSE_GUEST
 	bool "Jailhouse non-root cell support"
 	depends on X86_64 && PCI
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 57fd1966c4ea..f021531e98dc 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -112,6 +112,23 @@ static inline void kvm_spinlock_init(void)
 }
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
+#ifdef CONFIG_PARAVIRT_CR_PIN
+void __init kvm_paravirt_cr_pinning_init(void);
+void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
+				   unsigned long cr4_pinned_bits);
+#else
+static inline void kvm_paravirt_cr_pinning_init(void)
+{
+	return;
+}
+
+static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
+						 unsigned long cr4_pinned_bits)
+{
+	return;
+}
+#endif /* CONFIG_PARAVIRT_CR_PIN */
+
 #else /* CONFIG_KVM_GUEST */
 #define kvm_async_pf_task_wait_schedule(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
@@ -145,6 +162,17 @@ static inline bool kvm_handle_async_pf(struct pt_regs *regs, u32 token)
 {
 	return false;
 }
+
+static inline void kvm_paravirt_cr_pinning_init(void)
+{
+	return;
+}
+
+static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
+						 unsigned long cr4_pinned_bits)
+{
+	return;
+}
 #endif
 
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 921e67086a00..ee17223b1fa8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -21,6 +21,7 @@
 #include <linux/smp.h>
 #include <linux/io.h>
 #include <linux/syscore_ops.h>
+#include <linux/kvm_para.h>
 
 #include <asm/stackprotector.h>
 #include <asm/perf_event.h>
@@ -416,6 +417,8 @@ static void __init setup_cr_pinning(void)
 	mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP);
 	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask;
 	static_key_enable(&cr_pinning.key);
+
+	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
 }
 
 /*
@@ -1551,6 +1554,8 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 	mtrr_ap_init();
 	validate_apic_and_package_id(c);
 	x86_spec_ctrl_setup_ap();
+
+	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
 }
 
 static __init int setup_noclflush(char *arg)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 7e6403a8d861..def913b86a99 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -23,6 +23,8 @@
 #include <linux/kprobes.h>
 #include <linux/nmi.h>
 #include <linux/swait.h>
+#include <linux/init.h>
+#include <linux/kexec.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -33,6 +35,7 @@
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
 #include <asm/cpuidle_haltpoll.h>
+#include <asm/cmdline.h>
 
 DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
 
@@ -723,6 +726,7 @@ static void __init kvm_apic_init(void)
 static void __init kvm_init_platform(void)
 {
 	kvmclock_init();
+	kvm_paravirt_cr_pinning_init();
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
@@ -877,6 +881,41 @@ void __init kvm_spinlock_init(void)
 
 #endif	/* CONFIG_PARAVIRT_SPINLOCKS */
 
+#ifdef CONFIG_PARAVIRT_CR_PIN
+static int kvm_paravirt_cr_pinning_enabled __ro_after_init;
+
+void __init kvm_paravirt_cr_pinning_init(void)
+{
+#ifdef CONFIG_KEXEC_CORE
+	if (!cmdline_find_option_bool(boot_command_line, "pv_cr_pin"))
+		return;
+
+	/* Paravirtualized CR pinning is currently incompatible with kexec */
+	kexec_load_disabled = 1;
+#endif
+
+	kvm_paravirt_cr_pinning_enabled = 1;
+}
+
+void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
+				   unsigned long cr4_pinned_bits)
+{
+	u64 mask;
+
+	if (!kvm_paravirt_cr_pinning_enabled)
+		return;
+
+	if (!kvm_para_has_feature(KVM_FEATURE_CR_PIN))
+		return;
+
+	rdmsrl(MSR_KVM_CR0_PIN_ALLOWED, mask);
+	wrmsrl(MSR_KVM_CR0_PINNED_HIGH, cr0_pinned_bits & mask);
+
+	rdmsrl(MSR_KVM_CR4_PIN_ALLOWED, mask);
+	wrmsrl(MSR_KVM_CR4_PINNED_HIGH, cr4_pinned_bits & mask);
+}
+#endif
+
 #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
 
 static void kvm_disable_host_haltpoll(void *i)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d9c678b37a9b..ed3bcc85d40d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -27,6 +27,9 @@
 #include <asm/apic.h>
 #include <asm/bios_ebda.h>
 #include <asm/bugs.h>
+#include <asm/kasan.h>
+#include <asm/cmdline.h>
+
 #include <asm/cpu.h>
 #include <asm/efi.h>
 #include <asm/gart.h>
@@ -502,6 +505,11 @@ static void __init reserve_crashkernel(void)
 		return;
 	}
 
+	if (cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) {
+		pr_info("Ignoring crashkernel since pv_cr_pin present in cmdline\n");
+		return;
+	}
+
 	/* 0 means: find the address automatically */
 	if (!crash_base) {
 		/*
-- 
2.21.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] X86: Update mmu_cr4_features during feature identification
  2020-06-17 19:07 ` [PATCH 1/4] X86: Update mmu_cr4_features during feature identification John Andersen
@ 2020-06-18 14:09   ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2020-06-18 14:09 UTC (permalink / raw)
  To: John Andersen, corbet, pbonzini, tglx, mingo, bp, x86, hpa,
	shuah, sean.j.christopherson, liran.alon, drjones,
	rick.p.edgecombe, kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, arjan, caoj.fnst, bhe, nivedita, keescook,
	dan.j.williams, eric.auger, aaronlewis, peterx, makarandsonare,
	linux-doc, linux-kernel, kvm, linux-kselftest, kernel-hardening

Overall this looks pretty good.  For all the maintainers on cc, we try
to do internal reviews of these before they're submitted.  This one got
missed, sorry about that.

On 6/17/20 12:07 PM, John Andersen wrote:
> In identify_cpu when setting up SMEP/SMAP/UMIP call
> cr4_set_bits_and_update_boot instead of cr4_set_bits. This ensures that
> mmu_cr4_features contains those bits, and does not disable those
> protections when in hibernation asm.

When I'm writing comments, I try to use parenthesis for functions(),
which leaves variable_names plain.

I also try not to dive directly into the function names.  This
description assumes that the reader knows the subtle difference between
cr4_set_bits_and_update_boot() and of cr4_set_bits().  A sentence or two
of background here can save a reviewer a dive into the source code.

> setup_arch updates mmu_cr4_features to save what identified features are
> supported for later use in hibernation asm when cr4 needs to be modified
> to toggle PGE. cr4 writes happen in restore_image and restore_registers.
> setup_arch occurs before identify_cpu, this leads to mmu_cr4_features
> not containing some of the cr4 features which were enabled via
> identify_cpu when hibernation asm is executed.

This fails to address the bigger picture.  I assume you end up wanting
this because without it hibernation is not compatible with CR pinning.
Shouldn't that be mentioned?

I also wonder why we even need two classes of cr4_set_bits().  Are there
features we *want* to disable before entering the hibernation assembly?

For instance, why not leave MCE enabled in there?  What about PCIDs or
OSPKE?  Does it hurt?

> On CPU bringup when cr4_set_bits_and_update_boot is called 
> mmu_cr4_features will now be written to. For the boot CPU, the
> __ro_after_init on mmu_cr4_features does not cause a fault. However, 
> __ro_after_init was removed due to it triggering faults on non-boot 
> CPUs
Before this patch, cr4_set_bits_and_update_boot() was only ever called
during init.  But, after this patch, it gets called later in boot and
causes problems.  We're surely not making _real_ updates to it, right?
In that case the writes are superfluous and we would be better off just
not writing to it (and retaining __ro_after_init) rather than allowing
superfluous writes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-06-17 19:07 ` [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning John Andersen
@ 2020-06-18 14:18   ` Dave Hansen
  2020-06-18 14:43     ` Andersen, John
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-06-18 14:18 UTC (permalink / raw)
  To: John Andersen, corbet, pbonzini, tglx, mingo, bp, x86, hpa,
	shuah, sean.j.christopherson, liran.alon, drjones,
	rick.p.edgecombe, kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, arjan, caoj.fnst, bhe, nivedita, keescook,
	dan.j.williams, eric.auger, aaronlewis, peterx, makarandsonare,
	linux-doc, linux-kernel, kvm, linux-kselftest, kernel-hardening

On 6/17/20 12:07 PM, John Andersen wrote:
> +#define KVM_CR0_PIN_ALLOWED	(X86_CR0_WP)
> +#define KVM_CR4_PIN_ALLOWED	(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP)

Why *is* there an allowed set?  Why don't we just allow everything?

Shouldn't we also pin any unknown bits?  The CR4.FSGSBASE bit is an
example of something that showed up CPUs without Linux knowing about it.
 If set, it causes problems.  This set couldn't have helped FSGSBASE
because it is not in the allowed set.

Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
write to kernel memory.  Linux won't set it, but an attacker would go
after it, first thing.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-17 19:07 ` [PATCH 4/4] X86: Use KVM CR pin MSRs John Andersen
@ 2020-06-18 14:41   ` Dave Hansen
  2020-06-18 15:26     ` Andersen, John
  2020-06-20  5:13   ` Andy Lutomirski
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-06-18 14:41 UTC (permalink / raw)
  To: John Andersen, corbet, pbonzini, tglx, mingo, bp, x86, hpa,
	shuah, sean.j.christopherson, liran.alon, drjones,
	rick.p.edgecombe, kristen
  Cc: vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, arjan, caoj.fnst, bhe, nivedita, keescook,
	dan.j.williams, eric.auger, aaronlewis, peterx, makarandsonare,
	linux-doc, linux-kernel, kvm, linux-kselftest, kernel-hardening

On 6/17/20 12:07 PM, John Andersen wrote:
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 89386f6f3ab6..54fb2b5ab8fc 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3926,6 +3926,17 @@
>  			[KNL] Number of legacy pty's. Overwrites compiled-in
>  			default number.
>  
> +	pv_cr_pin	[SECURITY,X86]
> +			Enable paravirtualized control register pinning. When
> +			running paravirutalized under KVM, request that KVM not
> +			allow the guest to disable kernel protection features
> +			set in CPU control registers. Specifying this option
> +			will disable kexec (and crashkernel). If kexec support
> +			has not been compiled into the kernel and host KVM
> +			supports paravirtualized control register pinning, it
> +			will be active by default without the need to specify
> +			this parameter.

I'm writing this last in my review.  I guess I should have read this
first.  You'll see later in my review how this confused me.  This
behavior needs to be documented elsewhere.  Code comments would be best.

Let's say kexec is config'd off.  This feature is enabled by default and
crashes the kernel in early boot.  I have no way to disable this fancy
new feature.  Is that what we want?

I also think that instead of having to *enable* this explicitly when
kexec is present, maybe we should have a "disable_kexec" parameter.  If
kexec is configured out or disabled on the command-line, then you can
turn CR pinning on.

If someone fails to kexec() because of this feature, there's no way in
hell they'll ever track down "pv_cr_pin" on the command-line as the
cause.  The might have a chance of finding disable_kexec, though.

Wouldn't it also be nice to add a single printk() the first time a kexec
fails because of this feature being present?

>  	quiet		[KNL] Disable most log messages
>  
>  	r128=		[HW,DRM]
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 67f6a40b5e93..bc0b27483001 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -800,6 +800,7 @@ config KVM_GUEST
>  	bool "KVM Guest support (including kvmclock)"
>  	depends on PARAVIRT
>  	select PARAVIRT_CLOCK
> +	select PARAVIRT_CR_PIN
>  	select ARCH_CPUIDLE_HALTPOLL
>  	default y
>  	---help---
> @@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING
>  config PARAVIRT_CLOCK
>  	bool
>  
> +config PARAVIRT_CR_PIN
> +       bool "Paravirtual bit pinning for CR0 and CR4"
> +       depends on KVM_GUEST
> +       help
> +         Select this option to have the virtualised guest request that the
> +         hypervisor disallow it from disabling protections set in control
> +         registers. The hypervisor will prevent exploits from disabling
> +         features such as SMEP, SMAP, UMIP, and WP.

I'm confused.  Does this add support for ""Paravirtual bit pinning", or
actually tell the guest to request pinning by default?

It says "Select this option to have the virtualised guest request...",
which makes it sound like it affects the default rather than the
availability of the option.


> +#ifdef CONFIG_PARAVIRT_CR_PIN
> +void __init kvm_paravirt_cr_pinning_init(void);
> +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> +				   unsigned long cr4_pinned_bits);
> +#else
> +static inline void kvm_paravirt_cr_pinning_init(void)
> +{
> +	return;
> +}
> +
> +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> +						 unsigned long cr4_pinned_bits)
> +{
> +	return;
> +}
> +#endif /* CONFIG_PARAVIRT_CR_PIN */

For stuff like this that isn't the least bit performance sensitive, I
usually don't bother with header stubs.  Just do the function
declaration and then check the config option in the .c code.  It saves
#ifdef noise in the header.

>  #endif /* _ASM_X86_KVM_PARA_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 921e67086a00..ee17223b1fa8 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -21,6 +21,7 @@
>  #include <linux/smp.h>
>  #include <linux/io.h>
>  #include <linux/syscore_ops.h>
> +#include <linux/kvm_para.h>
>  
>  #include <asm/stackprotector.h>
>  #include <asm/perf_event.h>
> @@ -416,6 +417,8 @@ static void __init setup_cr_pinning(void)
>  	mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP);
>  	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask;
>  	static_key_enable(&cr_pinning.key);
> +
> +	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
>  }
>  
>  /*
> @@ -1551,6 +1554,8 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
>  	mtrr_ap_init();
>  	validate_apic_and_package_id(c);
>  	x86_spec_ctrl_setup_ap();
> +
> +	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
>  }

WP looks like it get special handling here.  But, why it is special goes
unmentioned in the changelog or comments.

Why is it special?

> +#ifdef CONFIG_PARAVIRT_CR_PIN
> +static int kvm_paravirt_cr_pinning_enabled __ro_after_init;
> +
> +void __init kvm_paravirt_cr_pinning_init(void)
> +{
> +#ifdef CONFIG_KEXEC_CORE
> +	if (!cmdline_find_option_bool(boot_command_line, "pv_cr_pin"))
> +		return;
> +
> +	/* Paravirtualized CR pinning is currently incompatible with kexec */
> +	kexec_load_disabled = 1;
> +#endif
> +
> +	kvm_paravirt_cr_pinning_enabled = 1;
> +}

This is why we don't like #ifdefs in .c files.  The CONFIG_KEXEC_CORE
one really makes this unreadable.

This is really confusing because it says, if "CONFIG_KEXEC_CORE" is off,
don't bother with looking for "pv_cr_pin" on the command-line before
setting kvm_paravirt_cr_pinning_enabled=1.  That doesn't make any sense
to me.

> +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> +				   unsigned long cr4_pinned_bits)
> +{
> +	u64 mask;
> +
> +	if (!kvm_paravirt_cr_pinning_enabled)
> +		return;
> +
> +	if (!kvm_para_has_feature(KVM_FEATURE_CR_PIN))
> +		return;

So, if we compiled this whole mess in and got the new command-line
parameter and we got all the way here and the host doesn't support it,
we silently return?

Seems like it would at least deserve a pr_info().

> +	rdmsrl(MSR_KVM_CR0_PIN_ALLOWED, mask);
> +	wrmsrl(MSR_KVM_CR0_PINNED_HIGH, cr0_pinned_bits & mask);
> +
> +	rdmsrl(MSR_KVM_CR4_PIN_ALLOWED, mask);
> +	wrmsrl(MSR_KVM_CR4_PINNED_HIGH, cr4_pinned_bits & mask);
> +}
> +#endif
> +
>  #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
>  
>  static void kvm_disable_host_haltpoll(void *i)
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d9c678b37a9b..ed3bcc85d40d 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -27,6 +27,9 @@
>  #include <asm/apic.h>
>  #include <asm/bios_ebda.h>
>  #include <asm/bugs.h>
> +#include <asm/kasan.h>
> +#include <asm/cmdline.h>
> +
>  #include <asm/cpu.h>
>  #include <asm/efi.h>
>  #include <asm/gart.h>
> @@ -502,6 +505,11 @@ static void __init reserve_crashkernel(void)
>  		return;
>  	}
>  
> +	if (cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) {
> +		pr_info("Ignoring crashkernel since pv_cr_pin present in cmdline\n");
> +		return;
> +	}

Isn't it a bit mean to ignore crashkernel if the kernel has
CONFIG_PARAVIRT_CR_PIN=n?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-06-18 14:18   ` Dave Hansen
@ 2020-06-18 14:43     ` Andersen, John
  2020-06-18 14:51       ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-06-18 14:43 UTC (permalink / raw)
  To: Dave Hansen
  Cc: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen, vkuznets, wanpengli, jmattson, joro, mchehab+huawei,
	gregkh, paulmck, pawan.kumar.gupta, jgross, mike.kravetz,
	oneukum, luto, peterz, fenghua.yu, reinette.chatre,
	vineela.tummalapalli, dave.hansen, arjan, caoj.fnst, bhe,
	nivedita, keescook, dan.j.williams, eric.auger, aaronlewis,
	peterx, makarandsonare, linux-doc, linux-kernel, kvm,
	linux-kselftest, kernel-hardening

On Thu, Jun 18, 2020 at 07:18:09AM -0700, Dave Hansen wrote:
> On 6/17/20 12:07 PM, John Andersen wrote:
> > +#define KVM_CR0_PIN_ALLOWED	(X86_CR0_WP)
> > +#define KVM_CR4_PIN_ALLOWED	(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP)
> 
> Why *is* there an allowed set?  Why don't we just allow everything?
> 
> Shouldn't we also pin any unknown bits?  The CR4.FSGSBASE bit is an
> example of something that showed up CPUs without Linux knowing about it.
>  If set, it causes problems.  This set couldn't have helped FSGSBASE
> because it is not in the allowed set.
> 
> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
> write to kernel memory.  Linux won't set it, but an attacker would go
> after it, first thing.

The allowed set came about because there were comments from internal review
where it was said that allowing the guest to pin TS and MP adds unnecessary
complexity.

Also because KVM always intercepts these bits via the CR0/4_GUEST_HOST_MASK. If
we allow setting of any bits, then we have to add some infrastructure for
modifying the mask when pinned bits are updated. I have a patch for that if we
want to go that route, but it doesn't account for the added complexity
mentioned above.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-06-18 14:43     ` Andersen, John
@ 2020-06-18 14:51       ` Dave Hansen
  2020-07-07 21:12         ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-06-18 14:51 UTC (permalink / raw)
  To: Andersen, John
  Cc: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen, vkuznets, wanpengli, jmattson, joro, mchehab+huawei,
	gregkh, paulmck, pawan.kumar.gupta, jgross, mike.kravetz,
	oneukum, luto, peterz, fenghua.yu, reinette.chatre,
	vineela.tummalapalli, dave.hansen, arjan, caoj.fnst, bhe,
	nivedita, keescook, dan.j.williams, eric.auger, aaronlewis,
	peterx, makarandsonare, linux-doc, linux-kernel, kvm,
	linux-kselftest, kernel-hardening

On 6/18/20 7:43 AM, Andersen, John wrote:
> On Thu, Jun 18, 2020 at 07:18:09AM -0700, Dave Hansen wrote:
>> On 6/17/20 12:07 PM, John Andersen wrote:
>>> +#define KVM_CR0_PIN_ALLOWED	(X86_CR0_WP)
>>> +#define KVM_CR4_PIN_ALLOWED	(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP)
>>
>> Why *is* there an allowed set?  Why don't we just allow everything?
>>
>> Shouldn't we also pin any unknown bits?  The CR4.FSGSBASE bit is an
>> example of something that showed up CPUs without Linux knowing about it.
>>  If set, it causes problems.  This set couldn't have helped FSGSBASE
>> because it is not in the allowed set.
>>
>> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
>> write to kernel memory.  Linux won't set it, but an attacker would go
>> after it, first thing.
> 
> The allowed set came about because there were comments from internal review
> where it was said that allowing the guest to pin TS and MP adds unnecessary
> complexity.

That would have been a great design point to include in the changelog.
 Could you make sure it shows up in future versions.

> Also because KVM always intercepts these bits via the CR0/4_GUEST_HOST_MASK. If
> we allow setting of any bits, then we have to add some infrastructure for
> modifying the mask when pinned bits are updated. I have a patch for that if we
> want to go that route, but it doesn't account for the added complexity
> mentioned above.

Well, we have a current, known issue (FSGSBASE) which shows how dealing
with guest-unknown bits is required.  To me, that overrules complexity
arguments to a large degree.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-18 14:41   ` Dave Hansen
@ 2020-06-18 15:26     ` Andersen, John
  2020-06-18 15:38       ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-06-18 15:26 UTC (permalink / raw)
  To: Dave Hansen
  Cc: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen, vkuznets, wanpengli, jmattson, joro, mchehab+huawei,
	gregkh, paulmck, pawan.kumar.gupta, jgross, mike.kravetz,
	oneukum, luto, peterz, fenghua.yu, reinette.chatre,
	vineela.tummalapalli, dave.hansen, arjan, caoj.fnst, bhe,
	nivedita, keescook, dan.j.williams, eric.auger, aaronlewis,
	peterx, makarandsonare, linux-doc, linux-kernel, kvm,
	linux-kselftest, kernel-hardening

On Thu, Jun 18, 2020 at 07:41:04AM -0700, Dave Hansen wrote:
> On 6/17/20 12:07 PM, John Andersen wrote:
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 89386f6f3ab6..54fb2b5ab8fc 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3926,6 +3926,17 @@
> >  			[KNL] Number of legacy pty's. Overwrites compiled-in
> >  			default number.
> >  
> > +	pv_cr_pin	[SECURITY,X86]
> > +			Enable paravirtualized control register pinning. When
> > +			running paravirutalized under KVM, request that KVM not
> > +			allow the guest to disable kernel protection features
> > +			set in CPU control registers. Specifying this option
> > +			will disable kexec (and crashkernel). If kexec support
> > +			has not been compiled into the kernel and host KVM
> > +			supports paravirtualized control register pinning, it
> > +			will be active by default without the need to specify
> > +			this parameter.
> 
> I'm writing this last in my review.  I guess I should have read this
> first.  You'll see later in my review how this confused me.  This
> behavior needs to be documented elsewhere.  Code comments would be best.
> 

Will do. Sorry for the confusion.

> Let's say kexec is config'd off.  This feature is enabled by default and
> crashes the kernel in early boot.  I have no way to disable this fancy
> new feature.  Is that what we want?
> 
> I also think that instead of having to *enable* this explicitly when
> kexec is present, maybe we should have a "disable_kexec" parameter.  If
> kexec is configured out or disabled on the command-line, then you can
> turn CR pinning on.
> 
> If someone fails to kexec() because of this feature, there's no way in
> hell they'll ever track down "pv_cr_pin" on the command-line as the
> cause.  The might have a chance of finding disable_kexec, though.
> 
> Wouldn't it also be nice to add a single printk() the first time a kexec
> fails because of this feature being present?
> 

That sounds like a good plan. I'll change pv_cr_pin to disable_kexec, and add a
disable_pv_cr_pin option in case it's being on by default via the compile time
option breaks a users workflow at runtime.

In this case, I'm assuming we can do away with the kconfig option then.

Just have it enabled by default. If kexec is present, it's disabled by default,
unless kexec is disabled, in which case, pinning is enabled unless
disable_pv_cr_pin is set.

> >  	quiet		[KNL] Disable most log messages
> >  
> >  	r128=		[HW,DRM]
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 67f6a40b5e93..bc0b27483001 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -800,6 +800,7 @@ config KVM_GUEST
> >  	bool "KVM Guest support (including kvmclock)"
> >  	depends on PARAVIRT
> >  	select PARAVIRT_CLOCK
> > +	select PARAVIRT_CR_PIN
> >  	select ARCH_CPUIDLE_HALTPOLL
> >  	default y
> >  	---help---
> > @@ -835,6 +836,15 @@ config PARAVIRT_TIME_ACCOUNTING
> >  config PARAVIRT_CLOCK
> >  	bool
> >  
> > +config PARAVIRT_CR_PIN
> > +       bool "Paravirtual bit pinning for CR0 and CR4"
> > +       depends on KVM_GUEST
> > +       help
> > +         Select this option to have the virtualised guest request that the
> > +         hypervisor disallow it from disabling protections set in control
> > +         registers. The hypervisor will prevent exploits from disabling
> > +         features such as SMEP, SMAP, UMIP, and WP.
> 
> I'm confused.  Does this add support for ""Paravirtual bit pinning", or
> actually tell the guest to request pinning by default?
> 
> It says "Select this option to have the virtualised guest request...",
> which makes it sound like it affects the default rather than the
> availability of the option.
> 

How about this

Select this option to request protection of SMEP, SMAP, UMIP, and WP
control register bits when running paravirtualized under KVM. Protection will
be active provided the feature is available host side and kexec is disabled via
kconfig or the command line for the guest requesting protection.

> > +#ifdef CONFIG_PARAVIRT_CR_PIN
> > +void __init kvm_paravirt_cr_pinning_init(void);
> > +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> > +				   unsigned long cr4_pinned_bits);
> > +#else
> > +static inline void kvm_paravirt_cr_pinning_init(void)
> > +{
> > +	return;
> > +}
> > +
> > +static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> > +						 unsigned long cr4_pinned_bits)
> > +{
> > +	return;
> > +}
> > +#endif /* CONFIG_PARAVIRT_CR_PIN */
> 
> For stuff like this that isn't the least bit performance sensitive, I
> usually don't bother with header stubs.  Just do the function
> declaration and then check the config option in the .c code.  It saves
> #ifdef noise in the header.
> 

Sounds good

> >  #endif /* _ASM_X86_KVM_PARA_H */
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 921e67086a00..ee17223b1fa8 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -21,6 +21,7 @@
> >  #include <linux/smp.h>
> >  #include <linux/io.h>
> >  #include <linux/syscore_ops.h>
> > +#include <linux/kvm_para.h>
> >  
> >  #include <asm/stackprotector.h>
> >  #include <asm/perf_event.h>
> > @@ -416,6 +417,8 @@ static void __init setup_cr_pinning(void)
> >  	mask = (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP);
> >  	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & mask;
> >  	static_key_enable(&cr_pinning.key);
> > +
> > +	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
> >  }
> >  
> >  /*
> > @@ -1551,6 +1554,8 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
> >  	mtrr_ap_init();
> >  	validate_apic_and_package_id(c);
> >  	x86_spec_ctrl_setup_ap();
> > +
> > +	kvm_setup_paravirt_cr_pinning(X86_CR0_WP, cr4_pinned_bits);
> >  }
> 
> WP looks like it get special handling here.  But, why it is special goes
> unmentioned in the changelog or comments.
> 
> Why is it special?
> 

We're copying the behavior of native_write_cr0() which assumes we always have
WP and can set it / it always must be set.

With CR4 we leverage the fact that setup_cr_pinning() initializes
cr4_pinned_bits that contains bits enabled during feature identification
and masked with what native pinning cares about. This ensures we pin the same
bits with paravirtualized pinning that we're already pinning natively.

I'll mention it in the commit message for v2.

> > +#ifdef CONFIG_PARAVIRT_CR_PIN
> > +static int kvm_paravirt_cr_pinning_enabled __ro_after_init;
> > +
> > +void __init kvm_paravirt_cr_pinning_init(void)
> > +{
> > +#ifdef CONFIG_KEXEC_CORE
> > +	if (!cmdline_find_option_bool(boot_command_line, "pv_cr_pin"))
> > +		return;
> > +
> > +	/* Paravirtualized CR pinning is currently incompatible with kexec */
> > +	kexec_load_disabled = 1;
> > +#endif
> > +
> > +	kvm_paravirt_cr_pinning_enabled = 1;
> > +}
> 
> This is why we don't like #ifdefs in .c files.  The CONFIG_KEXEC_CORE
> one really makes this unreadable.
> 
> This is really confusing because it says, if "CONFIG_KEXEC_CORE" is off,
> don't bother with looking for "pv_cr_pin" on the command-line before
> setting kvm_paravirt_cr_pinning_enabled=1.  That doesn't make any sense
> to me.
> 

I think this will be clearer when we change the command like options to
disable_. I'll be sure to use IS_ENABLED next time, my bad I forgot here.

> > +void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
> > +				   unsigned long cr4_pinned_bits)
> > +{
> > +	u64 mask;
> > +
> > +	if (!kvm_paravirt_cr_pinning_enabled)
> > +		return;
> > +
> > +	if (!kvm_para_has_feature(KVM_FEATURE_CR_PIN))
> > +		return;
> 
> So, if we compiled this whole mess in and got the new command-line
> parameter and we got all the way here and the host doesn't support it,
> we silently return?
> 
> Seems like it would at least deserve a pr_info().
> 

Will do. I'll probably use a rate limited variant because this happens for each
CPU.

> > +	rdmsrl(MSR_KVM_CR0_PIN_ALLOWED, mask);
> > +	wrmsrl(MSR_KVM_CR0_PINNED_HIGH, cr0_pinned_bits & mask);
> > +
> > +	rdmsrl(MSR_KVM_CR4_PIN_ALLOWED, mask);
> > +	wrmsrl(MSR_KVM_CR4_PINNED_HIGH, cr4_pinned_bits & mask);
> > +}
> > +#endif
> > +
> >  #ifdef CONFIG_ARCH_CPUIDLE_HALTPOLL
> >  
> >  static void kvm_disable_host_haltpoll(void *i)
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index d9c678b37a9b..ed3bcc85d40d 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -27,6 +27,9 @@
> >  #include <asm/apic.h>
> >  #include <asm/bios_ebda.h>
> >  #include <asm/bugs.h>
> > +#include <asm/kasan.h>
> > +#include <asm/cmdline.h>
> > +
> >  #include <asm/cpu.h>
> >  #include <asm/efi.h>
> >  #include <asm/gart.h>
> > @@ -502,6 +505,11 @@ static void __init reserve_crashkernel(void)
> >  		return;
> >  	}
> >  
> > +	if (cmdline_find_option_bool(boot_command_line, "pv_cr_pin")) {
> > +		pr_info("Ignoring crashkernel since pv_cr_pin present in cmdline\n");
> > +		return;
> > +	}
> 
> Isn't it a bit mean to ignore crashkernel if the kernel has
> CONFIG_PARAVIRT_CR_PIN=n?
> 

Yes that is mean :) I will change this. There is of course no need to check for
this option if the config isn't enabled.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-18 15:26     ` Andersen, John
@ 2020-06-18 15:38       ` Dave Hansen
  2020-06-18 15:49         ` Andersen, John
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-06-18 15:38 UTC (permalink / raw)
  To: Andersen, John
  Cc: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen, vkuznets, wanpengli, jmattson, joro, mchehab+huawei,
	gregkh, paulmck, pawan.kumar.gupta, jgross, mike.kravetz,
	oneukum, luto, peterz, fenghua.yu, reinette.chatre,
	vineela.tummalapalli, dave.hansen, arjan, caoj.fnst, bhe,
	nivedita, keescook, dan.j.williams, eric.auger, aaronlewis,
	peterx, makarandsonare, linux-doc, linux-kernel, kvm,
	linux-kselftest, kernel-hardening

On 6/18/20 8:26 AM, Andersen, John wrote:
> On Thu, Jun 18, 2020 at 07:41:04AM -0700, Dave Hansen wrote:
>> Let's say kexec is config'd off.  This feature is enabled by default and
>> crashes the kernel in early boot.  I have no way to disable this fancy
>> new feature.  Is that what we want?
>>
>> I also think that instead of having to *enable* this explicitly when
>> kexec is present, maybe we should have a "disable_kexec" parameter.  If
>> kexec is configured out or disabled on the command-line, then you can
>> turn CR pinning on.
>>
>> If someone fails to kexec() because of this feature, there's no way in
>> hell they'll ever track down "pv_cr_pin" on the command-line as the
>> cause.  The might have a chance of finding disable_kexec, though.
>>
>> Wouldn't it also be nice to add a single printk() the first time a kexec
>> fails because of this feature being present?
> 
> That sounds like a good plan. I'll change pv_cr_pin to disable_kexec, and add a
> disable_pv_cr_pin option in case it's being on by default via the compile time
> option breaks a users workflow at runtime.
> 
> In this case, I'm assuming we can do away with the kconfig option then.
> 
> Just have it enabled by default. If kexec is present, it's disabled by default,
> unless kexec is disabled, in which case, pinning is enabled unless
> disable_pv_cr_pin is set.

Yes, that sounds good to me.

...
>>> +config PARAVIRT_CR_PIN
>>> +       bool "Paravirtual bit pinning for CR0 and CR4"
>>> +       depends on KVM_GUEST
>>> +       help
>>> +         Select this option to have the virtualised guest request that the
>>> +         hypervisor disallow it from disabling protections set in control
>>> +         registers. The hypervisor will prevent exploits from disabling
>>> +         features such as SMEP, SMAP, UMIP, and WP.
>>
>> I'm confused.  Does this add support for ""Paravirtual bit pinning", or
>> actually tell the guest to request pinning by default?
>>
>> It says "Select this option to have the virtualised guest request...",
>> which makes it sound like it affects the default rather than the
>> availability of the option.
> 
> How about this
> 
> Select this option to request protection of SMEP, SMAP, UMIP, and WP
> control register bits when running paravirtualized under KVM. Protection will
> be active provided the feature is available host side and kexec is disabled via
> kconfig or the command line for the guest requesting protection.

It still isn't very clear to me.

Let's pull the config option out of this patch.  Enable the feature by
default and do the command-line processing in this patch.

If you still think a Kconfig option is helpful, add it in a separate
patch calling out the deficiencies with the boot-time options.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-18 15:38       ` Dave Hansen
@ 2020-06-18 15:49         ` Andersen, John
  0 siblings, 0 replies; 30+ messages in thread
From: Andersen, John @ 2020-06-18 15:49 UTC (permalink / raw)
  To: Dave Hansen
  Cc: corbet, pbonzini, tglx, mingo, bp, x86, hpa, shuah,
	sean.j.christopherson, liran.alon, drjones, rick.p.edgecombe,
	kristen, vkuznets, wanpengli, jmattson, joro, mchehab+huawei,
	gregkh, paulmck, pawan.kumar.gupta, jgross, mike.kravetz,
	oneukum, luto, peterz, fenghua.yu, reinette.chatre,
	vineela.tummalapalli, dave.hansen, arjan, caoj.fnst, bhe,
	nivedita, keescook, dan.j.williams, eric.auger, aaronlewis,
	peterx, makarandsonare, linux-doc, linux-kernel, kvm,
	linux-kselftest, kernel-hardening

On Thu, Jun 18, 2020 at 08:38:06AM -0700, Dave Hansen wrote:
> On 6/18/20 8:26 AM, Andersen, John wrote:
> > On Thu, Jun 18, 2020 at 07:41:04AM -0700, Dave Hansen wrote:
> >>> +config PARAVIRT_CR_PIN
> >>> +       bool "Paravirtual bit pinning for CR0 and CR4"
> >>> +       depends on KVM_GUEST
> >>> +       help
> >>> +         Select this option to have the virtualised guest request that the
> >>> +         hypervisor disallow it from disabling protections set in control
> >>> +         registers. The hypervisor will prevent exploits from disabling
> >>> +         features such as SMEP, SMAP, UMIP, and WP.
> >>
> >> I'm confused.  Does this add support for ""Paravirtual bit pinning", or
> >> actually tell the guest to request pinning by default?
> >>
> >> It says "Select this option to have the virtualised guest request...",
> >> which makes it sound like it affects the default rather than the
> >> availability of the option.
> > 
> > How about this
> > 
> > Select this option to request protection of SMEP, SMAP, UMIP, and WP
> > control register bits when running paravirtualized under KVM. Protection will
> > be active provided the feature is available host side and kexec is disabled via
> > kconfig or the command line for the guest requesting protection.
> 
> It still isn't very clear to me.
> 
> Let's pull the config option out of this patch.  Enable the feature by
> default and do the command-line processing in this patch.
> 
> If you still think a Kconfig option is helpful, add it in a separate
> patch calling out the deficiencies with the boot-time options.

That's right we're going to pull it out anyway and just disable if the
disable_pv_cr_pin command line option is set. Oops. That solves that.

Thank you very much for your review Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-17 19:07 ` [PATCH 4/4] X86: Use KVM CR pin MSRs John Andersen
  2020-06-18 14:41   ` Dave Hansen
@ 2020-06-20  5:13   ` Andy Lutomirski
  2020-06-23 20:03     ` Andersen, John
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2020-06-20  5:13 UTC (permalink / raw)
  To: John Andersen
  Cc: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Christopherson, Sean J, Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, mchehab+huawei, Greg KH,
	Paul E. McKenney, pawan.kumar.gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Andrew Lutomirski, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Arvind Sankar,
	Kees Cook, Dan Williams, eric.auger, aaronlewis, Peter Xu,
	makarandsonare, open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Wed, Jun 17, 2020 at 12:05 PM John Andersen
<john.s.andersen@intel.com> wrote:
> Guests using the kexec system call currently do not support
> paravirtualized control register pinning. This is due to early boot
> code writing known good values to control registers, these values do
> not contain the protected bits. This is due to CPU feature
> identification being done at a later time, when the kernel properly
> checks if it can enable protections. As such, the pv_cr_pin command line
> option has been added which instructs the kernel to disable kexec in
> favor of enabling paravirtualized control register pinning. crashkernel
> is also disabled when the pv_cr_pin parameter is specified due to its
> reliance on kexec.

Is there a plan for fixing this for real?  I'm wondering if there is a
sane weakening of this feature that still allows things like kexec.

What happens if a guest tries to reset?  For that matter, what happens
when a guest vCPU sends SIPI to another guest vCPU?  The target CPU
starts up in real mode, right?  There's no SMEP or SMAP in real mode,
and real mode has basically no security mitigations at all.

PCID is an odd case.  I see no good reason to pin it, and pinning PCID
on prevents use of 32-bit mode.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-20  5:13   ` Andy Lutomirski
@ 2020-06-23 20:03     ` Andersen, John
  2020-07-03 21:48       ` Andersen, John
  0 siblings, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-06-23 20:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Christopherson, Sean J, Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, mchehab+huawei, Greg KH,
	Paul E. McKenney, pawan.kumar.gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Peter Zijlstra, Fenghua Yu, reinette.chatre,
	vineela.tummalapalli, Dave Hansen, Arjan van de Ven, caoj.fnst,
	Baoquan He, Arvind Sankar, Kees Cook, Dan Williams, eric.auger,
	aaronlewis, Peter Xu, makarandsonare, open list:DOCUMENTATION,
	LKML, kvm list, open list:KERNEL SELFTEST FRAMEWORK,
	Kernel Hardening

On Fri, Jun 19, 2020 at 10:13:25PM -0700, Andy Lutomirski wrote:
> On Wed, Jun 17, 2020 at 12:05 PM John Andersen
> <john.s.andersen@intel.com> wrote:
> > Guests using the kexec system call currently do not support
> > paravirtualized control register pinning. This is due to early boot
> > code writing known good values to control registers, these values do
> > not contain the protected bits. This is due to CPU feature
> > identification being done at a later time, when the kernel properly
> > checks if it can enable protections. As such, the pv_cr_pin command line
> > option has been added which instructs the kernel to disable kexec in
> > favor of enabling paravirtualized control register pinning. crashkernel
> > is also disabled when the pv_cr_pin parameter is specified due to its
> > reliance on kexec.
> 
> Is there a plan for fixing this for real?  I'm wondering if there is a
> sane weakening of this feature that still allows things like kexec.
> 

I'm pretty sure kexec can be fixed. I had it working at one point, I'm
currently in the process of revalidating this. The issue was though that
kexec only worked within the guest, not on the physical host, which I suspect
is related to the need for supervisor pages to be mapped, which seems to be
required before enabling SMAP (based on what I'd seen with the selftests and
unittests). I was also just blindly turning on the bits without checking for
support when I'd tried this, so that could have been the issue too.

I think most of the changes for just blindly enabling the bits were in
relocate_kernel, secondary_startup_64, and startup_32.

> What happens if a guest tries to reset?  For that matter, what happens
> when a guest vCPU sends SIPI to another guest vCPU?  The target CPU
> starts up in real mode, right?
>

In this case we hit kvm_vcpu_reset, where we clear pinning. Yes I believe it
starts up in real mode.

> There's no SMEP or SMAP in real mode, and real mode has basically no security
> mitigations at all.
> 

We'd thought about the switch to real mode being a case where we'd want to drop
pinning. However, we weren't sure how much weaker, if at all, it makes this
protection.

Unless someone knows, I'll probably need to do some digging into what an
exploit might look like that tries switching to real mode and switching back as
a way around this protection.

If we can use the switch to real mode as a drop pinning trigger then I think
that might just solve the kexec problem.

> PCID is an odd case.  I see no good reason to pin it, and pinning PCID
> on prevents use of 32-bit mode.

Maybe it makes sense to default to the values we have, but allow host userspace
to overwrite the allowed values, in case some other guest OS wants to do
something that Linux doesn't with PCID or other bits.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-06-23 20:03     ` Andersen, John
@ 2020-07-03 21:48       ` Andersen, John
  2020-07-04 15:11         ` Arvind Sankar
  0 siblings, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-07-03 21:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jonathan Corbet, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Christopherson, Sean J, Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, mchehab+huawei, Greg KH,
	Paul E. McKenney, pawan.kumar.gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Peter Zijlstra, Fenghua Yu, reinette.chatre,
	vineela.tummalapalli, Dave Hansen, Arjan van de Ven, caoj.fnst,
	Baoquan He, Arvind Sankar, Kees Cook, Geremy Condra,
	Dan Williams, eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

> > Is there a plan for fixing this for real?  I'm wondering if there is a
> > sane weakening of this feature that still allows things like kexec.
> > 
> 
> I'm pretty sure kexec can be fixed. I had it working at one point, I'm
> currently in the process of revalidating this. The issue was though that
> kexec only worked within the guest, not on the physical host, which I suspect
> is related to the need for supervisor pages to be mapped, which seems to be
> required before enabling SMAP (based on what I'd seen with the selftests and
> unittests). I was also just blindly turning on the bits without checking for
> support when I'd tried this, so that could have been the issue too.
> 
> I think most of the changes for just blindly enabling the bits were in
> relocate_kernel, secondary_startup_64, and startup_32.
> 

So I have a naive fix for kexec which has only been tested to work under KVM.
When tested on a physical host, it did not boot when SMAP or UMIP were set.
Undoubtedly it's not the correct way to do this, as it skips CPU feature
identification, opting instead for blindly setting the bits. The physical host
I tested this on does not have UMIP so that's likely why it failed to boot when
UMIP gets set blindly. Within kvm-unit-tests, the test for SMAP maps memory as
supervisor pages before enabling SMAP. I suspect this is why setting SMAP
blindly causes the physical host not to boot.

Within trampoline_32bit_src() if I add more instructions I get an error
about "attempt to move .org backwards", which as I understand it means
there are only so many instructions allowed in each of those functions.

My suspicion is that someone with more knowledge of this area has a good
idea on how best to handle this. Feedback would be much appreciated.

> > There's no SMEP or SMAP in real mode, and real mode has basically no security
> > mitigations at all.
> > 
> 
> We'd thought about the switch to real mode being a case where we'd want to drop
> pinning. However, we weren't sure how much weaker, if at all, it makes this
> protection.
> 
> Unless someone knows, I'll probably need to do some digging into what an
> exploit might look like that tries switching to real mode and switching back as
> a way around this protection.
> 

TL;DR We probably shouldn't use the switch to real mode as a trigger to drop
pinning.

This protection assumes that the attacker is at the point where they have the
ability to write a payload for a ROP/JOP attack and gain control of execution.

For this case where we are going to switch to real mode we need to add an
assumption that the attacker has a write primitive that allows them to write
part of their payload to memory that will be addressable within 16 bit mode.

If the attacker has this write primitive, the attack becomes write payloads,
within the first stage, switch to real mode, use stage two within real mode via
JOP or just machine code (since there's we don't have to worry NX) to setup
protected mode and jump back into the kernel with protections disabled.

> > PCID is an odd case.  I see no good reason to pin it, and pinning PCID
> > on prevents use of 32-bit mode.
> 
> Maybe it makes sense to default to the values we have, but allow host userspace
> to overwrite the allowed values, in case some other guest OS wants to do
> something that Linux doesn't with PCID or other bits.

In the next version of this patchset I've made it so that the default allowed
values are WP, SMEP, SMAP, and UMIP. However, a write to the allowed MSR from
the host VMM (QEMU) can change which bits are allowed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] X86: Use KVM CR pin MSRs
  2020-07-03 21:48       ` Andersen, John
@ 2020-07-04 15:11         ` Arvind Sankar
  0 siblings, 0 replies; 30+ messages in thread
From: Arvind Sankar @ 2020-07-04 15:11 UTC (permalink / raw)
  To: Andersen, John
  Cc: Andy Lutomirski, Jonathan Corbet, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Christopherson, Sean J, Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, mchehab+huawei, Greg KH,
	Paul E. McKenney, pawan.kumar.gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Peter Zijlstra, Fenghua Yu, reinette.chatre,
	vineela.tummalapalli, Dave Hansen, Arjan van de Ven, caoj.fnst,
	Baoquan He, Arvind Sankar, Kees Cook, Geremy Condra,
	Dan Williams, eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Fri, Jul 03, 2020 at 09:48:14PM +0000, Andersen, John wrote:
> > > Is there a plan for fixing this for real?  I'm wondering if there is a
> > > sane weakening of this feature that still allows things like kexec.
> > > 
> > 
> > I'm pretty sure kexec can be fixed. I had it working at one point, I'm
> > currently in the process of revalidating this. The issue was though that
> > kexec only worked within the guest, not on the physical host, which I suspect
> > is related to the need for supervisor pages to be mapped, which seems to be
> > required before enabling SMAP (based on what I'd seen with the selftests and
> > unittests). I was also just blindly turning on the bits without checking for
> > support when I'd tried this, so that could have been the issue too.
> > 
> > I think most of the changes for just blindly enabling the bits were in
> > relocate_kernel, secondary_startup_64, and startup_32.
> > 
> 
> So I have a naive fix for kexec which has only been tested to work under KVM.
> When tested on a physical host, it did not boot when SMAP or UMIP were set.
> Undoubtedly it's not the correct way to do this, as it skips CPU feature
> identification, opting instead for blindly setting the bits. The physical host
> I tested this on does not have UMIP so that's likely why it failed to boot when
> UMIP gets set blindly. Within kvm-unit-tests, the test for SMAP maps memory as
> supervisor pages before enabling SMAP. I suspect this is why setting SMAP
> blindly causes the physical host not to boot.
> 
> Within trampoline_32bit_src() if I add more instructions I get an error
> about "attempt to move .org backwards", which as I understand it means
> there are only so many instructions allowed in each of those functions.
> 
> My suspicion is that someone with more knowledge of this area has a good
> idea on how best to handle this. Feedback would be much appreciated.

You can simply increase the value of TRAMPOLINE_32BIT_CODE_SIZE in
pgtable.h, assuming you don't need a very large increase. There's one
page available for code + stack at present.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-06-18 14:51       ` Dave Hansen
@ 2020-07-07 21:12         ` Sean Christopherson
  2020-07-07 21:48           ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2020-07-07 21:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andersen, John, corbet, pbonzini, tglx, mingo, bp, x86, hpa,
	shuah, liran.alon, drjones, rick.p.edgecombe, kristen, vkuznets,
	wanpengli, jmattson, joro, mchehab+huawei, gregkh, paulmck,
	pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto, peterz,
	fenghua.yu, reinette.chatre, vineela.tummalapalli, dave.hansen,
	arjan, caoj.fnst, bhe, nivedita, keescook, dan.j.williams,
	eric.auger, aaronlewis, peterx, makarandsonare, linux-doc,
	linux-kernel, kvm, linux-kselftest, kernel-hardening

On Thu, Jun 18, 2020 at 07:51:10AM -0700, Dave Hansen wrote:
> On 6/18/20 7:43 AM, Andersen, John wrote:
> > On Thu, Jun 18, 2020 at 07:18:09AM -0700, Dave Hansen wrote:
> >> On 6/17/20 12:07 PM, John Andersen wrote:
> >>> +#define KVM_CR0_PIN_ALLOWED	(X86_CR0_WP)
> >>> +#define KVM_CR4_PIN_ALLOWED	(X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP)
> >>
> >> Why *is* there an allowed set?  Why don't we just allow everything?
> >>
> >> Shouldn't we also pin any unknown bits?  The CR4.FSGSBASE bit is an
> >> example of something that showed up CPUs without Linux knowing about it.
> >>  If set, it causes problems.  This set couldn't have helped FSGSBASE
> >> because it is not in the allowed set.
> >>
> >> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
> >> write to kernel memory.  Linux won't set it, but an attacker would go
> >> after it, first thing.

That's an orthogonal to pinning.  KVM never lets the guest set CR4 bits that
are unknown to KVM.  Supporting CR4.NO_MARBLES would require an explicit KVM
change to allow it to be set by the guest, and would also require a userspace
VMM to expose NO_MARBLES to the guest.

That being said, this series should supporting pinning as much as possible,
i.e. if the bit can be exposed to the guest and doesn't require special
handling in KVM, allow it to be pinned.  E.g. TS is a special case because
pinning would require additional emulator support and IMO isn't interesting
enough to justify the extra complexity.  At a glance, I don't see anything
that would prevent pinning FSGSBASE.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-07 21:12         ` Sean Christopherson
@ 2020-07-07 21:48           ` Dave Hansen
  2020-07-07 21:51             ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-07-07 21:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andersen, John, corbet, pbonzini, tglx, mingo, bp, x86, hpa,
	shuah, liran.alon, drjones, rick.p.edgecombe, kristen, vkuznets,
	wanpengli, jmattson, joro, mchehab+huawei, gregkh, paulmck,
	pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto, peterz,
	fenghua.yu, reinette.chatre, vineela.tummalapalli, dave.hansen,
	arjan, caoj.fnst, bhe, nivedita, keescook, dan.j.williams,
	eric.auger, aaronlewis, peterx, makarandsonare, linux-doc,
	linux-kernel, kvm, linux-kselftest, kernel-hardening

On 7/7/20 2:12 PM, Sean Christopherson wrote:
>>>> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
>>>> write to kernel memory.  Linux won't set it, but an attacker would go
>>>> after it, first thing.
> That's an orthogonal to pinning.  KVM never lets the guest set CR4 bits that
> are unknown to KVM.  Supporting CR4.NO_MARBLES would require an explicit KVM
> change to allow it to be set by the guest, and would also require a userspace
> VMM to expose NO_MARBLES to the guest.
> 
> That being said, this series should supporting pinning as much as possible,
> i.e. if the bit can be exposed to the guest and doesn't require special
> handling in KVM, allow it to be pinned.  E.g. TS is a special case because
> pinning would require additional emulator support and IMO isn't interesting
> enough to justify the extra complexity.  At a glance, I don't see anything
> that would prevent pinning FSGSBASE.

Thanks for filling in the KVM picture.

If we're supporting as much pinning as possible, can we also add
something to make it inconvenient for someone to both make a CR4 bit
known to KVM *and* ignore the pinning aspects?

We should really make folks think about it.  Something like:

#define KVM_CR4_KNOWN 0xff
#define KVM_CR4_PIN_ALLOWED 0xf0
#define KVM_CR4_PIN_NOT_ALLOWED 0x0f

BUILD_BUG_ON(KVM_CR4_KNOWN !=
             (KVM_CR4_PIN_ALLOWED|KVM_CR4_PIN_NOT_ALLOWED));

So someone *MUST* make an active declaration about new bits being pinned
or not?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-07 21:48           ` Dave Hansen
@ 2020-07-07 21:51             ` Paolo Bonzini
  2020-07-09 15:44               ` Andersen, John
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-07-07 21:51 UTC (permalink / raw)
  To: Dave Hansen, Sean Christopherson
  Cc: Andersen, John, corbet, tglx, mingo, bp, x86, hpa, shuah,
	liran.alon, drjones, rick.p.edgecombe, kristen, vkuznets,
	wanpengli, jmattson, joro, mchehab+huawei, gregkh, paulmck,
	pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto, peterz,
	fenghua.yu, reinette.chatre, vineela.tummalapalli, dave.hansen,
	arjan, caoj.fnst, bhe, nivedita, keescook, dan.j.williams,
	eric.auger, aaronlewis, peterx, makarandsonare, linux-doc,
	linux-kernel, kvm, linux-kselftest, kernel-hardening

On 07/07/20 23:48, Dave Hansen wrote:
> On 7/7/20 2:12 PM, Sean Christopherson wrote:
>>>>> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
>>>>> write to kernel memory.  Linux won't set it, but an attacker would go
>>>>> after it, first thing.
>> That's an orthogonal to pinning.  KVM never lets the guest set CR4 bits that
>> are unknown to KVM.  Supporting CR4.NO_MARBLES would require an explicit KVM
>> change to allow it to be set by the guest, and would also require a userspace
>> VMM to expose NO_MARBLES to the guest.
>>
>> That being said, this series should supporting pinning as much as possible,
>> i.e. if the bit can be exposed to the guest and doesn't require special
>> handling in KVM, allow it to be pinned.  E.g. TS is a special case because
>> pinning would require additional emulator support and IMO isn't interesting
>> enough to justify the extra complexity.  At a glance, I don't see anything
>> that would prevent pinning FSGSBASE.
> 
> Thanks for filling in the KVM picture.
> 
> If we're supporting as much pinning as possible, can we also add
> something to make it inconvenient for someone to both make a CR4 bit
> known to KVM *and* ignore the pinning aspects?
> 
> We should really make folks think about it.  Something like:
> 
> #define KVM_CR4_KNOWN 0xff
> #define KVM_CR4_PIN_ALLOWED 0xf0
> #define KVM_CR4_PIN_NOT_ALLOWED 0x0f
> 
> BUILD_BUG_ON(KVM_CR4_KNOWN !=
>              (KVM_CR4_PIN_ALLOWED|KVM_CR4_PIN_NOT_ALLOWED));
> 
> So someone *MUST* make an active declaration about new bits being pinned
> or not?

I would just make all unknown bits pinnable (or perhaps all CR4 bits in
general).

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-07 21:51             ` Paolo Bonzini
@ 2020-07-09 15:44               ` Andersen, John
  2020-07-09 15:56                 ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-07-09 15:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dave Hansen, Sean Christopherson, corbet, tglx, mingo, bp, x86,
	hpa, shuah, liran.alon, drjones, rick.p.edgecombe, kristen,
	vkuznets, wanpengli, jmattson, joro, mchehab+huawei, gregkh,
	paulmck, pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto,
	peterz, fenghua.yu, reinette.chatre, vineela.tummalapalli,
	dave.hansen, arjan, caoj.fnst, bhe, nivedita, keescook,
	dan.j.williams, eric.auger, aaronlewis, peterx, makarandsonare,
	linux-doc, linux-kernel, kvm, linux-kselftest, kernel-hardening

On Tue, Jul 07, 2020 at 11:51:54PM +0200, Paolo Bonzini wrote:
> On 07/07/20 23:48, Dave Hansen wrote:
> > On 7/7/20 2:12 PM, Sean Christopherson wrote:
> >>>>> Let's say Intel loses its marbles and adds a CR4 bit that lets userspace
> >>>>> write to kernel memory.  Linux won't set it, but an attacker would go
> >>>>> after it, first thing.
> >> That's an orthogonal to pinning.  KVM never lets the guest set CR4 bits that
> >> are unknown to KVM.  Supporting CR4.NO_MARBLES would require an explicit KVM
> >> change to allow it to be set by the guest, and would also require a userspace
> >> VMM to expose NO_MARBLES to the guest.
> >>
> >> That being said, this series should supporting pinning as much as possible,
> >> i.e. if the bit can be exposed to the guest and doesn't require special
> >> handling in KVM, allow it to be pinned.  E.g. TS is a special case because
> >> pinning would require additional emulator support and IMO isn't interesting
> >> enough to justify the extra complexity.  At a glance, I don't see anything
> >> that would prevent pinning FSGSBASE.
> > 
> > Thanks for filling in the KVM picture.
> > 
> > If we're supporting as much pinning as possible, can we also add
> > something to make it inconvenient for someone to both make a CR4 bit
> > known to KVM *and* ignore the pinning aspects?
> > 
> > We should really make folks think about it.  Something like:
> > 
> > #define KVM_CR4_KNOWN 0xff
> > #define KVM_CR4_PIN_ALLOWED 0xf0
> > #define KVM_CR4_PIN_NOT_ALLOWED 0x0f
> > 
> > BUILD_BUG_ON(KVM_CR4_KNOWN !=
> >              (KVM_CR4_PIN_ALLOWED|KVM_CR4_PIN_NOT_ALLOWED));
> > 
> > So someone *MUST* make an active declaration about new bits being pinned
> > or not?
> 
> I would just make all unknown bits pinnable (or perhaps all CR4 bits in
> general).
> 

Sounds good. I'll make it this way in the next revision. I'll do the same for
CR0 (unless I hear otherwise). I've added the last paragraph here under the
ALLOWED MSRs data section.

data:
        Bits which may be pinned.

        Attempting to pin bits other than these will result in a failure when
        writing to the respective CR pinned MSR.

        Bits which are allowed to be pinned default to WP for CR0 and SMEP,
        SMAP, and UMIP for CR4.

        The host VMM may modify the set of allowed bits. However, only the above
        have been tested to work. Allowing the guest to pin other bits may or
        may not be compatible with KVM.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 15:44               ` Andersen, John
@ 2020-07-09 15:56                 ` Dave Hansen
  2020-07-09 16:07                   ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2020-07-09 15:56 UTC (permalink / raw)
  To: Andersen, John, Paolo Bonzini
  Cc: Sean Christopherson, corbet, tglx, mingo, bp, x86, hpa, shuah,
	liran.alon, drjones, rick.p.edgecombe, kristen, vkuznets,
	wanpengli, jmattson, joro, mchehab+huawei, gregkh, paulmck,
	pawan.kumar.gupta, jgross, mike.kravetz, oneukum, luto, peterz,
	fenghua.yu, reinette.chatre, vineela.tummalapalli, dave.hansen,
	arjan, caoj.fnst, bhe, nivedita, keescook, dan.j.williams,
	eric.auger, aaronlewis, peterx, makarandsonare, linux-doc,
	linux-kernel, kvm, linux-kselftest, kernel-hardening

On 7/9/20 8:44 AM, Andersen, John wrote:
> 
>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
>         SMAP, and UMIP for CR4.

I think it also makes sense to have FSGSBASE in this set.

I know it hasn't been tested, but I think we should do the legwork to
test it.  If not in this set, can we agree that it's a logical next step?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 15:56                 ` Dave Hansen
@ 2020-07-09 16:07                   ` Andy Lutomirski
  2020-07-09 16:22                     ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2020-07-09 16:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andersen, John, Paolo Bonzini, Sean Christopherson,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Andrew Lutomirski, Peter Zijlstra,
	Fenghua Yu, reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Arvind Sankar,
	Kees Cook, Dan Williams, eric.auger, aaronlewis, Peter Xu,
	makarandsonare, open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 7/9/20 8:44 AM, Andersen, John wrote:
> >
> >         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> >         SMAP, and UMIP for CR4.
>
> I think it also makes sense to have FSGSBASE in this set.
>
> I know it hasn't been tested, but I think we should do the legwork to
> test it.  If not in this set, can we agree that it's a logical next step?

I have no objection to pinning FSGSBASE, but is there a clear
description of the threat model that this whole series is meant to
address?  The idea is to provide a degree of protection against an
attacker who is able to convince a guest kernel to write something
inappropriate to CR4, right?  How realistic is this?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 16:07                   ` Andy Lutomirski
@ 2020-07-09 16:22                     ` Dave Hansen
  2020-07-09 16:27                       ` Andy Lutomirski
  2020-07-09 23:37                       ` Kees Cook
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Hansen @ 2020-07-09 16:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andersen, John, Paolo Bonzini, Sean Christopherson,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Arvind Sankar,
	Kees Cook, Dan Williams, eric.auger, aaronlewis, Peter Xu,
	makarandsonare, open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 7/9/20 8:44 AM, Andersen, John wrote:
>>>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
>>>         SMAP, and UMIP for CR4.
>> I think it also makes sense to have FSGSBASE in this set.
>>
>> I know it hasn't been tested, but I think we should do the legwork to
>> test it.  If not in this set, can we agree that it's a logical next step?
> I have no objection to pinning FSGSBASE, but is there a clear
> description of the threat model that this whole series is meant to
> address?  The idea is to provide a degree of protection against an
> attacker who is able to convince a guest kernel to write something
> inappropriate to CR4, right?  How realistic is this?

If a quick search can find this:

> https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html

I'd pretty confident that the guys doing actual bad things have it in
their toolbox too.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 16:22                     ` Dave Hansen
@ 2020-07-09 16:27                       ` Andy Lutomirski
  2020-07-14  5:36                         ` Andersen, John, Arvind Sankar
  2020-07-14  5:39                         ` Andersen, John
  2020-07-09 23:37                       ` Kees Cook
  1 sibling, 2 replies; 30+ messages in thread
From: Andy Lutomirski @ 2020-07-09 16:27 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Andersen, John, Paolo Bonzini,
	Sean Christopherson, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab, Greg KH,
	Paul E. McKenney, Pawan Gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Peter Zijlstra, Fenghua Yu, reinette.chatre,
	vineela.tummalapalli, Dave Hansen, Arjan van de Ven, caoj.fnst,
	Baoquan He, Arvind Sankar, Kees Cook, Dan Williams, eric.auger,
	aaronlewis, Peter Xu, makarandsonare, open list:DOCUMENTATION,
	LKML, kvm list, open list:KERNEL SELFTEST FRAMEWORK,
	Kernel Hardening

On Thu, Jul 9, 2020 at 9:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> > On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >> On 7/9/20 8:44 AM, Andersen, John wrote:
> >>>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> >>>         SMAP, and UMIP for CR4.
> >> I think it also makes sense to have FSGSBASE in this set.
> >>
> >> I know it hasn't been tested, but I think we should do the legwork to
> >> test it.  If not in this set, can we agree that it's a logical next step?
> > I have no objection to pinning FSGSBASE, but is there a clear
> > description of the threat model that this whole series is meant to
> > address?  The idea is to provide a degree of protection against an
> > attacker who is able to convince a guest kernel to write something
> > inappropriate to CR4, right?  How realistic is this?
>
> If a quick search can find this:
>
> > https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
>
> I'd pretty confident that the guys doing actual bad things have it in
> their toolbox too.
>

True, but we have the existing software CR4 pinning.  I suppose the
virtualization version is stronger.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 16:22                     ` Dave Hansen
  2020-07-09 16:27                       ` Andy Lutomirski
@ 2020-07-09 23:37                       ` Kees Cook
  1 sibling, 0 replies; 30+ messages in thread
From: Kees Cook @ 2020-07-09 23:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Andersen, John, Paolo Bonzini,
	Sean Christopherson, Jonathan Corbet, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Shuah Khan,
	Liran Alon, Andrew Jones, Rick Edgecombe,
	Kristen Carlson Accardi, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab, Greg KH,
	Paul E. McKenney, Pawan Gupta, Juergen Gross, Mike Kravetz,
	Oliver Neukum, Peter Zijlstra, Fenghua Yu, reinette.chatre,
	vineela.tummalapalli, Dave Hansen, Arjan van de Ven, caoj.fnst,
	Baoquan He, Arvind Sankar, Dan Williams, eric.auger, aaronlewis,
	Peter Xu, makarandsonare, open list:DOCUMENTATION, LKML,
	kvm list, open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Thu, Jul 09, 2020 at 09:22:09AM -0700, Dave Hansen wrote:
> On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> > On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >> On 7/9/20 8:44 AM, Andersen, John wrote:
> >>>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> >>>         SMAP, and UMIP for CR4.
> >> I think it also makes sense to have FSGSBASE in this set.
> >>
> >> I know it hasn't been tested, but I think we should do the legwork to
> >> test it.  If not in this set, can we agree that it's a logical next step?
> > I have no objection to pinning FSGSBASE, but is there a clear
> > description of the threat model that this whole series is meant to
> > address?  The idea is to provide a degree of protection against an
> > attacker who is able to convince a guest kernel to write something
> > inappropriate to CR4, right?  How realistic is this?
> 
> If a quick search can find this:
> 
> > https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
> 
> I'd pretty confident that the guys doing actual bad things have it in
> their toolbox too.

Right, it's common (see my commit log in 873d50d58f67), and having this
enforced by the hypervisor is WAY better since it'll block gadgets or
ROP.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 16:27                       ` Andy Lutomirski
@ 2020-07-14  5:36                         ` Andersen, John, Arvind Sankar
  2020-07-14  5:39                         ` Andersen, John
  1 sibling, 0 replies; 30+ messages in thread
From: Andersen, John, Arvind Sankar @ 2020-07-14  5:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dave Hansen, Paolo Bonzini, Sean Christopherson, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Kees Cook, Dan Williams,
	eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Thu, Jul 09, 2020 at 09:27:43AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 9, 2020 at 9:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> > > On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
> > >> On 7/9/20 8:44 AM, Andersen, John wrote:
> > >>>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> > >>>         SMAP, and UMIP for CR4.
> > >> I think it also makes sense to have FSGSBASE in this set.
> > >>
> > >> I know it hasn't been tested, but I think we should do the legwork to
> > >> test it.  If not in this set, can we agree that it's a logical next step?
> > > I have no objection to pinning FSGSBASE, but is there a clear
> > > description of the threat model that this whole series is meant to
> > > address?  The idea is to provide a degree of protection against an
> > > attacker who is able to convince a guest kernel to write something
> > > inappropriate to CR4, right?  How realistic is this?
> >
> > If a quick search can find this:
> >
> > > https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
> >
> > I'd pretty confident that the guys doing actual bad things have it in
> > their toolbox too.
> >
> 
> True, but we have the existing software CR4 pinning.  I suppose the
> virtualization version is stronger.
> 

Yes, as Kees said this will be stronger because it stops ROP and other gadget
based techniques which avoid the use of native_write_cr0/4().

With regards to what should be done in this patchset and what in other
patchsets. I have a fix for kexec thanks to Arvind's note about
TRAMPOLINE_32BIT_CODE_SIZE. The physical host boots fine now and the virtual
one can kexec fine.

What remains to be done on that front is to add some identifying information to
the kernel image to declare that it supports paravirtualized control register
pinning or not.

Liran suggested adding a section to the built image acting as a flag to signify
support for being kexec'd by a kernel with pinning enabled. If anyone has any
opinions on how they'd like to see this implemented please let me know.
Otherwise I'll just take a stab at it and you'll all see it hopefully in the
next version.

With regards to FSGSBASE, are we open to validating and adding that to the
DEFAULT set as a part of a separate patchset? This patchset is focused on
replicating the functionality we already have natively.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-09 16:27                       ` Andy Lutomirski
  2020-07-14  5:36                         ` Andersen, John, Arvind Sankar
@ 2020-07-14  5:39                         ` Andersen, John
  2020-07-15  4:41                           ` Sean Christopherson
  1 sibling, 1 reply; 30+ messages in thread
From: Andersen, John @ 2020-07-14  5:39 UTC (permalink / raw)
  To: Andy Lutomirski, Arvind Sankar
  Cc: Dave Hansen, Paolo Bonzini, Sean Christopherson, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML,
	H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Kees Cook, Dan Williams,
	eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Thu, Jul 09, 2020 at 09:27:43AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 9, 2020 at 9:22 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 7/9/20 9:07 AM, Andy Lutomirski wrote:
> > > On Thu, Jul 9, 2020 at 8:56 AM Dave Hansen <dave.hansen@intel.com> wrote:
> > >> On 7/9/20 8:44 AM, Andersen, John wrote:
> > >>>         Bits which are allowed to be pinned default to WP for CR0 and SMEP,
> > >>>         SMAP, and UMIP for CR4.
> > >> I think it also makes sense to have FSGSBASE in this set.
> > >>
> > >> I know it hasn't been tested, but I think we should do the legwork to
> > >> test it.  If not in this set, can we agree that it's a logical next step?
> > > I have no objection to pinning FSGSBASE, but is there a clear
> > > description of the threat model that this whole series is meant to
> > > address?  The idea is to provide a degree of protection against an
> > > attacker who is able to convince a guest kernel to write something
> > > inappropriate to CR4, right?  How realistic is this?
> >
> > If a quick search can find this:
> >
> > > https://googleprojectzero.blogspot.com/2017/05/exploiting-linux-kernel-via-packet.html
> >
> > I'd pretty confident that the guys doing actual bad things have it in
> > their toolbox too.
> >
> 
> True, but we have the existing software CR4 pinning.  I suppose the
> virtualization version is stronger.
> 

Yes, as Kees said this will be stronger because it stops ROP and other gadget
based techniques which avoid the use of native_write_cr0/4().

With regards to what should be done in this patchset and what in other
patchsets. I have a fix for kexec thanks to Arvind's note about
TRAMPOLINE_32BIT_CODE_SIZE. The physical host boots fine now and the virtual
one can kexec fine.

What remains to be done on that front is to add some identifying information to
the kernel image to declare that it supports paravirtualized control register
pinning or not.

Liran suggested adding a section to the built image acting as a flag to signify
support for being kexec'd by a kernel with pinning enabled. If anyone has any
opinions on how they'd like to see this implemented please let me know.
Otherwise I'll just take a stab at it and you'll all see it hopefully in the
next version.

With regards to FSGSBASE, are we open to validating and adding that to the
DEFAULT set as a part of a separate patchset? This patchset is focused on
replicating the functionality we already have natively.


(If anyone got this email twice, sorry I messed up the From: field the first
time around)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-14  5:39                         ` Andersen, John
@ 2020-07-15  4:41                           ` Sean Christopherson
  2020-07-15 19:58                             ` Andersen, John
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2020-07-15  4:41 UTC (permalink / raw)
  To: Andersen, John
  Cc: Andy Lutomirski, Arvind Sankar, Dave Hansen, Paolo Bonzini,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Kees Cook, Dan Williams,
	eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Tue, Jul 14, 2020 at 05:39:30AM +0000, Andersen, John wrote:
> With regards to FSGSBASE, are we open to validating and adding that to the
> DEFAULT set as a part of a separate patchset? This patchset is focused on
> replicating the functionality we already have natively.

Kees added FSGSBASE pinning in commit a13b9d0b97211 ("x86/cpu: Use pinning
mask for CR4 bits needing to be 0"), so I believe it's a done deal already.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning
  2020-07-15  4:41                           ` Sean Christopherson
@ 2020-07-15 19:58                             ` Andersen, John
  0 siblings, 0 replies; 30+ messages in thread
From: Andersen, John @ 2020-07-15 19:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Andy Lutomirski, Arvind Sankar, Dave Hansen, Paolo Bonzini,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	X86 ML, H. Peter Anvin, Shuah Khan, Liran Alon, Andrew Jones,
	Rick Edgecombe, Kristen Carlson Accardi, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Mauro Carvalho Chehab,
	Greg KH, Paul E. McKenney, Pawan Gupta, Juergen Gross,
	Mike Kravetz, Oliver Neukum, Peter Zijlstra, Fenghua Yu,
	reinette.chatre, vineela.tummalapalli, Dave Hansen,
	Arjan van de Ven, caoj.fnst, Baoquan He, Kees Cook, Dan Williams,
	eric.auger, aaronlewis, Peter Xu, makarandsonare,
	open list:DOCUMENTATION, LKML, kvm list,
	open list:KERNEL SELFTEST FRAMEWORK, Kernel Hardening

On Tue, Jul 14, 2020 at 09:41:29PM -0700, Sean Christopherson wrote:
> On Tue, Jul 14, 2020 at 05:39:30AM +0000, Andersen, John wrote:
> > With regards to FSGSBASE, are we open to validating and adding that to the
> > DEFAULT set as a part of a separate patchset? This patchset is focused on
> > replicating the functionality we already have natively.
> 
> Kees added FSGSBASE pinning in commit a13b9d0b97211 ("x86/cpu: Use pinning
> mask for CR4 bits needing to be 0"), so I believe it's a done deal already.

Ah my bad. Thanks, I'll look into it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, back to index

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-17 19:07 [PATCH 0/4] Paravirtualized Control Register pinning John Andersen
2020-06-17 19:07 ` [PATCH 1/4] X86: Update mmu_cr4_features during feature identification John Andersen
2020-06-18 14:09   ` Dave Hansen
2020-06-17 19:07 ` [PATCH 2/4] KVM: x86: Introduce paravirt feature CR0/CR4 pinning John Andersen
2020-06-18 14:18   ` Dave Hansen
2020-06-18 14:43     ` Andersen, John
2020-06-18 14:51       ` Dave Hansen
2020-07-07 21:12         ` Sean Christopherson
2020-07-07 21:48           ` Dave Hansen
2020-07-07 21:51             ` Paolo Bonzini
2020-07-09 15:44               ` Andersen, John
2020-07-09 15:56                 ` Dave Hansen
2020-07-09 16:07                   ` Andy Lutomirski
2020-07-09 16:22                     ` Dave Hansen
2020-07-09 16:27                       ` Andy Lutomirski
2020-07-14  5:36                         ` Andersen, John, Arvind Sankar
2020-07-14  5:39                         ` Andersen, John
2020-07-15  4:41                           ` Sean Christopherson
2020-07-15 19:58                             ` Andersen, John
2020-07-09 23:37                       ` Kees Cook
2020-06-17 19:07 ` [PATCH 3/4] selftests: kvm: add test for CR pinning with SMM John Andersen
2020-06-17 19:07 ` [PATCH 4/4] X86: Use KVM CR pin MSRs John Andersen
2020-06-18 14:41   ` Dave Hansen
2020-06-18 15:26     ` Andersen, John
2020-06-18 15:38       ` Dave Hansen
2020-06-18 15:49         ` Andersen, John
2020-06-20  5:13   ` Andy Lutomirski
2020-06-23 20:03     ` Andersen, John
2020-07-03 21:48       ` Andersen, John
2020-07-04 15:11         ` Arvind Sankar

Kernel-hardening archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git