linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/62] Linux as SEV-ES Guest Support
@ 2020-02-11 13:51 Joerg Roedel
  2020-02-11 13:51 ` [PATCH 01/62] KVM: SVM: Add GHCB definitions Joerg Roedel
                   ` (63 more replies)
  0 siblings, 64 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

Hi,

here is the first public post of the patch-set to enable Linux to run
under SEV-ES enabled hypervisors. The code is mostly feature-complete,
but there are still a couple of bugs to fix. Nevertheless, given the
size of the patch-set, I think it is about time to ask for initial
feedback of the changes that come with it. To better understand the code
here is a quick explanation of SEV-ES first.

This patch-set does not contain the hypervisor changes necessary to run
SEV-ES enabled KVM guests. These patches will be sent separatly when
they are ready to be sent out.

What is SEV-ES
==============

SEV-ES is an acronym for 'Secure Encrypted Virtualization - Encrypted
State' and means a hardware feature of AMD processors which hides the
register state of VCPUs to the hypervisor by encrypting it. The
hypervisor can't read or make changes to the guests register state.

Most intercepts set by the hypervisor do not cause a #VMEXIT of the
guest anymore, but turn into a VMM Communication Exception (#VC
exception, vector 29) inside the guest. The error-code of this exception
is the intercept exit-code that caused the exception. The guest handles
the #VC exception by communicating with the hypervisor through a shared
data structure, the 'Guest-Hypervisor-Communication-Block' (GHCB). The
layout of that data-structure and the protocol is specified in [1].

A description of the SEV-ES hardware interface can be found in the AMD64
Architecture Programmer's Manual Volume 2, Section 15.35 [2].

Implementation Details
======================

SEV-ES guests will always boot via UEFI firmware and use the 64-bit EFI
entry point into the kernel. This implies that only 64-bit Linux x86
guests are supported.

Pre-Decompression Boot Code and Early Exception Support
-------------------------------------------------------

Intercepts that cause exceptions in the guest include instructions like
CPUID, RDMSR/WRMSR, IOIO instructions and a couple more. Some of them
are executed very early during boot, which means that exceptions need to
work that early. That is the reason big parts of this patch-set enable
support for early exceptions, first in the pre-decompression boot-code
and later also in the early boot-code of the kernel image.

As these patches add exception support to the pre-decompression boot
code, it also implements a page-fault handler to create the
identity-mapped page-table on-demand. One reason for this change is to
make use of the exception handling code in non SEV-ES guests too, so
that it is less likely to break in the future. The other reason is that
for SEV-ES guests the code needs to setup its own page-table to map the
GHCB unencrypted. Without these patches the pre-decompression code only
uses its own page-table when KASLR is enabled and used.

SIPI and INIT Handling
----------------------

The hypervisor also can't make changes to the guest register state,
which implies that it can't emulate SIPI and INIT messages. This means
that any CPU register state reset needs to be done inside the guest.
Most of this is handled in the firmware, but the Linux kernel has to
setup an AP Jump Table to boot secondary processors. CPU online/offline
handling also needs special handling, where this patch-set implements a
shortcut. An offlined CPU will not go back to real-mode when it is woken
up again, but stays in long-mode an just jumps back to the trampoline
code.

NMI Special Handling
--------------------

The last thing that needs special handling with SEV-ES are NMIs.
Hypervisors usually start to intercept IRET instructions when an NMI got
injected to find out when the NMI window is re-opened. But handling IRET
intercepts requires the hypervisor to access guest register state and is
not possible with SEV-ES. The specification under [1] solves this
problem with an NMI_COMPLETE message sent my the guest to the
hypervisor, upon which the hypervisor re-opens the NMI window for the
guest.

This patch-set sends the NMI_COMPLETE message before the actual IRET,
while the kernel is still on a valid stack and kernel cr3. This opens
the NMI-window a few instructions early, but this is fine as under
x86-64 Linux NMI-nesting is safe. The alternative would be to
single-step over the IRET, but that requires more intrusive changes to
the entry code because it does not handle entries from kernel-mode while
on the entry stack.

Besides the special handling above the patch-set contains the handlers
for the #VC exception and all the exit-codes specified in [1].

Current State of the Patches
============================

The patch-set posted here can boot an SMP Linux guest under
SEV-ES-enabled KVM and the guest survives some load-testing
(kernel-compiles).  The guest boots to the graphical desktop and is
usable. But there are still know bugs and issues:

	* Putting some NMI-load on the guest will make it crash usually
	  within a minute
	* The handler for MMIO events needs more security checks when
	  walking the page-table
	* The MMIO handler also lacks emulation for MOVS and REP MOVS
	  instructions like used by memcpy_toio() and memcpy_fromio().

More testing will likely uncover more bugs, but I think the patch-set is
ready for initial feedback. It grew pretty big already and handling it
becomes more and more painful.

So please review the parts of the patch-set that you find interesting
and let me know your feedback.

Thanks a lot,

       Joerg

[1] https://developer.amd.com/wp-content/resources/56421.pdf
[2] https://www.amd.com/system/files/TechDocs/24593.pdf

Doug Covelli (1):
  x86/vmware: Add VMware specific handling for VMMCALL under SEV-ES

Joerg Roedel (43):
  KVM: SVM: Add GHCB Accessor functions
  x86/traps: Move some definitions to <asm/trap_defs.h>
  x86/insn-decoder: Make inat-tables.c suitable for pre-decompression
    code
  x86/boot/compressed: Fix debug_puthex() parameter type
  x86/boot/compressed/64: Disable red-zone usage
  x86/boot/compressed/64: Add IDT Infrastructure
  x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c
  x86/boot/compressed/64: Add page-fault handler
  x86/boot/compressed/64: Always switch to own page-table
  x86/boot/compressed/64: Don't pre-map memory in KASLR code
  x86/boot/compressed/64: Change add_identity_map() to take start and
    end
  x86/boot/compressed/64: Add stage1 #VC handler
  x86/boot/compressed/64: Call set_sev_encryption_mask earlier
  x86/boot/compressed/64: Check return value of
    kernel_ident_mapping_init()
  x86/boot/compressed/64: Add function to map a page unencrypted
  x86/boot/compressed/64: Setup GHCB Based VC Exception handler
  x86/fpu: Move xgetbv()/xsetbv() into separate header
  x86/idt: Move IDT to data segment
  x86/idt: Split idt_data setup out of set_intr_gate()
  x86/head/64: Install boot GDT
  x86/head/64: Reload GDT after switch to virtual addresses
  x86/head/64: Load segment registers earlier
  x86/head/64: Switch to initial stack earlier
  x86/head/64: Load IDT earlier
  x86/head/64: Move early exception dispatch to C code
  x86/sev-es: Add SEV-ES Feature Detection
  x86/sev-es: Compile early handler code into kernel image
  x86/sev-es: Setup early #VC handler
  x86/sev-es: Setup GHCB based boot #VC handler
  x86/sev-es: Wire up existing #VC exit-code handlers
  x86/sev-es: Handle instruction fetches from user-space
  x86/sev-es: Harden runtime #VC handler for exceptions from user-space
  x86/sev-es: Filter exceptions not supported from user-space
  x86/sev-es: Handle RDTSCP Events
  x86/sev-es: Handle #AC Events
  x86/sev-es: Handle #DB Events
  x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES
  x86/realmode: Add SEV-ES specific trampoline entry point
  x86/head/64: Don't call verify_cpu() on starting APs
  x86/head/64: Rename start_cpu0
  x86/sev-es: Support CPU offline/online
  x86/cpufeature: Add SEV_ES_GUEST CPU Feature
  x86/sev-es: Add NMI state tracking

Tom Lendacky (18):
  KVM: SVM: Add GHCB definitions
  x86/cpufeatures: Add SEV-ES CPU feature
  x86/sev-es: Add support for handling IOIO exceptions
  x86/sev-es: Add CPUID handling to #VC handler
  x86/sev-es: Add handler for MMIO events
  x86/sev-es: Setup per-cpu GHCBs for the runtime handler
  x86/sev-es: Add Runtime #VC Exception Handler
  x86/sev-es: Handle MSR events
  x86/sev-es: Handle DR7 read/write events
  x86/sev-es: Handle WBINVD Events
  x86/sev-es: Handle RDTSC Events
  x86/sev-es: Handle RDPMC Events
  x86/sev-es: Handle INVD Events
  x86/sev-es: Handle MONITOR/MONITORX Events
  x86/sev-es: Handle MWAIT/MWAITX Events
  x86/sev-es: Handle VMMCALL Events
  x86/kvm: Add KVM specific VMMCALL handling under SEV-ES
  x86/realmode: Setup AP jump table

 arch/x86/Kconfig                           |   1 +
 arch/x86/boot/Makefile                     |   2 +-
 arch/x86/boot/compressed/Makefile          |   8 +-
 arch/x86/boot/compressed/head_64.S         |  41 ++
 arch/x86/boot/compressed/ident_map_64.c    | 320 +++++++++
 arch/x86/boot/compressed/idt_64.c          |  53 ++
 arch/x86/boot/compressed/idt_handlers_64.S |  78 +++
 arch/x86/boot/compressed/kaslr.c           |  36 +-
 arch/x86/boot/compressed/kaslr_64.c        | 156 -----
 arch/x86/boot/compressed/misc.h            |  34 +-
 arch/x86/boot/compressed/sev-es.c          | 148 ++++
 arch/x86/entry/entry_64.S                  |  52 ++
 arch/x86/include/asm/cpu.h                 |   2 +-
 arch/x86/include/asm/cpufeatures.h         |   2 +
 arch/x86/include/asm/desc.h                |   2 +
 arch/x86/include/asm/desc_defs.h           |   3 +
 arch/x86/include/asm/fpu/internal.h        |  29 +-
 arch/x86/include/asm/fpu/xcr.h             |  32 +
 arch/x86/include/asm/mem_encrypt.h         |   5 +
 arch/x86/include/asm/msr-index.h           |   3 +
 arch/x86/include/asm/processor.h           |   1 +
 arch/x86/include/asm/realmode.h            |   4 +
 arch/x86/include/asm/segment.h             |   2 +-
 arch/x86/include/asm/sev-es.h              | 119 ++++
 arch/x86/include/asm/svm.h                 | 103 +++
 arch/x86/include/asm/trap_defs.h           |  50 ++
 arch/x86/include/asm/traps.h               |  51 +-
 arch/x86/include/asm/x86_init.h            |  16 +-
 arch/x86/include/uapi/asm/svm.h            |  11 +
 arch/x86/kernel/Makefile                   |   1 +
 arch/x86/kernel/cpu/amd.c                  |  10 +-
 arch/x86/kernel/cpu/scattered.c            |   1 +
 arch/x86/kernel/cpu/vmware.c               |  48 +-
 arch/x86/kernel/head64.c                   |  49 ++
 arch/x86/kernel/head_32.S                  |   4 +-
 arch/x86/kernel/head_64.S                  | 162 +++--
 arch/x86/kernel/idt.c                      |  60 +-
 arch/x86/kernel/kvm.c                      |  35 +-
 arch/x86/kernel/nmi.c                      |   8 +
 arch/x86/kernel/sev-es-shared.c            | 721 ++++++++++++++++++++
 arch/x86/kernel/sev-es.c                   | 748 +++++++++++++++++++++
 arch/x86/kernel/smpboot.c                  |   4 +-
 arch/x86/kernel/traps.c                    |   3 +
 arch/x86/mm/extable.c                      |   1 +
 arch/x86/mm/mem_encrypt.c                  |  11 +-
 arch/x86/mm/mem_encrypt_identity.c         |   3 +
 arch/x86/realmode/init.c                   |  12 +
 arch/x86/realmode/rm/header.S              |   3 +
 arch/x86/realmode/rm/trampoline_64.S       |  20 +
 arch/x86/tools/gen-insn-attr-x86.awk       |  50 +-
 tools/arch/x86/tools/gen-insn-attr-x86.awk |  50 +-
 51 files changed, 3016 insertions(+), 352 deletions(-)
 create mode 100644 arch/x86/boot/compressed/ident_map_64.c
 create mode 100644 arch/x86/boot/compressed/idt_64.c
 create mode 100644 arch/x86/boot/compressed/idt_handlers_64.S
 delete mode 100644 arch/x86/boot/compressed/kaslr_64.c
 create mode 100644 arch/x86/boot/compressed/sev-es.c
 create mode 100644 arch/x86/include/asm/fpu/xcr.h
 create mode 100644 arch/x86/include/asm/sev-es.h
 create mode 100644 arch/x86/include/asm/trap_defs.h
 create mode 100644 arch/x86/kernel/sev-es-shared.c
 create mode 100644 arch/x86/kernel/sev-es.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH 01/62] KVM: SVM: Add GHCB definitions
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
@ 2020-02-11 13:51 ` Joerg Roedel
  2020-02-11 13:51 ` [PATCH 02/62] KVM: SVM: Add GHCB Accessor functions Joerg Roedel
                   ` (62 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Extend the vmcb_safe_area with SEV-ES fields and add a new
'struct ghcb' which will be used for guest-hypervisor communication.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/svm.h | 42 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 6ece8561ba66..f36288c659b5 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -201,6 +201,48 @@ struct __attribute__ ((__packed__)) vmcb_save_area {
 	u64 br_to;
 	u64 last_excp_from;
 	u64 last_excp_to;
+
+	/*
+	 * The following part of the save area is valid only for
+	 * SEV-ES guests when referenced through the GHCB.
+	 */
+	u8 reserved_7[104];
+	u64 reserved_8;		/* rax already available at 0x01f8 */
+	u64 rcx;
+	u64 rdx;
+	u64 rbx;
+	u64 reserved_9;		/* rsp already available at 0x01d8 */
+	u64 rbp;
+	u64 rsi;
+	u64 rdi;
+	u64 r8;
+	u64 r9;
+	u64 r10;
+	u64 r11;
+	u64 r12;
+	u64 r13;
+	u64 r14;
+	u64 r15;
+	u8 reserved_10[16];
+	u64 sw_exit_code;
+	u64 sw_exit_info_1;
+	u64 sw_exit_info_2;
+	u64 sw_scratch;
+	u8 reserved_11[56];
+	u64 xcr0;
+	u8 valid_bitmap[16];
+	u64 x87_state_gpa;
+	u8 reserved_12[1016];
+};
+
+struct __attribute__ ((__packed__)) ghcb {
+	struct vmcb_save_area save;
+
+	u8 shared_buffer[2032];
+
+	u8 reserved_1[10];
+	u16 protocol_version;	/* negotiated SEV-ES/GHCB protocol version */
+	u32 ghcb_usage;
 };
 
 struct __attribute__ ((__packed__)) vmcb {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 02/62] KVM: SVM: Add GHCB Accessor functions
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
  2020-02-11 13:51 ` [PATCH 01/62] KVM: SVM: Add GHCB definitions Joerg Roedel
@ 2020-02-11 13:51 ` Joerg Roedel
  2020-02-11 13:51 ` [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature Joerg Roedel
                   ` (61 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Building a correct GHCB for the hypervisor requires setting valid bits
in the GHCB. Simplify that process by providing accessor functions to
set values and to update the valid bitmap.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/svm.h | 61 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index f36288c659b5..e4e9f6bacfaa 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -333,4 +333,65 @@ struct __attribute__ ((__packed__)) vmcb {
 
 #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
 
+/* GHCB Accessor functions */
+
+#define DEFINE_GHCB_INDICES(field)					\
+	u16 idx = offsetof(struct vmcb_save_area, field) / 8;		\
+	u16 byte_idx  = idx / 8;					\
+	u16 bit_idx   = idx % 8;					\
+	BUILD_BUG_ON(byte_idx > ARRAY_SIZE(ghcb->save.valid_bitmap));
+
+#define GHCB_SET_VALID(ghcb, field)					\
+	{								\
+		DEFINE_GHCB_INDICES(field)				\
+		(ghcb)->save.valid_bitmap[byte_idx] |= BIT(bit_idx);	\
+	}
+
+#define DEFINE_GHCB_SETTER(field)					\
+	static inline void						\
+	ghcb_set_##field(struct ghcb *ghcb, u64 value)			\
+	{								\
+		GHCB_SET_VALID(ghcb, field)				\
+		(ghcb)->save.field = value;				\
+	}
+
+#define DEFINE_GHCB_ACCESSORS(field)					\
+	static inline bool ghcb_is_valid_##field(const struct ghcb *ghcb)	\
+	{								\
+		DEFINE_GHCB_INDICES(field)				\
+		return !!((ghcb)->save.valid_bitmap[byte_idx]		\
+						& BIT(bit_idx));	\
+	}								\
+									\
+	static inline void						\
+	ghcb_set_##field(struct ghcb *ghcb, u64 value)			\
+	{								\
+		GHCB_SET_VALID(ghcb, field)				\
+		(ghcb)->save.field = value;				\
+	}
+
+DEFINE_GHCB_ACCESSORS(cpl)
+DEFINE_GHCB_ACCESSORS(rip)
+DEFINE_GHCB_ACCESSORS(rsp)
+DEFINE_GHCB_ACCESSORS(rax)
+DEFINE_GHCB_ACCESSORS(rcx)
+DEFINE_GHCB_ACCESSORS(rdx)
+DEFINE_GHCB_ACCESSORS(rbx)
+DEFINE_GHCB_ACCESSORS(rbp)
+DEFINE_GHCB_ACCESSORS(rsi)
+DEFINE_GHCB_ACCESSORS(rdi)
+DEFINE_GHCB_ACCESSORS(r8)
+DEFINE_GHCB_ACCESSORS(r9)
+DEFINE_GHCB_ACCESSORS(r10)
+DEFINE_GHCB_ACCESSORS(r11)
+DEFINE_GHCB_ACCESSORS(r12)
+DEFINE_GHCB_ACCESSORS(r13)
+DEFINE_GHCB_ACCESSORS(r14)
+DEFINE_GHCB_ACCESSORS(r15)
+DEFINE_GHCB_ACCESSORS(sw_exit_code)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_1)
+DEFINE_GHCB_ACCESSORS(sw_exit_info_2)
+DEFINE_GHCB_ACCESSORS(sw_scratch)
+DEFINE_GHCB_ACCESSORS(xcr0)
+
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
  2020-02-11 13:51 ` [PATCH 01/62] KVM: SVM: Add GHCB definitions Joerg Roedel
  2020-02-11 13:51 ` [PATCH 02/62] KVM: SVM: Add GHCB Accessor functions Joerg Roedel
@ 2020-02-11 13:51 ` Joerg Roedel
  2020-02-13  6:51   ` Borislav Petkov
  2020-02-11 13:51 ` [PATCH 04/62] x86/traps: Move some definitions to <asm/trap_defs.h> Joerg Roedel
                   ` (60 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add CPU feature detection for Secure Encrypted Virtualization with
Encrypted State. This feature enhances SEV by also encrypting the
guest register state, making it in-accessible to the hypervisor.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/amd.c          | 4 +++-
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..26e4ee209f7b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SEV_ES		(11*32+ 6) /* AMD Secure Encrypted Virtualization - Encrypted State */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index ac83a0fef628..aad2223862ef 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -580,7 +580,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 	 *	      If BIOS has not enabled SME then don't advertise the
 	 *	      SME feature (set in scattered.c).
 	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-	 *            SEV feature (set in scattered.c).
+	 *            SEV and SEV_ES feature (set in scattered.c).
 	 *
 	 *   In all cases, since support for SME and SEV requires long mode,
 	 *   don't advertise the feature under CONFIG_X86_32.
@@ -611,6 +611,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 		setup_clear_cpu_cap(X86_FEATURE_SME);
 clear_sev:
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
+		setup_clear_cpu_cap(X86_FEATURE_SEV);
+		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 62b137c3c97a..30f354989cf1 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -41,6 +41,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_SME,		CPUID_EAX,  0, 0x8000001f, 0 },
 	{ X86_FEATURE_SEV,		CPUID_EAX,  1, 0x8000001f, 0 },
+	{ X86_FEATURE_SEV_ES,		CPUID_EAX,  3, 0x8000001f, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 04/62] x86/traps: Move some definitions to <asm/trap_defs.h>
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (2 preceding siblings ...)
  2020-02-11 13:51 ` [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature Joerg Roedel
@ 2020-02-11 13:51 ` Joerg Roedel
  2020-02-11 13:51 ` [PATCH 05/62] x86/insn-decoder: Make inat-tables.c suitable for pre-decompression code Joerg Roedel
                   ` (59 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Move the definition of x86 trap vector numbers and the page-fault
error code bits to the new header file asm/trap_defs.h. This makes it
easier to include them into pre-decompression boot code. No functional
changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/trap_defs.h | 49 ++++++++++++++++++++++++++++++++
 arch/x86/include/asm/traps.h     | 44 +---------------------------
 2 files changed, 50 insertions(+), 43 deletions(-)
 create mode 100644 arch/x86/include/asm/trap_defs.h

diff --git a/arch/x86/include/asm/trap_defs.h b/arch/x86/include/asm/trap_defs.h
new file mode 100644
index 000000000000..488f82ac36da
--- /dev/null
+++ b/arch/x86/include/asm/trap_defs.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_TRAP_DEFS_H
+#define _ASM_X86_TRAP_DEFS_H
+
+/* Interrupts/Exceptions */
+enum {
+	X86_TRAP_DE = 0,	/*  0, Divide-by-zero */
+	X86_TRAP_DB,		/*  1, Debug */
+	X86_TRAP_NMI,		/*  2, Non-maskable Interrupt */
+	X86_TRAP_BP,		/*  3, Breakpoint */
+	X86_TRAP_OF,		/*  4, Overflow */
+	X86_TRAP_BR,		/*  5, Bound Range Exceeded */
+	X86_TRAP_UD,		/*  6, Invalid Opcode */
+	X86_TRAP_NM,		/*  7, Device Not Available */
+	X86_TRAP_DF,		/*  8, Double Fault */
+	X86_TRAP_OLD_MF,	/*  9, Coprocessor Segment Overrun */
+	X86_TRAP_TS,		/* 10, Invalid TSS */
+	X86_TRAP_NP,		/* 11, Segment Not Present */
+	X86_TRAP_SS,		/* 12, Stack Segment Fault */
+	X86_TRAP_GP,		/* 13, General Protection Fault */
+	X86_TRAP_PF,		/* 14, Page Fault */
+	X86_TRAP_SPURIOUS,	/* 15, Spurious Interrupt */
+	X86_TRAP_MF,		/* 16, x87 Floating-Point Exception */
+	X86_TRAP_AC,		/* 17, Alignment Check */
+	X86_TRAP_MC,		/* 18, Machine Check */
+	X86_TRAP_XF,		/* 19, SIMD Floating-Point Exception */
+	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
+};
+
+/*
+ * Page fault error code bits:
+ *
+ *   bit 0 ==	 0: no page found	1: protection fault
+ *   bit 1 ==	 0: read access		1: write access
+ *   bit 2 ==	 0: kernel-mode access	1: user-mode access
+ *   bit 3 ==				1: use of reserved bit detected
+ *   bit 4 ==				1: fault was an instruction fetch
+ *   bit 5 ==				1: protection keys block access
+ */
+enum x86_pf_error_code {
+	X86_PF_PROT	=		1 << 0,
+	X86_PF_WRITE	=		1 << 1,
+	X86_PF_USER	=		1 << 2,
+	X86_PF_RSVD	=		1 << 3,
+	X86_PF_INSTR	=		1 << 4,
+	X86_PF_PK	=		1 << 5,
+};
+
+#endif /* _ASM_X86_TRAP_DEFS_H */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index ffa0dc8a535e..2aa786484bb1 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -5,6 +5,7 @@
 #include <linux/context_tracking_state.h>
 #include <linux/kprobes.h>
 
+#include <asm/trap_defs.h>
 #include <asm/debugreg.h>
 #include <asm/siginfo.h>			/* TRAP_TRACE, ... */
 
@@ -132,47 +133,4 @@ void __noreturn handle_stack_overflow(const char *message,
 				      unsigned long fault_address);
 #endif
 
-/* Interrupts/Exceptions */
-enum {
-	X86_TRAP_DE = 0,	/*  0, Divide-by-zero */
-	X86_TRAP_DB,		/*  1, Debug */
-	X86_TRAP_NMI,		/*  2, Non-maskable Interrupt */
-	X86_TRAP_BP,		/*  3, Breakpoint */
-	X86_TRAP_OF,		/*  4, Overflow */
-	X86_TRAP_BR,		/*  5, Bound Range Exceeded */
-	X86_TRAP_UD,		/*  6, Invalid Opcode */
-	X86_TRAP_NM,		/*  7, Device Not Available */
-	X86_TRAP_DF,		/*  8, Double Fault */
-	X86_TRAP_OLD_MF,	/*  9, Coprocessor Segment Overrun */
-	X86_TRAP_TS,		/* 10, Invalid TSS */
-	X86_TRAP_NP,		/* 11, Segment Not Present */
-	X86_TRAP_SS,		/* 12, Stack Segment Fault */
-	X86_TRAP_GP,		/* 13, General Protection Fault */
-	X86_TRAP_PF,		/* 14, Page Fault */
-	X86_TRAP_SPURIOUS,	/* 15, Spurious Interrupt */
-	X86_TRAP_MF,		/* 16, x87 Floating-Point Exception */
-	X86_TRAP_AC,		/* 17, Alignment Check */
-	X86_TRAP_MC,		/* 18, Machine Check */
-	X86_TRAP_XF,		/* 19, SIMD Floating-Point Exception */
-	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
-};
-
-/*
- * Page fault error code bits:
- *
- *   bit 0 ==	 0: no page found	1: protection fault
- *   bit 1 ==	 0: read access		1: write access
- *   bit 2 ==	 0: kernel-mode access	1: user-mode access
- *   bit 3 ==				1: use of reserved bit detected
- *   bit 4 ==				1: fault was an instruction fetch
- *   bit 5 ==				1: protection keys block access
- */
-enum x86_pf_error_code {
-	X86_PF_PROT	=		1 << 0,
-	X86_PF_WRITE	=		1 << 1,
-	X86_PF_USER	=		1 << 2,
-	X86_PF_RSVD	=		1 << 3,
-	X86_PF_INSTR	=		1 << 4,
-	X86_PF_PK	=		1 << 5,
-};
 #endif /* _ASM_X86_TRAPS_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 05/62] x86/insn-decoder: Make inat-tables.c suitable for pre-decompression code
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (3 preceding siblings ...)
  2020-02-11 13:51 ` [PATCH 04/62] x86/traps: Move some definitions to <asm/trap_defs.h> Joerg Roedel
@ 2020-02-11 13:51 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 06/62] x86/boot/compressed: Fix debug_puthex() parameter type Joerg Roedel
                   ` (58 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:51 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The inat-tables.c file has some arrays in it that contain pointers to
other arrays. These pointers need to be relocated when the kernel
image is moved to a different location.

The pre-decompression boot-code has no support for applying ELF
relocations, so initialize these arrays at runtime in the
pre-decompression code to make sure all pointers are correctly
initialized.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/tools/gen-insn-attr-x86.awk       | 50 +++++++++++++++++++++-
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 50 +++++++++++++++++++++-
 2 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..af38469afd14 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -362,6 +362,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 END {
 	if (awkchecked != "")
 		exit 1
+
+	print "#ifndef __BOOT_COMPRESSED\n"
+
 	# print escape opcode map's array
 	print "/* Escape opcode map array */"
 	print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
@@ -388,6 +391,51 @@ END {
 		for (j = 0; j < max_lprefix; j++)
 			if (atable[i,j])
 				print "	["i"]["j"] = "atable[i,j]","
-	print "};"
+	print "};\n"
+
+	print "#else /* !__BOOT_COMPRESSED */\n"
+
+	print "/* Escape opcode map array */"
+	print "static const insn_attr_t *inat_escape_tables[INAT_ESC_MAX + 1]" \
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "/* Group opcode map array */"
+	print "static const insn_attr_t *inat_group_tables[INAT_GRP_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "/* AVX opcode map array */"
+	print "static const insn_attr_t *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "static void inat_init_tables(void)"
+	print "{"
+
+	# print escape opcode map's array
+	print "\t/* Print Escape opcode map array */"
+	for (i = 0; i < geid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (etable[i,j])
+				print "\tinat_escape_tables["i"]["j"] = "etable[i,j]";"
+	print ""
+
+	# print group opcode map's array
+	print "\t/* Print Group opcode map array */"
+	for (i = 0; i < ggid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (gtable[i,j])
+				print "\tinat_group_tables["i"]["j"] = "gtable[i,j]";"
+	print ""
+	# print AVX opcode map's array
+	print "\t/* Print AVX opcode map array */"
+	for (i = 0; i < gaid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (atable[i,j])
+				print "\tinat_avx_tables["i"]["j"] = "atable[i,j]";"
+
+	print "}"
+	print "#endif"
 }
 
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..af38469afd14 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -362,6 +362,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 END {
 	if (awkchecked != "")
 		exit 1
+
+	print "#ifndef __BOOT_COMPRESSED\n"
+
 	# print escape opcode map's array
 	print "/* Escape opcode map array */"
 	print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
@@ -388,6 +391,51 @@ END {
 		for (j = 0; j < max_lprefix; j++)
 			if (atable[i,j])
 				print "	["i"]["j"] = "atable[i,j]","
-	print "};"
+	print "};\n"
+
+	print "#else /* !__BOOT_COMPRESSED */\n"
+
+	print "/* Escape opcode map array */"
+	print "static const insn_attr_t *inat_escape_tables[INAT_ESC_MAX + 1]" \
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "/* Group opcode map array */"
+	print "static const insn_attr_t *inat_group_tables[INAT_GRP_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "/* AVX opcode map array */"
+	print "static const insn_attr_t *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1];"
+	print ""
+
+	print "static void inat_init_tables(void)"
+	print "{"
+
+	# print escape opcode map's array
+	print "\t/* Print Escape opcode map array */"
+	for (i = 0; i < geid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (etable[i,j])
+				print "\tinat_escape_tables["i"]["j"] = "etable[i,j]";"
+	print ""
+
+	# print group opcode map's array
+	print "\t/* Print Group opcode map array */"
+	for (i = 0; i < ggid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (gtable[i,j])
+				print "\tinat_group_tables["i"]["j"] = "gtable[i,j]";"
+	print ""
+	# print AVX opcode map's array
+	print "\t/* Print AVX opcode map array */"
+	for (i = 0; i < gaid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (atable[i,j])
+				print "\tinat_avx_tables["i"]["j"] = "atable[i,j]";"
+
+	print "}"
+	print "#endif"
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 06/62] x86/boot/compressed: Fix debug_puthex() parameter type
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (4 preceding siblings ...)
  2020-02-11 13:51 ` [PATCH 05/62] x86/insn-decoder: Make inat-tables.c suitable for pre-decompression code Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage Joerg Roedel
                   ` (57 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

In the CONFIG_X86_VERBOSE_BOOTUP=Y case the debug_puthex() macro just
turns into __puthex, which takes 'unsigned long' as parameter. But in
the CONFIG_X86_VERBOSE_BOOTUP=N case it is a function which takes
'unsigned char *', causing compile warnings when the function is used.
Fix the parameter type to get rid of the warnings.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/misc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index c8181392f70d..726e264410ff 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -59,7 +59,7 @@ void __puthex(unsigned long value);
 
 static inline void debug_putstr(const char *s)
 { }
-static inline void debug_puthex(const char *s)
+static inline void debug_puthex(unsigned long value)
 { }
 #define debug_putaddr(x) /* */
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (5 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 06/62] x86/boot/compressed: Fix debug_puthex() parameter type Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:13   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure Joerg Roedel
                   ` (56 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The x86-64 ABI defines a red-zone on the stack:

  The 128-byte area beyond the location pointed to by %rsp is
  considered to be reserved and shall not be modified by signal or
  interrupt handlers. 10 Therefore, functions may use this area for
  temporary data that is not needed across function calls. In
  particular, leaf functions may use this area for their entire stack
  frame, rather than adjusting the stack pointer in the prologue and
  epilogue. This area is known as the red zone.

This is not compatible with exception handling, so disable it for the
pre-decompression boot code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/Makefile            | 2 +-
 arch/x86/boot/compressed/Makefile | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 012b82fc8617..8f55e4ce1ccc 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -65,7 +65,7 @@ clean-files += cpustr.h
 
 # ---------------------------------------------------------------------------
 
-KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP
+KBUILD_CFLAGS	:= $(REALMODE_CFLAGS) -D_SETUP -mno-red-zone
 KBUILD_AFLAGS	:= $(KBUILD_CFLAGS) -D__ASSEMBLY__
 KBUILD_CFLAGS	+= $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
 GCOV_PROFILE := n
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 26050ae0b27e..e186cc0b628d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -30,7 +30,7 @@ KBUILD_CFLAGS := -m$(BITS) -O2
 KBUILD_CFLAGS += -fno-strict-aliasing $(call cc-option, -fPIE, -fPIC)
 KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
 cflags-$(CONFIG_X86_32) := -march=i386
-cflags-$(CONFIG_X86_64) := -mcmodel=small
+cflags-$(CONFIG_X86_64) := -mcmodel=small -mno-red-zone
 KBUILD_CFLAGS += $(cflags-y)
 KBUILD_CFLAGS += -mno-mmx -mno-sse
 KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
@@ -87,7 +87,7 @@ endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 
-$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
+$(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar
 
 vmlinux-objs-$(CONFIG_EFI_STUB) += $(obj)/eboot.o \
 	$(objtree)/drivers/firmware/efi/libstub/lib.a
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (6 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:18   ` Andy Lutomirski
  2020-02-14 19:40   ` Andi Kleen
  2020-02-11 13:52 ` [PATCH 09/62] x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c Joerg Roedel
                   ` (55 subsequent siblings)
  63 siblings, 2 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add code needed to setup an IDT in the early pre-decompression
boot-code. The IDT is loaded first in startup_64, which is after
EfiExitBootServices() has been called, and later reloaded when the
kernel image has been relocated to the end of the decompression area.

This allows to setup different IDT handlers before and after the
relocation.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/Makefile          |  1 +
 arch/x86/boot/compressed/head_64.S         | 34 +++++++++++
 arch/x86/boot/compressed/idt_64.c          | 43 +++++++++++++
 arch/x86/boot/compressed/idt_handlers_64.S | 71 ++++++++++++++++++++++
 arch/x86/boot/compressed/misc.h            |  5 ++
 arch/x86/include/asm/desc_defs.h           |  3 +
 6 files changed, 157 insertions(+)
 create mode 100644 arch/x86/boot/compressed/idt_64.c
 create mode 100644 arch/x86/boot/compressed/idt_handlers_64.S

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e186cc0b628d..54d63526e856 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -81,6 +81,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
 	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
+	vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
 endif
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 1f1f6c8139b3..d27a9ce1bcb0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -33,6 +33,7 @@
 #include <asm/processor-flags.h>
 #include <asm/asm-offsets.h>
 #include <asm/bootparam.h>
+#include <asm/desc_defs.h>
 #include "pgtable.h"
 
 /*
@@ -358,6 +359,10 @@ SYM_CODE_START(startup_64)
 	movq	%rax, gdt64+2(%rip)
 	lgdt	gdt64(%rip)
 
+	pushq	%rsi
+	call	load_stage1_idt
+	popq	%rsi
+
 	/*
 	 * paging_prepare() sets up the trampoline and checks if we need to
 	 * enable 5-level paging.
@@ -465,6 +470,16 @@ SYM_FUNC_END_ALIAS(efi_stub_entry)
 	.text
 SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 
+/*
+ * Reload GDT after relocation - The GDT at the non-relocated position
+ * might be overwritten soon by the in-place decompression, so reload
+ * GDT at the relocated address. The GDT is referenced by exception
+ * handling and needs to be set up correctly.
+ */
+	leaq	gdt(%rip), %rax
+	movq	%rax, gdt64+2(%rip)
+	lgdt	gdt64(%rip)
+
 /*
  * Clear BSS (stack is currently empty)
  */
@@ -475,6 +490,13 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	shrq	$3, %rcx
 	rep	stosq
 
+/*
+ * Load stage2 IDT
+ */
+	pushq	%rsi
+	call	load_stage2_idt
+	popq	%rsi
+
 /*
  * Do the extraction, and jump to the new kernel..
  */
@@ -628,6 +650,18 @@ SYM_DATA_START_LOCAL(gdt)
 	.quad   0x0000000000000000	/* TS continued */
 SYM_DATA_END_LABEL(gdt, SYM_L_LOCAL, gdt_end)
 
+SYM_DATA_START(boot_idt_desc)
+	.word	boot_idt_end - boot_idt
+	.quad	0
+SYM_DATA_END(boot_idt_desc)
+	.balign 8
+SYM_DATA_START(boot_idt)
+	.rept	BOOT_IDT_ENTRIES
+	.quad	0
+	.quad	0
+	.endr
+SYM_DATA_END_LABEL(boot_idt, SYM_L_GLOBAL, boot_idt_end)
+
 #ifdef CONFIG_EFI_MIXED
 SYM_DATA_LOCAL(efi32_boot_args, .long 0, 0)
 SYM_DATA(efi_is64, .byte 1)
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
new file mode 100644
index 000000000000..46ecea671b90
--- /dev/null
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/trap_defs.h>
+#include <asm/segment.h>
+#include "misc.h"
+
+static void set_idt_entry(int vector, void (*handler)(void))
+{
+	unsigned long address = (unsigned long)handler;
+	gate_desc entry;
+
+	memset(&entry, 0, sizeof(entry));
+
+	entry.offset_low    = (u16)(address & 0xffff);
+	entry.segment       = __KERNEL_CS;
+	entry.bits.type     = GATE_TRAP;
+	entry.bits.p        = 1;
+	entry.offset_middle = (u16)((address >> 16) & 0xffff);
+	entry.offset_high   = (u32)(address >> 32);
+
+	memcpy(&boot_idt[vector], &entry, sizeof(entry));
+}
+
+/* Have this here so we don't need to include <asm/desc.h> */
+static void load_boot_idt(const struct desc_ptr *dtr)
+{
+	asm volatile("lidt %0"::"m" (*dtr));
+}
+
+/* Setup IDT before kernel jumping to  .Lrelocated */
+void load_stage1_idt(void)
+{
+	boot_idt_desc.address = (unsigned long)boot_idt;
+
+	load_boot_idt(&boot_idt_desc);
+}
+
+/* Setup IDT after kernel jumping to  .Lrelocated */
+void load_stage2_idt(void)
+{
+	boot_idt_desc.address = (unsigned long)boot_idt;
+
+	load_boot_idt(&boot_idt_desc);
+}
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
new file mode 100644
index 000000000000..0b2b6cf747d2
--- /dev/null
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Early IDT handler entry points
+ *
+ * Copyright (C) 2019 SUSE
+ *
+ * Author: Joerg Roedel <jroedel@suse.de>
+ */
+
+#include <asm/segment.h>
+
+.macro EXCEPTION_HANDLER name function error_code=0
+SYM_FUNC_START(\name)
+
+	/* Build pt_regs */
+	.if \error_code == 0
+	pushq   $0
+	.endif
+
+	pushq   %rdi
+	pushq   %rsi
+	pushq   %rdx
+	pushq   %rcx
+	pushq   %rax
+	pushq   %r8
+	pushq   %r9
+	pushq   %r10
+	pushq   %r11
+	pushq   %rbx
+	pushq   %rbp
+	pushq   %r12
+	pushq   %r13
+	pushq   %r14
+	pushq   %r15
+
+	/* Call handler with pt_regs */
+	movq    %rsp, %rdi
+	call    \function
+
+	/* Restore regs */
+	popq    %r15
+	popq    %r14
+	popq    %r13
+	popq    %r12
+	popq    %rbp
+	popq    %rbx
+	popq    %r11
+	popq    %r10
+	popq    %r9
+	popq    %r8
+	popq    %rax
+	popq    %rcx
+	popq    %rdx
+	popq    %rsi
+	popq    %rdi
+
+	/* Remove error code and return */
+	addq    $8, %rsp
+
+	/*
+	 * Make sure we return to __KERNEL_CS - the CS selector on
+	 * the IRET frame might still be from an old BIOS GDT
+	 */
+	movq	$__KERNEL_CS, 8(%rsp)
+
+	iretq
+SYM_FUNC_END(\name)
+	.endm
+
+	.text
+	.code64
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 726e264410ff..062ae3ae6930 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -23,6 +23,7 @@
 #include <asm/page.h>
 #include <asm/boot.h>
 #include <asm/bootparam.h>
+#include <asm/desc_defs.h>
 
 #define BOOT_CTYPE_H
 #include <linux/acpi.h>
@@ -133,4 +134,8 @@ int count_immovable_mem_regions(void);
 static inline int count_immovable_mem_regions(void) { return 0; }
 #endif
 
+/* idt_64.c */
+extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
+extern struct desc_ptr boot_idt_desc;
+
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
index a91f3b6e4f2a..5621fb3f2d1a 100644
--- a/arch/x86/include/asm/desc_defs.h
+++ b/arch/x86/include/asm/desc_defs.h
@@ -109,6 +109,9 @@ struct desc_ptr {
 
 #endif /* !__ASSEMBLY__ */
 
+/* Boot IDT definitions */
+#define	BOOT_IDT_ENTRIES	32
+
 /* Access rights as returned by LAR */
 #define AR_TYPE_RODATA		(0 * (1 << 9))
 #define AR_TYPE_RWDATA		(1 * (1 << 9))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 09/62] x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (7 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 10/62] x86/boot/compressed/64: Add page-fault handler Joerg Roedel
                   ` (54 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The file contains only code related to identity mapped page-tables.
Rename the file and compile it always in.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/Makefile                       | 2 +-
 arch/x86/boot/compressed/{kaslr_64.c => ident_map_64.c} | 9 +++++++++
 arch/x86/boot/compressed/kaslr.c                        | 9 ---------
 arch/x86/boot/compressed/misc.h                         | 8 ++++++++
 4 files changed, 18 insertions(+), 10 deletions(-)
 rename arch/x86/boot/compressed/{kaslr_64.c => ident_map_64.c} (95%)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 54d63526e856..e6b3e0fc48de 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -80,7 +80,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
-	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
+	vmlinux-objs-y += $(obj)/ident_map_64.o
 	vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
diff --git a/arch/x86/boot/compressed/kaslr_64.c b/arch/x86/boot/compressed/ident_map_64.c
similarity index 95%
rename from arch/x86/boot/compressed/kaslr_64.c
rename to arch/x86/boot/compressed/ident_map_64.c
index 748456c365f4..3a2115582920 100644
--- a/arch/x86/boot/compressed/kaslr_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -29,6 +29,15 @@
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
 #include "../../mm/ident_map.c"
 
+#ifdef CONFIG_X86_5LEVEL
+unsigned int __pgtable_l5_enabled;
+unsigned int pgdir_shift = 39;
+unsigned int ptrs_per_p4d = 1;
+#endif
+
+/* Used by PAGE_KERN* macros: */
+pteval_t __default_kernel_pte_mask __read_mostly = ~0;
+
 /* Used by pgtable.h asm code to force instruction serialization. */
 unsigned long __force_order;
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..7c61a8c5b9cf 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -43,17 +43,8 @@
 #define STATIC
 #include <linux/decompress/mm.h>
 
-#ifdef CONFIG_X86_5LEVEL
-unsigned int __pgtable_l5_enabled;
-unsigned int pgdir_shift __ro_after_init = 39;
-unsigned int ptrs_per_p4d __ro_after_init = 1;
-#endif
-
 extern unsigned long get_cmd_line_ptr(void);
 
-/* Used by PAGE_KERN* macros: */
-pteval_t __default_kernel_pte_mask __read_mostly = ~0;
-
 /* Simplified build-specific string for starting entropy. */
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
 		LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 062ae3ae6930..3a030a878d53 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -134,6 +134,14 @@ int count_immovable_mem_regions(void);
 static inline int count_immovable_mem_regions(void) { return 0; }
 #endif
 
+/* ident_map_64.c */
+#ifdef CONFIG_X86_5LEVEL
+extern unsigned int __pgtable_l5_enabled, pgdir_shift, ptrs_per_p4d;
+#endif
+
+/* Used by PAGE_KERN* macros: */
+extern pteval_t __default_kernel_pte_mask;
+
 /* idt_64.c */
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 10/62] x86/boot/compressed/64: Add page-fault handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (8 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 09/62] x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 11/62] x86/boot/compressed/64: Always switch to own page-table Joerg Roedel
                   ` (53 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Install a page-fault handler to add an identity mapping to addresses
not yet mapped. Also do some checking whether the error code is sane.

This makes non SEV-ES machines use the exception handling
infrastructure in the pre-decompressions boot code too, making it less
likely to break in the future.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/ident_map_64.c    | 38 ++++++++++++++++++++++
 arch/x86/boot/compressed/idt_64.c          |  2 ++
 arch/x86/boot/compressed/idt_handlers_64.S |  2 ++
 arch/x86/boot/compressed/misc.h            |  6 ++++
 4 files changed, 48 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 3a2115582920..0865d181b85d 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -19,11 +19,13 @@
 /* No PAGE_TABLE_ISOLATION support needed either: */
 #undef CONFIG_PAGE_TABLE_ISOLATION
 
+#include "error.h"
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
 #include <asm/init.h>
 #include <asm/pgtable.h>
+#include <asm/trap_defs.h>
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -163,3 +165,39 @@ void finalize_identity_maps(void)
 {
 	write_cr3(top_level_pgt);
 }
+
+static void pf_error(unsigned long error_code, unsigned long address,
+		     struct pt_regs *regs)
+{
+	error_putstr("Unexpected page-fault:");
+	error_putstr("\nError Code: ");
+	error_puthex(error_code);
+	error_putstr("\nCR2: 0x");
+	error_puthex(address);
+	error_putstr("\nRIP relative to _head: 0x");
+	error_puthex(regs->ip - (unsigned long)_head);
+	error_putstr("\n");
+
+	error("Stopping.\n");
+}
+
+void do_boot_page_fault(struct pt_regs *regs)
+{
+	unsigned long address = native_read_cr2();
+	unsigned long error_code = regs->orig_ax;
+
+	/*
+	 * Check for unexpected error codes. Unexpected are:
+	 *	- Faults on present pages
+	 *	- User faults
+	 *	- Reserved bits set
+	 */
+	if (error_code & (X86_PF_PROT | X86_PF_USER | X86_PF_RSVD))
+		pf_error(error_code, address, regs);
+
+	/*
+	 * Error code is sane - now identity map the 2M region around
+	 * the faulting address.
+	 */
+	add_identity_map(address & PMD_MASK, PMD_SIZE);
+}
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 46ecea671b90..84ba57d9d436 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -39,5 +39,7 @@ void load_stage2_idt(void)
 {
 	boot_idt_desc.address = (unsigned long)boot_idt;
 
+	set_idt_entry(X86_TRAP_PF, boot_pf_handler);
+
 	load_boot_idt(&boot_idt_desc);
 }
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index 0b2b6cf747d2..f7f1ea66dcbf 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -69,3 +69,5 @@ SYM_FUNC_END(\name)
 
 	.text
 	.code64
+
+EXCEPTION_HANDLER	boot_pf_handler do_boot_page_fault error_code=1
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 3a030a878d53..eff4ed0b1cea 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -37,6 +37,9 @@
 #define memptr unsigned
 #endif
 
+/* boot/compressed/vmlinux start and end markers */
+extern char _head[], _end[];
+
 /* misc.c */
 extern memptr free_mem_ptr;
 extern memptr free_mem_end_ptr;
@@ -146,4 +149,7 @@ extern pteval_t __default_kernel_pte_mask;
 extern gate_desc boot_idt[BOOT_IDT_ENTRIES];
 extern struct desc_ptr boot_idt_desc;
 
+/* IDT Entry Points */
+void boot_pf_handler(void);
+
 #endif /* BOOT_COMPRESSED_MISC_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 11/62] x86/boot/compressed/64: Always switch to own page-table
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (9 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 10/62] x86/boot/compressed/64: Add page-fault handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 12/62] x86/boot/compressed/64: Don't pre-map memory in KASLR code Joerg Roedel
                   ` (52 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

When booted through startup_64 the kernel keeps running on the EFI
page-table until the KASLR code sets up its own page-table. Without
KASLR the pre-decompression boot code never switches off the EFI
page-table. Change that by unconditionally switching to our own
page-table once the kernel is relocated.

This makes sure we can make changes to the mapping when necessary, for
example map pages unencrypted in SEV and SEV-ES guests.

Also remove the debug_putstr() calls in initialize_identity_maps()
because the function now runs before console_init() is called.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/head_64.S      |  3 +-
 arch/x86/boot/compressed/ident_map_64.c | 51 +++++++++++++++----------
 arch/x86/boot/compressed/kaslr.c        |  3 --
 3 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d27a9ce1bcb0..5164d2e8631a 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -491,10 +491,11 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	rep	stosq
 
 /*
- * Load stage2 IDT
+ * Load stage2 IDT and switch to our own page-table
  */
 	pushq	%rsi
 	call	load_stage2_idt
+	call	initialize_identity_maps
 	popq	%rsi
 
 /*
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 0865d181b85d..6a3890caaa19 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -88,9 +88,31 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
  */
 static struct x86_mapping_info mapping_info;
 
+/*
+ * Adds the specified range to what will become the new identity mappings.
+ * Once all ranges have been added, the new mapping is activated by calling
+ * finalize_identity_maps() below.
+ */
+void add_identity_map(unsigned long start, unsigned long size)
+{
+	unsigned long end = start + size;
+
+	/* Align boundary to 2M. */
+	start = round_down(start, PMD_SIZE);
+	end = round_up(end, PMD_SIZE);
+	if (start >= end)
+		return;
+
+	/* Build the mapping. */
+	kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
+				  start, end);
+}
+
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
+	unsigned long start, size;
+
 	/* If running as an SEV guest, the encryption mask is required. */
 	set_sev_encryption_mask();
 
@@ -123,37 +145,24 @@ void initialize_identity_maps(void)
 	 */
 	top_level_pgt = read_cr3_pa();
 	if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) {
-		debug_putstr("booted via startup_32()\n");
 		pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
 		pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE;
 		memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
 	} else {
-		debug_putstr("booted via startup_64()\n");
 		pgt_data.pgt_buf = _pgtable;
 		pgt_data.pgt_buf_size = BOOT_PGT_SIZE;
 		memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size);
 		top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data);
 	}
-}
 
-/*
- * Adds the specified range to what will become the new identity mappings.
- * Once all ranges have been added, the new mapping is activated by calling
- * finalize_identity_maps() below.
- */
-void add_identity_map(unsigned long start, unsigned long size)
-{
-	unsigned long end = start + size;
-
-	/* Align boundary to 2M. */
-	start = round_down(start, PMD_SIZE);
-	end = round_up(end, PMD_SIZE);
-	if (start >= end)
-		return;
-
-	/* Build the mapping. */
-	kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
-				  start, end);
+	/*
+	 * New page-table is set up - map the kernel image and load it
+	 * into cr3.
+	 */
+	start = (unsigned long)_head;
+	size  = _end - _head;
+	add_identity_map(start, size);
+	write_cr3(top_level_pgt);
 }
 
 /*
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 7c61a8c5b9cf..856dc1c9bb0d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -903,9 +903,6 @@ void choose_random_location(unsigned long input,
 
 	boot_params->hdr.loadflags |= KASLR_FLAG;
 
-	/* Prepare to add new identity pagetables on demand. */
-	initialize_identity_maps();
-
 	/* Record the various known unsafe memory ranges. */
 	mem_avoid_init(input, input_size, *output);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 12/62] x86/boot/compressed/64: Don't pre-map memory in KASLR code
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (10 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 11/62] x86/boot/compressed/64: Always switch to own page-table Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 13/62] x86/boot/compressed/64: Change add_identity_map() to take start and end Joerg Roedel
                   ` (51 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

With the page-fault handler in place the identity mapping can be built
on-demand. So remove the code which manually creates the mappings and
unexport/remove the functions used for it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/ident_map_64.c | 16 ++--------------
 arch/x86/boot/compressed/kaslr.c        | 24 +-----------------------
 arch/x86/boot/compressed/misc.h         | 10 ----------
 3 files changed, 3 insertions(+), 47 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 6a3890caaa19..ab7a3d9705c0 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -89,11 +89,9 @@ phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
 static struct x86_mapping_info mapping_info;
 
 /*
- * Adds the specified range to what will become the new identity mappings.
- * Once all ranges have been added, the new mapping is activated by calling
- * finalize_identity_maps() below.
+ * Adds the specified range to the identity mappings.
  */
-void add_identity_map(unsigned long start, unsigned long size)
+static void add_identity_map(unsigned long start, unsigned long size)
 {
 	unsigned long end = start + size;
 
@@ -165,16 +163,6 @@ void initialize_identity_maps(void)
 	write_cr3(top_level_pgt);
 }
 
-/*
- * This switches the page tables to the new level4 that has been built
- * via calls to add_identity_map() above. If booted via startup_32(),
- * this is effectively a no-op.
- */
-void finalize_identity_maps(void)
-{
-	write_cr3(top_level_pgt);
-}
-
 static void pf_error(unsigned long error_code, unsigned long address,
 		     struct pt_regs *regs)
 {
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 856dc1c9bb0d..c466fb738de0 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -399,8 +399,6 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 	 */
 	mem_avoid[MEM_AVOID_ZO_RANGE].start = input;
 	mem_avoid[MEM_AVOID_ZO_RANGE].size = (output + init_size) - input;
-	add_identity_map(mem_avoid[MEM_AVOID_ZO_RANGE].start,
-			 mem_avoid[MEM_AVOID_ZO_RANGE].size);
 
 	/* Avoid initrd. */
 	initrd_start  = (u64)boot_params->ext_ramdisk_image << 32;
@@ -420,14 +418,10 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 		;
 	mem_avoid[MEM_AVOID_CMDLINE].start = cmd_line;
 	mem_avoid[MEM_AVOID_CMDLINE].size = cmd_line_size;
-	add_identity_map(mem_avoid[MEM_AVOID_CMDLINE].start,
-			 mem_avoid[MEM_AVOID_CMDLINE].size);
 
 	/* Avoid boot parameters. */
 	mem_avoid[MEM_AVOID_BOOTPARAMS].start = (unsigned long)boot_params;
 	mem_avoid[MEM_AVOID_BOOTPARAMS].size = sizeof(*boot_params);
-	add_identity_map(mem_avoid[MEM_AVOID_BOOTPARAMS].start,
-			 mem_avoid[MEM_AVOID_BOOTPARAMS].size);
 
 	/* We don't need to set a mapping for setup_data. */
 
@@ -436,11 +430,6 @@ static void mem_avoid_init(unsigned long input, unsigned long input_size,
 
 	/* Enumerate the immovable memory regions */
 	num_immovable_mem = count_immovable_mem_regions();
-
-#ifdef CONFIG_X86_VERBOSE_BOOTUP
-	/* Make sure video RAM can be used. */
-	add_identity_map(0, PMD_SIZE);
-#endif
 }
 
 /*
@@ -919,19 +908,8 @@ void choose_random_location(unsigned long input,
 		warn("Physical KASLR disabled: no suitable memory region!");
 	} else {
 		/* Update the new physical address location. */
-		if (*output != random_addr) {
-			add_identity_map(random_addr, output_size);
+		if (*output != random_addr)
 			*output = random_addr;
-		}
-
-		/*
-		 * This loads the identity mapping page table.
-		 * This should only be done if a new physical address
-		 * is found for the kernel, otherwise we should keep
-		 * the old page table to make it be like the "nokaslr"
-		 * case.
-		 */
-		finalize_identity_maps();
 	}
 
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index eff4ed0b1cea..4e5bc688f467 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -98,17 +98,7 @@ static inline void choose_random_location(unsigned long input,
 #endif
 
 #ifdef CONFIG_X86_64
-void initialize_identity_maps(void);
-void add_identity_map(unsigned long start, unsigned long size);
-void finalize_identity_maps(void);
 extern unsigned char _pgtable[];
-#else
-static inline void initialize_identity_maps(void)
-{ }
-static inline void add_identity_map(unsigned long start, unsigned long size)
-{ }
-static inline void finalize_identity_maps(void)
-{ }
 #endif
 
 #ifdef CONFIG_EARLY_PRINTK
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 13/62] x86/boot/compressed/64: Change add_identity_map() to take start and end
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (11 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 12/62] x86/boot/compressed/64: Don't pre-map memory in KASLR code Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler Joerg Roedel
                   ` (50 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Changing the function to take start and end as parameters instead of
start and size simplifies the callers, which don't need to calculate
the size if they already have start and end.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/ident_map_64.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index ab7a3d9705c0..ba5b88189220 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -91,10 +91,8 @@ static struct x86_mapping_info mapping_info;
 /*
  * Adds the specified range to the identity mappings.
  */
-static void add_identity_map(unsigned long start, unsigned long size)
+static void add_identity_map(unsigned long start, unsigned long end)
 {
-	unsigned long end = start + size;
-
 	/* Align boundary to 2M. */
 	start = round_down(start, PMD_SIZE);
 	end = round_up(end, PMD_SIZE);
@@ -109,8 +107,6 @@ static void add_identity_map(unsigned long start, unsigned long size)
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
-	unsigned long start, size;
-
 	/* If running as an SEV guest, the encryption mask is required. */
 	set_sev_encryption_mask();
 
@@ -157,9 +153,7 @@ void initialize_identity_maps(void)
 	 * New page-table is set up - map the kernel image and load it
 	 * into cr3.
 	 */
-	start = (unsigned long)_head;
-	size  = _end - _head;
-	add_identity_map(start, size);
+	add_identity_map((unsigned long)_head, (unsigned long)_end);
 	write_cr3(top_level_pgt);
 }
 
@@ -180,7 +174,8 @@ static void pf_error(unsigned long error_code, unsigned long address,
 
 void do_boot_page_fault(struct pt_regs *regs)
 {
-	unsigned long address = native_read_cr2();
+	unsigned long address = native_read_cr2() & PMD_MASK;
+	unsigned long end = address + PMD_SIZE;
 	unsigned long error_code = regs->orig_ax;
 
 	/*
@@ -196,5 +191,5 @@ void do_boot_page_fault(struct pt_regs *regs)
 	 * Error code is sane - now identity map the 2M region around
 	 * the faulting address.
 	 */
-	add_identity_map(address & PMD_MASK, PMD_SIZE);
+	add_identity_map(address, end);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (12 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 13/62] x86/boot/compressed/64: Change add_identity_map() to take start and end Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:23   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 15/62] x86/boot/compressed/64: Call set_sev_encryption_mask earlier Joerg Roedel
                   ` (49 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add the first handler for #VC exceptions. At stage 1 there is no GHCB
yet becaue we might still be on the EFI page table and thus can't map
memory unencrypted.

The stage 1 handler is limited to the MSR based protocol to talk to
the hypervisor and can only support CPUID exit-codes, but that is
enough to get to stage 2.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/Makefile          |  1 +
 arch/x86/boot/compressed/idt_64.c          |  4 ++
 arch/x86/boot/compressed/idt_handlers_64.S |  4 ++
 arch/x86/boot/compressed/misc.h            |  1 +
 arch/x86/boot/compressed/sev-es.c          | 42 ++++++++++++++
 arch/x86/include/asm/msr-index.h           |  1 +
 arch/x86/include/asm/sev-es.h              | 45 +++++++++++++++
 arch/x86/include/asm/trap_defs.h           |  1 +
 arch/x86/kernel/sev-es-shared.c            | 66 ++++++++++++++++++++++
 9 files changed, 165 insertions(+)
 create mode 100644 arch/x86/boot/compressed/sev-es.c
 create mode 100644 arch/x86/include/asm/sev-es.h
 create mode 100644 arch/x86/kernel/sev-es-shared.c

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index e6b3e0fc48de..583678c78e1b 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -84,6 +84,7 @@ ifdef CONFIG_X86_64
 	vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
+	vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 84ba57d9d436..bdd20dfd1fd0 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -31,6 +31,10 @@ void load_stage1_idt(void)
 {
 	boot_idt_desc.address = (unsigned long)boot_idt;
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	set_idt_entry(X86_TRAP_VC, boot_stage1_vc_handler);
+#endif
+
 	load_boot_idt(&boot_idt_desc);
 }
 
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index f7f1ea66dcbf..330eb4e5c8b3 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -71,3 +71,7 @@ SYM_FUNC_END(\name)
 	.code64
 
 EXCEPTION_HANDLER	boot_pf_handler do_boot_page_fault error_code=1
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+EXCEPTION_HANDLER	boot_stage1_vc_handler no_ghcb_vc_handler error_code=1
+#endif
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 4e5bc688f467..0e3508c5c15c 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -141,5 +141,6 @@ extern struct desc_ptr boot_idt_desc;
 
 /* IDT Entry Points */
 void boot_pf_handler(void);
+void boot_stage1_vc_handler(void);
 
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
new file mode 100644
index 000000000000..8d13121a8cf2
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel <jroedel@suse.de>
+ */
+
+#include <linux/kernel.h>
+
+#include <asm/sev-es.h>
+#include <asm/msr-index.h>
+#include <asm/ptrace.h>
+#include <asm/svm.h>
+
+#include "misc.h"
+
+static inline u64 read_ghcb_msr(void)
+{
+	unsigned long low, high;
+
+	asm volatile("rdmsr\n" : "=a" (low), "=d" (high) :
+			"c" (MSR_AMD64_SEV_ES_GHCB));
+
+	return ((high << 32) | low);
+}
+
+static inline void write_ghcb_msr(u64 val)
+{
+	u32 low, high;
+
+	low  = val & 0xffffffffUL;
+	high = val >> 32;
+
+	asm volatile("wrmsr\n" : : "c" (MSR_AMD64_SEV_ES_GHCB),
+			"a"(low), "d" (high) : "memory");
+}
+
+#undef __init
+#define __init
+
+/* Include code for early handlers */
+#include "../../kernel/sev-es-shared.c"
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..b6139b70db54 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -432,6 +432,7 @@
 #define MSR_AMD64_IBSBRTARGET		0xc001103b
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_SEV_ES_GHCB		0xc0010130
 #define MSR_AMD64_SEV			0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT	0
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
new file mode 100644
index 000000000000..f524b40aef07
--- /dev/null
+++ b/arch/x86/include/asm/sev-es.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel <jroedel@suse.de>
+ */
+
+#ifndef __ASM_ENCRYPTED_STATE_H
+#define __ASM_ENCRYPTED_STATE_H
+
+#include <linux/types.h>
+
+#define GHCB_SEV_CPUID_REQ	0x004UL
+#define		GHCB_CPUID_REQ_EAX	0
+#define		GHCB_CPUID_REQ_EBX	1
+#define		GHCB_CPUID_REQ_ECX	2
+#define		GHCB_CPUID_REQ_EDX	3
+#define		GHCB_CPUID_REQ(fn, reg) (GHCB_SEV_CPUID_REQ | \
+					(((unsigned long)reg & 3) << 30) | \
+					(((unsigned long)fn) << 32))
+
+#define GHCB_SEV_CPUID_RESP	0x005UL
+#define GHCB_SEV_TERMINATE	0x100UL
+
+#define	GHCB_SEV_GHCB_RESP_CODE(v)	((v) & 0xfff)
+#define	VMGEXIT()			{ asm volatile("rep; vmmcall\n\r"); }
+
+static inline u64 lower_bits(u64 val, unsigned int bits)
+{
+	u64 mask = (1ULL << bits) - 1;
+
+	return (val & mask);
+}
+
+static inline u64 copy_lower_bits(u64 out, u64 in, unsigned int bits)
+{
+	u64 mask = (1ULL << bits) - 1;
+
+	out &= ~mask;
+	out |= lower_bits(in, bits);
+
+	return out;
+}
+
+#endif
diff --git a/arch/x86/include/asm/trap_defs.h b/arch/x86/include/asm/trap_defs.h
index 488f82ac36da..af45d65f0458 100644
--- a/arch/x86/include/asm/trap_defs.h
+++ b/arch/x86/include/asm/trap_defs.h
@@ -24,6 +24,7 @@ enum {
 	X86_TRAP_AC,		/* 17, Alignment Check */
 	X86_TRAP_MC,		/* 18, Machine Check */
 	X86_TRAP_XF,		/* 19, SIMD Floating-Point Exception */
+	X86_TRAP_VC = 29,	/* 29, VMM Communication Exception */
 	X86_TRAP_IRET = 32,	/* 32, IRET Exception */
 };
 
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
new file mode 100644
index 000000000000..7edf2dfac71f
--- /dev/null
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel <jroedel@suse.de>
+ *
+ * This file is not compiled stand-alone. It contains code shared
+ * between the pre-decompression boot code and the running Linux kernel
+ * and is included directly into both code-bases.
+ */
+
+/*
+ * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
+ * page yet, so it only supports the MSR based communication with the
+ * hypervisor and only the CPUID exit-code.
+ */
+void __init no_ghcb_vc_handler(struct pt_regs *regs)
+{
+	unsigned int fn = lower_bits(regs->ax, 32);
+	unsigned long exit_code = regs->orig_ax;
+	unsigned long val;
+
+	/* Only CPUID is supported via MSR protocol */
+	if (exit_code != SVM_EXIT_CPUID)
+		goto fail;
+
+	write_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EAX));
+	VMGEXIT();
+	val = read_ghcb_msr();
+	if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SEV_CPUID_RESP)
+		goto fail;
+	regs->ax = copy_lower_bits(regs->ax, val >> 32, 32);
+
+	write_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EBX));
+	VMGEXIT();
+	val = read_ghcb_msr();
+	if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SEV_CPUID_RESP)
+		goto fail;
+	regs->bx = copy_lower_bits(regs->bx, val >> 32, 32);
+
+	write_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_ECX));
+	VMGEXIT();
+	val = read_ghcb_msr();
+	if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SEV_CPUID_RESP)
+		goto fail;
+	regs->cx = copy_lower_bits(regs->cx, val >> 32, 32);
+
+	write_ghcb_msr(GHCB_CPUID_REQ(fn, GHCB_CPUID_REQ_EDX));
+	VMGEXIT();
+	val = read_ghcb_msr();
+	if (GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_SEV_CPUID_RESP)
+		goto fail;
+	regs->dx = copy_lower_bits(regs->dx, val >> 32, 32);
+
+	regs->ip += 2;
+
+	return;
+
+fail:
+	write_ghcb_msr(GHCB_SEV_TERMINATE);
+	VMGEXIT();
+
+	/* Shouldn't get here - if we do halt the machine */
+	while (true)
+		asm volatile("hlt\n");
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 15/62] x86/boot/compressed/64: Call set_sev_encryption_mask earlier
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (13 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 16/62] x86/boot/compressed/64: Check return value of kernel_ident_mapping_init() Joerg Roedel
                   ` (48 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Call set_sev_encryption_mask() while still on the stage 1 #VC-handler,
because the stage 2 handler needs our own page-tables to be set up, to
which calling set_sev_encryption_mask() is a prerequisite.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/head_64.S      | 8 +++++++-
 arch/x86/boot/compressed/ident_map_64.c | 3 ---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 5164d2e8631a..5557f899b22b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -491,9 +491,15 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	rep	stosq
 
 /*
- * Load stage2 IDT and switch to our own page-table
+ * If running as a SEV guest, the encryption mask is required in the
+ * page-table setup code below. When the guest also has SEV-ES enabled
+ * set_sev_encryption_mask() will cause #VC exceptions, but the stage2
+ * handler can't map its GHCB because the page-table is not set up yet.
+ * So set up the encryption mask here while still on the stage1 #VC
+ * handler. Then load stage2 IDT and switch to our own page-table.
  */
 	pushq	%rsi
+	call	set_sev_encryption_mask
 	call	load_stage2_idt
 	call	initialize_identity_maps
 	popq	%rsi
diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index ba5b88189220..5b720736a789 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -107,9 +107,6 @@ static void add_identity_map(unsigned long start, unsigned long end)
 /* Locates and clears a region for a new top level page table. */
 void initialize_identity_maps(void)
 {
-	/* If running as an SEV guest, the encryption mask is required. */
-	set_sev_encryption_mask();
-
 	/* Exclude the encryption mask from __PHYSICAL_MASK */
 	physical_mask &= ~sme_me_mask;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 16/62] x86/boot/compressed/64: Check return value of kernel_ident_mapping_init()
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (14 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 15/62] x86/boot/compressed/64: Call set_sev_encryption_mask earlier Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 17/62] x86/boot/compressed/64: Add function to map a page unencrypted Joerg Roedel
                   ` (47 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The function can fail to create an identity mapping, check for that
and bail out if it happens.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/ident_map_64.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index 5b720736a789..feb180cced28 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -93,6 +93,8 @@ static struct x86_mapping_info mapping_info;
  */
 static void add_identity_map(unsigned long start, unsigned long end)
 {
+	int ret;
+
 	/* Align boundary to 2M. */
 	start = round_down(start, PMD_SIZE);
 	end = round_up(end, PMD_SIZE);
@@ -100,8 +102,9 @@ static void add_identity_map(unsigned long start, unsigned long end)
 		return;
 
 	/* Build the mapping. */
-	kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
-				  start, end);
+	ret = kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, start, end);
+	if (ret)
+		error("Error: kernel_ident_mapping_init() failed\n");
 }
 
 /* Locates and clears a region for a new top level page table. */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 17/62] x86/boot/compressed/64: Add function to map a page unencrypted
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (15 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 16/62] x86/boot/compressed/64: Check return value of kernel_ident_mapping_init() Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler Joerg Roedel
                   ` (46 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

This function is needed to map the GHCB for SEV-ES guests. The GHCB is
used for communication with the hypervisor, so its content must not be
encrypted.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/ident_map_64.c | 125 ++++++++++++++++++++++++
 arch/x86/boot/compressed/misc.h         |   1 +
 2 files changed, 126 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index feb180cced28..04a5ff4bda66 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -26,6 +26,7 @@
 #include <asm/init.h>
 #include <asm/pgtable.h>
 #include <asm/trap_defs.h>
+#include <asm/cmpxchg.h>
 /* Use the static base for this part of the boot process */
 #undef __PAGE_OFFSET
 #define __PAGE_OFFSET __PAGE_OFFSET_BASE
@@ -157,6 +158,130 @@ void initialize_identity_maps(void)
 	write_cr3(top_level_pgt);
 }
 
+static pte_t *split_large_pmd(struct x86_mapping_info *info,
+			      pmd_t *pmdp, unsigned long __address)
+{
+	unsigned long page_flags;
+	unsigned long address;
+	pte_t *pte;
+	pmd_t pmd;
+	int i;
+
+	pte = (pte_t *)info->alloc_pgt_page(info->context);
+	if (!pte)
+		return NULL;
+
+	address     = __address & PMD_MASK;
+	/* No large page - clear PSE flag */
+	page_flags  = info->page_flag & ~_PAGE_PSE;
+
+	/* Populate the PTEs */
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		set_pte(&pte[i], __pte(address | page_flags));
+		address += PAGE_SIZE;
+	}
+
+	/*
+	 * Ideally we need to clear the large PMD first and do a TLB
+	 * flush before we write the new PMD. But the 2M range of the
+	 * PMD might contain the code we execute and/or the stack
+	 * we are on, so we can't do that. But that should be safe here
+	 * because we are going from large to small mappings and we are
+	 * also the only user of the page-table, so there is no chance
+	 * of a TLB multihit.
+	 */
+	pmd = __pmd((unsigned long)pte | info->kernpg_flag);
+	set_pmd(pmdp, pmd);
+	/* Flush TLB to establish the new PMD */
+	write_cr3(top_level_pgt);
+
+	return pte + pte_index(__address);
+}
+
+static void clflush_page(unsigned long address)
+{
+	unsigned int flush_size;
+	char *cl, *start, *end;
+
+	/*
+	 * Hardcode cl-size to 64 - CPUID can't be used here because that might
+	 * cause another #VC exception and the GHCB is not ready to use yet.
+	 */
+	flush_size = 64;
+	start      = (char *)(address & PAGE_MASK);
+	end        = start + PAGE_SIZE;
+
+	/*
+	 * First make sure there are no pending writes on the cache-lines to
+	 * flush.
+	 */
+	asm volatile("mfence" : : : "memory");
+
+	for (cl = start; cl != end; cl += flush_size)
+		clflush(cl);
+}
+
+static int __set_page_decrypted(struct x86_mapping_info *info,
+				unsigned long address)
+{
+	unsigned long scratch, *target;
+	pgd_t *pgdp = (pgd_t *)top_level_pgt;
+	p4d_t *p4dp;
+	pud_t *pudp;
+	pmd_t *pmdp;
+	pte_t *ptep, pte;
+
+	/*
+	 * First make sure there is a PMD mapping for 'address'.
+	 * It should already exist, but keep things generic.
+	 *
+	 * To map the page just read from it and fault it in if there is no
+	 * mapping yet. add_identity_map() can't be called here because that
+	 * would unconditionally map the address on PMD level, destroying any
+	 * PTE-level mappings that might already exist.  Also do something
+	 * useless with 'scratch' so the access won't be optimized away.
+	 */
+	target = (unsigned long *)address;
+	scratch = *target;
+	arch_cmpxchg(target, scratch, scratch);
+
+	/*
+	 * The page is mapped at least with PMD size - so skip checks and walk
+	 * directly to the PMD.
+	 */
+	p4dp = p4d_offset(pgdp, address);
+	pudp = pud_offset(p4dp, address);
+	pmdp = pmd_offset(pudp, address);
+
+	if (pmd_large(*pmdp))
+		ptep = split_large_pmd(info, pmdp, address);
+	else
+		ptep = pte_offset_kernel(pmdp, address);
+
+	if (!ptep)
+		return -ENOMEM;
+
+	/* Clear encryption flag and write new pte */
+	pte = pte_clear_flags(*ptep, _PAGE_ENC);
+	set_pte(ptep, pte);
+
+	/* Flush TLB to map the page unencrypted */
+	write_cr3(top_level_pgt);
+
+	/*
+	 * Changing encryption attributes of a page requires to flush it from
+	 * the caches.
+	 */
+	clflush_page(address);
+
+	return 0;
+}
+
+int set_page_decrypted(unsigned long address)
+{
+	return __set_page_decrypted(&mapping_info, address);
+}
+
 static void pf_error(unsigned long error_code, unsigned long address,
 		     struct pt_regs *regs)
 {
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 0e3508c5c15c..42f68a858a35 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -98,6 +98,7 @@ static inline void choose_random_location(unsigned long input,
 #endif
 
 #ifdef CONFIG_X86_64
+extern int set_page_decrypted(unsigned long address);
 extern unsigned char _pgtable[];
 #endif
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (16 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 17/62] x86/boot/compressed/64: Add function to map a page unencrypted Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:25   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions Joerg Roedel
                   ` (45 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Install an exception handler for #VC exception that uses a GHCB. Also
add the infrastructure for handling different exit-codes by decoding
the instruction that caused the exception and error handling.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/Kconfig                           |   1 +
 arch/x86/boot/compressed/idt_64.c          |   4 +
 arch/x86/boot/compressed/idt_handlers_64.S |   1 +
 arch/x86/boot/compressed/misc.h            |   1 +
 arch/x86/boot/compressed/sev-es.c          |  91 +++++++++++
 arch/x86/include/asm/sev-es.h              |  33 ++++
 arch/x86/include/uapi/asm/svm.h            |   1 +
 arch/x86/kernel/sev-es-shared.c            | 171 +++++++++++++++++++++
 8 files changed, 303 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index beea77046f9b..c12347492589 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1526,6 +1526,7 @@ config AMD_MEM_ENCRYPT
 	select DYNAMIC_PHYSICAL_MASK
 	select ARCH_USE_MEMREMAP_PROT
 	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+	select INSTRUCTION_DECODER
 	---help---
 	  Say yes to enable support for the encryption of system memory.
 	  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index bdd20dfd1fd0..eebb2f857dac 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -45,5 +45,9 @@ void load_stage2_idt(void)
 
 	set_idt_entry(X86_TRAP_PF, boot_pf_handler);
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	set_idt_entry(X86_TRAP_VC, boot_stage2_vc_handler);
+#endif
+
 	load_boot_idt(&boot_idt_desc);
 }
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index 330eb4e5c8b3..3c71a11beee0 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -74,4 +74,5 @@ EXCEPTION_HANDLER	boot_pf_handler do_boot_page_fault error_code=1
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 EXCEPTION_HANDLER	boot_stage1_vc_handler no_ghcb_vc_handler error_code=1
+EXCEPTION_HANDLER	boot_stage2_vc_handler boot_vc_handler error_code=1
 #endif
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 42f68a858a35..567d71ab5ed9 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -143,5 +143,6 @@ extern struct desc_ptr boot_idt_desc;
 /* IDT Entry Points */
 void boot_pf_handler(void);
 void boot_stage1_vc_handler(void);
+void boot_stage2_vc_handler(void);
 
 #endif /* BOOT_COMPRESSED_MISC_H */
diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index 8d13121a8cf2..02fb6f57128b 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -8,12 +8,16 @@
 #include <linux/kernel.h>
 
 #include <asm/sev-es.h>
+#include <asm/trap_defs.h>
 #include <asm/msr-index.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
 
 #include "misc.h"
 
+struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
+struct ghcb *boot_ghcb;
+
 static inline u64 read_ghcb_msr(void)
 {
 	unsigned long low, high;
@@ -35,8 +39,95 @@ static inline void write_ghcb_msr(u64 val)
 			"a"(low), "d" (high) : "memory");
 }
 
+static enum es_result es_fetch_insn_byte(struct es_em_ctxt *ctxt,
+					 unsigned int offset,
+					 char *buffer)
+{
+	char *rip = (char *)ctxt->regs->ip;
+
+	buffer[offset] = rip[offset];
+
+	return ES_OK;
+}
+
+static enum es_result es_write_mem(struct es_em_ctxt *ctxt,
+				   void *dst, char *buf, size_t size)
+{
+	memcpy(dst, buf, size);
+
+	return ES_OK;
+}
+
+static enum es_result es_read_mem(struct es_em_ctxt *ctxt,
+				  void *src, char *buf, size_t size)
+{
+	memcpy(buf, src, size);
+
+	return ES_OK;
+}
+
 #undef __init
+#undef __pa
 #define __init
+#define __pa(x)	((unsigned long)(x))
+
+#define __BOOT_COMPRESSED
+
+/* Basic instruction decoding support needed */
+#include "../../lib/inat.c"
+#include "../../lib/insn.c"
 
 /* Include code for early handlers */
 #include "../../kernel/sev-es-shared.c"
+
+static bool setup_ghcb(void)
+{
+	if (!sev_es_negotiate_protocol())
+		terminate(GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED);
+
+	if (set_page_decrypted((unsigned long)&boot_ghcb_page))
+		return false;
+
+	/* Page is now mapped decrypted, clear it */
+	memset(&boot_ghcb_page, 0, sizeof(boot_ghcb_page));
+
+	boot_ghcb = &boot_ghcb_page;
+
+	/* Initialize lookup tables for the instruction decoder */
+	inat_init_tables();
+
+	return true;
+}
+
+void boot_vc_handler(struct pt_regs *regs)
+{
+	unsigned long exit_code = regs->orig_ax;
+	struct es_em_ctxt ctxt;
+	enum es_result result;
+
+	if (!boot_ghcb && !setup_ghcb())
+		terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+	ghcb_invalidate(boot_ghcb);
+	result = init_em_ctxt(&ctxt, regs, exit_code);
+	if (result != ES_OK)
+		goto finish;
+
+	switch (exit_code) {
+	default:
+		result = ES_UNSUPPORTED;
+		break;
+	}
+
+finish:
+	if (result == ES_OK) {
+		finish_insn(&ctxt);
+	} else if (result != ES_RETRY) {
+		/*
+		 * For now, just halt the machine. That makes debugging easier,
+		 * later we just call terminate() here.
+		 */
+		while (true)
+			asm volatile("hlt\n");
+	}
+}
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index f524b40aef07..512d3ccb9832 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -9,7 +9,14 @@
 #define __ASM_ENCRYPTED_STATE_H
 
 #include <linux/types.h>
+#include <asm/insn.h>
 
+#define GHCB_SEV_INFO		0x001UL
+#define GHCB_SEV_INFO_REQ	0x002UL
+#define		GHCB_INFO(v)		((v) & 0xfffUL)
+#define		GHCB_PROTO_MAX(v)	(((v) >> 48) & 0xffffUL)
+#define		GHCB_PROTO_MIN(v)	(((v) >> 32) & 0xffffUL)
+#define		GHCB_PROTO_OUR		0x0001UL
 #define GHCB_SEV_CPUID_REQ	0x004UL
 #define		GHCB_CPUID_REQ_EAX	0
 #define		GHCB_CPUID_REQ_EBX	1
@@ -21,10 +28,36 @@
 
 #define GHCB_SEV_CPUID_RESP	0x005UL
 #define GHCB_SEV_TERMINATE	0x100UL
+#define		GHCB_SEV_ES_REASON_GENERAL_REQUEST	0
+#define		GHCB_SEV_ES_REASON_PROTOCOL_UNSUPPORTED	1
 
 #define	GHCB_SEV_GHCB_RESP_CODE(v)	((v) & 0xfff)
 #define	VMGEXIT()			{ asm volatile("rep; vmmcall\n\r"); }
 
+enum es_result {
+	ES_OK,			/* All good */
+	ES_UNSUPPORTED,		/* Requested operation not supported */
+	ES_VMM_ERROR,		/* Unexpected state from the VMM */
+	ES_DECODE_FAILED,	/* Instruction decoding failed */
+	ES_EXCEPTION,		/* Instruction caused exception */
+	ES_RETRY,		/* Retry instruction emulation */
+};
+
+struct es_fault_info {
+	unsigned long vector;
+	unsigned long error_code;
+	unsigned long cr2;
+};
+
+struct pt_regs;
+
+/* ES instruction emulation context */
+struct es_em_ctxt {
+	struct pt_regs *regs;
+	struct insn insn;
+	struct es_fault_info fi;
+};
+
 static inline u64 lower_bits(u64 val, unsigned int bits)
 {
 	u64 mask = (1ULL << bits) - 1;
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 2e8a30f06c74..c68d1618c9b0 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -29,6 +29,7 @@
 #define SVM_EXIT_WRITE_DR6     0x036
 #define SVM_EXIT_WRITE_DR7     0x037
 #define SVM_EXIT_EXCP_BASE     0x040
+#define SVM_EXIT_LAST_EXCP     0x05f
 #define SVM_EXIT_INTR          0x060
 #define SVM_EXIT_NMI           0x061
 #define SVM_EXIT_SMI           0x062
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 7edf2dfac71f..f83292c54ab7 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -9,6 +9,135 @@
  * and is included directly into both code-bases.
  */
 
+static void terminate(unsigned int reason)
+{
+	/* Request Guest Termination from Hypvervisor */
+	write_ghcb_msr(GHCB_SEV_TERMINATE);
+	VMGEXIT();
+
+	while (true)
+		asm volatile("hlt\n" : : : "memory");
+}
+
+static bool sev_es_negotiate_protocol(void)
+{
+	u64 val;
+
+	/* Do the GHCB protocol version negotiation */
+	write_ghcb_msr(GHCB_SEV_INFO_REQ);
+	VMGEXIT();
+	val = read_ghcb_msr();
+
+	if (GHCB_INFO(val) != GHCB_SEV_INFO)
+		return false;
+
+	if (GHCB_PROTO_MAX(val) < GHCB_PROTO_OUR ||
+	    GHCB_PROTO_MIN(val) > GHCB_PROTO_OUR)
+		return false;
+
+	return true;
+}
+
+static void ghcb_invalidate(struct ghcb *ghcb)
+{
+	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
+}
+
+static bool valid_cs(struct pt_regs *regs)
+{
+	return (regs->cs == __KERNEL_CS) || (regs->cs == __USER_CS);
+}
+
+static enum es_result decode_insn(struct es_em_ctxt *ctxt)
+{
+	char buffer[MAX_INSN_SIZE];
+	enum es_result ret;
+	unsigned int i;
+
+	if (!valid_cs(ctxt->regs))
+		return ES_UNSUPPORTED;
+
+	/* Fetch instruction */
+	for (i = 0; i < MAX_INSN_SIZE; i++) {
+		ret = es_fetch_insn_byte(ctxt, i, buffer);
+		if (ret != ES_OK)
+			break;
+	}
+
+	insn_init(&ctxt->insn, buffer, i - 1, 1);
+	insn_get_length(&ctxt->insn);
+
+	if (ret != ES_EXCEPTION)
+		ret = ctxt->insn.immediate.got ? ES_OK : ES_DECODE_FAILED;
+
+	return ret;
+}
+
+static bool decoding_needed(unsigned long exit_code)
+{
+	/* Exceptions don't require to decode the instruction */
+	return !(exit_code >= SVM_EXIT_EXCP_BASE &&
+		 exit_code <= SVM_EXIT_LAST_EXCP);
+}
+
+static enum es_result init_em_ctxt(struct es_em_ctxt *ctxt,
+				   struct pt_regs *regs,
+				   unsigned long exit_code)
+{
+	enum es_result ret = ES_OK;
+
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->regs = regs;
+
+	if (decoding_needed(exit_code))
+		ret = decode_insn(ctxt);
+
+	return ret;
+}
+
+static void finish_insn(struct es_em_ctxt *ctxt)
+{
+	ctxt->regs->ip += ctxt->insn.length;
+}
+
+static enum es_result ghcb_hv_call(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+				   u64 exit_code, u64 exit_info_1,
+				   u64 exit_info_2)
+{
+	enum es_result ret;
+
+	ghcb_set_sw_exit_code(ghcb, exit_code);
+	ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
+	ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
+
+	write_ghcb_msr(__pa(ghcb));
+	VMGEXIT();
+
+	if ((ghcb->save.sw_exit_info_1 & 0xffffffff) == 1) {
+		u64 info = ghcb->save.sw_exit_info_2;
+		unsigned long v;
+
+		info = ghcb->save.sw_exit_info_2;
+		v = info & SVM_EVTINJ_VEC_MASK;
+
+		/* Check if exception information from hypervisor is sane. */
+		if ((info & SVM_EVTINJ_VALID) &&
+		    ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+		    ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+			ctxt->fi.vector = v;
+			if (info & SVM_EVTINJ_VALID_ERR)
+				ctxt->fi.error_code = info >> 32;
+			ret = ES_EXCEPTION;
+		} else {
+			ret = ES_VMM_ERROR;
+		}
+	} else {
+		ret = ES_OK;
+	}
+
+	return ret;
+}
+
 /*
  * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
  * page yet, so it only supports the MSR based communication with the
@@ -64,3 +193,45 @@ void __init no_ghcb_vc_handler(struct pt_regs *regs)
 	while (true)
 		asm volatile("hlt\n");
 }
+
+static enum es_result insn_string_read(struct es_em_ctxt *ctxt,
+				       void *src, char *buf,
+				       unsigned int data_size,
+				       unsigned int count,
+				       bool backwards)
+{
+	int i, b = backwards ? -1 : 1;
+	enum es_result ret = ES_OK;
+
+	for (i = 0; i < count; i++) {
+		void *s = src + (i * data_size * b);
+		char *d = buf + (i * data_size);
+
+		ret = es_read_mem(ctxt, s, d, data_size);
+		if (ret != ES_OK)
+			break;
+	}
+
+	return ret;
+}
+
+static enum es_result insn_string_write(struct es_em_ctxt *ctxt,
+					void *dst, char *buf,
+					unsigned int data_size,
+					unsigned int count,
+					bool backwards)
+{
+	int i, s = backwards ? -1 : 1;
+	enum es_result ret = ES_OK;
+
+	for (i = 0; i < count; i++) {
+		void *d = dst + (i * data_size * s);
+		char *b = buf + (i * data_size);
+
+		ret = es_write_mem(ctxt, d, b, data_size);
+		if (ret != ES_OK)
+			break;
+	}
+
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (17 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:28   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 20/62] x86/fpu: Move xgetbv()/xsetbv() into separate header Joerg Roedel
                   ` (44 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add support for decoding and handling #VC exceptions for IOIO events.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapted code to #VC handling framework ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/sev-es.c |   3 +
 arch/x86/kernel/sev-es-shared.c   | 214 ++++++++++++++++++++++++++++++
 2 files changed, 217 insertions(+)

diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index 02fb6f57128b..b2a2d068dc12 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -114,6 +114,9 @@ void boot_vc_handler(struct pt_regs *regs)
 		goto finish;
 
 	switch (exit_code) {
+	case SVM_EXIT_IOIO:
+		result = handle_ioio(boot_ghcb, &ctxt);
+		break;
 	default:
 		result = ES_UNSUPPORTED;
 		break;
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index f83292c54ab7..bd21a79da084 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -235,3 +235,217 @@ static enum es_result insn_string_write(struct es_em_ctxt *ctxt,
 
 	return ret;
 }
+
+#define IOIO_TYPE_STR  BIT(2)
+#define IOIO_TYPE_IN   1
+#define IOIO_TYPE_INS  (IOIO_TYPE_IN | IOIO_TYPE_STR)
+#define IOIO_TYPE_OUT  0
+#define IOIO_TYPE_OUTS (IOIO_TYPE_OUT | IOIO_TYPE_STR)
+
+#define IOIO_REP       BIT(3)
+
+#define IOIO_ADDR_64   BIT(9)
+#define IOIO_ADDR_32   BIT(8)
+#define IOIO_ADDR_16   BIT(7)
+
+#define IOIO_DATA_32   BIT(6)
+#define IOIO_DATA_16   BIT(5)
+#define IOIO_DATA_8    BIT(4)
+
+#define IOIO_SEG_ES    (0 << 10)
+#define IOIO_SEG_DS    (3 << 10)
+
+static bool insn_repmode(struct insn *insn)
+{
+	unsigned int i;
+
+	for (i = 0; i < insn->prefixes.nbytes; i++) {
+		switch (insn->prefixes.bytes[i]) {
+		case 0xf2:
+		case 0xf3:
+			return true;
+		}
+	}
+
+	return false;
+}
+
+
+static enum es_result ioio_exitinfo(struct es_em_ctxt *ctxt, u64 *exitinfo)
+{
+	struct insn *insn = &ctxt->insn;
+	*exitinfo = 0;
+
+	switch (insn->opcode.bytes[0]) {
+	/* INS opcodes */
+	case 0x6c:
+	case 0x6d:
+		*exitinfo |= IOIO_TYPE_INS;
+		*exitinfo |= IOIO_SEG_ES;
+		*exitinfo |= (ctxt->regs->dx & 0xffff) << 16;
+		break;
+
+	/* OUTS opcodes */
+	case 0x6e:
+	case 0x6f:
+		*exitinfo |= IOIO_TYPE_OUTS;
+		*exitinfo |= IOIO_SEG_DS;
+		*exitinfo |= (ctxt->regs->dx & 0xffff) << 16;
+		break;
+
+	/* IN immediate opcodes */
+	case 0xe4:
+	case 0xe5:
+		*exitinfo |= IOIO_TYPE_IN;
+		*exitinfo |= insn->immediate.value << 16;
+		break;
+
+	/* OUT immediate opcodes */
+	case 0xe6:
+	case 0xe7:
+		*exitinfo |= IOIO_TYPE_OUT;
+		*exitinfo |= insn->immediate.value << 16;
+		break;
+
+	/* IN register opcodes */
+	case 0xec:
+	case 0xed:
+		*exitinfo |= IOIO_TYPE_IN;
+		*exitinfo |= (ctxt->regs->dx & 0xffff) << 16;
+		break;
+
+	/* OUT register opcodes */
+	case 0xee:
+	case 0xef:
+		*exitinfo |= IOIO_TYPE_OUT;
+		*exitinfo |= (ctxt->regs->dx & 0xffff) << 16;
+		break;
+
+	default:
+		return ES_DECODE_FAILED;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0x6c:
+	case 0x6e:
+	case 0xe4:
+	case 0xe6:
+	case 0xec:
+	case 0xee:
+		/* Single byte opcodes */
+		*exitinfo |= IOIO_DATA_8;
+		break;
+	default:
+		/* Length determined by instruction parsing */
+		*exitinfo |= (insn->opnd_bytes == 2) ? IOIO_DATA_16
+						     : IOIO_DATA_32;
+	}
+	switch (insn->addr_bytes) {
+	case 2:
+		*exitinfo |= IOIO_ADDR_16;
+		break;
+	case 4:
+		*exitinfo |= IOIO_ADDR_32;
+		break;
+	case 8:
+		*exitinfo |= IOIO_ADDR_64;
+		break;
+	}
+
+	if (insn_repmode(insn))
+		*exitinfo |= IOIO_REP;
+
+	return ES_OK;
+}
+
+static enum es_result handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	struct pt_regs *regs = ctxt->regs;
+	u64 exit_info_1, exit_info_2;
+	enum es_result ret;
+
+	ret = ioio_exitinfo(ctxt, &exit_info_1);
+	if (ret != ES_OK)
+		return ret;
+
+	if (exit_info_1 & IOIO_TYPE_STR) {
+		int df = (regs->flags & X86_EFLAGS_DF) ? -1 : 1;
+		unsigned int io_bytes, exit_bytes;
+		unsigned int ghcb_count, op_count;
+		u64 sw_scratch;
+
+		/*
+		 * For the string variants with rep prefix the amount of in/out
+		 * operations per #VC exception is limited so that the kernel
+		 * has a chance to take interrupts an re-schedule while the
+		 * instruction is emulated.
+		 */
+		io_bytes   = (exit_info_1 >> 4) & 0x7;
+		ghcb_count = sizeof(ghcb->shared_buffer) / io_bytes;
+
+		op_count    = (exit_info_1 & IOIO_REP) ? regs->cx : 1;
+		exit_info_2 = min(op_count, ghcb_count);
+		exit_bytes  = exit_info_2 * io_bytes;
+
+		if (!(exit_info_1 & IOIO_TYPE_IN)) {
+			ret = insn_string_read(ctxt, (void *)regs->si,
+					       ghcb->shared_buffer, io_bytes,
+					       exit_info_2, df);
+			if (ret)
+				return ret;
+		}
+
+		sw_scratch = __pa(ghcb) + offsetof(struct ghcb, shared_buffer);
+		ghcb_set_sw_scratch(ghcb, sw_scratch);
+		ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_IOIO,
+				   exit_info_1, exit_info_2);
+		if (ret != ES_OK)
+			return ret;
+
+		/* Everything went well, write back results */
+		if (exit_info_1 & IOIO_TYPE_IN) {
+			ret = insn_string_write(ctxt, (void *)regs->di,
+						ghcb->shared_buffer, io_bytes,
+						exit_info_2, df);
+			if (ret)
+				return ret;
+
+			if (df)
+				regs->di -= exit_bytes;
+			else
+				regs->di += exit_bytes;
+		} else {
+			if (df)
+				regs->si -= exit_bytes;
+			else
+				regs->si += exit_bytes;
+		}
+
+		if (exit_info_1 & IOIO_REP)
+			regs->cx -= exit_info_2;
+
+		ret = regs->cx ? ES_RETRY : ES_OK;
+
+	} else {
+		int bits = (exit_info_1 & 0x70) >> 1;
+		u64 rax = 0;
+
+		if (!(exit_info_1 & IOIO_TYPE_IN))
+			rax = lower_bits(regs->ax, bits);
+
+		ghcb_set_rax(ghcb, rax);
+
+		ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_IOIO, exit_info_1, 0);
+		if (ret != ES_OK)
+			return ret;
+
+		if (exit_info_1 & IOIO_TYPE_IN) {
+			if (!ghcb_is_valid_rax(ghcb))
+				return ES_VMM_ERROR;
+			regs->ax = copy_lower_bits(regs->ax, ghcb->save.rax,
+						   bits);
+		}
+	}
+
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 20/62] x86/fpu: Move xgetbv()/xsetbv() into separate header
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (18 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 21/62] x86/sev-es: Add CPUID handling to #VC handler Joerg Roedel
                   ` (43 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The xgetbv() function is needed in pre-decompression boot code, but
asm/fpu/internal.h can't be included there directly. Doing so opens
the door to include-hell due to various include-magic in
boot/compressed/misc.h.

Avoid that by moving xgetbv()/xsetbv() to a separate header file and
include this instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/fpu/internal.h | 29 +-------------------------
 arch/x86/include/asm/fpu/xcr.h      | 32 +++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 28 deletions(-)
 create mode 100644 arch/x86/include/asm/fpu/xcr.h

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 44c48e34d799..795fc049988e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -19,6 +19,7 @@
 #include <asm/user.h>
 #include <asm/fpu/api.h>
 #include <asm/fpu/xstate.h>
+#include <asm/fpu/xcr.h>
 #include <asm/cpufeature.h>
 #include <asm/trace/fpu.h>
 
@@ -614,32 +615,4 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
 	}
 	__write_pkru(pkru_val);
 }
-
-/*
- * MXCSR and XCR definitions:
- */
-
-extern unsigned int mxcsr_feature_mask;
-
-#define XCR_XFEATURE_ENABLED_MASK	0x00000000
-
-static inline u64 xgetbv(u32 index)
-{
-	u32 eax, edx;
-
-	asm volatile(".byte 0x0f,0x01,0xd0" /* xgetbv */
-		     : "=a" (eax), "=d" (edx)
-		     : "c" (index));
-	return eax + ((u64)edx << 32);
-}
-
-static inline void xsetbv(u32 index, u64 value)
-{
-	u32 eax = value;
-	u32 edx = value >> 32;
-
-	asm volatile(".byte 0x0f,0x01,0xd1" /* xsetbv */
-		     : : "a" (eax), "d" (edx), "c" (index));
-}
-
 #endif /* _ASM_X86_FPU_INTERNAL_H */
diff --git a/arch/x86/include/asm/fpu/xcr.h b/arch/x86/include/asm/fpu/xcr.h
new file mode 100644
index 000000000000..91ee45712737
--- /dev/null
+++ b/arch/x86/include/asm/fpu/xcr.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_FPU_XCR_H
+#define _ASM_X86_FPU_XCR_H
+
+/*
+ * MXCSR and XCR definitions:
+ */
+
+extern unsigned int mxcsr_feature_mask;
+
+#define XCR_XFEATURE_ENABLED_MASK	0x00000000
+
+static inline u64 xgetbv(u32 index)
+{
+	u32 eax, edx;
+
+	asm volatile(".byte 0x0f,0x01,0xd0" /* xgetbv */
+		     : "=a" (eax), "=d" (edx)
+		     : "c" (index));
+	return eax + ((u64)edx << 32);
+}
+
+static inline void xsetbv(u32 index, u64 value)
+{
+	u32 eax = value;
+	u32 edx = value >> 32;
+
+	asm volatile(".byte 0x0f,0x01,0xd1" /* xsetbv */
+		     : : "a" (eax), "d" (edx), "c" (index));
+}
+
+#endif /* _ASM_X86_FPU_XCR_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 21/62] x86/sev-es: Add CPUID handling to #VC handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (19 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 20/62] x86/fpu: Move xgetbv()/xsetbv() into separate header Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 22/62] x86/sev-es: Add handler for MMIO events Joerg Roedel
                   ` (42 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Handle #VC exceptions caused by CPUID instructions. These happen in
early boot code when the KASLR code checks for RDTSC.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling framework ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/sev-es.c |  4 ++++
 arch/x86/kernel/sev-es-shared.c   | 34 +++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index b2a2d068dc12..270a23c05f53 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -10,6 +10,7 @@
 #include <asm/sev-es.h>
 #include <asm/trap_defs.h>
 #include <asm/msr-index.h>
+#include <asm/fpu/xcr.h>
 #include <asm/ptrace.h>
 #include <asm/svm.h>
 
@@ -117,6 +118,9 @@ void boot_vc_handler(struct pt_regs *regs)
 	case SVM_EXIT_IOIO:
 		result = handle_ioio(boot_ghcb, &ctxt);
 		break;
+	case SVM_EXIT_CPUID:
+		result = handle_cpuid(boot_ghcb, &ctxt);
+		break;
 	default:
 		result = ES_UNSUPPORTED;
 		break;
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index bd21a79da084..0f422e3b2077 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -449,3 +449,37 @@ static enum es_result handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 
 	return ret;
 }
+
+static enum es_result handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	struct pt_regs *regs = ctxt->regs;
+	u32 cr4 = native_read_cr4();
+	enum es_result ret;
+
+	ghcb_set_rax(ghcb, regs->ax);
+	ghcb_set_rcx(ghcb, regs->cx);
+
+	if (cr4 & X86_CR4_OSXSAVE)
+		/* Safe to read xcr0 */
+		ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
+	else
+		/* xgetbv will cause #GP - use reset value for xcr0 */
+		ghcb_set_xcr0(ghcb, 1);
+
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
+	if (ret != ES_OK)
+		return ret;
+
+	if (!(ghcb_is_valid_rax(ghcb) &&
+	      ghcb_is_valid_rbx(ghcb) &&
+	      ghcb_is_valid_rcx(ghcb) &&
+	      ghcb_is_valid_rdx(ghcb)))
+		return ES_VMM_ERROR;
+
+	regs->ax = ghcb->save.rax;
+	regs->bx = ghcb->save.rbx;
+	regs->cx = ghcb->save.rcx;
+	regs->dx = ghcb->save.rdx;
+
+	return ES_OK;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 22/62] x86/sev-es: Add handler for MMIO events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (20 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 21/62] x86/sev-es: Add CPUID handling to #VC handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 23/62] x86/idt: Move IDT to data segment Joerg Roedel
                   ` (41 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add handler for VC exceptions caused by MMIO intercepts. These
intercepts come along as nested page faults on pages with reserved
bits set.

TODO:
	- Add return values of helper functions
	- Check permissions on page-table walks
	- Check data segments
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to VC handling framework ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/boot/compressed/sev-es.c |   8 +
 arch/x86/include/uapi/asm/svm.h   |   5 +
 arch/x86/kernel/sev-es-shared.c   | 236 ++++++++++++++++++++++++++++++
 3 files changed, 249 insertions(+)

diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
index 270a23c05f53..55a78b42a2f2 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -67,6 +67,11 @@ static enum es_result es_read_mem(struct es_em_ctxt *ctxt,
 	return ES_OK;
 }
 
+static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
+{
+	return (phys_addr_t)vaddr;
+}
+
 #undef __init
 #undef __pa
 #define __init
@@ -121,6 +126,9 @@ void boot_vc_handler(struct pt_regs *regs)
 	case SVM_EXIT_CPUID:
 		result = handle_cpuid(boot_ghcb, &ctxt);
 		break;
+	case SVM_EXIT_NPF:
+		result = handle_mmio(boot_ghcb, &ctxt);
+		break;
 	default:
 		result = ES_UNSUPPORTED;
 		break;
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index c68d1618c9b0..8f36ae021a7f 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -81,6 +81,11 @@
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
 
+/* SEV-ES software-defined VMGEXIT events */
+#define SVM_VMGEXIT_MMIO_READ			0x80000001
+#define SVM_VMGEXIT_MMIO_WRITE			0x80000002
+#define SVM_VMGEXIT_UNSUPPORTED_EVENT		0x8000ffff
+
 #define SVM_EXIT_ERR           -1
 
 #define SVM_EXIT_REASONS \
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 0f422e3b2077..14693eff9614 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -483,3 +483,239 @@ static enum es_result handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 
 	return ES_OK;
 }
+
+/* Map from x86 register index to pt_regs offset */
+static unsigned long *register_from_idx(struct pt_regs *regs, u8 reg)
+{
+	static int reg2pt_regs[] = {
+		10, 11, 12, 5, 19, 4, 13, 14, 9, 8, 7, 6, 3, 2, 1, 0
+	};
+	unsigned long *regs_array = (unsigned long *)regs;
+
+	if (WARN_ONCE(reg > 15, "register index is not valid: %#hhx\n", reg))
+		return NULL;
+
+	return &regs_array[reg2pt_regs[reg]];
+}
+
+static u64 insn_get_eff_addr(struct es_em_ctxt *ctxt)
+{
+	u64 effective_addr;
+	u8 mod, rm;
+
+	if (!ctxt->insn.modrm.nbytes)
+		return 0;
+
+	if (insn_rip_relative(&ctxt->insn))
+		return ctxt->regs->ip + ctxt->insn.displacement.value;
+
+	mod = X86_MODRM_MOD(ctxt->insn.modrm.value);
+	rm = X86_MODRM_RM(ctxt->insn.modrm.value);
+
+	if (ctxt->insn.rex_prefix.nbytes &&
+	    X86_REX_B(ctxt->insn.rex_prefix.value))
+		rm |= 0x8;
+
+	if (mod == 3)
+		return *register_from_idx(ctxt->regs, rm);
+
+	switch (mod) {
+	case 1:
+	case 2:
+		effective_addr = ctxt->insn.displacement.value;
+		break;
+	default:
+		effective_addr = 0;
+	}
+
+	if (ctxt->insn.sib.nbytes) {
+		u8 scale, index, base;
+
+		scale = X86_SIB_SCALE(ctxt->insn.sib.value);
+		index = X86_SIB_INDEX(ctxt->insn.sib.value);
+		base = X86_SIB_BASE(ctxt->insn.sib.value);
+		if (ctxt->insn.rex_prefix.nbytes &&
+		    X86_REX_X(ctxt->insn.rex_prefix.value))
+			index |= 0x8;
+		if (ctxt->insn.rex_prefix.nbytes &&
+		    X86_REX_B(ctxt->insn.rex_prefix.value))
+			base |= 0x8;
+
+		if (index != 4)
+			effective_addr += (*register_from_idx(ctxt->regs, index)
+					   << scale);
+
+		if ((base != 5) || mod)
+			effective_addr += *register_from_idx(ctxt->regs, base);
+		else
+			effective_addr += ctxt->insn.displacement.value;
+	} else {
+		effective_addr += *register_from_idx(ctxt->regs, rm);
+	}
+
+	return effective_addr;
+}
+
+static unsigned long *insn_get_reg(struct es_em_ctxt *ctxt)
+{
+	u8 reg;
+
+	if (!ctxt->insn.modrm.nbytes)
+		return NULL;
+
+	reg = X86_MODRM_REG(ctxt->insn.modrm.value);
+	if (ctxt->insn.rex_prefix.nbytes &&
+	    X86_REX_R(ctxt->insn.rex_prefix.value))
+		reg |= 0x8;
+
+	return register_from_idx(ctxt->regs, reg);
+}
+
+static enum es_result do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+			      unsigned int bytes, bool read)
+{
+	u64 exit_code, exit_info_1, exit_info_2;
+	unsigned long ghcb_pa = __pa(ghcb);
+
+	/* Register-direct addressing mode not supported with MMIO */
+	if (X86_MODRM_MOD(ctxt->insn.modrm.value) == 3)
+		return ES_UNSUPPORTED;
+
+	exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE;
+
+	exit_info_1 = insn_get_eff_addr(ctxt);
+	exit_info_1 = es_slow_virt_to_phys(ghcb, exit_info_1);
+	exit_info_2 = bytes;    /* Can never be greater than 8 */
+
+	ghcb->save.sw_scratch = ghcb_pa + offsetof(struct ghcb, shared_buffer);
+
+	return ghcb_hv_call(ghcb, ctxt, exit_code, exit_info_1, exit_info_2);
+}
+
+static enum es_result handle_mmio_twobyte_ops(struct ghcb *ghcb,
+					      struct es_em_ctxt *ctxt)
+{
+	struct insn *insn = &ctxt->insn;
+	unsigned int bytes = 0;
+	enum es_result ret;
+	int sign_byte;
+	long *reg_data;
+
+	switch (insn->opcode.bytes[1]) {
+		/* MMIO Read w/ zero-extension */
+	case 0xb6:
+		bytes = 1;
+		/* Fallthrough */
+	case 0xb7:
+		if (!bytes)
+			bytes = 2;
+
+		ret = do_mmio(ghcb, ctxt, bytes, true);
+		if (ret)
+			break;
+
+		/* Zero extend based on operand size */
+		reg_data = insn_get_reg(ctxt);
+		memset(reg_data, 0, insn->opnd_bytes);
+
+		memcpy(reg_data, ghcb->shared_buffer, bytes);
+		break;
+
+		/* MMIO Read w/ sign-extension */
+	case 0xbe:
+		bytes = 1;
+		/* Fallthrough */
+	case 0xbf:
+		if (!bytes)
+			bytes = 2;
+
+		ret = do_mmio(ghcb, ctxt, bytes, true);
+		if (ret)
+			break;
+
+		/* Sign extend based on operand size */
+		reg_data = insn_get_reg(ctxt);
+		if (bytes == 1) {
+			u8 *val = (u8 *)ghcb->shared_buffer;
+
+			sign_byte = (*val & 0x80) ? 0x00 : 0xff;
+		} else {
+			u16 *val = (u16 *)ghcb->shared_buffer;
+
+			sign_byte = (*val & 0x8000) ? 0x00 : 0xff;
+		}
+		memset(reg_data, sign_byte, insn->opnd_bytes);
+
+		memcpy(reg_data, ghcb->shared_buffer, bytes);
+		break;
+
+	default:
+		ret = ES_UNSUPPORTED;
+	}
+
+	return ret;
+}
+
+static enum es_result handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	struct insn *insn = &ctxt->insn;
+	unsigned int bytes = 0;
+	enum es_result ret;
+	long *reg_data;
+
+	switch (insn->opcode.bytes[0]) {
+	/* MMIO Write */
+	case 0x88:
+		bytes = 1;
+		/* Fallthrough */
+	case 0x89:
+		if (!bytes)
+			bytes = insn->opnd_bytes;
+
+		reg_data = insn_get_reg(ctxt);
+		memcpy(ghcb->shared_buffer, reg_data, bytes);
+
+		ret = do_mmio(ghcb, ctxt, bytes, false);
+		break;
+
+	case 0xc6:
+		bytes = 1;
+		/* Fallthrough */
+	case 0xc7:
+		if (!bytes)
+			bytes = insn->opnd_bytes;
+
+		memcpy(ghcb->shared_buffer, insn->immediate1.bytes, bytes);
+
+		ret = do_mmio(ghcb, ctxt, bytes, false);
+		break;
+
+		/* MMIO Read */
+	case 0x8a:
+		bytes = 1;
+		/* Fallthrough */
+	case 0x8b:
+		if (!bytes)
+			bytes = insn->opnd_bytes;
+
+		ret = do_mmio(ghcb, ctxt, bytes, true);
+		if (ret)
+			break;
+
+		reg_data = insn_get_reg(ctxt);
+		if (bytes == 4)
+			*reg_data = 0;  /* Zero-extend for 32-bit operation */
+
+		memcpy(reg_data, ghcb->shared_buffer, bytes);
+		break;
+
+		/* Two-Byte Opcodes */
+	case 0x0f:
+		ret = handle_mmio_twobyte_ops(ghcb, ctxt);
+		break;
+	default:
+		ret = ES_UNSUPPORTED;
+	}
+
+	return ret;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (21 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 22/62] x86/sev-es: Add handler for MMIO events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:41   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 24/62] x86/idt: Split idt_data setup out of set_intr_gate() Joerg Roedel
                   ` (40 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

With SEV-ES, exception handling is needed very early, even before the
kernel has cleared the bss segment. In order to prevent clearing the
currently used IDT, move the IDT to the data segment.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/idt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 87ef69a72c52..7f81c1294847 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -166,7 +166,7 @@ static const __initconst struct idt_data dbg_idts[] = {
 #endif
 
 /* Must be page-aligned because the real IDT is used in a fixmap. */
-gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
+gate_desc idt_table[IDT_ENTRIES] __page_aligned_data;
 
 struct desc_ptr idt_descr __ro_after_init = {
 	.size		= (IDT_ENTRIES * 2 * sizeof(unsigned long)) - 1,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 24/62] x86/idt: Split idt_data setup out of set_intr_gate()
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (22 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 23/62] x86/idt: Move IDT to data segment Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 25/62] x86/head/64: Install boot GDT Joerg Roedel
                   ` (39 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The code to setup idt_data is needed for early exception handling, but
set_intr_gate() can't be used that early because it has pv-ops in its
code path, which don't work that early.

Split out the idt_data initialization part from set_intr_gate() so
that it can be used separatly.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/idt.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 7f81c1294847..7d8fa631dca9 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -227,18 +227,24 @@ idt_setup_from_table(gate_desc *idt, const struct idt_data *t, int size, bool sy
 	}
 }
 
+static void init_idt_data(struct idt_data *data, unsigned int n,
+			  const void *addr)
+{
+	BUG_ON(n > 0xFF);
+
+	memset(data, 0, sizeof(*data));
+	data->vector	= n;
+	data->addr	= addr;
+	data->segment	= __KERNEL_CS;
+	data->bits.type	= GATE_INTERRUPT;
+	data->bits.p	= 1;
+}
+
 static void set_intr_gate(unsigned int n, const void *addr)
 {
 	struct idt_data data;
 
-	BUG_ON(n > 0xFF);
-
-	memset(&data, 0, sizeof(data));
-	data.vector	= n;
-	data.addr	= addr;
-	data.segment	= __KERNEL_CS;
-	data.bits.type	= GATE_INTERRUPT;
-	data.bits.p	= 1;
+	init_idt_data(&data, n, addr);
 
 	idt_setup_from_table(idt_table, &data, 1, false);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 25/62] x86/head/64: Install boot GDT
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (23 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 24/62] x86/idt: Split idt_data setup out of set_intr_gate() Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:29   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 26/62] x86/head/64: Reload GDT after switch to virtual addresses Joerg Roedel
                   ` (38 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Handling exceptions during boot requires a working GDT. The kernel GDT
is not yet ready for use, so install a temporary boot GDT.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4bbc770af632..5a3cde971cb7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -72,6 +72,20 @@ SYM_CODE_START_NOALIGN(startup_64)
 	/* Set up the stack for verify_cpu(), similar to initial_stack below */
 	leaq	(__end_init_task - SIZEOF_PTREGS)(%rip), %rsp
 
+	/* Setup boot GDT descriptor and load boot GDT */
+	leaq	boot_gdt(%rip), %rax
+	movq	%rax, boot_gdt_base(%rip)
+	lgdt	boot_gdt_descr(%rip)
+
+	/* GDT loaded - switch to __KERNEL_CS so IRET works reliably */
+	pushq	$__KERNEL_CS
+	leaq	.Lon_kernel_cs(%rip), %rax
+	pushq	%rax
+	lretq
+
+.Lon_kernel_cs:
+	UNWIND_HINT_EMPTY
+
 	/* Sanitize CPU configuration */
 	call verify_cpu
 
@@ -480,6 +494,18 @@ SYM_DATA_LOCAL(early_gdt_descr_base,	.quad INIT_PER_CPU_VAR(gdt_page))
 SYM_DATA(phys_base, .quad 0x0)
 EXPORT_SYMBOL(phys_base)
 
+/* Boot GDT used when kernel addresses are not mapped yet */
+SYM_DATA_LOCAL(boot_gdt_descr,		.word boot_gdt_end - boot_gdt)
+SYM_DATA_LOCAL(boot_gdt_base,		.quad 0)
+SYM_DATA_START(boot_gdt)
+	.quad	0
+	.quad   0x00cf9a000000ffff      /* __KERNEL32_CS */
+	.quad   0x00af9a000000ffff      /* __KERNEL_CS */
+	.quad   0x00cf92000000ffff      /* __KERNEL_DS */
+	.quad   0x0080890000000000      /* TS descriptor */
+	.quad   0x0000000000000000      /* TS continued */
+SYM_DATA_END_LABEL(boot_gdt, SYM_L_LOCAL, boot_gdt_end)
+
 #include "../../x86/xen/xen-head.S"
 
 	__PAGE_ALIGNED_BSS
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 26/62] x86/head/64: Reload GDT after switch to virtual addresses
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (24 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 25/62] x86/head/64: Install boot GDT Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 27/62] x86/head/64: Load segment registers earlier Joerg Roedel
                   ` (37 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Reload the GDT after switching to virtual addresses to make sure it will
not go away when the lower mappings are removed.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_64.S | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 5a3cde971cb7..a3a9383e8dd6 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -157,6 +157,11 @@ SYM_CODE_START(secondary_startup_64)
 1:
 	UNWIND_HINT_EMPTY
 
+	/* Setup boot GDT descriptor and load boot GDT */
+	leaq	boot_gdt(%rip), %rax
+	movq	%rax, boot_gdt_base(%rip)
+	lgdt	boot_gdt_descr(%rip)
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 27/62] x86/head/64: Load segment registers earlier
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (25 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 26/62] x86/head/64: Reload GDT after switch to virtual addresses Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 28/62] x86/head/64: Switch to initial stack earlier Joerg Roedel
                   ` (36 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Make sure segments are properly set up before setting up an IDT and
doing anything that might cause a #VC exception. This is later needed
for early exception handling.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_64.S | 52 +++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index a3a9383e8dd6..36f2f30ad200 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -162,6 +162,32 @@ SYM_CODE_START(secondary_startup_64)
 	movq	%rax, boot_gdt_base(%rip)
 	lgdt	boot_gdt_descr(%rip)
 
+	/* set up data segments */
+	xorl %eax,%eax
+	movl %eax,%ds
+	movl %eax,%ss
+	movl %eax,%es
+
+	/*
+	 * We don't really need to load %fs or %gs, but load them anyway
+	 * to kill any stale realmode selectors.  This allows execution
+	 * under VT hardware.
+	 */
+	movl %eax,%fs
+	movl %eax,%gs
+
+	/* Set up %gs.
+	 *
+	 * The base of %gs always points to fixed_percpu_data. If the
+	 * stack protector canary is enabled, it is located at %gs:40.
+	 * Note that, on SMP, the boot cpu uses init data section until
+	 * the per cpu areas are set up.
+	 */
+	movl	$MSR_GS_BASE,%ecx
+	movl	initial_gs(%rip),%eax
+	movl	initial_gs+4(%rip),%edx
+	wrmsr
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
@@ -197,32 +223,6 @@ SYM_CODE_START(secondary_startup_64)
 	 */
 	lgdt	early_gdt_descr(%rip)
 
-	/* set up data segments */
-	xorl %eax,%eax
-	movl %eax,%ds
-	movl %eax,%ss
-	movl %eax,%es
-
-	/*
-	 * We don't really need to load %fs or %gs, but load them anyway
-	 * to kill any stale realmode selectors.  This allows execution
-	 * under VT hardware.
-	 */
-	movl %eax,%fs
-	movl %eax,%gs
-
-	/* Set up %gs.
-	 *
-	 * The base of %gs always points to fixed_percpu_data. If the
-	 * stack protector canary is enabled, it is located at %gs:40.
-	 * Note that, on SMP, the boot cpu uses init data section until
-	 * the per cpu areas are set up.
-	 */
-	movl	$MSR_GS_BASE,%ecx
-	movl	initial_gs(%rip),%eax
-	movl	initial_gs+4(%rip),%edx
-	wrmsr
-
 	/* rsi is pointer to real mode structure with interesting info.
 	   pass it to C */
 	movq	%rsi, %rdi
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 28/62] x86/head/64: Switch to initial stack earlier
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (26 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 27/62] x86/head/64: Load segment registers earlier Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 29/62] x86/head/64: Load IDT earlier Joerg Roedel
                   ` (35 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Make sure there is a stack once the kernel runs from virual addresses.
At this stage any secondary CPU which boots will have lost its stack
because the kernel switched to a new page-table which does not map the
real-mode stack anymore.

This is also needed for handling early #VC exceptions caused by
instructions like CPUID.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_64.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 36f2f30ad200..eefd6838b895 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -188,6 +188,12 @@ SYM_CODE_START(secondary_startup_64)
 	movl	initial_gs+4(%rip),%edx
 	wrmsr
 
+	/*
+	 * Setup a boot time stack - Any secondary CPU will have lost its stack
+	 * by now because the cr3-switch above unmaps the real-mode stack
+	 */
+	movq initial_stack(%rip), %rsp
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
@@ -208,9 +214,6 @@ SYM_CODE_START(secondary_startup_64)
 	/* Make changes effective */
 	movq	%rax, %cr0
 
-	/* Setup a boot time stack */
-	movq initial_stack(%rip), %rsp
-
 	/* zero EFLAGS after setting rsp */
 	pushq $0
 	popfq
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 29/62] x86/head/64: Load IDT earlier
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (27 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 28/62] x86/head/64: Switch to initial stack earlier Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 30/62] x86/head/64: Move early exception dispatch to C code Joerg Roedel
                   ` (34 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Load the IDT right after switching to virtual addresses in head_64.S
so that the kernel can handle #VC exceptions.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/desc.h |  1 +
 arch/x86/kernel/head64.c    |  7 +++++++
 arch/x86/kernel/head_64.S   | 17 +++++++++++++++++
 arch/x86/kernel/idt.c       | 22 ++++++++++++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 68a99d2a5f33..8a4c642ee2b3 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -440,6 +440,7 @@ extern void idt_setup_apic_and_irq_gates(void);
 extern void idt_setup_early_pf(void);
 extern void idt_setup_ist_traps(void);
 extern void idt_setup_debugidt_traps(void);
+extern void setup_early_handlers(gate_desc *idt);
 #else
 static inline void idt_setup_early_pf(void) { }
 static inline void idt_setup_ist_traps(void) { }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 206a4b6144c2..7cdfb7113811 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -489,3 +489,10 @@ void __init x86_64_start_reservations(char *real_mode_data)
 
 	start_kernel();
 }
+
+void __head early_idt_setup_early_handler(unsigned long physaddr)
+{
+	gate_desc *idt = fixup_pointer(idt_table, physaddr);
+
+	setup_early_handlers(idt);
+}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index eefd6838b895..0af79f783659 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -98,6 +98,20 @@ SYM_CODE_START_NOALIGN(startup_64)
 	leaq	_text(%rip), %rdi
 	pushq	%rsi
 	call	__startup_64
+	/* Save return value */
+	pushq	%rax
+
+	/*
+	 * Load IDT with early handlers - needed for SEV-ES
+	 * Do this here because this must only happen on the boot CPU
+	 * and the code below is shared with secondary CPU bringup.
+	 */
+	leaq	_text(%rip), %rdi
+	call	early_idt_setup_early_handler
+
+	/* Restore __startup_64 return value*/
+	popq	%rax
+	/* Restore pointer to real_mode_data */
 	popq	%rsi
 
 	/* Form the CR3 value being sure to include the CR3 modifier */
@@ -194,6 +208,9 @@ SYM_CODE_START(secondary_startup_64)
 	 */
 	movq initial_stack(%rip), %rsp
 
+	/* Load IDT */
+	lidt	idt_descr(%rip)
+
 	/* Check if nx is implemented */
 	movl	$0x80000001, %eax
 	cpuid
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 7d8fa631dca9..84250c090596 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -347,6 +347,28 @@ void __init idt_setup_early_handler(void)
 	load_idt(&idt_descr);
 }
 
+#ifdef CONFIG_X86_64
+/*
+ * This function does the same as idt_setup_early_handler(), but is
+ * called directly from head_64.S before the kernel switches to virtual
+ * addresses.  PV-ops don't work at that point, so set_intr_gate() can't
+ * be used here.
+ */
+void __init setup_early_handlers(gate_desc *idt)
+{
+	int i;
+
+	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++) {
+		struct idt_data data;
+		gate_desc desc;
+
+		init_idt_data(&data, i, early_idt_handler_array[i]);
+		idt_init_desc(&desc, &data);
+		native_write_idt_entry(idt, i, &desc);
+	}
+}
+#endif
+
 /**
  * idt_invalidate - Invalidate interrupt descriptor table
  * @addr:	The virtual address of the 'invalid' IDT
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 30/62] x86/head/64: Move early exception dispatch to C code
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (28 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 29/62] x86/head/64: Load IDT earlier Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:44   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 31/62] x86/sev-es: Add SEV-ES Feature Detection Joerg Roedel
                   ` (33 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Move the assembly coded dispatch between page-faults and all other
exceptions to C code to make it easier to maintain and extend.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head64.c  | 20 ++++++++++++++++++++
 arch/x86/kernel/head_64.S | 11 +----------
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 7cdfb7113811..d83c62ebaa85 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -36,6 +36,8 @@
 #include <asm/microcode.h>
 #include <asm/kasan.h>
 #include <asm/fixmap.h>
+#include <asm/extable.h>
+#include <asm/trap_defs.h>
 
 /*
  * Manage page tables very early on.
@@ -377,6 +379,24 @@ int __init early_make_pgtable(unsigned long address)
 	return __early_make_pgtable(address, pmd);
 }
 
+void __init early_exception(struct pt_regs *regs, int trapnr)
+{
+	unsigned long cr2;
+	int r;
+
+	switch (trapnr) {
+	case X86_TRAP_PF:
+		cr2 = native_read_cr2();
+		r = early_make_pgtable(cr2);
+		break;
+	default:
+		r = 1;
+	}
+
+	if (r)
+		early_fixup_exception(regs, trapnr);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 0af79f783659..81cf6c5763ef 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -357,18 +357,9 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
 	pushq %r15				/* pt_regs->r15 */
 	UNWIND_HINT_REGS
 
-	cmpq $14,%rsi		/* Page fault? */
-	jnz 10f
-	GET_CR2_INTO(%rdi)	/* can clobber %rax if pv */
-	call early_make_pgtable
-	andl %eax,%eax
-	jz 20f			/* All good */
-
-10:
 	movq %rsp,%rdi		/* RDI = pt_regs; RSI is already trapnr */
-	call early_fixup_exception
+	call early_exception
 
-20:
 	decl early_recursion_flag(%rip)
 	jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(early_idt_handler_common)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 31/62] x86/sev-es: Add SEV-ES Feature Detection
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (29 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 30/62] x86/head/64: Move early exception dispatch to C code Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 32/62] x86/sev-es: Compile early handler code into kernel image Joerg Roedel
                   ` (32 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add the sev_es_active function for checking whether SEV-ES is enabled.
Also cache the value of MSR_AMD64_SEV at boot to speed up the feature
checking in the running code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/mem_encrypt.h |  3 +++
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/mm/mem_encrypt.c          | 11 ++++++++++-
 arch/x86/mm/mem_encrypt_identity.c |  3 +++
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 848ce43b9040..6f61bb93366a 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -19,6 +19,7 @@
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern u64 sme_me_mask;
+extern u64 sev_status;
 extern bool sev_enabled;
 
 void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
@@ -49,6 +50,7 @@ void __init mem_encrypt_free_decrypted_mem(void);
 
 bool sme_active(void);
 bool sev_active(void);
+bool sev_es_active(void);
 
 #define __bss_decrypted __attribute__((__section__(".bss..decrypted")))
 
@@ -71,6 +73,7 @@ static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
+static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 0; }
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b6139b70db54..1411c37b6cd9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -435,7 +435,9 @@
 #define MSR_AMD64_SEV_ES_GHCB		0xc0010130
 #define MSR_AMD64_SEV			0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT	0
+#define MSR_AMD64_SEV_ES_ENABLED_BIT	1
 #define MSR_AMD64_SEV_ENABLED		BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
+#define MSR_AMD64_SEV_ES_ENABLED	BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 
 #define MSR_AMD64_VIRT_SPEC_CTRL	0xc001011f
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a03614bd3e1a..a35fcba24866 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -38,7 +38,9 @@
  * section is later cleared.
  */
 u64 sme_me_mask __section(.data) = 0;
+u64 sev_status __section(.data) = 0;
 EXPORT_SYMBOL(sme_me_mask);
+EXPORT_SYMBOL(sev_status);
 DEFINE_STATIC_KEY_FALSE(sev_enable_key);
 EXPORT_SYMBOL_GPL(sev_enable_key);
 
@@ -347,9 +349,16 @@ bool sme_active(void)
 
 bool sev_active(void)
 {
-	return sme_me_mask && sev_enabled;
+	return !!(sev_status & MSR_AMD64_SEV_ENABLED);
 }
 
+bool sev_es_active(void)
+{
+	return !!(sev_status & MSR_AMD64_SEV_ES_ENABLED);
+}
+EXPORT_SYMBOL_GPL(sev_es_active);
+
+
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
index e2b0e2ac07bb..68d75379e06a 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -540,6 +540,9 @@ void __init sme_enable(struct boot_params *bp)
 		if (!(msr & MSR_AMD64_SEV_ENABLED))
 			return;
 
+		/* Save SEV_STATUS to avoid reading MSR again */
+		sev_status = msr;
+
 		/* SEV state cannot be controlled by a command line option */
 		sme_me_mask = me_mask;
 		sev_enabled = true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 32/62] x86/sev-es: Compile early handler code into kernel image
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (30 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 31/62] x86/sev-es: Add SEV-ES Feature Detection Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 33/62] x86/sev-es: Setup early #VC handler Joerg Roedel
                   ` (31 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Setup sev-es.c and include the code from the
pre-decompression stage to also build it into the image of the running
kernel. Temporarily add __maybe_unused annotations to avoid build
warnings until the functions get used.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/Makefile        |  1 +
 arch/x86/kernel/sev-es-shared.c | 24 ++++----
 arch/x86/kernel/sev-es.c        | 98 +++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/kernel/sev-es.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 9b294c13809a..b11bb52e2603 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -143,6 +143,7 @@ obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev-es.o
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 14693eff9614..ad2a6c964217 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -9,7 +9,7 @@
  * and is included directly into both code-bases.
  */
 
-static void terminate(unsigned int reason)
+static void __maybe_unused terminate(unsigned int reason)
 {
 	/* Request Guest Termination from Hypvervisor */
 	write_ghcb_msr(GHCB_SEV_TERMINATE);
@@ -19,7 +19,7 @@ static void terminate(unsigned int reason)
 		asm volatile("hlt\n" : : : "memory");
 }
 
-static bool sev_es_negotiate_protocol(void)
+static bool __maybe_unused sev_es_negotiate_protocol(void)
 {
 	u64 val;
 
@@ -38,7 +38,7 @@ static bool sev_es_negotiate_protocol(void)
 	return true;
 }
 
-static void ghcb_invalidate(struct ghcb *ghcb)
+static void __maybe_unused ghcb_invalidate(struct ghcb *ghcb)
 {
 	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
@@ -80,9 +80,10 @@ static bool decoding_needed(unsigned long exit_code)
 		 exit_code <= SVM_EXIT_LAST_EXCP);
 }
 
-static enum es_result init_em_ctxt(struct es_em_ctxt *ctxt,
-				   struct pt_regs *regs,
-				   unsigned long exit_code)
+static enum es_result __maybe_unused
+init_em_ctxt(struct es_em_ctxt *ctxt,
+	     struct pt_regs *regs,
+	     unsigned long exit_code)
 {
 	enum es_result ret = ES_OK;
 
@@ -95,7 +96,7 @@ static enum es_result init_em_ctxt(struct es_em_ctxt *ctxt,
 	return ret;
 }
 
-static void finish_insn(struct es_em_ctxt *ctxt)
+static void __maybe_unused finish_insn(struct es_em_ctxt *ctxt)
 {
 	ctxt->regs->ip += ctxt->insn.length;
 }
@@ -358,7 +359,8 @@ static enum es_result ioio_exitinfo(struct es_em_ctxt *ctxt, u64 *exitinfo)
 	return ES_OK;
 }
 
-static enum es_result handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result __maybe_unused
+handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
 	u64 exit_info_1, exit_info_2;
@@ -450,7 +452,8 @@ static enum es_result handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ret;
 }
 
-static enum es_result handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result __maybe_unused
+handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
 	u32 cr4 = native_read_cr4();
@@ -656,7 +659,8 @@ static enum es_result handle_mmio_twobyte_ops(struct ghcb *ghcb,
 	return ret;
 }
 
-static enum es_result handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result __maybe_unused
+handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct insn *insn = &ctxt->insn;
 	unsigned int bytes = 0;
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
new file mode 100644
index 000000000000..33ab7fe8b6a0
--- /dev/null
+++ b/arch/x86/kernel/sev-es.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2019 SUSE
+ *
+ * Author: Joerg Roedel <jroedel@suse.de>
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+
+#include <asm/trap_defs.h>
+#include <asm/sev-es.h>
+#include <asm/fpu/internal.h>
+#include <asm/processor.h>
+#include <asm/svm.h>
+
+static inline u64 read_ghcb_msr(void)
+{
+	return native_read_msr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static inline void write_ghcb_msr(u64 val)
+{
+	u32 low, high;
+
+	low  = (u32)(val);
+	high = (u32)(val >> 32);
+
+	native_write_msr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+static bool check_kernel(struct pt_regs *regs)
+{
+	return regs->cs == __KERNEL_CS;
+}
+
+static enum es_result es_fetch_insn_byte(struct es_em_ctxt *ctxt,
+					 unsigned int offset,
+					 char *buffer)
+{
+	char *rip = (char *)ctxt->regs->ip;
+
+	/* More checks are needed when we boot to user-space */
+	if (!check_kernel(ctxt->regs))
+		return ES_UNSUPPORTED;
+
+	buffer[offset] = rip[offset];
+
+	return ES_OK;
+}
+
+static enum es_result es_write_mem(struct es_em_ctxt *ctxt,
+				   void *dst, char *buf, size_t size)
+{
+	/* More checks are needed when we boot to user-space */
+	if (!check_kernel(ctxt->regs))
+		return ES_UNSUPPORTED;
+
+	memcpy(dst, buf, size);
+
+	return ES_OK;
+}
+
+static enum es_result es_read_mem(struct es_em_ctxt *ctxt,
+				  void *src, char *buf, size_t size)
+{
+	/* More checks are needed when we boot to user-space */
+	if (!check_kernel(ctxt->regs))
+		return ES_UNSUPPORTED;
+
+	memcpy(buf, src, size);
+
+	return ES_OK;
+}
+
+static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
+{
+	unsigned long va = (unsigned long)vaddr;
+	unsigned int level;
+	phys_addr_t pa;
+	pgd_t *pgd;
+	pte_t *pte;
+
+	pgd = pgd_offset(current->active_mm, va);
+	pte = lookup_address_in_pgd(pgd, va, &level);
+	if (!pte)
+		return 0;
+
+	pa = (phys_addr_t)pte_pfn(*pte) << PAGE_SHIFT;
+	pa |= va & ~page_level_mask(level);
+
+	return pa;
+}
+
+/* Include code shared with pre-decompression boot stage */
+#include "sev-es-shared.c"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 33/62] x86/sev-es: Setup early #VC handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (31 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 32/62] x86/sev-es: Compile early handler code into kernel image Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 34/62] x86/sev-es: Setup GHCB based boot " Joerg Roedel
                   ` (30 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Setup an early handler for #VC exceptions. There is no GHCB mapped
yet, so just re-use the no_ghcb_vc_handler. It can only handle CPUID
exit-codes, but that should be enough to get the kernel through
verify_cpu() and __startup_64() until it runs on virtual addresses.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/desc.h      |  1 +
 arch/x86/include/asm/processor.h |  1 +
 arch/x86/include/asm/sev-es.h    |  2 ++
 arch/x86/kernel/head64.c         | 17 ++++++++++++++++
 arch/x86/kernel/head_64.S        | 35 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/idt.c            | 10 +++++++++
 6 files changed, 66 insertions(+)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index 8a4c642ee2b3..cc2db0325f9f 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -388,6 +388,7 @@ static inline void set_desc_limit(struct desc_struct *desc, unsigned long limit)
 
 void update_intr_gate(unsigned int n, const void *addr);
 void alloc_intr_gate(unsigned int n, const void *addr);
+void set_early_idt_handler(gate_desc *idt, int n, void *handler);
 
 extern unsigned long system_vectors[];
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 09705ccc393c..4622427d01d4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -768,6 +768,7 @@ extern int sysenter_setup(void);
 
 /* Defined in head.S */
 extern struct desc_ptr		early_gdt_descr;
+extern struct desc_ptr		early_idt_descr;
 
 extern void switch_to_new_gdt(int);
 extern void load_direct_gdt(int);
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 512d3ccb9832..caa29f75ce41 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -75,4 +75,6 @@ static inline u64 copy_lower_bits(u64 out, u64 in, unsigned int bits)
 	return out;
 }
 
+extern void early_vc_handler(void);
+
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index d83c62ebaa85..eab04ac260d4 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -38,6 +38,7 @@
 #include <asm/fixmap.h>
 #include <asm/extable.h>
 #include <asm/trap_defs.h>
+#include <asm/sev-es.h>
 
 /*
  * Manage page tables very early on.
@@ -516,3 +517,19 @@ void __head early_idt_setup_early_handler(unsigned long physaddr)
 
 	setup_early_handlers(idt);
 }
+
+void __head early_idt_setup(unsigned long physbase)
+{
+	gate_desc *idt = fixup_pointer(idt_table, physbase);
+	void __maybe_unused *handler;
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/* VMM Communication Exception */
+	handler = fixup_pointer(early_vc_handler, physbase);
+	set_early_idt_handler(idt, X86_TRAP_VC, handler);
+#endif
+
+	/* Initialize IDT descriptor and load IDT */
+	early_idt_descr.address = (unsigned long)idt;
+	native_load_idt(&early_idt_descr);
+}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 81cf6c5763ef..13ebf7d3af2c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -86,6 +86,12 @@ SYM_CODE_START_NOALIGN(startup_64)
 .Lon_kernel_cs:
 	UNWIND_HINT_EMPTY
 
+	/* Setup IDT - Needed for SEV-ES */
+	leaq	_text(%rip), %rdi
+	pushq	%rsi
+	call	early_idt_setup
+	popq	%rsi
+
 	/* Sanitize CPU configuration */
 	call verify_cpu
 
@@ -364,6 +370,32 @@ SYM_CODE_START_LOCAL(early_idt_handler_common)
 	jmp restore_regs_and_return_to_kernel
 SYM_CODE_END(early_idt_handler_common)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/*
+ * VC Exception handler used during very early boot. The
+ * early_idt_handler_array can't be used because it returns via the
+ * paravirtualized INTERRUPT_RETURN and pv-ops don't work that early.
+ */
+SYM_CODE_START_NOALIGN(early_vc_handler)
+	UNWIND_HINT_IRET_REGS offset=8
+
+	/* Build pt_regs */
+	PUSH_AND_CLEAR_REGS
+
+	/* Call C handler */
+	movq    %rsp, %rdi
+	call    no_ghcb_vc_handler
+
+	/* Unwind pt_regs */
+	POP_REGS
+
+	/* Remove Error Code */
+	addq    $8, %rsp
+
+	/* Pure iret required here - don't use INTERRUPT_RETURN */
+	iretq
+SYM_CODE_END(early_vc_handler)
+#endif
 
 #define SYM_DATA_START_PAGE_ALIGNED(name)			\
 	SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
@@ -505,6 +537,9 @@ SYM_DATA_END(level1_fixmap_pgt)
 SYM_DATA(early_gdt_descr,		.word GDT_ENTRIES*8-1)
 SYM_DATA_LOCAL(early_gdt_descr_base,	.quad INIT_PER_CPU_VAR(gdt_page))
 
+SYM_DATA(early_idt_descr,		.word NUM_EXCEPTION_VECTORS * 16)
+SYM_DATA_LOCAL(early_idt_descr_base,	.quad 0)
+
 	.align 16
 /* This must match the first entry in level2_kernel_pgt */
 SYM_DATA(phys_base, .quad 0x0)
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 84250c090596..1bfee6981e9b 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -393,3 +393,13 @@ void alloc_intr_gate(unsigned int n, const void *addr)
 	if (!test_and_set_bit(n, system_vectors))
 		set_intr_gate(n, addr);
 }
+
+void set_early_idt_handler(gate_desc *idt, int n, void *handler)
+{
+	struct idt_data data;
+	gate_desc desc;
+
+	init_idt_data(&data, n, handler);
+	idt_init_desc(&desc, &data);
+	native_write_idt_entry(idt, n, &desc);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 34/62] x86/sev-es: Setup GHCB based boot #VC handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (32 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 33/62] x86/sev-es: Setup early #VC handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler Joerg Roedel
                   ` (29 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add the infrastructure to handle #VC exceptions when the kernel runs
on virtual addresses and has a GHCB mapped. This handler will be used
until the runtime #VC handler takes over.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/segment.h  |   2 +-
 arch/x86/include/asm/sev-es.h   |   1 +
 arch/x86/kernel/head64.c        |   5 ++
 arch/x86/kernel/sev-es-shared.c |  15 ++---
 arch/x86/kernel/sev-es.c        | 116 ++++++++++++++++++++++++++++++++
 arch/x86/mm/extable.c           |   1 +
 6 files changed, 131 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 6669164abadc..5b648066504c 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -230,7 +230,7 @@
 #define NUM_EXCEPTION_VECTORS		32
 
 /* Bitmask of exception vectors which push an error code on the stack: */
-#define EXCEPTION_ERRCODE_MASK		0x00027d00
+#define EXCEPTION_ERRCODE_MASK		0x20027d00
 
 #define GDT_SIZE			(GDT_ENTRIES*8)
 #define GDT_ENTRY_TLS_ENTRIES		3
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index caa29f75ce41..a2d0c77dabc3 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -76,5 +76,6 @@ static inline u64 copy_lower_bits(u64 out, u64 in, unsigned int bits)
 }
 
 extern void early_vc_handler(void);
+extern int boot_vc_exception(struct pt_regs *regs);
 
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index eab04ac260d4..14e0699b2692 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -390,6 +390,11 @@ void __init early_exception(struct pt_regs *regs, int trapnr)
 		cr2 = native_read_cr2();
 		r = early_make_pgtable(cr2);
 		break;
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	case X86_TRAP_VC:
+		r = boot_vc_exception(regs);
+		break;
+#endif
 	default:
 		r = 1;
 	}
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index ad2a6c964217..57c29c91fe87 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -9,7 +9,7 @@
  * and is included directly into both code-bases.
  */
 
-static void __maybe_unused terminate(unsigned int reason)
+static void terminate(unsigned int reason)
 {
 	/* Request Guest Termination from Hypvervisor */
 	write_ghcb_msr(GHCB_SEV_TERMINATE);
@@ -19,7 +19,7 @@ static void __maybe_unused terminate(unsigned int reason)
 		asm volatile("hlt\n" : : : "memory");
 }
 
-static bool __maybe_unused sev_es_negotiate_protocol(void)
+static bool sev_es_negotiate_protocol(void)
 {
 	u64 val;
 
@@ -38,7 +38,7 @@ static bool __maybe_unused sev_es_negotiate_protocol(void)
 	return true;
 }
 
-static void __maybe_unused ghcb_invalidate(struct ghcb *ghcb)
+static void ghcb_invalidate(struct ghcb *ghcb)
 {
 	memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
 }
@@ -80,10 +80,9 @@ static bool decoding_needed(unsigned long exit_code)
 		 exit_code <= SVM_EXIT_LAST_EXCP);
 }
 
-static enum es_result __maybe_unused
-init_em_ctxt(struct es_em_ctxt *ctxt,
-	     struct pt_regs *regs,
-	     unsigned long exit_code)
+static enum es_result init_em_ctxt(struct es_em_ctxt *ctxt,
+				   struct pt_regs *regs,
+				   unsigned long exit_code)
 {
 	enum es_result ret = ES_OK;
 
@@ -96,7 +95,7 @@ init_em_ctxt(struct es_em_ctxt *ctxt,
 	return ret;
 }
 
-static void __maybe_unused finish_insn(struct es_em_ctxt *ctxt)
+static void finish_insn(struct es_em_ctxt *ctxt)
 {
 	ctxt->regs->ip += ctxt->insn.length;
 }
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 33ab7fe8b6a0..0e0b28477627 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -7,15 +7,30 @@
  * Author: Joerg Roedel <jroedel@suse.de>
  */
 
+#include <linux/sched/debug.h>	/* For show_regs() */
 #include <linux/kernel.h>
+#include <linux/printk.h>
 #include <linux/mm.h>
 
 #include <asm/trap_defs.h>
 #include <asm/sev-es.h>
 #include <asm/fpu/internal.h>
 #include <asm/processor.h>
+#include <asm/trap_defs.h>
 #include <asm/svm.h>
 
+/* For early boot hypervisor communication in SEV-ES enabled guests */
+struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
+
+/*
+ * Needs to be in the .data section because we need it NULL before bss is
+ * cleared
+ */
+struct ghcb __initdata *boot_ghcb;
+
+/* Needed in early_forward_exception */
+extern void early_exception(struct pt_regs *regs, int trapnr);
+
 static inline u64 read_ghcb_msr(void)
 {
 	return native_read_msr(MSR_AMD64_SEV_ES_GHCB);
@@ -96,3 +111,104 @@ static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
 
 /* Include code shared with pre-decompression boot stage */
 #include "sev-es-shared.c"
+
+/*
+ * This function runs on the first #VC exception after the kernel
+ * switched to virtual addresses.
+ */
+static bool __init setup_ghcb(void)
+{
+	/* First make sure the hypervisor talks a supported protocol. */
+	if (!sev_es_negotiate_protocol())
+		return false;
+	/*
+	 * Clear the boot_ghcb. The first exception comes in before the bss
+	 * section is cleared.
+	 */
+	memset(&boot_ghcb_page, 0, PAGE_SIZE);
+
+	/* Alright - Make the boot-ghcb public */
+	boot_ghcb = &boot_ghcb_page;
+
+	return true;
+}
+
+static void __init early_forward_exception(struct es_em_ctxt *ctxt)
+{
+	int trapnr = ctxt->fi.vector;
+
+	if (trapnr == X86_TRAP_PF)
+		native_write_cr2(ctxt->fi.cr2);
+
+	ctxt->regs->orig_ax = ctxt->fi.error_code;
+	early_exception(ctxt->regs, trapnr);
+}
+
+static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
+		struct ghcb *ghcb,
+		unsigned long exit_code)
+{
+	enum es_result result;
+
+	switch (exit_code) {
+	default:
+		/*
+		 * Unexpected #VC exception
+		 */
+		result = ES_UNSUPPORTED;
+	}
+
+	return result;
+}
+
+int __init boot_vc_exception(struct pt_regs *regs)
+{
+	unsigned long exit_code = regs->orig_ax;
+	struct es_em_ctxt ctxt;
+	enum es_result result;
+
+	/* Do initial setup or terminate the guest */
+	if (unlikely(boot_ghcb == NULL && !setup_ghcb()))
+		terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+	ghcb_invalidate(boot_ghcb);
+	result = init_em_ctxt(&ctxt, regs, exit_code);
+
+	if (result == ES_OK)
+		result = handle_vc_exception(&ctxt, boot_ghcb, exit_code);
+
+	/* Done - now check the result */
+	switch (result) {
+	case ES_OK:
+		finish_insn(&ctxt);
+		break;
+	case ES_UNSUPPORTED:
+		early_printk("PANIC: Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n",
+				exit_code, regs->ip);
+		goto fail;
+	case ES_VMM_ERROR:
+		early_printk("PANIC: Failure in communication with VMM (exit-code 0x%02lx IP: 0x%lx)\n",
+				exit_code, regs->ip);
+		goto fail;
+	case ES_DECODE_FAILED:
+		early_printk("PANIC: Failed to decode instruction (exit-code 0x%02lx IP: 0x%lx)\n",
+				exit_code, regs->ip);
+		goto fail;
+	case ES_EXCEPTION:
+		early_forward_exception(&ctxt);
+		break;
+	case ES_RETRY:
+		/* Nothing to do */
+		break;
+	default:
+		BUG();
+	}
+
+	return 0;
+
+fail:
+	show_regs(regs);
+
+	while (true)
+		halt();
+}
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 30bb0bd3b1b8..cd440a9cf422 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -5,6 +5,7 @@
 #include <xen/xen.h>
 
 #include <asm/fpu/internal.h>
+#include <asm/sev-es.h>
 #include <asm/traps.h>
 #include <asm/kdebug.h>
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (33 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 34/62] x86/sev-es: Setup GHCB based boot " Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:46   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 36/62] x86/sev-es: Add Runtime #VC Exception Handler Joerg Roedel
                   ` (28 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

The runtime handler needs a GHCB per CPU. Set them up and map them
unencrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/mem_encrypt.h |  2 ++
 arch/x86/kernel/sev-es.c           | 25 ++++++++++++++++++++++++-
 arch/x86/kernel/traps.c            |  3 +++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 6f61bb93366a..d48e7be9bb49 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -48,6 +48,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
 void __init mem_encrypt_init(void);
 void __init mem_encrypt_free_decrypted_mem(void);
 
+void __init encrypted_state_init_ghcbs(void);
 bool sme_active(void);
 bool sev_active(void);
 bool sev_es_active(void);
@@ -71,6 +72,7 @@ static inline void __init sme_early_init(void) { }
 static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
+static inline void encrypted_state_init_ghcbs(void) { }
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 0e0b28477627..9a5530857db7 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -8,8 +8,11 @@
  */
 
 #include <linux/sched/debug.h>	/* For show_regs() */
-#include <linux/kernel.h>
+#include <linux/percpu-defs.h>
+#include <linux/mem_encrypt.h>
 #include <linux/printk.h>
+#include <linux/set_memory.h>
+#include <linux/kernel.h>
 #include <linux/mm.h>
 
 #include <asm/trap_defs.h>
@@ -28,6 +31,9 @@ struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
  */
 struct ghcb __initdata *boot_ghcb;
 
+/* Runtime GHCBs */
+static DEFINE_PER_CPU_DECRYPTED(struct ghcb, ghcb_page) __aligned(PAGE_SIZE);
+
 /* Needed in early_forward_exception */
 extern void early_exception(struct pt_regs *regs, int trapnr);
 
@@ -133,6 +139,23 @@ static bool __init setup_ghcb(void)
 	return true;
 }
 
+void encrypted_state_init_ghcbs(void)
+{
+	int cpu;
+
+	if (!sev_es_active())
+		return;
+
+	/* Initialize per-cpu GHCB pages */
+	for_each_possible_cpu(cpu) {
+		struct ghcb *ghcb = &per_cpu(ghcb_page, cpu);
+
+		set_memory_decrypted((unsigned long)ghcb,
+				     sizeof(ghcb_page) >> PAGE_SHIFT);
+		memset(ghcb, 0, sizeof(*ghcb));
+	}
+}
+
 static void __init early_forward_exception(struct es_em_ctxt *ctxt)
 {
 	int trapnr = ctxt->fi.vector;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6ef00eb6fbb9..9c9a7fae36d3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -918,6 +918,9 @@ void __init trap_init(void)
 	/* Init cpu_entry_area before IST entries are set up */
 	setup_cpu_entry_areas();
 
+	/* Init GHCB memory pages when running as an SEV-ES guest */
+	encrypted_state_init_ghcbs();
+
 	idt_setup_traps();
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 36/62] x86/sev-es: Add Runtime #VC Exception Handler
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (34 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 37/62] x86/sev-es: Wire up existing #VC exit-code handlers Joerg Roedel
                   ` (27 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add the handler for #VC exceptions invoked at runtime.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_64.S    |  4 ++
 arch/x86/include/asm/traps.h |  7 ++++
 arch/x86/kernel/idt.c        |  4 +-
 arch/x86/kernel/sev-es.c     | 77 +++++++++++++++++++++++++++++++++++-
 4 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f2bb91e87877..729876d368c5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1210,6 +1210,10 @@ idtentry async_page_fault	do_async_page_fault	has_error_code=1	read_cr2=1
 idtentry machine_check		do_mce			has_error_code=0	paranoid=1
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+idtentry vmm_communication     do_vmm_communication    has_error_code=1
+#endif
+
 /*
  * Save all registers in pt_regs, and switch gs if needed.
  * Use slow, but surefire "are we in kernel?" check.
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 2aa786484bb1..1be25c065698 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -35,6 +35,9 @@ asmlinkage void alignment_check(void);
 #ifdef CONFIG_X86_MCE
 asmlinkage void machine_check(void);
 #endif /* CONFIG_X86_MCE */
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+asmlinkage void vmm_communication(void);
+#endif
 asmlinkage void simd_coprocessor_error(void);
 
 #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
@@ -93,6 +96,10 @@ dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code);
 dotraplinkage void do_machine_check(struct pt_regs *regs, long error_code);
 #endif
 dotraplinkage void do_simd_coprocessor_error(struct pt_regs *regs, long error_code);
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+dotraplinkage void do_vmm_communication_error(struct pt_regs *regs,
+					      long error_code);
+#endif
 #ifdef CONFIG_X86_32
 dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code);
 #endif
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index 1bfee6981e9b..94f6a5705e1d 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -95,8 +95,10 @@ static const __initconst struct idt_data def_idts[] = {
 #ifdef CONFIG_X86_MCE
 	INTG(X86_TRAP_MC,		&machine_check),
 #endif
-
 	SYSG(X86_TRAP_OF,		overflow),
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	INTG(X86_TRAP_VC,               vmm_communication),
+#endif
 #if defined(CONFIG_IA32_EMULATION)
 	SYSG(IA32_SYSCALL_VECTOR,	entry_INT80_compat),
 #elif defined(CONFIG_X86_32)
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 9a5530857db7..1fb7128ff386 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -19,7 +19,7 @@
 #include <asm/sev-es.h>
 #include <asm/fpu/internal.h>
 #include <asm/processor.h>
-#include <asm/trap_defs.h>
+#include <asm/traps.h>
 #include <asm/svm.h>
 
 /* For early boot hypervisor communication in SEV-ES enabled guests */
@@ -184,6 +184,81 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	return result;
 }
 
+static void forward_exception(struct es_em_ctxt *ctxt)
+{
+	long error_code = ctxt->fi.error_code;
+	int trapnr = ctxt->fi.vector;
+
+	ctxt->regs->orig_ax = ctxt->fi.error_code;
+
+	switch (trapnr) {
+	case X86_TRAP_GP:
+		do_general_protection(ctxt->regs, error_code);
+		break;
+	case X86_TRAP_UD:
+		do_invalid_op(ctxt->regs, 0);
+		break;
+	default:
+		BUG();
+	}
+}
+
+dotraplinkage void do_vmm_communication(struct pt_regs *regs, unsigned long exit_code)
+{
+	struct es_em_ctxt ctxt;
+	enum es_result result;
+	struct ghcb *ghcb;
+
+	/*
+	 * This is invoked through an interrupt gate, so IRQs are disabled. The
+	 * code below might walk page-tables for user or kernel addresses, so
+	 * keep the IRQs disabled to protect us against concurrent TLB flushes.
+	 */
+
+	ghcb = this_cpu_ptr(&ghcb_page);
+
+	ghcb_invalidate(ghcb);
+	result = init_em_ctxt(&ctxt, regs, exit_code);
+
+	if (result == ES_OK)
+		result = handle_vc_exception(&ctxt, ghcb, exit_code);
+
+	/* Done - now check the result */
+	switch (result) {
+	case ES_OK:
+		finish_insn(&ctxt);
+		break;
+	case ES_UNSUPPORTED:
+		pr_emerg("Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n",
+			 exit_code, regs->ip);
+		goto fail;
+	case ES_VMM_ERROR:
+		pr_emerg("PANIC: Failure in communication with VMM (exit-code 0x%02lx IP: 0x%lx)\n",
+			 exit_code, regs->ip);
+		goto fail;
+	case ES_DECODE_FAILED:
+		pr_emerg("PANIC: Failed to decode instruction (exit-code 0x%02lx IP: 0x%lx)\n",
+			 exit_code, regs->ip);
+		goto fail;
+	case ES_EXCEPTION:
+		forward_exception(&ctxt);
+		break;
+	case ES_RETRY:
+		/* Nothing to do */
+		break;
+	default:
+		BUG();
+	}
+
+	return;
+
+fail:
+	show_regs(regs);
+
+	while (true)
+		halt();
+}
+
 int __init boot_vc_exception(struct pt_regs *regs)
 {
 	unsigned long exit_code = regs->orig_ax;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 37/62] x86/sev-es: Wire up existing #VC exit-code handlers
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (35 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 36/62] x86/sev-es: Add Runtime #VC Exception Handler Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space Joerg Roedel
                   ` (26 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Re-use the handlers for CPUID and IOIO caused #VC exceptions in the
early boot handler.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es-shared.c | 9 +++------
 arch/x86/kernel/sev-es.c        | 9 +++++++++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index 57c29c91fe87..14693eff9614 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -358,8 +358,7 @@ static enum es_result ioio_exitinfo(struct es_em_ctxt *ctxt, u64 *exitinfo)
 	return ES_OK;
 }
 
-static enum es_result __maybe_unused
-handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
 	u64 exit_info_1, exit_info_2;
@@ -451,8 +450,7 @@ handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ret;
 }
 
-static enum es_result __maybe_unused
-handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result handle_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
 	u32 cr4 = native_read_cr4();
@@ -658,8 +656,7 @@ static enum es_result handle_mmio_twobyte_ops(struct ghcb *ghcb,
 	return ret;
 }
 
-static enum es_result __maybe_unused
-handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result handle_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct insn *insn = &ctxt->insn;
 	unsigned int bytes = 0;
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 1fb7128ff386..2a801919e7c0 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -174,6 +174,15 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	enum es_result result;
 
 	switch (exit_code) {
+	case SVM_EXIT_CPUID:
+		result = handle_cpuid(ghcb, ctxt);
+		break;
+	case SVM_EXIT_IOIO:
+		result = handle_ioio(ghcb, ctxt);
+		break;
+	case SVM_EXIT_NPF:
+		result = handle_mmio(ghcb, ctxt);
+		break;
 	default:
 		/*
 		 * Unexpected #VC exception
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (36 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 37/62] x86/sev-es: Wire up existing #VC exit-code handlers Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-12 21:42   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions " Joerg Roedel
                   ` (25 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

When a #VC exception is triggered by user-space the instruction
decoder needs to read the instruction bytes from user addresses.
Enhance es_fetch_insn_byte() to safely fetch kernel and user
instruction bytes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 2a801919e7c0..f5bff4219f6f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -61,13 +61,29 @@ static enum es_result es_fetch_insn_byte(struct es_em_ctxt *ctxt,
 					 unsigned int offset,
 					 char *buffer)
 {
-	char *rip = (char *)ctxt->regs->ip;
-
-	/* More checks are needed when we boot to user-space */
-	if (!check_kernel(ctxt->regs))
-		return ES_UNSUPPORTED;
-
-	buffer[offset] = rip[offset];
+	if (user_mode(ctxt->regs)) {
+		unsigned long addr = ctxt->regs->ip + offset;
+		char __user *rip = (char __user *)addr;
+
+		if (unlikely(addr >= TASK_SIZE_MAX))
+			return ES_UNSUPPORTED;
+
+		if (copy_from_user(buffer + offset, rip, 1)) {
+			ctxt->fi.vector     = X86_TRAP_PF;
+			ctxt->fi.cr2        = addr;
+			ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER;
+			return ES_EXCEPTION;
+		}
+	} else {
+		char *rip = (char *)ctxt->regs->ip + offset;
+
+		if (probe_kernel_read(buffer + offset, rip, 1) != 0) {
+			ctxt->fi.vector     = X86_TRAP_PF;
+			ctxt->fi.cr2        = (unsigned long)rip;
+			ctxt->fi.error_code = X86_PF_INSTR;
+			return ES_EXCEPTION;
+		}
+	}
 
 	return ES_OK;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions from user-space
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (37 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:47   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 40/62] x86/sev-es: Filter exceptions not supported " Joerg Roedel
                   ` (24 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Send SIGBUS to the user-space process that caused the #VC exception
instead of killing the machine. Also ratelimit the error messages so
that user-space can't flood the kernel log.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index f5bff4219f6f..d128a9397639 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -254,16 +254,16 @@ dotraplinkage void do_vmm_communication(struct pt_regs *regs, unsigned long exit
 		finish_insn(&ctxt);
 		break;
 	case ES_UNSUPPORTED:
-		pr_emerg("Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n",
-			 exit_code, regs->ip);
+		pr_err_ratelimited("Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n",
+				   exit_code, regs->ip);
 		goto fail;
 	case ES_VMM_ERROR:
-		pr_emerg("PANIC: Failure in communication with VMM (exit-code 0x%02lx IP: 0x%lx)\n",
-			 exit_code, regs->ip);
+		pr_err_ratelimited("Failure in communication with VMM (exit-code 0x%02lx IP: 0x%lx)\n",
+				   exit_code, regs->ip);
 		goto fail;
 	case ES_DECODE_FAILED:
-		pr_emerg("PANIC: Failed to decode instruction (exit-code 0x%02lx IP: 0x%lx)\n",
-			 exit_code, regs->ip);
+		pr_err_ratelimited("PANIC: Failed to decode instruction (exit-code 0x%02lx IP: 0x%lx)\n",
+				   exit_code, regs->ip);
 		goto fail;
 	case ES_EXCEPTION:
 		forward_exception(&ctxt);
@@ -278,10 +278,24 @@ dotraplinkage void do_vmm_communication(struct pt_regs *regs, unsigned long exit
 	return;
 
 fail:
-	show_regs(regs);
+	if (user_mode(regs)) {
+		/*
+		 * Do not kill the machine if user-space triggered the
+		 * exception. Send SIGBUS instead and let user-space deal with
+		 * it.
+		 */
+		force_sig_fault(SIGBUS, BUS_OBJERR, (void __user *)0);
+	} else {
+		/* Show some debug info */
+		show_regs(regs);
 
-	while (true)
-		halt();
+		/* Ask hypervisor to terminate */
+		terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+		/* If that fails and we get here - just halt the machine */
+		while (true)
+			halt();
+	}
 }
 
 int __init boot_vc_exception(struct pt_regs *regs)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 40/62] x86/sev-es: Filter exceptions not supported from user-space
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (38 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions " Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 41/62] x86/sev-es: Handle MSR events Joerg Roedel
                   ` (23 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Currently only CPUID caused #VC exceptions are supported from
user-space. Filter the others out early.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d128a9397639..84b5b8f7897a 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -209,6 +209,26 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	return result;
 }
 
+static enum es_result context_filter(struct pt_regs *regs, long exit_code)
+{
+	enum es_result r = ES_OK;
+
+	if (user_mode(regs)) {
+		switch (exit_code) {
+		/* List of #VC exit-codes we support in user-space */
+		case SVM_EXIT_EXCP_BASE ... SVM_EXIT_LAST_EXCP:
+		case SVM_EXIT_CPUID:
+			r = ES_OK;
+			break;
+		default:
+			r = ES_UNSUPPORTED;
+			break;
+		}
+	}
+
+	return r;
+}
+
 static void forward_exception(struct es_em_ctxt *ctxt)
 {
 	long error_code = ctxt->fi.error_code;
@@ -245,6 +265,10 @@ dotraplinkage void do_vmm_communication(struct pt_regs *regs, unsigned long exit
 	ghcb_invalidate(ghcb);
 	result = init_em_ctxt(&ctxt, regs, exit_code);
 
+	/* Check if the exception is supported in the context we came from. */
+	if (result == ES_OK)
+		result = context_filter(regs, exit_code);
+
 	if (result == ES_OK)
 		result = handle_vc_exception(&ctxt, ghcb, exit_code);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 41/62] x86/sev-es: Handle MSR events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (39 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 40/62] x86/sev-es: Filter exceptions not supported " Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-13 15:45   ` Dave Hansen
  2020-02-11 13:52 ` [PATCH 42/62] x86/sev-es: Handle DR7 read/write events Joerg Roedel
                   ` (22 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by RDMSR/WRMSR
instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 84b5b8f7897a..b27d5b0a8ae1 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -134,6 +134,35 @@ static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
 /* Include code shared with pre-decompression boot stage */
 #include "sev-es-shared.c"
 
+static enum es_result handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	struct pt_regs *regs = ctxt->regs;
+	enum es_result ret;
+	bool write;
+	u64 exit_info_1;
+
+	write = (ctxt->insn.opcode.bytes[1] == 0x30);
+
+	ghcb_set_rcx(ghcb, regs->cx);
+	if (write) {
+		ghcb_set_rax(ghcb, regs->ax);
+		ghcb_set_rdx(ghcb, regs->dx);
+		exit_info_1 = 1;
+	} else {
+		exit_info_1 = 0;
+	}
+
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, exit_info_1, 0);
+	if (ret != ES_OK)
+		return ret;
+	else if (!write) {
+		regs->ax = ghcb->save.rax;
+		regs->dx = ghcb->save.rdx;
+	}
+
+	return ret;
+}
+
 /*
  * This function runs on the first #VC exception after the kernel
  * switched to virtual addresses.
@@ -196,6 +225,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_IOIO:
 		result = handle_ioio(ghcb, ctxt);
 		break;
+	case SVM_EXIT_MSR:
+		result = handle_msr(ghcb, ctxt);
+		break;
 	case SVM_EXIT_NPF:
 		result = handle_mmio(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 42/62] x86/sev-es: Handle DR7 read/write events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (40 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 41/62] x86/sev-es: Handle MSR events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 43/62] x86/sev-es: Handle WBINVD Events Joerg Roedel
                   ` (21 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Add code to handle #VC exceptions on DR7 register reads and writes.
This is needed early because show_regs() reads DR7 to print it out.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Adapt to #VC handling framework
                   - Support early usage ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 69 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index b27d5b0a8ae1..fcd67ab04d2d 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -22,6 +22,8 @@
 #include <asm/traps.h>
 #include <asm/svm.h>
 
+#define DR7_RESET_VALUE        0x400
+
 /* For early boot hypervisor communication in SEV-ES enabled guests */
 struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
 
@@ -30,6 +32,9 @@ struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
  * cleared
  */
 struct ghcb __initdata *boot_ghcb;
+static DEFINE_PER_CPU(unsigned long, cached_dr7) = DR7_RESET_VALUE;
+/* Needed before per-cpu access is set up */
+static unsigned long early_dr7 = DR7_RESET_VALUE;
 
 /* Runtime GHCBs */
 static DEFINE_PER_CPU_DECRYPTED(struct ghcb, ghcb_page) __aligned(PAGE_SIZE);
@@ -212,13 +217,69 @@ static void __init early_forward_exception(struct es_em_ctxt *ctxt)
 	early_exception(ctxt->regs, trapnr);
 }
 
+static enum es_result handle_dr7_write(struct ghcb *ghcb,
+				       struct es_em_ctxt *ctxt,
+				       bool early)
+{
+	u8 rm = X86_MODRM_RM(ctxt->insn.modrm.value);
+	unsigned long *reg;
+	enum es_result ret;
+
+	if (ctxt->insn.rex_prefix.nbytes &&
+	    X86_REX_B(ctxt->insn.rex_prefix.value))
+		rm |= 0x8;
+
+	reg = register_from_idx(ctxt->regs, rm);
+
+	/* Using a value of 0 for ExitInfo1 means RAX holds the value */
+	ghcb_set_rax(ghcb, *reg);
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WRITE_DR7, 0, 0);
+	if (ret != ES_OK)
+		return ret;
+
+	if (early)
+		early_dr7 = *reg;
+	else
+		this_cpu_write(cached_dr7, *reg);
+
+	return ES_OK;
+}
+
+static enum es_result handle_dr7_read(struct ghcb *ghcb,
+				      struct es_em_ctxt *ctxt,
+				      bool early)
+{
+	u8 rm = X86_MODRM_RM(ctxt->insn.modrm.value);
+	unsigned long *reg;
+
+	if (ctxt->insn.rex_prefix.nbytes &&
+	    X86_REX_B(ctxt->insn.rex_prefix.value))
+		rm |= 0x8;
+
+	reg = register_from_idx(ctxt->regs, rm);
+
+	if (early)
+		*reg = early_dr7;
+	else
+		*reg = this_cpu_read(cached_dr7);
+
+	return ES_OK;
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
-		struct ghcb *ghcb,
-		unsigned long exit_code)
+					  struct ghcb *ghcb,
+					  unsigned long exit_code,
+					  bool early)
 {
 	enum es_result result;
 
 	switch (exit_code) {
+	case SVM_EXIT_READ_DR7:
+		result = handle_dr7_read(ghcb, ctxt, early);
+		break;
+	case SVM_EXIT_WRITE_DR7:
+		result = handle_dr7_write(ghcb, ctxt, early);
+		break;
 	case SVM_EXIT_CPUID:
 		result = handle_cpuid(ghcb, ctxt);
 		break;
@@ -302,7 +363,7 @@ dotraplinkage void do_vmm_communication(struct pt_regs *regs, unsigned long exit
 		result = context_filter(regs, exit_code);
 
 	if (result == ES_OK)
-		result = handle_vc_exception(&ctxt, ghcb, exit_code);
+		result = handle_vc_exception(&ctxt, ghcb, exit_code, false);
 
 	/* Done - now check the result */
 	switch (result) {
@@ -368,7 +429,7 @@ int __init boot_vc_exception(struct pt_regs *regs)
 	result = init_em_ctxt(&ctxt, regs, exit_code);
 
 	if (result == ES_OK)
-		result = handle_vc_exception(&ctxt, boot_ghcb, exit_code);
+		result = handle_vc_exception(&ctxt, boot_ghcb, exit_code, true);
 
 	/* Done - now check the result */
 	switch (result) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 43/62] x86/sev-es: Handle WBINVD Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (41 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 42/62] x86/sev-es: Handle DR7 read/write events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 44/62] x86/sev-es: Handle RDTSC Events Joerg Roedel
                   ` (20 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by WBINVD instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling framework ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index fcd67ab04d2d..491537b770fd 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -266,6 +266,12 @@ static enum es_result handle_dr7_read(struct ghcb *ghcb,
 	return ES_OK;
 }
 
+static enum es_result handle_wbinvd(struct ghcb *ghcb,
+				    struct es_em_ctxt *ctxt)
+{
+	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0);
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -289,6 +295,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_MSR:
 		result = handle_msr(ghcb, ctxt);
 		break;
+	case SVM_EXIT_WBINVD:
+		result = handle_wbinvd(ghcb, ctxt);
+		break;
 	case SVM_EXIT_NPF:
 		result = handle_mmio(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 44/62] x86/sev-es: Handle RDTSC Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (42 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 43/62] x86/sev-es: Handle WBINVD Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 45/62] x86/sev-es: Handle RDPMC Events Joerg Roedel
                   ` (19 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by RDTSC instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 491537b770fd..061557515f75 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -272,6 +272,23 @@ static enum es_result handle_wbinvd(struct ghcb *ghcb,
 	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0);
 }
 
+static enum es_result handle_rdtsc(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	enum es_result ret;
+
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDTSC, 0, 0);
+	if (ret != ES_OK)
+		return ret;
+
+	if (!(ghcb_is_valid_rax(ghcb) && ghcb_is_valid_rdx(ghcb)))
+		return ES_VMM_ERROR;
+
+	ctxt->regs->ax = ghcb->save.rax;
+	ctxt->regs->dx = ghcb->save.rdx;
+
+	return ES_OK;
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -286,6 +303,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_WRITE_DR7:
 		result = handle_dr7_write(ghcb, ctxt, early);
 		break;
+	case SVM_EXIT_RDTSC:
+		result = handle_rdtsc(ghcb, ctxt);
+		break;
 	case SVM_EXIT_CPUID:
 		result = handle_cpuid(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 45/62] x86/sev-es: Handle RDPMC Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (43 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 44/62] x86/sev-es: Handle RDTSC Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 46/62] x86/sev-es: Handle INVD Events Joerg Roedel
                   ` (18 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by RDPMC instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 061557515f75..e96332516c2a 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -289,6 +289,25 @@ static enum es_result handle_rdtsc(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ES_OK;
 }
 
+static enum es_result handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	enum es_result ret;
+
+	ghcb_set_rcx(ghcb, ctxt->regs->cx);
+
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDPMC, 0, 0);
+	if (ret != ES_OK)
+		return ret;
+
+	if (!(ghcb_is_valid_rax(ghcb) && ghcb_is_valid_rdx(ghcb)))
+		return ES_VMM_ERROR;
+
+	ctxt->regs->ax = ghcb->save.rax;
+	ctxt->regs->dx = ghcb->save.rdx;
+
+	return ES_OK;
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -306,6 +325,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_RDTSC:
 		result = handle_rdtsc(ghcb, ctxt);
 		break;
+	case SVM_EXIT_RDPMC:
+		result = handle_rdpmc(ghcb, ctxt);
+		break;
 	case SVM_EXIT_CPUID:
 		result = handle_cpuid(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 46/62] x86/sev-es: Handle INVD Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (44 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 45/62] x86/sev-es: Handle RDPMC Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-12  0:12   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 47/62] x86/sev-es: Handle RDTSCP Events Joerg Roedel
                   ` (17 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by INVD instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index e96332516c2a..485f5a14c3b4 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -308,6 +308,11 @@ static enum es_result handle_rdpmc(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ES_OK;
 }
 
+static enum es_result handle_invd(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_INVD, 0, 0);
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -328,6 +333,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_RDPMC:
 		result = handle_rdpmc(ghcb, ctxt);
 		break;
+	case SVM_EXIT_INVD:
+		result = handle_invd(ghcb, ctxt);
+		break;
 	case SVM_EXIT_CPUID:
 		result = handle_cpuid(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 47/62] x86/sev-es: Handle RDTSCP Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (45 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 46/62] x86/sev-es: Handle INVD Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 48/62] x86/sev-es: Handle MONITOR/MONITORX Events Joerg Roedel
                   ` (16 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Extend the RDTSC handler to also handle RDTSCP events.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 485f5a14c3b4..d5a14f277adb 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -272,19 +272,24 @@ static enum es_result handle_wbinvd(struct ghcb *ghcb,
 	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_WBINVD, 0, 0);
 }
 
-static enum es_result handle_rdtsc(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+static enum es_result handle_rdtsc(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+				   unsigned long exit_code)
 {
+	bool rdtscp = (exit_code == SVM_EXIT_RDTSCP);
 	enum es_result ret;
 
-	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_RDTSC, 0, 0);
+	ret = ghcb_hv_call(ghcb, ctxt, exit_code, 0, 0);
 	if (ret != ES_OK)
 		return ret;
 
-	if (!(ghcb_is_valid_rax(ghcb) && ghcb_is_valid_rdx(ghcb)))
+	if (!(ghcb_is_valid_rax(ghcb) && ghcb_is_valid_rdx(ghcb) &&
+	     (!rdtscp || ghcb_is_valid_rcx(ghcb))))
 		return ES_VMM_ERROR;
 
 	ctxt->regs->ax = ghcb->save.rax;
 	ctxt->regs->dx = ghcb->save.rdx;
+	if (rdtscp)
+		ctxt->regs->cx = ghcb->save.rcx;
 
 	return ES_OK;
 }
@@ -328,7 +333,8 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 		result = handle_dr7_write(ghcb, ctxt, early);
 		break;
 	case SVM_EXIT_RDTSC:
-		result = handle_rdtsc(ghcb, ctxt);
+	case SVM_EXIT_RDTSCP:
+		result = handle_rdtsc(ghcb, ctxt, exit_code);
 		break;
 	case SVM_EXIT_RDPMC:
 		result = handle_rdpmc(ghcb, ctxt);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 48/62] x86/sev-es: Handle MONITOR/MONITORX Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (46 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 47/62] x86/sev-es: Handle RDTSCP Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 49/62] x86/sev-es: Handle MWAIT/MWAITX Events Joerg Roedel
                   ` (15 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by MONITOR and MONITORX
instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d5a14f277adb..865f510d11ba 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -318,6 +318,21 @@ static enum es_result handle_invd(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_INVD, 0, 0);
 }
 
+static enum es_result handle_monitor(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	phys_addr_t monitor_pa;
+	pgd_t *pgd;
+
+	pgd = __va(read_cr3_pa());
+	monitor_pa = es_slow_virt_to_phys(ghcb, ctxt->regs->ax);
+
+	ghcb_set_rax(ghcb, monitor_pa);
+	ghcb_set_rcx(ghcb, ctxt->regs->cx);
+	ghcb_set_rdx(ghcb, ctxt->regs->dx);
+
+	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MONITOR, 0, 0);
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -354,6 +369,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_WBINVD:
 		result = handle_wbinvd(ghcb, ctxt);
 		break;
+	case SVM_EXIT_MONITOR:
+		result = handle_monitor(ghcb, ctxt);
+		break;
 	case SVM_EXIT_NPF:
 		result = handle_mmio(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 49/62] x86/sev-es: Handle MWAIT/MWAITX Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (47 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 48/62] x86/sev-es: Handle MONITOR/MONITORX Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 50/62] x86/sev-es: Handle VMMCALL Events Joerg Roedel
                   ` (14 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by MWAIT and MWAITX
instructions.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 865f510d11ba..8f1e84da6fa6 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -333,6 +333,14 @@ static enum es_result handle_monitor(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MONITOR, 0, 0);
 }
 
+static enum es_result handle_mwait(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+	ghcb_set_rax(ghcb, ctxt->regs->ax);
+	ghcb_set_rcx(ghcb, ctxt->regs->cx);
+
+	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MWAIT, 0, 0);
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -372,6 +380,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_MONITOR:
 		result = handle_monitor(ghcb, ctxt);
 		break;
+	case SVM_EXIT_MWAIT:
+		result = handle_mwait(ghcb, ctxt);
+		break;
 	case SVM_EXIT_NPF:
 		result = handle_mmio(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 50/62] x86/sev-es: Handle VMMCALL Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (48 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 49/62] x86/sev-es: Handle MWAIT/MWAITX Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-12  0:14   ` Andy Lutomirski
  2020-02-11 13:52 ` [PATCH 51/62] x86/sev-es: Handle #AC Events Joerg Roedel
                   ` (13 subsequent siblings)
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement a handler for #VC exceptions caused by VMMCALL instructions.
This patch is only a starting point, VMMCALL emulation under SEV-ES
needs further hypervisor-specific changes to provide additional state.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: Adapt to #VC handling infrastructure ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 8f1e84da6fa6..6bd2cae7eb9c 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -341,6 +341,26 @@ static enum es_result handle_mwait(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 	return ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MWAIT, 0, 0);
 }
 
+static enum es_result handle_vmmcall(struct ghcb *ghcb,
+				     struct es_em_ctxt *ctxt)
+{
+	enum es_result ret;
+
+	ghcb_set_rax(ghcb, ctxt->regs->ax);
+	ghcb_set_cpl(ghcb, user_mode(ctxt->regs) ? 3 : 0);
+
+	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0);
+	if (ret != ES_OK)
+		return ret;
+
+	if (!ghcb_is_valid_rax(ghcb))
+		return ES_VMM_ERROR;
+
+	ctxt->regs->ax = ghcb->save.rax;
+
+	return ES_OK;
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -374,6 +394,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_MSR:
 		result = handle_msr(ghcb, ctxt);
 		break;
+	case SVM_EXIT_VMMCALL:
+		result = handle_vmmcall(ghcb, ctxt);
+		break;
 	case SVM_EXIT_WBINVD:
 		result = handle_wbinvd(ghcb, ctxt);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 51/62] x86/sev-es: Handle #AC Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (49 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 50/62] x86/sev-es: Handle VMMCALL Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 52/62] x86/sev-es: Handle #DB Events Joerg Roedel
                   ` (12 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Implement a handler for #VC exceptions caused by #AC exceptions. The #AC
exception is just forwarded to do_alignment_check() and not pushed down
to the hypervisor, as requested by the SEV-ES GHCB Standardization
Specification.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 6bd2cae7eb9c..1b873d00e38f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -375,6 +375,10 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_WRITE_DR7:
 		result = handle_dr7_write(ghcb, ctxt, early);
 		break;
+	case SVM_EXIT_EXCP_BASE + X86_TRAP_AC:
+		do_alignment_check(ctxt->regs, 0);
+		result = ES_RETRY;
+		break;
 	case SVM_EXIT_RDTSC:
 	case SVM_EXIT_RDTSCP:
 		result = handle_rdtsc(ghcb, ctxt, exit_code);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 52/62] x86/sev-es: Handle #DB Events
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (50 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 51/62] x86/sev-es: Handle #AC Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 53/62] x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES Joerg Roedel
                   ` (11 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Handle #VC exceptions caused by #DB exceptions in the guest. Do not
forward them to the hypervisor and handle them with do_debug() instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/sev-es.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 1b873d00e38f..700f75fc13e7 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -361,6 +361,15 @@ static enum es_result handle_vmmcall(struct ghcb *ghcb,
 	return ES_OK;
 }
 
+static enum es_result handle_db_exception(struct ghcb *ghcb,
+					  struct es_em_ctxt *ctxt)
+{
+	do_debug(ctxt->regs, 0);
+
+	/* Exception event, do not advance RIP */
+	return ES_RETRY;
+}
+
 static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 					  struct ghcb *ghcb,
 					  unsigned long exit_code,
@@ -375,6 +384,9 @@ static enum es_result handle_vc_exception(struct es_em_ctxt *ctxt,
 	case SVM_EXIT_WRITE_DR7:
 		result = handle_dr7_write(ghcb, ctxt, early);
 		break;
+	case SVM_EXIT_EXCP_BASE + X86_TRAP_DB:
+		result = handle_db_exception(ghcb, ctxt);
+		break;
 	case SVM_EXIT_EXCP_BASE + X86_TRAP_AC:
 		do_alignment_check(ctxt->regs, 0);
 		result = ES_RETRY;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 53/62] x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (51 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 52/62] x86/sev-es: Handle #DB Events Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 54/62] x86/kvm: Add KVM " Joerg Roedel
                   ` (10 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add two new paravirt callbacks to provide hypervisor specific processor
state in the GHCB and to copy state from the hypervisor back to the
processor.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/x86_init.h | 16 +++++++++++++++-
 arch/x86/kernel/sev-es.c        | 12 ++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 96d9cd208610..c4790ec279cc 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -4,8 +4,10 @@
 
 #include <asm/bootparam.h>
 
+struct ghcb;
 struct mpc_bus;
 struct mpc_cpu;
+struct pt_regs;
 struct mpc_table;
 struct cpuinfo_x86;
 
@@ -238,10 +240,22 @@ struct x86_legacy_features {
 /**
  * struct x86_hyper_runtime - x86 hypervisor specific runtime callbacks
  *
- * @pin_vcpu:		pin current vcpu to specified physical cpu (run rarely)
+ * @pin_vcpu:			pin current vcpu to specified physical
+ *				cpu (run rarely)
+ * @sev_es_hcall_prepare:	Load additional hypervisor-specific
+ *				state into the GHCB when doing a VMMCALL under
+ *				SEV-ES. Called from the #VC exception handler.
+ * @sev_es_hcall_finish:	Copies state from the GHCB back into the
+ *				processor (or pt_regs). Also runs checks on the
+ *				state returned from the hypervisor after a
+ *				VMMCALL under SEV-ES.  Needs to return 'false'
+ *				if the checks fail.  Called from the #VC
+ *				exception handler.
  */
 struct x86_hyper_runtime {
 	void (*pin_vcpu)(int cpu);
+	void (*sev_es_hcall_prepare)(struct ghcb *ghcb, struct pt_regs *regs);
+	bool (*sev_es_hcall_finish)(struct ghcb *ghcb, struct pt_regs *regs);
 };
 
 /**
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 700f75fc13e7..6924bb1ad8b2 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -349,6 +349,9 @@ static enum es_result handle_vmmcall(struct ghcb *ghcb,
 	ghcb_set_rax(ghcb, ctxt->regs->ax);
 	ghcb_set_cpl(ghcb, user_mode(ctxt->regs) ? 3 : 0);
 
+	if (x86_platform.hyper.sev_es_hcall_prepare)
+		x86_platform.hyper.sev_es_hcall_prepare(ghcb, ctxt->regs);
+
 	ret = ghcb_hv_call(ghcb, ctxt, SVM_EXIT_VMMCALL, 0, 0);
 	if (ret != ES_OK)
 		return ret;
@@ -358,6 +361,15 @@ static enum es_result handle_vmmcall(struct ghcb *ghcb,
 
 	ctxt->regs->ax = ghcb->save.rax;
 
+	/*
+	 * Call sev_es_hcall_finish() after regs->ax is already set.
+	 * This allows the hypervisor handler to overwrite it again if
+	 * necessary.
+	 */
+	if (x86_platform.hyper.sev_es_hcall_finish &&
+	    !x86_platform.hyper.sev_es_hcall_finish(ghcb, ctxt->regs))
+		return ES_VMM_ERROR;
+
 	return ES_OK;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 54/62] x86/kvm: Add KVM specific VMMCALL handling under SEV-ES
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (52 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 53/62] x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 55/62] x86/vmware: Add VMware specific handling for VMMCALL " Joerg Roedel
                   ` (9 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Implement the callbacks to copy the processor state required by KVM to
the GHCB.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Split out of a larger patch
                   - Adapt to different callback functions ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/kvm.c | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d817f255aed8..318eb906a0b5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,8 @@
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
 #include <asm/cpuidle_haltpoll.h>
+#include <asm/ptrace.h>
+#include <asm/svm.h>
 
 static int kvmapf = 1;
 
@@ -711,13 +713,34 @@ static void __init kvm_init_platform(void)
 	x86_platform.apic_post_init = kvm_apic_init;
 }
 
+#if defined(CONFIG_AMD_MEM_ENCRYPT)
+static void kvm_sev_es_hcall_prepare(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* RAX and CPL are already in the GHCB */
+	ghcb_set_rbx(ghcb, regs->bx);
+	ghcb_set_rcx(ghcb, regs->cx);
+	ghcb_set_rdx(ghcb, regs->dx);
+	ghcb_set_rsi(ghcb, regs->si);
+}
+
+static bool kvm_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	/* No checking of the return state needed */
+	return true;
+}
+#endif
+
 const __initconst struct hypervisor_x86 x86_hyper_kvm = {
-	.name			= "KVM",
-	.detect			= kvm_detect,
-	.type			= X86_HYPER_KVM,
-	.init.guest_late_init	= kvm_guest_init,
-	.init.x2apic_available	= kvm_para_available,
-	.init.init_platform	= kvm_init_platform,
+	.name				= "KVM",
+	.detect				= kvm_detect,
+	.type				= X86_HYPER_KVM,
+	.init.guest_late_init		= kvm_guest_init,
+	.init.x2apic_available		= kvm_para_available,
+	.init.init_platform		= kvm_init_platform,
+#if defined(CONFIG_AMD_MEM_ENCRYPT)
+	.runtime.sev_es_hcall_prepare	= kvm_sev_es_hcall_prepare,
+	.runtime.sev_es_hcall_finish	= kvm_sev_es_hcall_finish,
+#endif
 };
 
 static __init int activate_jump_labels(void)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 55/62] x86/vmware: Add VMware specific handling for VMMCALL under SEV-ES
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (53 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 54/62] x86/kvm: Add KVM " Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 56/62] x86/realmode: Add SEV-ES specific trampoline entry point Joerg Roedel
                   ` (8 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel, Doug Covelli

From: Doug Covelli <dcovelli@vmware.com>

This change adds VMware specific handling for #VC faults caused by
VMMCALL instructions.

Signed-off-by: Doug Covelli <dcovelli@vmware.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Adapt to different paravirt interface ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/cpu/vmware.c | 48 ++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 46d732696c1c..7edab8fcf8bf 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -31,6 +31,7 @@
 #include <asm/timer.h>
 #include <asm/apic.h>
 #include <asm/vmware.h>
+#include <asm/svm.h>
 
 #undef pr_fmt
 #define pr_fmt(fmt)	"vmware: " fmt
@@ -263,10 +264,47 @@ static bool __init vmware_legacy_x2apic_available(void)
 	       (eax & (1 << VMWARE_CMD_LEGACY_X2APIC)) != 0;
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+static void vmware_sev_es_hcall_prepare(struct ghcb *ghcb,
+					struct pt_regs *regs)
+{
+	/* Copy VMWARE specific Hypercall parameters to the GHCB */
+	ghcb_set_rip(ghcb, regs->ip);
+	ghcb_set_rbx(ghcb, regs->bx);
+	ghcb_set_rcx(ghcb, regs->cx);
+	ghcb_set_rdx(ghcb, regs->dx);
+	ghcb_set_rsi(ghcb, regs->si);
+	ghcb_set_rdi(ghcb, regs->di);
+	ghcb_set_rbp(ghcb, regs->bp);
+}
+
+static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, struct pt_regs *regs)
+{
+	if (!(ghcb_is_valid_rbx(ghcb) &&
+	      ghcb_is_valid_rcx(ghcb) &&
+	      ghcb_is_valid_rdx(ghcb) &&
+	      ghcb_is_valid_rsi(ghcb) &&
+	      ghcb_is_valid_rdi(ghcb) &&
+	      ghcb_is_valid_rbp(ghcb)))
+		return false;
+
+	regs->bx = ghcb->save.rbx;
+	regs->cx = ghcb->save.rcx;
+	regs->dx = ghcb->save.rdx;
+	regs->si = ghcb->save.rsi;
+	regs->di = ghcb->save.rdi;
+	regs->bp = ghcb->save.rbp;
+
+	return true;
+}
+#endif
+
 const __initconst struct hypervisor_x86 x86_hyper_vmware = {
-	.name			= "VMware",
-	.detect			= vmware_platform,
-	.type			= X86_HYPER_VMWARE,
-	.init.init_platform	= vmware_platform_setup,
-	.init.x2apic_available	= vmware_legacy_x2apic_available,
+	.name				= "VMware",
+	.detect				= vmware_platform,
+	.type				= X86_HYPER_VMWARE,
+	.init.init_platform		= vmware_platform_setup,
+	.init.x2apic_available		= vmware_legacy_x2apic_available,
+	.runtime.sev_es_hcall_prepare	= vmware_sev_es_hcall_prepare,
+	.runtime.sev_es_hcall_finish	= vmware_sev_es_hcall_finish,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 56/62] x86/realmode: Add SEV-ES specific trampoline entry point
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (54 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 55/62] x86/vmware: Add VMware specific handling for VMMCALL " Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 57/62] x86/realmode: Setup AP jump table Joerg Roedel
                   ` (7 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The code at the trampoline entry point is executed in real-mode. In
real-mode #VC exceptions can't be handled, so anything that might cause
such an exception must be avoided.

In the standard trampoline entry code this is the WBINVD instruction and
the call to verify_cpu(), which are both not needed anyway when running
as an SEV-ES guest.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/realmode.h      |  3 +++
 arch/x86/realmode/rm/header.S        |  3 +++
 arch/x86/realmode/rm/trampoline_64.S | 20 ++++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index b35030eeec36..6590394af309 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -21,6 +21,9 @@ struct real_mode_header {
 	/* SMP trampoline */
 	u32	trampoline_start;
 	u32	trampoline_header;
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	u32	sev_es_trampoline_start;
+#endif
 #ifdef CONFIG_X86_64
 	u32	trampoline_pgd;
 #endif
diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
index af04512c02d9..8c1db5bf5d78 100644
--- a/arch/x86/realmode/rm/header.S
+++ b/arch/x86/realmode/rm/header.S
@@ -20,6 +20,9 @@ SYM_DATA_START(real_mode_header)
 	/* SMP trampoline */
 	.long	pa_trampoline_start
 	.long	pa_trampoline_header
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	.long	pa_sev_es_trampoline_start
+#endif
 #ifdef CONFIG_X86_64
 	.long	pa_trampoline_pgd;
 #endif
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index 251758ed7443..84c5d1b33d10 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -56,6 +56,7 @@ SYM_CODE_START(trampoline_start)
 	testl   %eax, %eax		# Check for return code
 	jnz	no_longmode
 
+.Lswitch_to_protected:
 	/*
 	 * GDT tables in non default location kernel can be beyond 16MB and
 	 * lgdt will not be able to load the address as in real mode default
@@ -80,6 +81,25 @@ no_longmode:
 	jmp no_longmode
 SYM_CODE_END(trampoline_start)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/* SEV-ES supports non-zero IP for entry points - no alignment needed */
+SYM_CODE_START(sev_es_trampoline_start)
+	cli			# We should be safe anyway
+
+	LJMPW_RM(1f)
+1:
+	mov	%cs, %ax	# Code and data in the same place
+	mov	%ax, %ds
+	mov	%ax, %es
+	mov	%ax, %ss
+
+	# Setup stack
+	movl	$rm_stack_end, %esp
+
+	jmp	.Lswitch_to_protected
+SYM_CODE_END(sev_es_trampoline_start)
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
 #include "../kernel/verify_cpu.S"
 
 	.section ".text32","ax"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 57/62] x86/realmode: Setup AP jump table
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (55 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 56/62] x86/realmode: Add SEV-ES specific trampoline entry point Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 58/62] x86/head/64: Don't call verify_cpu() on starting APs Joerg Roedel
                   ` (6 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Tom Lendacky <thomas.lendacky@amd.com>

Setup the AP jump table to point to the SEV-ES trampoline code so that
the APs can boot.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
[ jroedel@suse.de: - Adapted to different code base
                   - Moved AP table setup from SIPI sending path to
		     real-mode setup code ]
Co-developed-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/sev-es.h   | 11 ++++++
 arch/x86/include/uapi/asm/svm.h |  3 ++
 arch/x86/kernel/sev-es.c        | 63 +++++++++++++++++++++++++++++++++
 arch/x86/realmode/init.c        |  6 ++++
 4 files changed, 83 insertions(+)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index a2d0c77dabc3..a4d7574c5c6a 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -78,4 +78,15 @@ static inline u64 copy_lower_bits(u64 out, u64 in, unsigned int bits)
 extern void early_vc_handler(void);
 extern int boot_vc_exception(struct pt_regs *regs);
 
+struct real_mode_header;
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+int sev_es_setup_ap_jump_table(struct real_mode_header *rmh);
+#else /* CONFIG_AMD_MEM_ENCRYPT */
+static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
+{
+	return 0;
+}
+#endif /* CONFIG_AMD_MEM_ENCRYPT*/
+
 #endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 8f36ae021a7f..a19ce9681ec2 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -84,6 +84,9 @@
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ			0x80000001
 #define SVM_VMGEXIT_MMIO_WRITE			0x80000002
+#define SVM_VMGEXIT_AP_JUMP_TABLE		0x80000005
+#define		SVM_VMGEXIT_SET_AP_JUMP_TABLE			0
+#define		SVM_VMGEXIT_GET_AP_JUMP_TABLE			1
 #define SVM_VMGEXIT_UNSUPPORTED_EVENT		0x8000ffff
 
 #define SVM_EXIT_ERR           -1
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 6924bb1ad8b2..d8193d37ed2b 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 
 #include <asm/trap_defs.h>
+#include <asm/realmode.h>
 #include <asm/sev-es.h>
 #include <asm/fpu/internal.h>
 #include <asm/processor.h>
@@ -42,6 +43,8 @@ static DEFINE_PER_CPU_DECRYPTED(struct ghcb, ghcb_page) __aligned(PAGE_SIZE);
 /* Needed in early_forward_exception */
 extern void early_exception(struct pt_regs *regs, int trapnr);
 
+static inline u64 read_ghcb_msr(void);
+
 static inline u64 read_ghcb_msr(void)
 {
 	return native_read_msr(MSR_AMD64_SEV_ES_GHCB);
@@ -139,6 +142,66 @@ static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
 /* Include code shared with pre-decompression boot stage */
 #include "sev-es-shared.c"
 
+static u64 sev_es_get_jump_table_addr(void)
+{
+	unsigned long flags;
+	struct ghcb *ghcb;
+	u64 ret;
+
+	local_irq_save(flags);
+
+	ghcb = this_cpu_ptr(&ghcb_page);
+	ghcb_invalidate(ghcb);
+
+	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE);
+	ghcb_set_sw_exit_info_1(ghcb, SVM_VMGEXIT_GET_AP_JUMP_TABLE);
+	ghcb_set_sw_exit_info_2(ghcb, 0);
+
+	write_ghcb_msr(__pa(ghcb));
+	VMGEXIT();
+
+	if (!ghcb_is_valid_sw_exit_info_1(ghcb) ||
+	    !ghcb_is_valid_sw_exit_info_2(ghcb))
+		ret = 0;
+
+	ret = ghcb->save.sw_exit_info_2;
+
+	local_irq_restore(flags);
+
+	return ret;
+}
+
+int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
+{
+	u16 startup_cs, startup_ip;
+	phys_addr_t jump_table_pa;
+	u64 jump_table_addr;
+	u16 *jump_table;
+
+	jump_table_addr = sev_es_get_jump_table_addr();
+
+	/* Check if AP Jump Table is non-zero and page-aligned */
+	if (!jump_table_addr || jump_table_addr & ~PAGE_MASK)
+		return 0;
+
+	jump_table_pa = jump_table_addr & PAGE_MASK;
+
+	startup_cs = (u16)(rmh->trampoline_start >> 4);
+	startup_ip = (u16)(rmh->sev_es_trampoline_start -
+			   rmh->trampoline_start);
+
+	jump_table = ioremap_encrypted(jump_table_pa, PAGE_SIZE);
+	if (!jump_table)
+		return -EIO;
+
+	jump_table[0] = startup_ip;
+	jump_table[1] = startup_cs;
+
+	iounmap(jump_table);
+
+	return 0;
+}
+
 static enum es_result handle_msr(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
 {
 	struct pt_regs *regs = ctxt->regs;
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 262f83cad355..1c5cbfd102d5 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -9,6 +9,7 @@
 #include <asm/realmode.h>
 #include <asm/tlbflush.h>
 #include <asm/crash.h>
+#include <asm/sev-es.h>
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -107,6 +108,11 @@ static void __init setup_real_mode(void)
 	if (sme_active())
 		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
 
+	if (sev_es_active()) {
+		if (sev_es_setup_ap_jump_table(real_mode_header))
+			panic("Failed to update SEV-ES AP Jump Table");
+	}
+
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
 	trampoline_pgd[511] = init_top_pgt[511].pgd;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 58/62] x86/head/64: Don't call verify_cpu() on starting APs
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (56 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 57/62] x86/realmode: Setup AP jump table Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 59/62] x86/head/64: Rename start_cpu0 Joerg Roedel
                   ` (5 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The APs are not ready to handle exceptions when verify_cpu() is called
in secondary_startup_64.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/realmode.h | 1 +
 arch/x86/kernel/head_64.S       | 1 +
 arch/x86/realmode/init.c        | 6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 6590394af309..5c97807c38a4 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -69,6 +69,7 @@ extern unsigned char startup_32_smp[];
 extern unsigned char boot_gdt[];
 #else
 extern unsigned char secondary_startup_64[];
+extern unsigned char secondary_startup_64_no_verify[];
 #endif
 
 static inline size_t real_mode_size_needed(void)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 13ebf7d3af2c..9dd602bd6244 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -144,6 +144,7 @@ SYM_CODE_START(secondary_startup_64)
 	/* Sanitize CPU configuration */
 	call verify_cpu
 
+SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	/*
 	 * Retrieve the modifier (SME encryption mask if SME is active) to be
 	 * added to the initial pgdir entry that will be programmed into CR3.
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 1c5cbfd102d5..030c38268069 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -109,6 +109,12 @@ static void __init setup_real_mode(void)
 		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
 
 	if (sev_es_active()) {
+		/*
+		 * Skip the call to verify_cpu() in secondary_startup_64 as it
+		 * will cause #VC exceptions when the AP can't handle them yet.
+		 */
+		trampoline_header->start = (u64) secondary_startup_64_no_verify;
+
 		if (sev_es_setup_ap_jump_table(real_mode_header))
 			panic("Failed to update SEV-ES AP Jump Table");
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 59/62] x86/head/64: Rename start_cpu0
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (57 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 58/62] x86/head/64: Don't call verify_cpu() on starting APs Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 60/62] x86/sev-es: Support CPU offline/online Joerg Roedel
                   ` (4 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

For SEV-ES this entry point will be used for restarting APs after they
have been offlined. Remove the '0' from the name to reflect that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/cpu.h | 2 +-
 arch/x86/kernel/head_32.S  | 4 ++--
 arch/x86/kernel/head_64.S  | 6 +++---
 arch/x86/kernel/smpboot.c  | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..00668daf8991 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -29,7 +29,7 @@ struct x86_cpu {
 #ifdef CONFIG_HOTPLUG_CPU
 extern int arch_register_cpu(int num);
 extern void arch_unregister_cpu(int);
-extern void start_cpu0(void);
+extern void start_cpu(void);
 #ifdef CONFIG_DEBUG_HOTPLUG_CPU0
 extern int _debug_hotplug_cpu(int cpu, int action);
 #endif
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 3923ab4630d7..1a280152bd10 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -180,12 +180,12 @@ SYM_CODE_END(startup_32)
  * up already except stack. We just set up stack here. Then call
  * start_secondary().
  */
-SYM_FUNC_START(start_cpu0)
+SYM_FUNC_START(start_cpu)
 	movl initial_stack, %ecx
 	movl %ecx, %esp
 	call *(initial_code)
 1:	jmp 1b
-SYM_FUNC_END(start_cpu0)
+SYM_FUNC_END(start_cpu)
 #endif
 
 /*
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 9dd602bd6244..681f3aafd424 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -293,15 +293,15 @@ SYM_CODE_END(secondary_startup_64)
 
 #ifdef CONFIG_HOTPLUG_CPU
 /*
- * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
+ * CPU entry point. It's called from play_dead(). Everything has been set
  * up already except stack. We just set up stack here. Then call
  * start_secondary() via .Ljump_to_C_code.
  */
-SYM_CODE_START(start_cpu0)
+SYM_CODE_START(start_cpu)
 	UNWIND_HINT_EMPTY
 	movq	initial_stack(%rip), %rsp
 	jmp	.Ljump_to_C_code
-SYM_CODE_END(start_cpu0)
+SYM_CODE_END(start_cpu)
 #endif
 
 	/* Both SMP bootup and ACPI suspend change these variables */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 69881b2d446c..19aa18f1e307 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1717,7 +1717,7 @@ static inline void mwait_play_dead(void)
 		 * If NMI wants to wake up CPU0, start CPU0.
 		 */
 		if (wakeup_cpu0())
-			start_cpu0();
+			start_cpu();
 	}
 }
 
@@ -1732,7 +1732,7 @@ void hlt_play_dead(void)
 		 * If NMI wants to wake up CPU0, start CPU0.
 		 */
 		if (wakeup_cpu0())
-			start_cpu0();
+			start_cpu();
 	}
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 60/62] x86/sev-es: Support CPU offline/online
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (58 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 59/62] x86/head/64: Rename start_cpu0 Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 61/62] x86/cpufeature: Add SEV_ES_GUEST CPU Feature Joerg Roedel
                   ` (3 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Add a play_dead handler when running under SEV-ES. This is needed
because the hypervisor can't deliver an SIPI request to restart the AP.
Instead the kernel has to issue a VMGEXIT to halt the VCPU. When the
hypervisor would deliver and SIPI is wakes up the VCPU instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kernel/sev-es.c        | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index a19ce9681ec2..20a05839dd9a 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -84,6 +84,7 @@
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ			0x80000001
 #define SVM_VMGEXIT_MMIO_WRITE			0x80000002
+#define SVM_VMGEXIT_AP_HLT_LOOP			0x80000004
 #define SVM_VMGEXIT_AP_JUMP_TABLE		0x80000005
 #define		SVM_VMGEXIT_SET_AP_JUMP_TABLE			0
 #define		SVM_VMGEXIT_GET_AP_JUMP_TABLE			1
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d8193d37ed2b..755708f72824 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -22,6 +22,8 @@
 #include <asm/processor.h>
 #include <asm/traps.h>
 #include <asm/svm.h>
+#include <asm/smp.h>
+#include <asm/cpu.h>
 
 #define DR7_RESET_VALUE        0x400
 
@@ -252,6 +254,48 @@ static bool __init setup_ghcb(void)
 	return true;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void sev_es_ap_hlt_loop(void)
+{
+	struct ghcb *ghcb;
+
+	ghcb = this_cpu_ptr(&ghcb_page);
+
+	while (true) {
+		ghcb_invalidate(ghcb);
+		ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_HLT_LOOP);
+		ghcb_set_sw_exit_info_1(ghcb, 0);
+		ghcb_set_sw_exit_info_2(ghcb, 0);
+
+		write_ghcb_msr(__pa(ghcb));
+		VMGEXIT();
+
+		/* Wakup Signal? */
+		if (ghcb_is_valid_sw_exit_info_2(ghcb) &&
+		    ghcb->save.sw_exit_info_2 != 0)
+			break;
+	}
+}
+
+void sev_es_play_dead(void)
+{
+	play_dead_common();
+
+	/* IRQs now disabled */
+
+	sev_es_ap_hlt_loop();
+
+	/*
+	 * If we get here, the VCPU was woken up again. Jump to CPU
+	 * startup code to get it back online.
+	 */
+
+	start_cpu();
+}
+#else  /* CONFIG_HOTPLUG_CPU */
+#define sev_es_play_dead	native_play_dead
+#endif /* CONFIG_HOTPLUG_CPU */
+
 void encrypted_state_init_ghcbs(void)
 {
 	int cpu;
@@ -267,6 +311,8 @@ void encrypted_state_init_ghcbs(void)
 				     sizeof(ghcb_page) >> PAGE_SHIFT);
 		memset(ghcb, 0, sizeof(*ghcb));
 	}
+
+	smp_ops.play_dead = sev_es_play_dead;
 }
 
 static void __init early_forward_exception(struct es_em_ctxt *ctxt)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 61/62] x86/cpufeature: Add SEV_ES_GUEST CPU Feature
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (59 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 60/62] x86/sev-es: Support CPU offline/online Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 13:52 ` [PATCH 62/62] x86/sev-es: Add NMI state tracking Joerg Roedel
                   ` (2 subsequent siblings)
  63 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

The feature bit will indicate whether the kernel runs as an SEV-ES
guest. This can be used to apply alternatives at boot for SEV-ES guests
and provides a way for user-space to detect whether it runs as an SEV-ES
guest.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/amd.c          | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 26e4ee209f7b..e864327812e2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -234,6 +234,7 @@
 #define X86_FEATURE_EPT_AD		( 8*32+17) /* Intel Extended Page Table access-dirty bit */
 #define X86_FEATURE_VMCALL		( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
+#define X86_FEATURE_SEV_ES_GUEST	( 8*32+20) /* SEV-ES Guest */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index aad2223862ef..a1eb39153771 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -484,7 +484,6 @@ static void early_init_amd_mc(struct cpuinfo_x86 *c)
 
 static void bsp_init_amd(struct cpuinfo_x86 *c)
 {
-
 #ifdef CONFIG_X86_64
 	if (c->x86 >= 0xf) {
 		unsigned long long tseg;
@@ -614,6 +613,11 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 		setup_clear_cpu_cap(X86_FEATURE_SEV);
 		setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
 	}
+
+	if (!rdmsrl_safe(MSR_AMD64_SEV, &msr)) {
+		if (msr & MSR_AMD64_SEV_ES_ENABLED)
+			set_cpu_cap(c, X86_FEATURE_SEV_ES_GUEST);
+	}
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH 62/62] x86/sev-es: Add NMI state tracking
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (60 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 61/62] x86/cpufeature: Add SEV_ES_GUEST CPU Feature Joerg Roedel
@ 2020-02-11 13:52 ` Joerg Roedel
  2020-02-11 22:50   ` Andy Lutomirski
  2020-02-11 14:50 ` [RFC PATCH 00/62] Linux as SEV-ES Guest Support Peter Zijlstra
  2020-02-12  3:48 ` Andy Lutomirski
  63 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 13:52 UTC (permalink / raw)
  To: x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel, Joerg Roedel

From: Joerg Roedel <jroedel@suse.de>

Keep NMI state in SEV-ES code so the kernel can re-enable NMIs for the
vCPU when it reaches IRET.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_64.S       | 48 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/sev-es.h   | 27 +++++++++++++++++++
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kernel/nmi.c           |  8 ++++++
 arch/x86/kernel/sev-es.c        | 28 ++++++++++++++++++-
 5 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 729876d368c5..355470b36896 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
 #include <asm/export.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include <asm/sev-es.h>
 #include <linux/err.h>
 
 #include "calling.h"
@@ -629,6 +630,13 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 	ud2
 1:
 #endif
+
+	/*
+	 * This code path is used by the NMI handler, so check if NMIs
+	 * need to be re-enabled when running as an SEV-ES guest.
+	 */
+	SEV_ES_IRET_CHECK
+
 	POP_REGS pop_rdi=0
 
 	/*
@@ -1474,6 +1482,8 @@ SYM_CODE_START(nmi)
 	movq	$-1, %rsi
 	call	do_nmi
 
+	SEV_ES_NMI_COMPLETE
+
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, because we don't want to enable interrupts.
@@ -1599,6 +1609,7 @@ nested_nmi_out:
 	popq	%rdx
 
 	/* We are returning to kernel mode, so this cannot result in a fault. */
+	SEV_ES_NMI_COMPLETE
 	iretq
 
 first_nmi:
@@ -1687,6 +1698,12 @@ end_repeat_nmi:
 	movq	$-1, %rsi
 	call	do_nmi
 
+	/*
+	 * When running as an SEV-ES guest, jump to the SEV-ES NMI IRET
+	 * path.
+	 */
+	SEV_ES_NMI_COMPLETE
+
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
 
@@ -1715,6 +1732,9 @@ nmi_restore:
 	std
 	movq	$0, 5*8(%rsp)		/* clear "NMI executing" */
 
+nmi_return:
+	UNWIND_HINT_IRET_REGS
+
 	/*
 	 * iretq reads the "iret" frame and exits the NMI stack in a
 	 * single instruction.  We are returning to kernel mode, so this
@@ -1724,6 +1744,34 @@ nmi_restore:
 	iretq
 SYM_CODE_END(nmi)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+SYM_CODE_START(sev_es_iret_user)
+	UNWIND_HINT_IRET_REGS offset=8
+	/*
+	 * The kernel jumps here directly from
+	 * swapgs_restore_regs_and_return_to_usermode. %rsp points already to
+	 * trampoline stack, but %cr3 is still from kernel. User-regs are live
+	 * except %rdi. Switch to user CR3, restore user %rdi and user gs_base
+	 * and single-step over IRET
+	 */
+	SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
+	popq	%rdi
+	SWAPGS
+	/*
+	 * Enable single-stepping and execute IRET. When IRET is
+	 * finished the resulting #DB exception will cause a #VC
+	 * exception to be raised. The #VC exception handler will send a
+	 * NMI-complete message to the hypervisor to re-open the NMI
+	 * window.
+	 */
+sev_es_iret_kernel:
+	pushf
+	btsq $X86_EFLAGS_TF_BIT, (%rsp)
+	popf
+	iretq
+SYM_CODE_END(sev_es_iret_user)
+#endif
+
 #ifndef CONFIG_IA32_EMULATION
 /*
  * This handles SYSCALL from 32-bit code.  There is no way to program
diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index a4d7574c5c6a..22f45782149e 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -8,6 +8,8 @@
 #ifndef __ASM_ENCRYPTED_STATE_H
 #define __ASM_ENCRYPTED_STATE_H
 
+#ifndef __ASSEMBLY__
+
 #include <linux/types.h>
 #include <asm/insn.h>
 
@@ -82,11 +84,36 @@ struct real_mode_header;
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 int sev_es_setup_ap_jump_table(struct real_mode_header *rmh);
+void sev_es_nmi_enter(void);
 #else /* CONFIG_AMD_MEM_ENCRYPT */
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh)
 {
 	return 0;
 }
+static inline void sev_es_nmi_enter(void) { }
+#endif /* CONFIG_AMD_MEM_ENCRYPT*/
+
+#else /* !__ASSEMBLY__ */
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define SEV_ES_NMI_COMPLETE		\
+	ALTERNATIVE	"", "callq sev_es_nmi_complete", X86_FEATURE_SEV_ES_GUEST
+
+.macro	SEV_ES_IRET_CHECK
+	ALTERNATIVE	"jmp	.Lend_\@", "", X86_FEATURE_SEV_ES_GUEST
+	movq	PER_CPU_VAR(sev_es_in_nmi), %rdi
+	testq	%rdi, %rdi
+	jz	.Lend_\@
+	callq	sev_es_nmi_complete
+.Lend_\@:
+.endm
+
+#else  /* CONFIG_AMD_MEM_ENCRYPT */
+#define	SEV_ES_NMI_RETURN
+.macro	SEV_ES_IRET_CHECK
+.endm
 #endif /* CONFIG_AMD_MEM_ENCRYPT*/
 
+#endif /* __ASSEMBLY__ */
+
 #endif
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 20a05839dd9a..0f837339db66 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -84,6 +84,7 @@
 /* SEV-ES software-defined VMGEXIT events */
 #define SVM_VMGEXIT_MMIO_READ			0x80000001
 #define SVM_VMGEXIT_MMIO_WRITE			0x80000002
+#define SVM_VMGEXIT_NMI_COMPLETE		0x80000003
 #define SVM_VMGEXIT_AP_HLT_LOOP			0x80000004
 #define SVM_VMGEXIT_AP_JUMP_TABLE		0x80000005
 #define		SVM_VMGEXIT_SET_AP_JUMP_TABLE			0
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 54c21d6abd5a..7312a6d4d50f 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -37,6 +37,7 @@
 #include <asm/reboot.h>
 #include <asm/cache.h>
 #include <asm/nospec-branch.h>
+#include <asm/sev-es.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -510,6 +511,13 @@ NOKPROBE_SYMBOL(is_debug_stack);
 dotraplinkage notrace void
 do_nmi(struct pt_regs *regs, long error_code)
 {
+	/*
+	 * For SEV-ES the kernel needs to track whether NMIs are blocked until
+	 * IRET is reached, even when the CPU is offline.
+	 */
+	if (sev_es_active())
+		sev_es_nmi_enter();
+
 	if (IS_ENABLED(CONFIG_SMP) && cpu_is_offline(smp_processor_id()))
 		return;
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 755708f72824..c90d250c767e 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -36,6 +36,7 @@ struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
  */
 struct ghcb __initdata *boot_ghcb;
 static DEFINE_PER_CPU(unsigned long, cached_dr7) = DR7_RESET_VALUE;
+static DEFINE_PER_CPU(bool, sev_es_in_nmi) = false;
 /* Needed before per-cpu access is set up */
 static unsigned long early_dr7 = DR7_RESET_VALUE;
 
@@ -144,6 +145,28 @@ static phys_addr_t es_slow_virt_to_phys(struct ghcb *ghcb, long vaddr)
 /* Include code shared with pre-decompression boot stage */
 #include "sev-es-shared.c"
 
+void sev_es_nmi_enter(void)
+{
+	this_cpu_write(sev_es_in_nmi, true);
+}
+
+void sev_es_nmi_complete(void)
+{
+	struct ghcb *ghcb;
+
+	ghcb = this_cpu_ptr(&ghcb_page);
+
+	ghcb_invalidate(ghcb);
+	ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE);
+	ghcb_set_sw_exit_info_1(ghcb, 0);
+	ghcb_set_sw_exit_info_2(ghcb, 0);
+
+	write_ghcb_msr(__pa(ghcb));
+	VMGEXIT();
+
+	this_cpu_write(sev_es_in_nmi, false);
+}
+
 static u64 sev_es_get_jump_table_addr(void)
 {
 	unsigned long flags;
@@ -485,7 +508,10 @@ static enum es_result handle_vmmcall(struct ghcb *ghcb,
 static enum es_result handle_db_exception(struct ghcb *ghcb,
 					  struct es_em_ctxt *ctxt)
 {
-	do_debug(ctxt->regs, 0);
+	if (this_cpu_read(sev_es_in_nmi))
+		sev_es_nmi_complete();
+	else
+		do_debug(ctxt->regs, 0);
 
 	/* Exception event, do not advance RIP */
 	return ES_RETRY;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (61 preceding siblings ...)
  2020-02-11 13:52 ` [PATCH 62/62] x86/sev-es: Add NMI state tracking Joerg Roedel
@ 2020-02-11 14:50 ` Peter Zijlstra
  2020-02-11 15:43   ` Joerg Roedel
  2020-02-12  3:48 ` Andy Lutomirski
  63 siblings, 1 reply; 109+ messages in thread
From: Peter Zijlstra @ 2020-02-11 14:50 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Thomas Hellstrom,
	Jiri Slaby, Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	linux-kernel, kvm, virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 02:51:54PM +0100, Joerg Roedel wrote:
> NMI Special Handling
> --------------------
> 
> The last thing that needs special handling with SEV-ES are NMIs.
> Hypervisors usually start to intercept IRET instructions when an NMI got
> injected to find out when the NMI window is re-opened. But handling IRET
> intercepts requires the hypervisor to access guest register state and is
> not possible with SEV-ES. The specification under [1] solves this
> problem with an NMI_COMPLETE message sent my the guest to the
> hypervisor, upon which the hypervisor re-opens the NMI window for the
> guest.
> 
> This patch-set sends the NMI_COMPLETE message before the actual IRET,
> while the kernel is still on a valid stack and kernel cr3. This opens
> the NMI-window a few instructions early, but this is fine as under
> x86-64 Linux NMI-nesting is safe. The alternative would be to
> single-step over the IRET, but that requires more intrusive changes to
> the entry code because it does not handle entries from kernel-mode while
> on the entry stack.
> 
> Besides the special handling above the patch-set contains the handlers
> for the #VC exception and all the exit-codes specified in [1].

Oh gawd; so instead of improving the whole NMI situation, AMD went and
made it worse still ?!?

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-11 14:50 ` [RFC PATCH 00/62] Linux as SEV-ES Guest Support Peter Zijlstra
@ 2020-02-11 15:43   ` Joerg Roedel
  2020-02-11 22:12     ` Andy Lutomirski
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-11 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Thomas Hellstrom,
	Jiri Slaby, Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook,
	linux-kernel, kvm, virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote:
 
> Oh gawd; so instead of improving the whole NMI situation, AMD went and
> made it worse still ?!?

Well, depends on how you want to see it. Under SEV-ES an IRET will not
re-open the NMI window, but the guest has to tell the hypervisor
explicitly when it is ready to receive new NMIs via the NMI_COMPLETE
message.  NMIs stay blocked even when an exception happens in the
handler, so this could also be seen as a (slight) improvement.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-11 15:43   ` Joerg Roedel
@ 2020-02-11 22:12     ` Andy Lutomirski
  2020-02-12 13:54       ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Peter Zijlstra, X86 ML, H. Peter Anvin, Andy Lutomirski,
	Dave Hansen, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 7:43 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote:
>
> > Oh gawd; so instead of improving the whole NMI situation, AMD went and
> > made it worse still ?!?
>
> Well, depends on how you want to see it. Under SEV-ES an IRET will not
> re-open the NMI window, but the guest has to tell the hypervisor
> explicitly when it is ready to receive new NMIs via the NMI_COMPLETE
> message.  NMIs stay blocked even when an exception happens in the
> handler, so this could also be seen as a (slight) improvement.
>

I don't get it.  VT-x has a VMCS bit "Interruptibility
state"."Blocking by NMI" that tracks the NMI masking state.  Would it
have killed AMD to solve the problem they same way to retain
architectural behavior inside a SEV-ES VM?

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage
  2020-02-11 13:52 ` [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage Joerg Roedel
@ 2020-02-11 22:13   ` Andy Lutomirski
  0 siblings, 0 replies; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> The x86-64 ABI defines a red-zone on the stack:
>
>   The 128-byte area beyond the location pointed to by %rsp is
>   considered to be reserved and shall not be modified by signal or
>   interrupt handlers. 10 Therefore, functions may use this area for
>   temporary data that is not needed across function calls. In
>   particular, leaf functions may use this area for their entire stack
>   frame, rather than adjusting the stack pointer in the prologue and
>   epilogue. This area is known as the red zone.
>
> This is not compatible with exception handling, so disable it for the
> pre-decompression boot code.

Acked-by: Andy Lutomirski <luto@kernel.org>

I admit that I thought we already supported exceptions this early.  At
least I seem to remember writing this code.  Maybe it never got
upstreamed?

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure
  2020-02-11 13:52 ` [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure Joerg Roedel
@ 2020-02-11 22:18   ` Andy Lutomirski
  2020-02-12 11:19     ` Joerg Roedel
  2020-02-14 19:40   ` Andi Kleen
  1 sibling, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:18 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Add code needed to setup an IDT in the early pre-decompression
> boot-code. The IDT is loaded first in startup_64, which is after
> EfiExitBootServices() has been called, and later reloaded when the
> kernel image has been relocated to the end of the decompression area.
>
> This allows to setup different IDT handlers before and after the
> relocation.
>

> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> new file mode 100644
> index 000000000000..46ecea671b90
> --- /dev/null
> +++ b/arch/x86/boot/compressed/idt_64.c
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#include <asm/trap_defs.h>
> +#include <asm/segment.h>
> +#include "misc.h"
> +
> +static void set_idt_entry(int vector, void (*handler)(void))
> +{
> +       unsigned long address = (unsigned long)handler;
> +       gate_desc entry;
> +
> +       memset(&entry, 0, sizeof(entry));
> +
> +       entry.offset_low    = (u16)(address & 0xffff);
> +       entry.segment       = __KERNEL_CS;
> +       entry.bits.type     = GATE_TRAP;

^^^

I realize we're not running a real kernel here, but GATE_TRAP is
madness.  Please use GATE_INTERRUPT.

> +       entry.bits.p        = 1;
> +       entry.offset_middle = (u16)((address >> 16) & 0xffff);
> +       entry.offset_high   = (u32)(address >> 32);
> +
> +       memcpy(&boot_idt[vector], &entry, sizeof(entry));
> +}
> +
> +/* Have this here so we don't need to include <asm/desc.h> */
> +static void load_boot_idt(const struct desc_ptr *dtr)
> +{
> +       asm volatile("lidt %0"::"m" (*dtr));
> +}
> +
> +/* Setup IDT before kernel jumping to  .Lrelocated */
> +void load_stage1_idt(void)
> +{
> +       boot_idt_desc.address = (unsigned long)boot_idt;
> +
> +       load_boot_idt(&boot_idt_desc);
> +}
> +
> +/* Setup IDT after kernel jumping to  .Lrelocated */
> +void load_stage2_idt(void)
> +{
> +       boot_idt_desc.address = (unsigned long)boot_idt;
> +
> +       load_boot_idt(&boot_idt_desc);
> +}
> diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
> new file mode 100644
> index 000000000000..0b2b6cf747d2
> --- /dev/null
> +++ b/arch/x86/boot/compressed/idt_handlers_64.S
> @@ -0,0 +1,71 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Early IDT handler entry points
> + *
> + * Copyright (C) 2019 SUSE
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + */
> +
> +#include <asm/segment.h>
> +
> +.macro EXCEPTION_HANDLER name function error_code=0
> +SYM_FUNC_START(\name)
> +
> +       /* Build pt_regs */
> +       .if \error_code == 0
> +       pushq   $0
> +       .endif

cld

> +
> +       pushq   %rdi
> +       pushq   %rsi
> +       pushq   %rdx
> +       pushq   %rcx
> +       pushq   %rax
> +       pushq   %r8
> +       pushq   %r9
> +       pushq   %r10
> +       pushq   %r11
> +       pushq   %rbx
> +       pushq   %rbp
> +       pushq   %r12
> +       pushq   %r13
> +       pushq   %r14
> +       pushq   %r15
> +
> +       /* Call handler with pt_regs */
> +       movq    %rsp, %rdi
> +       call    \function
> +
> +       /* Restore regs */
> +       popq    %r15
> +       popq    %r14
> +       popq    %r13
> +       popq    %r12
> +       popq    %rbp
> +       popq    %rbx
> +       popq    %r11
> +       popq    %r10
> +       popq    %r9
> +       popq    %r8
> +       popq    %rax
> +       popq    %rcx
> +       popq    %rdx
> +       popq    %rsi
> +       popq    %rdi

if error_code?

> +
> +       /* Remove error code and return */
> +       addq    $8, %rsp
> +
> +       /*
> +        * Make sure we return to __KERNEL_CS - the CS selector on
> +        * the IRET frame might still be from an old BIOS GDT
> +        */
> +       movq    $__KERNEL_CS, 8(%rsp)
> +

If this actually happens, you have a major bug.  Please sanitize all
the segment registers after installing the GDT rather than hacking
around it here.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler
  2020-02-11 13:52 ` [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler Joerg Roedel
@ 2020-02-11 22:23   ` Andy Lutomirski
  2020-02-12 11:38     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Add the first handler for #VC exceptions. At stage 1 there is no GHCB
> yet becaue we might still be on the EFI page table and thus can't map
> memory unencrypted.
>
> The stage 1 handler is limited to the MSR based protocol to talk to
> the hypervisor and can only support CPUID exit-codes, but that is
> enough to get to stage 2.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/boot/compressed/Makefile          |  1 +
>  arch/x86/boot/compressed/idt_64.c          |  4 ++
>  arch/x86/boot/compressed/idt_handlers_64.S |  4 ++
>  arch/x86/boot/compressed/misc.h            |  1 +
>  arch/x86/boot/compressed/sev-es.c          | 42 ++++++++++++++
>  arch/x86/include/asm/msr-index.h           |  1 +
>  arch/x86/include/asm/sev-es.h              | 45 +++++++++++++++
>  arch/x86/include/asm/trap_defs.h           |  1 +
>  arch/x86/kernel/sev-es-shared.c            | 66 ++++++++++++++++++++++
>  9 files changed, 165 insertions(+)
>  create mode 100644 arch/x86/boot/compressed/sev-es.c
>  create mode 100644 arch/x86/include/asm/sev-es.h
>  create mode 100644 arch/x86/kernel/sev-es-shared.c
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index e6b3e0fc48de..583678c78e1b 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -84,6 +84,7 @@ ifdef CONFIG_X86_64
>         vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
>         vmlinux-objs-y += $(obj)/mem_encrypt.o
>         vmlinux-objs-y += $(obj)/pgtable_64.o
> +       vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
>  endif
>
>  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> index 84ba57d9d436..bdd20dfd1fd0 100644
> --- a/arch/x86/boot/compressed/idt_64.c
> +++ b/arch/x86/boot/compressed/idt_64.c
> @@ -31,6 +31,10 @@ void load_stage1_idt(void)
>  {
>         boot_idt_desc.address = (unsigned long)boot_idt;
>
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +       set_idt_entry(X86_TRAP_VC, boot_stage1_vc_handler);
> +#endif
> +
>         load_boot_idt(&boot_idt_desc);
>  }
>
> diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
> index f7f1ea66dcbf..330eb4e5c8b3 100644
> --- a/arch/x86/boot/compressed/idt_handlers_64.S
> +++ b/arch/x86/boot/compressed/idt_handlers_64.S
> @@ -71,3 +71,7 @@ SYM_FUNC_END(\name)
>         .code64
>
>  EXCEPTION_HANDLER      boot_pf_handler do_boot_page_fault error_code=1
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +EXCEPTION_HANDLER      boot_stage1_vc_handler no_ghcb_vc_handler error_code=1
> +#endif
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 4e5bc688f467..0e3508c5c15c 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -141,5 +141,6 @@ extern struct desc_ptr boot_idt_desc;
>
>  /* IDT Entry Points */
>  void boot_pf_handler(void);
> +void boot_stage1_vc_handler(void);
>
>  #endif /* BOOT_COMPRESSED_MISC_H */
> diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
> new file mode 100644
> index 000000000000..8d13121a8cf2
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sev-es.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD Encrypted Register State Support
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + */
> +
> +#include <linux/kernel.h>
> +
> +#include <asm/sev-es.h>
> +#include <asm/msr-index.h>
> +#include <asm/ptrace.h>
> +#include <asm/svm.h>
> +
> +#include "misc.h"
> +
> +static inline u64 read_ghcb_msr(void)
> +{
> +       unsigned long low, high;
> +
> +       asm volatile("rdmsr\n" : "=a" (low), "=d" (high) :
> +                       "c" (MSR_AMD64_SEV_ES_GHCB));
> +
> +       return ((high << 32) | low);
> +}
> +
> +static inline void write_ghcb_msr(u64 val)
> +{
> +       u32 low, high;
> +
> +       low  = val & 0xffffffffUL;
> +       high = val >> 32;
> +
> +       asm volatile("wrmsr\n" : : "c" (MSR_AMD64_SEV_ES_GHCB),
> +                       "a"(low), "d" (high) : "memory");
> +}
> +
> +#undef __init
> +#define __init
> +
> +/* Include code for early handlers */
> +#include "../../kernel/sev-es-shared.c"
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ebe1685e92dd..b6139b70db54 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -432,6 +432,7 @@
>  #define MSR_AMD64_IBSBRTARGET          0xc001103b
>  #define MSR_AMD64_IBSOPDATA4           0xc001103d
>  #define MSR_AMD64_IBS_REG_COUNT_MAX    8 /* includes MSR_AMD64_IBSBRTARGET */
> +#define MSR_AMD64_SEV_ES_GHCB          0xc0010130
>  #define MSR_AMD64_SEV                  0xc0010131
>  #define MSR_AMD64_SEV_ENABLED_BIT      0
>  #define MSR_AMD64_SEV_ENABLED          BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
> diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
> new file mode 100644
> index 000000000000..f524b40aef07
> --- /dev/null
> +++ b/arch/x86/include/asm/sev-es.h
> @@ -0,0 +1,45 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * AMD Encrypted Register State Support
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + */
> +
> +#ifndef __ASM_ENCRYPTED_STATE_H
> +#define __ASM_ENCRYPTED_STATE_H
> +
> +#include <linux/types.h>
> +
> +#define GHCB_SEV_CPUID_REQ     0x004UL
> +#define                GHCB_CPUID_REQ_EAX      0
> +#define                GHCB_CPUID_REQ_EBX      1
> +#define                GHCB_CPUID_REQ_ECX      2
> +#define                GHCB_CPUID_REQ_EDX      3
> +#define                GHCB_CPUID_REQ(fn, reg) (GHCB_SEV_CPUID_REQ | \
> +                                       (((unsigned long)reg & 3) << 30) | \
> +                                       (((unsigned long)fn) << 32))
> +
> +#define GHCB_SEV_CPUID_RESP    0x005UL
> +#define GHCB_SEV_TERMINATE     0x100UL
> +
> +#define        GHCB_SEV_GHCB_RESP_CODE(v)      ((v) & 0xfff)
> +#define        VMGEXIT()                       { asm volatile("rep; vmmcall\n\r"); }
> +
> +static inline u64 lower_bits(u64 val, unsigned int bits)
> +{
> +       u64 mask = (1ULL << bits) - 1;
> +
> +       return (val & mask);
> +}
> +
> +static inline u64 copy_lower_bits(u64 out, u64 in, unsigned int bits)
> +{
> +       u64 mask = (1ULL << bits) - 1;
> +
> +       out &= ~mask;
> +       out |= lower_bits(in, bits);
> +
> +       return out;
> +}
> +
> +#endif
> diff --git a/arch/x86/include/asm/trap_defs.h b/arch/x86/include/asm/trap_defs.h
> index 488f82ac36da..af45d65f0458 100644
> --- a/arch/x86/include/asm/trap_defs.h
> +++ b/arch/x86/include/asm/trap_defs.h
> @@ -24,6 +24,7 @@ enum {
>         X86_TRAP_AC,            /* 17, Alignment Check */
>         X86_TRAP_MC,            /* 18, Machine Check */
>         X86_TRAP_XF,            /* 19, SIMD Floating-Point Exception */
> +       X86_TRAP_VC = 29,       /* 29, VMM Communication Exception */
>         X86_TRAP_IRET = 32,     /* 32, IRET Exception */
>  };
>
> diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
> new file mode 100644
> index 000000000000..7edf2dfac71f
> --- /dev/null
> +++ b/arch/x86/kernel/sev-es-shared.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * AMD Encrypted Register State Support
> + *
> + * Author: Joerg Roedel <jroedel@suse.de>
> + *
> + * This file is not compiled stand-alone. It contains code shared
> + * between the pre-decompression boot code and the running Linux kernel
> + * and is included directly into both code-bases.
> + */
> +
> +/*
> + * Boot VC Handler - This is the first VC handler during boot, there is no GHCB
> + * page yet, so it only supports the MSR based communication with the
> + * hypervisor and only the CPUID exit-code.
> + */
> +void __init no_ghcb_vc_handler(struct pt_regs *regs)

Isn't there a second parameter: unsigned long error_code?

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler
  2020-02-11 13:52 ` [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler Joerg Roedel
@ 2020-02-11 22:25   ` Andy Lutomirski
  2020-02-12 11:44     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:25 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Install an exception handler for #VC exception that uses a GHCB. Also
> add the infrastructure for handling different exit-codes by decoding
> the instruction that caused the exception and error handling.
>

> diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
> index 8d13121a8cf2..02fb6f57128b 100644
> --- a/arch/x86/boot/compressed/sev-es.c
> +++ b/arch/x86/boot/compressed/sev-es.c
> @@ -8,12 +8,16 @@
>  #include <linux/kernel.h>
>
>  #include <asm/sev-es.h>
> +#include <asm/trap_defs.h>
>  #include <asm/msr-index.h>
>  #include <asm/ptrace.h>
>  #include <asm/svm.h>
>
>  #include "misc.h"
>
> +struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
> +struct ghcb *boot_ghcb;
> +
>  static inline u64 read_ghcb_msr(void)
>  {
>         unsigned long low, high;
> @@ -35,8 +39,95 @@ static inline void write_ghcb_msr(u64 val)
>                         "a"(low), "d" (high) : "memory");
>  }
>
> +static enum es_result es_fetch_insn_byte(struct es_em_ctxt *ctxt,
> +                                        unsigned int offset,
> +                                        char *buffer)
> +{
> +       char *rip = (char *)ctxt->regs->ip;
> +
> +       buffer[offset] = rip[offset];
> +
> +       return ES_OK;
> +}
> +
> +static enum es_result es_write_mem(struct es_em_ctxt *ctxt,
> +                                  void *dst, char *buf, size_t size)
> +{
> +       memcpy(dst, buf, size);
> +
> +       return ES_OK;
> +}
> +
> +static enum es_result es_read_mem(struct es_em_ctxt *ctxt,
> +                                 void *src, char *buf, size_t size)
> +{
> +       memcpy(buf, src, size);
> +
> +       return ES_OK;
> +}


What are all these abstractions for?

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions
  2020-02-11 13:52 ` [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions Joerg Roedel
@ 2020-02-11 22:28   ` Andy Lutomirski
  2020-02-12 11:49     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> Add support for decoding and handling #VC exceptions for IOIO events.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> [ jroedel@suse.de: Adapted code to #VC handling framework ]
> Co-developed-by: Joerg Roedel <jroedel@suse.de>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

It would be nice if this could reuse the existing in-kernel
instruction decoder.  Is there some reason it can't?

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 25/62] x86/head/64: Install boot GDT
  2020-02-11 13:52 ` [PATCH 25/62] x86/head/64: Install boot GDT Joerg Roedel
@ 2020-02-11 22:29   ` Andy Lutomirski
  2020-02-12 12:20     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:29 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Handling exceptions during boot requires a working GDT. The kernel GDT
> is not yet ready for use, so install a temporary boot GDT.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/kernel/head_64.S | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 4bbc770af632..5a3cde971cb7 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -72,6 +72,20 @@ SYM_CODE_START_NOALIGN(startup_64)
>         /* Set up the stack for verify_cpu(), similar to initial_stack below */
>         leaq    (__end_init_task - SIZEOF_PTREGS)(%rip), %rsp
>
> +       /* Setup boot GDT descriptor and load boot GDT */
> +       leaq    boot_gdt(%rip), %rax
> +       movq    %rax, boot_gdt_base(%rip)
> +       lgdt    boot_gdt_descr(%rip)
> +
> +       /* GDT loaded - switch to __KERNEL_CS so IRET works reliably */
> +       pushq   $__KERNEL_CS
> +       leaq    .Lon_kernel_cs(%rip), %rax
> +       pushq   %rax
> +       lretq
> +
> +.Lon_kernel_cs:
> +       UNWIND_HINT_EMPTY

I would suggest fixing at least SS as well.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-11 13:52 ` [PATCH 23/62] x86/idt: Move IDT to data segment Joerg Roedel
@ 2020-02-11 22:41   ` Andy Lutomirski
  2020-02-12 11:55     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> With SEV-ES, exception handling is needed very early, even before the
> kernel has cleared the bss segment. In order to prevent clearing the
> currently used IDT, move the IDT to the data segment.

Ugh.  At the very least this needs a comment in the code.

I had a patch to fix the kernel ELF loader to clear BSS, which would
fix this problem once and for all, but it didn't work due to the messy
way that the decompressor handles memory.  I never got around to
fixing this, sadly.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 30/62] x86/head/64: Move early exception dispatch to C code
  2020-02-11 13:52 ` [PATCH 30/62] x86/head/64: Move early exception dispatch to C code Joerg Roedel
@ 2020-02-11 22:44   ` Andy Lutomirski
  2020-02-12 12:39     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:44 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Move the assembly coded dispatch between page-faults and all other
> exceptions to C code to make it easier to maintain and extend.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/kernel/head64.c  | 20 ++++++++++++++++++++
>  arch/x86/kernel/head_64.S | 11 +----------
>  2 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 7cdfb7113811..d83c62ebaa85 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -36,6 +36,8 @@
>  #include <asm/microcode.h>
>  #include <asm/kasan.h>
>  #include <asm/fixmap.h>
> +#include <asm/extable.h>
> +#include <asm/trap_defs.h>
>
>  /*
>   * Manage page tables very early on.
> @@ -377,6 +379,24 @@ int __init early_make_pgtable(unsigned long address)
>         return __early_make_pgtable(address, pmd);
>  }
>
> +void __init early_exception(struct pt_regs *regs, int trapnr)
> +{
> +       unsigned long cr2;
> +       int r;

How about int (or bool) handled;  Or just if (!early_make_pgtable)
return;  This would also be nicer if you inverted the return value so
that true means "I handled it".

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler
  2020-02-11 13:52 ` [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler Joerg Roedel
@ 2020-02-11 22:46   ` Andy Lutomirski
  2020-02-12 15:16     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:46 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> The runtime handler needs a GHCB per CPU. Set them up and map them
> unencrypted.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/mem_encrypt.h |  2 ++
>  arch/x86/kernel/sev-es.c           | 25 ++++++++++++++++++++++++-
>  arch/x86/kernel/traps.c            |  3 +++
>  3 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 6f61bb93366a..d48e7be9bb49 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -48,6 +48,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size);
>  void __init mem_encrypt_init(void);
>  void __init mem_encrypt_free_decrypted_mem(void);
>
> +void __init encrypted_state_init_ghcbs(void);
>  bool sme_active(void);
>  bool sev_active(void);
>  bool sev_es_active(void);
> @@ -71,6 +72,7 @@ static inline void __init sme_early_init(void) { }
>  static inline void __init sme_encrypt_kernel(struct boot_params *bp) { }
>  static inline void __init sme_enable(struct boot_params *bp) { }
>
> +static inline void encrypted_state_init_ghcbs(void) { }
>  static inline bool sme_active(void) { return false; }
>  static inline bool sev_active(void) { return false; }
>  static inline bool sev_es_active(void) { return false; }
> diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
> index 0e0b28477627..9a5530857db7 100644
> --- a/arch/x86/kernel/sev-es.c
> +++ b/arch/x86/kernel/sev-es.c
> @@ -8,8 +8,11 @@
>   */
>
>  #include <linux/sched/debug.h> /* For show_regs() */
> -#include <linux/kernel.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/mem_encrypt.h>
>  #include <linux/printk.h>
> +#include <linux/set_memory.h>
> +#include <linux/kernel.h>
>  #include <linux/mm.h>
>
>  #include <asm/trap_defs.h>
> @@ -28,6 +31,9 @@ struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
>   */
>  struct ghcb __initdata *boot_ghcb;
>
> +/* Runtime GHCBs */
> +static DEFINE_PER_CPU_DECRYPTED(struct ghcb, ghcb_page) __aligned(PAGE_SIZE);

Hmm.  This is a largeish amount of memory on large non-SEV-ES systems.
Maybe store a pointer instead?  It would be even better if it could be
DEFINE_PER_CPU like this but be discarded if we don't need it, but I
don't think we have the infrastructure for that.

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions from user-space
  2020-02-11 13:52 ` [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions " Joerg Roedel
@ 2020-02-11 22:47   ` Andy Lutomirski
  2020-02-12 13:16     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Send SIGBUS to the user-space process that caused the #VC exception
> instead of killing the machine. Also ratelimit the error messages so
> that user-space can't flood the kernel log.

What would cause this?  CPUID?  Something else?

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 62/62] x86/sev-es: Add NMI state tracking
  2020-02-11 13:52 ` [PATCH 62/62] x86/sev-es: Add NMI state tracking Joerg Roedel
@ 2020-02-11 22:50   ` Andy Lutomirski
  2020-02-12 13:56     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-11 22:50 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Keep NMI state in SEV-ES code so the kernel can re-enable NMIs for the
> vCPU when it reaches IRET.

This patch is overcomplicated IMO.  Just do the magic incantation in C
from do_nmi or from here:

        /*
         * For ease of testing, unmask NMIs right away.  Disabled by
         * default because IRET is very expensive.

If you do the latter, you'll need to handle the case where the NMI
came from user mode.

The ideal solution is do_nmi, I think.

if (static_cpu_has(X86_BUG_AMD_FORGOT_ABOUT_NMI))
  sev_es_unmask_nmi();

Feel free to use X86_FEATURE_SEV_ES instead :)

--Andu

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 46/62] x86/sev-es: Handle INVD Events
  2020-02-11 13:52 ` [PATCH 46/62] x86/sev-es: Handle INVD Events Joerg Roedel
@ 2020-02-12  0:12   ` Andy Lutomirski
  2020-02-12 15:36     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12  0:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel



> On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote:
> 
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Implement a handler for #VC exceptions caused by INVD instructions.

Uh, what?  Surely the #VC code can have a catch-all OOPS path for things like this. Linux should never ever do INVD.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 50/62] x86/sev-es: Handle VMMCALL Events
  2020-02-11 13:52 ` [PATCH 50/62] x86/sev-es: Handle VMMCALL Events Joerg Roedel
@ 2020-02-12  0:14   ` Andy Lutomirski
  2020-02-12 13:22     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12  0:14 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel



> On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote:
> 
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Implement a handler for #VC exceptions caused by VMMCALL instructions.
> This patch is only a starting point, VMMCALL emulation under SEV-ES
> needs further hypervisor-specific changes to provide additional state.
> 

How about we just don’t do VMMCALL if we’re a SEV-ES guest?  Otherwise we add thousands of cycles of extra latency for no good reason.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
                   ` (62 preceding siblings ...)
  2020-02-11 14:50 ` [RFC PATCH 00/62] Linux as SEV-ES Guest Support Peter Zijlstra
@ 2020-02-12  3:48 ` Andy Lutomirski
  2020-02-12 13:59   ` Joerg Roedel
  63 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12  3:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel



> On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote:

> 
> 
>    * Putting some NMI-load on the guest will make it crash usually
>      within a minute

Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to ask for help. Some time between when you start filling it out and when you do VMGEXIT, you get NMI. If the NMI does
its own GHCB access [0], it will clobber the outer #VC’a state, resulting in a failure when VMGEXIT happens. There’s a related failure mode if the NMI is after the VMGEXIT but before the result is read.

I suspect you can fix this by saving the GHCB at the beginning of do_nmi and restoring it at the end. This has the major caveat that it will not work if do_nmi comes from user mode and schedules, but I don’t believe this can happen.

[0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that this happens.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure
  2020-02-11 22:18   ` Andy Lutomirski
@ 2020-02-12 11:19     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 11:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

Hi Andy,

thanks a lot for your valuable reviews!

On Tue, Feb 11, 2020 at 02:18:52PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> > +       entry.offset_low    = (u16)(address & 0xffff);
> > +       entry.segment       = __KERNEL_CS;
> > +       entry.bits.type     = GATE_TRAP;
> 
> ^^^
> 
> I realize we're not running a real kernel here, but GATE_TRAP is
> madness.  Please use GATE_INTERRUPT.

Changed that.

> > +       /* Build pt_regs */
> > +       .if \error_code == 0
> > +       pushq   $0
> > +       .endif
> 
> cld

Added.

> > +       popq    %rdi
> 
> if error_code?

The code above pushes a $0 for exceptions without an error code, so it
needs to be removed unconditionally.

> > +
> > +       /* Remove error code and return */
> > +       addq    $8, %rsp
> > +
> > +       /*
> > +        * Make sure we return to __KERNEL_CS - the CS selector on
> > +        * the IRET frame might still be from an old BIOS GDT
> > +        */
> > +       movq    $__KERNEL_CS, 8(%rsp)
> > +
> 
> If this actually happens, you have a major bug.  Please sanitize all
> the segment registers after installing the GDT rather than hacking
> around it here.

Okay, will change that. I thought I could safe some instructions in the
head_64.S code, but you are right that its better to setup a defined
environment first.


Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler
  2020-02-11 22:23   ` Andy Lutomirski
@ 2020-02-12 11:38     ` Joerg Roedel
  2020-02-12 16:22       ` Andy Lutomirski
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 11:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:23:22PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> > +void __init no_ghcb_vc_handler(struct pt_regs *regs)
> 
> Isn't there a second parameter: unsigned long error_code?

No, the function gets the error-code from regs->orig_ax. This particular
function only needs to check for error_code == SVM_EXIT_CPUID, as that
is the only one supported when there is no GHCB.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler
  2020-02-11 22:25   ` Andy Lutomirski
@ 2020-02-12 11:44     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 11:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:25:49PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> >
> > From: Joerg Roedel <jroedel@suse.de>
> >
> > Install an exception handler for #VC exception that uses a GHCB. Also
> > add the infrastructure for handling different exit-codes by decoding
> > the instruction that caused the exception and error handling.
> >
> 
> > diff --git a/arch/x86/boot/compressed/sev-es.c b/arch/x86/boot/compressed/sev-es.c
> > index 8d13121a8cf2..02fb6f57128b 100644
> > --- a/arch/x86/boot/compressed/sev-es.c
> > +++ b/arch/x86/boot/compressed/sev-es.c
> > @@ -8,12 +8,16 @@
> >  #include <linux/kernel.h>
> >
> >  #include <asm/sev-es.h>
> > +#include <asm/trap_defs.h>
> >  #include <asm/msr-index.h>
> >  #include <asm/ptrace.h>
> >  #include <asm/svm.h>
> >
> >  #include "misc.h"
> >
> > +struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
> > +struct ghcb *boot_ghcb;
> > +
> >  static inline u64 read_ghcb_msr(void)
> >  {
> >         unsigned long low, high;
> > @@ -35,8 +39,95 @@ static inline void write_ghcb_msr(u64 val)
> >                         "a"(low), "d" (high) : "memory");
> >  }
> >
> > +static enum es_result es_fetch_insn_byte(struct es_em_ctxt *ctxt,
> > +                                        unsigned int offset,
> > +                                        char *buffer)
> > +{
> > +       char *rip = (char *)ctxt->regs->ip;
> > +
> > +       buffer[offset] = rip[offset];
> > +
> > +       return ES_OK;
> > +}
> > +
> > +static enum es_result es_write_mem(struct es_em_ctxt *ctxt,
> > +                                  void *dst, char *buf, size_t size)
> > +{
> > +       memcpy(dst, buf, size);
> > +
> > +       return ES_OK;
> > +}
> > +
> > +static enum es_result es_read_mem(struct es_em_ctxt *ctxt,
> > +                                 void *src, char *buf, size_t size)
> > +{
> > +       memcpy(buf, src, size);
> > +
> > +       return ES_OK;
> > +}
> 
> 
> What are all these abstractions for?

They are needed for the code in arch/x86/kernel/sev-es-shared.c. This
file is used in the pre-decompression boot code and in the running
kernels SEV-ES support.

The running kernel needs these abstraction because it will get #VC
exceptions from user-space and MMIO exits touching user-space addresses.
These functions will implement the necessary security checks.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions
  2020-02-11 22:28   ` Andy Lutomirski
@ 2020-02-12 11:49     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 11:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:28:06PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> It would be nice if this could reuse the existing in-kernel
> instruction decoder.  Is there some reason it can't?

It does, see patch 5, which makes the inat-tables generator script
suitable for pre-decompression boot code. Actually every
instruction-caused #VC exception will decode the instruction to get its
length.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-11 22:41   ` Andy Lutomirski
@ 2020-02-12 11:55     ` Joerg Roedel
  2020-02-12 16:23       ` Andy Lutomirski
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 11:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:41:25PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> >
> > From: Joerg Roedel <jroedel@suse.de>
> >
> > With SEV-ES, exception handling is needed very early, even before the
> > kernel has cleared the bss segment. In order to prevent clearing the
> > currently used IDT, move the IDT to the data segment.
> 
> Ugh.  At the very least this needs a comment in the code.

Yes, right, added a comment for that.

> I had a patch to fix the kernel ELF loader to clear BSS, which would
> fix this problem once and for all, but it didn't work due to the messy
> way that the decompressor handles memory.  I never got around to
> fixing this, sadly.

Aren't there other ways of booting (Xen-PV?) which don't use the kernel
ELF loader?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 25/62] x86/head/64: Install boot GDT
  2020-02-11 22:29   ` Andy Lutomirski
@ 2020-02-12 12:20     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 12:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:29:24PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> > +       /* GDT loaded - switch to __KERNEL_CS so IRET works reliably */
> > +       pushq   $__KERNEL_CS
> > +       leaq    .Lon_kernel_cs(%rip), %rax
> > +       pushq   %rax
> > +       lretq
> > +
> > +.Lon_kernel_cs:
> > +       UNWIND_HINT_EMPTY
> 
> I would suggest fixing at least SS as well.

You are right, that is cleaner. Initialized DS, ES, and SS to
__KERNEL_DS here too.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 30/62] x86/head/64: Move early exception dispatch to C code
  2020-02-11 22:44   ` Andy Lutomirski
@ 2020-02-12 12:39     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 12:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:44:45PM -0800, Andy Lutomirski wrote:
> How about int (or bool) handled;  Or just if (!early_make_pgtable)
> return;  This would also be nicer if you inverted the return value so
> that true means "I handled it".

Okay, makes sense. Changed the return value of early_make_pgtable() to bool and
this function to:

	void __init early_exception(struct pt_regs *regs, int trapnr)
	{
		if (trapnr == X86_TRAP_PF &&
		    early_make_pgtable(native_read_cr2()))
				return;

		early_fixup_exception(regs, trapnr);
	}

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions from user-space
  2020-02-11 22:47   ` Andy Lutomirski
@ 2020-02-12 13:16     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 13:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:47:05PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> >
> > From: Joerg Roedel <jroedel@suse.de>
> >
> > Send SIGBUS to the user-space process that caused the #VC exception
> > instead of killing the machine. Also ratelimit the error messages so
> > that user-space can't flood the kernel log.
> 
> What would cause this?  CPUID?  Something else?

Yes, CPUID, RDTSC(P) and, most importantly, user-space mapping some IO
space an accessing it, causing MMIO #VC exceptions.

Especially the MMIO case has so many implications that it will not be
supported at the moment. Imagine for example MMIO accesses by 32bit
user-space with non-standard, non-zero based code and data segments. Or
user-space changing the instruction bytes between when the #VC exception
is raised and when the handler parses the instruction. Lots of checks
are needed to make this work securely, and the complexity of this is not
worth it at this time.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 50/62] x86/sev-es: Handle VMMCALL Events
  2020-02-12  0:14   ` Andy Lutomirski
@ 2020-02-12 13:22     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 13:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 04:14:53PM -0800, Andy Lutomirski wrote:
> 
> How about we just don’t do VMMCALL if we’re a SEV-ES guest?  Otherwise
> we add thousands of cycles of extra latency for no good reason.

True, but I left that as a future optimization for now, given the size
the patch-set already has. The idea is to add an abstraction around
VMMCALL for the support code of the various hypervisors and just do a
VMGEXIT in that wrapper when in an SEV-ES guest. But again, that is a
separate patch-set.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-11 22:12     ` Andy Lutomirski
@ 2020-02-12 13:54       ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 13:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, X86 ML, H. Peter Anvin, Dave Hansen,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:12:04PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 7:43 AM Joerg Roedel <joro@8bytes.org> wrote:
> >
> > On Tue, Feb 11, 2020 at 03:50:08PM +0100, Peter Zijlstra wrote:
> >
> > > Oh gawd; so instead of improving the whole NMI situation, AMD went and
> > > made it worse still ?!?
> >
> > Well, depends on how you want to see it. Under SEV-ES an IRET will not
> > re-open the NMI window, but the guest has to tell the hypervisor
> > explicitly when it is ready to receive new NMIs via the NMI_COMPLETE
> > message.  NMIs stay blocked even when an exception happens in the
> > handler, so this could also be seen as a (slight) improvement.
> >
> 
> I don't get it.  VT-x has a VMCS bit "Interruptibility
> state"."Blocking by NMI" that tracks the NMI masking state.  Would it
> have killed AMD to solve the problem they same way to retain
> architectural behavior inside a SEV-ES VM?

No, but it wouldn't solve the problem. Inside an NMI handler there could
be #VC exceptions, which do an IRET on their own. Hardware NMI state
tracking would re-enable NMIs when the #VC exception returns to the NMI
handler, which is not what every OS is comfortable with.

Yes, there are many ways to hack around this. The GHCB spec mentions the
single-stepping-over-IRET idea, which I also prototyped in a previous
version of this patch-set. I gave up on it when I discovered that NMIs
that happen when executing in kernel-mode but on entry stack will cause
the #VC handler to call into C code while on entry stack, because
neither paranoid_entry nor error_entry handle the
from-kernel-with-entry-strack case. This could of course also be fixed,
but further complicates things already complicated enough by the PTI
changes and nested-NMI support.

My patch for using the NMI_COMPLETE message is certainly not perfect and
needs changes, but having the message specified in the protocol gives
the guest the best flexibility in deciding when it is ready to receive
new NMIs, imho.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 62/62] x86/sev-es: Add NMI state tracking
  2020-02-11 22:50   ` Andy Lutomirski
@ 2020-02-12 13:56     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 13:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:50:29PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> This patch is overcomplicated IMO.  Just do the magic incantation in C
> from do_nmi or from here:
> 
>         /*
>          * For ease of testing, unmask NMIs right away.  Disabled by
>          * default because IRET is very expensive.
> 
> If you do the latter, you'll need to handle the case where the NMI
> came from user mode.
> 
> The ideal solution is do_nmi, I think.
> 
> if (static_cpu_has(X86_BUG_AMD_FORGOT_ABOUT_NMI))
>   sev_es_unmask_nmi();
> 
> Feel free to use X86_FEATURE_SEV_ES instead :)

Yeah, I also had that implemented once, but then changed it because I
thought that nested NMIs do not necessarily call into do_nmi(), which
would cause NMIs to stay blocked forever. I need to read through the NMI
entry code again to check if that can really happen.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [RFC PATCH 00/62] Linux as SEV-ES Guest Support
  2020-02-12  3:48 ` Andy Lutomirski
@ 2020-02-12 13:59   ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 13:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 07:48:12PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote:
> 
> > 
> > 
> >    * Putting some NMI-load on the guest will make it crash usually
> >      within a minute
> 
> Suppose you do CPUID or some MMIO and get #VC. You fill in the GHCB to
> ask for help. Some time between when you start filling it out and when
> you do VMGEXIT, you get NMI. If the NMI does its own GHCB access [0],
> it will clobber the outer #VC’a state, resulting in a failure when
> VMGEXIT happens. There’s a related failure mode if the NMI is after
> the VMGEXIT but before the result is read.
> 
> I suspect you can fix this by saving the GHCB at the beginning of
> do_nmi and restoring it at the end. This has the major caveat that it
> will not work if do_nmi comes from user mode and schedules, but I
> don’t believe this can happen.
> 
> [0] Due to the NMI_COMPLETE catastrophe, there is a 100% chance that
> this happens.

Very true, thank you! You probably saved me a few hours of debugging
this further :)
I will implement better handling for nested #VC exceptions, which
hopefully solves the NMI crashes.

Thanks again,

       Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler
  2020-02-11 22:46   ` Andy Lutomirski
@ 2020-02-12 15:16     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 15:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, H. Peter Anvin, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:46:11PM -0800, Andy Lutomirski wrote:
> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
> > +/* Runtime GHCBs */
> > +static DEFINE_PER_CPU_DECRYPTED(struct ghcb, ghcb_page) __aligned(PAGE_SIZE);
> 
> Hmm.  This is a largeish amount of memory on large non-SEV-ES systems.
> Maybe store a pointer instead?  It would be even better if it could be
> DEFINE_PER_CPU like this but be discarded if we don't need it, but I
> don't think we have the infrastructure for that.

Yeah, discarding is not easily possible right now, but I changed it to
only store a pointer and allocating the pages only when running as an
SEV-ES guest.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 46/62] x86/sev-es: Handle INVD Events
  2020-02-12  0:12   ` Andy Lutomirski
@ 2020-02-12 15:36     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-12 15:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 04:12:19PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Feb 11, 2020, at 5:53 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > 
> > From: Tom Lendacky <thomas.lendacky@amd.com>
> > 
> > Implement a handler for #VC exceptions caused by INVD instructions.
> 
> Uh, what?  Surely the #VC code can have a catch-all OOPS path for things like this. Linux should never ever do INVD.

Right, its hard to come up with a valid use-case for INVD in the Linux
kernel. I changed the patch to mark INVD as unsupported and print an
error message.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler
  2020-02-12 11:38     ` Joerg Roedel
@ 2020-02-12 16:22       ` Andy Lutomirski
  0 siblings, 0 replies; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12 16:22 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel



> On Feb 12, 2020, at 3:38 AM, Joerg Roedel <joro@8bytes.org> wrote:
> 
> On Tue, Feb 11, 2020 at 02:23:22PM -0800, Andy Lutomirski wrote:
>>> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>>> +void __init no_ghcb_vc_handler(struct pt_regs *regs)
>> 
>> Isn't there a second parameter: unsigned long error_code?
> 
> No, the function gets the error-code from regs->orig_ax. This particular
> function only needs to check for error_code == SVM_EXIT_CPUID, as that
> is the only one supported when there is no GHCB.
> 

Hmm. It might be nice to use the same signature for early handlers as for normal ones.

> Regards,
> 
>    Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-12 11:55     ` Joerg Roedel
@ 2020-02-12 16:23       ` Andy Lutomirski
  2020-02-12 16:28         ` Jürgen Groß
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12 16:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel



> On Feb 12, 2020, at 3:55 AM, Joerg Roedel <joro@8bytes.org> wrote:
> 
> On Tue, Feb 11, 2020 at 02:41:25PM -0800, Andy Lutomirski wrote:
>>> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>>> 
>>> From: Joerg Roedel <jroedel@suse.de>
>>> 
>>> With SEV-ES, exception handling is needed very early, even before the
>>> kernel has cleared the bss segment. In order to prevent clearing the
>>> currently used IDT, move the IDT to the data segment.
>> 
>> Ugh.  At the very least this needs a comment in the code.
> 
> Yes, right, added a comment for that.
> 
>> I had a patch to fix the kernel ELF loader to clear BSS, which would
>> fix this problem once and for all, but it didn't work due to the messy
>> way that the decompressor handles memory.  I never got around to
>> fixing this, sadly.
> 
> Aren't there other ways of booting (Xen-PV?) which don't use the kernel
> ELF loader?

Dunno. I would hope the any sane loader would clear BSS before executing anything. This isn’t currently the case, though. Oh well.

> 
> Regards,
> 
>    Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-12 16:23       ` Andy Lutomirski
@ 2020-02-12 16:28         ` Jürgen Groß
  2020-02-19 10:42           ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Jürgen Groß @ 2020-02-12 16:28 UTC (permalink / raw)
  To: Andy Lutomirski, Joerg Roedel
  Cc: Andy Lutomirski, X86 ML, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Kees Cook, LKML, kvm list, Linux Virtualization,
	Joerg Roedel

On 12.02.20 17:23, Andy Lutomirski wrote:
> 
> 
>> On Feb 12, 2020, at 3:55 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>
>> On Tue, Feb 11, 2020 at 02:41:25PM -0800, Andy Lutomirski wrote:
>>>> On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>>>>
>>>> From: Joerg Roedel <jroedel@suse.de>
>>>>
>>>> With SEV-ES, exception handling is needed very early, even before the
>>>> kernel has cleared the bss segment. In order to prevent clearing the
>>>> currently used IDT, move the IDT to the data segment.
>>>
>>> Ugh.  At the very least this needs a comment in the code.
>>
>> Yes, right, added a comment for that.
>>
>>> I had a patch to fix the kernel ELF loader to clear BSS, which would
>>> fix this problem once and for all, but it didn't work due to the messy
>>> way that the decompressor handles memory.  I never got around to
>>> fixing this, sadly.
>>
>> Aren't there other ways of booting (Xen-PV?) which don't use the kernel
>> ELF loader?
> 
> Dunno. I would hope the any sane loader would clear BSS before executing anything. This isn’t currently the case, though. Oh well.

Xen-PV is clearing BSS as the very first action.


Juergen

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space
  2020-02-11 13:52 ` [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space Joerg Roedel
@ 2020-02-12 21:42   ` Andy Lutomirski
  2020-03-13  9:12     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Andy Lutomirski @ 2020-02-12 21:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: X86 ML, H. Peter Anvin, Andy Lutomirski, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On Tue, Feb 11, 2020 at 5:53 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> When a #VC exception is triggered by user-space the instruction
> decoder needs to read the instruction bytes from user addresses.
> Enhance es_fetch_insn_byte() to safely fetch kernel and user
> instruction bytes.

I realize that this is a somewhat arbitrary point in the series to
complain about this, but: the kernel already has infrastructure to
decode and fix up an instruction-based exception.  See
fixup_umip_exception().  Please refactor code so that you can share
the same infrastructure rather than creating an entirely new thing.

FWIW, the fixup_umip_exception() code seems to have much more robust
segment handling than yours :)

--Andy

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature
  2020-02-11 13:51 ` [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature Joerg Roedel
@ 2020-02-13  6:51   ` Borislav Petkov
  0 siblings, 0 replies; 109+ messages in thread
From: Borislav Petkov @ 2020-02-13  6:51 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Tue, Feb 11, 2020 at 02:51:57PM +0100, Joerg Roedel wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> Add CPU feature detection for Secure Encrypted Virtualization with
> Encrypted State. This feature enhances SEV by also encrypting the
> guest register state, making it in-accessible to the hypervisor.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/cpufeatures.h | 1 +
>  arch/x86/kernel/cpu/amd.c          | 4 +++-
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index f3327cb56edf..26e4ee209f7b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -285,6 +285,7 @@
>  #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
>  #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
> +#define X86_FEATURE_SEV_ES		(11*32+ 6) /* AMD Secure Encrypted Virtualization - Encrypted State */

Let's put this in word 8 which is for virt flags. X86_FEATURE_SEV could
go there too but that should be a separate patch anyway, if at all.

>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index ac83a0fef628..aad2223862ef 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -580,7 +580,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  	 *	      If BIOS has not enabled SME then don't advertise the
>  	 *	      SME feature (set in scattered.c).
>  	 *   For SEV: If BIOS has not enabled SEV then don't advertise the
> -	 *            SEV feature (set in scattered.c).
> +	 *            SEV and SEV_ES feature (set in scattered.c).
>  	 *
>  	 *   In all cases, since support for SME and SEV requires long mode,
>  	 *   don't advertise the feature under CONFIG_X86_32.
> @@ -611,6 +611,8 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
>  		setup_clear_cpu_cap(X86_FEATURE_SME);
>  clear_sev:
>  		setup_clear_cpu_cap(X86_FEATURE_SEV);
> +		setup_clear_cpu_cap(X86_FEATURE_SEV);

X86_FEATURE_SEV twice? Because once didn't stick?

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 41/62] x86/sev-es: Handle MSR events
  2020-02-11 13:52 ` [PATCH 41/62] x86/sev-es: Handle MSR events Joerg Roedel
@ 2020-02-13 15:45   ` Dave Hansen
  2020-02-14  7:23     ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Dave Hansen @ 2020-02-13 15:45 UTC (permalink / raw)
  To: Joerg Roedel, x86
  Cc: hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On 2/11/20 5:52 AM, Joerg Roedel wrote:
> Implement a handler for #VC exceptions caused by RDMSR/WRMSR
> instructions.

As a general comment on all of these event handlers: Why do we bother
having the hypercalls in the interrupt handler as opposed to just
calling them directly.  What you have is:

	wrmsr()
	-> #VC exception
	   hcall()

But we could make our rd/wrmsr() wrappers just do:

	if (running_on_sev_es())
		hcall(HCALL_MSR_WHATEVER...)
	else
		wrmsr()

and then we don't have any of the nastiness of exception handling.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 41/62] x86/sev-es: Handle MSR events
  2020-02-13 15:45   ` Dave Hansen
@ 2020-02-14  7:23     ` Joerg Roedel
  2020-02-14 16:59       ` Dave Hansen
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-14  7:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Thu, Feb 13, 2020 at 07:45:00AM -0800, Dave Hansen wrote:
> On 2/11/20 5:52 AM, Joerg Roedel wrote:
> > Implement a handler for #VC exceptions caused by RDMSR/WRMSR
> > instructions.
> 
> As a general comment on all of these event handlers: Why do we bother
> having the hypercalls in the interrupt handler as opposed to just
> calling them directly.  What you have is:
> 
> 	wrmsr()
> 	-> #VC exception
> 	   hcall()
> 
> But we could make our rd/wrmsr() wrappers just do:
> 
> 	if (running_on_sev_es())
> 		hcall(HCALL_MSR_WHATEVER...)
> 	else
> 		wrmsr()
> 
> and then we don't have any of the nastiness of exception handling.

Yes, investigating this is on the list for future optimizations (besides
caching CPUID results). My idea is to use alternatives patching for
this. But the exception handling is needed anyway because #VC
exceptions happen very early already, basically the first thing after
setting up a stack is calling verify_cpu(), which uses CPUID.
The other reason is that things like MMIO and IOIO instructions can't be
easily patched by alternatives. Those would work with the runtime
checking you showed above, though.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 41/62] x86/sev-es: Handle MSR events
  2020-02-14  7:23     ` Joerg Roedel
@ 2020-02-14 16:59       ` Dave Hansen
  2020-02-15 12:45         ` Joerg Roedel
  0 siblings, 1 reply; 109+ messages in thread
From: Dave Hansen @ 2020-02-14 16:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On 2/13/20 11:23 PM, Joerg Roedel wrote:
> Yes, investigating this is on the list for future optimizations (besides
> caching CPUID results). My idea is to use alternatives patching for
> this. But the exception handling is needed anyway because #VC
> exceptions happen very early already, basically the first thing after
> setting up a stack is calling verify_cpu(), which uses CPUID.

Ahh, bummer.  How does a guest know that it's running under SEV-ES?
What's the enumeration mechanism if CPUID doesn't "work"?

> The other reason is that things like MMIO and IOIO instructions can't be
> easily patched by alternatives. Those would work with the runtime
> checking you showed above, though.

Is there a reason we can't make a rule that you *must* do MMIO through
an accessor function so we *can* patch them?  I know random drivers
might break the rule, but are SEV-ES guests going to be running random
drivers?  I would think that they mostly if not all want to use virtio.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure
  2020-02-11 13:52 ` [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure Joerg Roedel
  2020-02-11 22:18   ` Andy Lutomirski
@ 2020-02-14 19:40   ` Andi Kleen
  2020-02-15 12:32     ` Joerg Roedel
  1 sibling, 1 reply; 109+ messages in thread
From: Andi Kleen @ 2020-02-14 19:40 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

Joerg Roedel <joro@8bytes.org> writes:
> +	addq    $8, %rsp
> +
> +	/*
> +	 * Make sure we return to __KERNEL_CS - the CS selector on
> +	 * the IRET frame might still be from an old BIOS GDT
> +	 */
> +	movq	$__KERNEL_CS, 8(%rsp)

This doesn't make sense. Either it's running on the correct CS
before the exception or not. Likely there's some other problem
here that you patched over with this hack.

-Andi

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure
  2020-02-14 19:40   ` Andi Kleen
@ 2020-02-15 12:32     ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-15 12:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Fri, Feb 14, 2020 at 11:40:36AM -0800, Andi Kleen wrote:
> Joerg Roedel <joro@8bytes.org> writes:
> > +	addq    $8, %rsp
> > +
> > +	/*
> > +	 * Make sure we return to __KERNEL_CS - the CS selector on
> > +	 * the IRET frame might still be from an old BIOS GDT
> > +	 */
> > +	movq	$__KERNEL_CS, 8(%rsp)
> 
> This doesn't make sense. Either it's running on the correct CS
> before the exception or not. Likely there's some other problem
> here that you patched over with this hack.

It is actually a well-known situation and not some other problem. The
boot-code loaded a new GDT and IDT, but did not reload CS with a far
jump/ret/call. The CS value loaded is undefined and comes from the UEFI
BIOS. When an exception is raised, this old CS value is stored in the
IRET frame, and when IRET is executed the processor loads an undefined
CS value, which causes a triple fault with the current IDT setup.

The hack in this patch just fixes the IRET frame up so that it will
return to the correct CS. The reason for this hack was actually to safe
some instructions in the boot-path, because the space is limited there
between the defined offsets of the various entry points.

I removed this hack meanwhile and added a separate function which
reloads CS, DS, SS and ES and which is called from the boot-path, so
that there is no problem with the offsets.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 41/62] x86/sev-es: Handle MSR events
  2020-02-14 16:59       ` Dave Hansen
@ 2020-02-15 12:45         ` Joerg Roedel
  0 siblings, 0 replies; 109+ messages in thread
From: Joerg Roedel @ 2020-02-15 12:45 UTC (permalink / raw)
  To: Dave Hansen
  Cc: x86, hpa, Andy Lutomirski, Dave Hansen, Peter Zijlstra,
	Thomas Hellstrom, Jiri Slaby, Dan Williams, Tom Lendacky,
	Juergen Gross, Kees Cook, linux-kernel, kvm, virtualization,
	Joerg Roedel

On Fri, Feb 14, 2020 at 08:59:39AM -0800, Dave Hansen wrote:
> On 2/13/20 11:23 PM, Joerg Roedel wrote:
> > Yes, investigating this is on the list for future optimizations (besides
> > caching CPUID results). My idea is to use alternatives patching for
> > this. But the exception handling is needed anyway because #VC
> > exceptions happen very early already, basically the first thing after
> > setting up a stack is calling verify_cpu(), which uses CPUID.
> 
> Ahh, bummer.  How does a guest know that it's running under SEV-ES?
> What's the enumeration mechanism if CPUID doesn't "work"?

There are two ways a guest can find out:

	1) Read the SEV_STATUS_MSR and check for the SEV-ES bit
	2) If a #VC exception is raised it also knows it runs as an
	   SEV-ES guest

This patch-set implements both ways at the appropriate stages of the
boot process. Very early it just installs a #VC handler without checking
whether it is running under SEV-ES and handles the exceptions when they
are raised.

Later in the boot process it also reads the SEV_STATUS_MSR and sets a
cpu_feature flag to do alternative patching based on its value.

> > The other reason is that things like MMIO and IOIO instructions can't be
> > easily patched by alternatives. Those would work with the runtime
> > checking you showed above, though.
> 
> Is there a reason we can't make a rule that you *must* do MMIO through
> an accessor function so we *can* patch them?  I know random drivers
> might break the rule, but are SEV-ES guests going to be running random
> drivers?  I would think that they mostly if not all want to use
> virtio.

Yeah, there are already defined accessor functions for MMIO, like
read/write[bwlq] and memcpy_toio/memcpy_fromio. It is probably worth
testing what performance overhead is involved in overloading these to
call directly into the paravirt path when SEV-ES is enabled. With
alternatives patching it would still add a couple of NOPS for the
non-SEV-ES case.

But all that does not remove the need for the #VC exception handler, as
#VC exceptions can also be triggered by user-space, and the instruction
emulation for MMIO will be needed to allow MMIO in user-space (the
patch-set currently does not allow that, but it could be needed in the
future).

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-12 16:28         ` Jürgen Groß
@ 2020-02-19 10:42           ` Joerg Roedel
  2020-02-19 10:47             ` Jürgen Groß
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-02-19 10:42 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, H. Peter Anvin,
	Dave Hansen, Peter Zijlstra, Thomas Hellstrom, Jiri Slaby,
	Dan Williams, Tom Lendacky, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

Hi Jürgen,

On Wed, Feb 12, 2020 at 05:28:21PM +0100, Jürgen Groß wrote:
> Xen-PV is clearing BSS as the very first action.

In the kernel image? Or in the ELF loader before jumping to the kernel
image?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 23/62] x86/idt: Move IDT to data segment
  2020-02-19 10:42           ` Joerg Roedel
@ 2020-02-19 10:47             ` Jürgen Groß
  0 siblings, 0 replies; 109+ messages in thread
From: Jürgen Groß @ 2020-02-19 10:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Andy Lutomirski, X86 ML, H. Peter Anvin,
	Dave Hansen, Peter Zijlstra, Thomas Hellstrom, Jiri Slaby,
	Dan Williams, Tom Lendacky, Kees Cook, LKML, kvm list,
	Linux Virtualization, Joerg Roedel

On 19.02.20 11:42, Joerg Roedel wrote:
> Hi Jürgen,
> 
> On Wed, Feb 12, 2020 at 05:28:21PM +0100, Jürgen Groß wrote:
>> Xen-PV is clearing BSS as the very first action.
> 
> In the kernel image? Or in the ELF loader before jumping to the kernel
> image?

In the kernel image.

See arch/x86/xen/xen-head.S - startup_xen is the entry point of the
kernel.


Juergen

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space
  2020-02-12 21:42   ` Andy Lutomirski
@ 2020-03-13  9:12     ` Joerg Roedel
  2020-03-17 21:34       ` Andy Lutomirski
  0 siblings, 1 reply; 109+ messages in thread
From: Joerg Roedel @ 2020-03-13  9:12 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, X86 ML, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Thomas Hellstrom, Jiri Slaby, Dan Williams,
	Tom Lendacky, Juergen Gross, Kees Cook, LKML, kvm list,
	Linux Virtualization

On Wed, Feb 12, 2020 at 01:42:48PM -0800, Andy Lutomirski wrote:
> I realize that this is a somewhat arbitrary point in the series to
> complain about this, but: the kernel already has infrastructure to
> decode and fix up an instruction-based exception.  See
> fixup_umip_exception().  Please refactor code so that you can share
> the same infrastructure rather than creating an entirely new thing.

Okay, but 'infrastructure' is a bold word for the call path down
fixup_umip_exception(). It uses the in-kernel instruction decoder, which
I already use in my patch-set. But I agree that some code in this
patch-set is duplicated and already present in the instruction decoder,
and that fixup_umip_exception() has more robust instruction decoding.

I factor the instruction decoding part out and make is usable for the
#VC handler too and remove the code that is already present in the
instruction decoder.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space
  2020-03-13  9:12     ` Joerg Roedel
@ 2020-03-17 21:34       ` Andy Lutomirski
  0 siblings, 0 replies; 109+ messages in thread
From: Andy Lutomirski @ 2020-03-17 21:34 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, X86 ML, H. Peter Anvin,
	Dave Hansen, Peter Zijlstra, Thomas Hellstrom, Jiri Slaby,
	Dan Williams, Tom Lendacky, Juergen Gross, Kees Cook, LKML,
	kvm list, Linux Virtualization

On Fri, Mar 13, 2020 at 2:12 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Wed, Feb 12, 2020 at 01:42:48PM -0800, Andy Lutomirski wrote:
> > I realize that this is a somewhat arbitrary point in the series to
> > complain about this, but: the kernel already has infrastructure to
> > decode and fix up an instruction-based exception.  See
> > fixup_umip_exception().  Please refactor code so that you can share
> > the same infrastructure rather than creating an entirely new thing.
>
> Okay, but 'infrastructure' is a bold word for the call path down
> fixup_umip_exception().

I won't argue with that.

> It uses the in-kernel instruction decoder, which
> I already use in my patch-set. But I agree that some code in this
> patch-set is duplicated and already present in the instruction decoder,
> and that fixup_umip_exception() has more robust instruction decoding.
>
> I factor the instruction decoding part out and make is usable for the
> #VC handler too and remove the code that is already present in the
> instruction decoder.

Thanks!

>
> Regards,
>
>         Joerg
>

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2020-03-17 21:34 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 13:51 [RFC PATCH 00/62] Linux as SEV-ES Guest Support Joerg Roedel
2020-02-11 13:51 ` [PATCH 01/62] KVM: SVM: Add GHCB definitions Joerg Roedel
2020-02-11 13:51 ` [PATCH 02/62] KVM: SVM: Add GHCB Accessor functions Joerg Roedel
2020-02-11 13:51 ` [PATCH 03/62] x86/cpufeatures: Add SEV-ES CPU feature Joerg Roedel
2020-02-13  6:51   ` Borislav Petkov
2020-02-11 13:51 ` [PATCH 04/62] x86/traps: Move some definitions to <asm/trap_defs.h> Joerg Roedel
2020-02-11 13:51 ` [PATCH 05/62] x86/insn-decoder: Make inat-tables.c suitable for pre-decompression code Joerg Roedel
2020-02-11 13:52 ` [PATCH 06/62] x86/boot/compressed: Fix debug_puthex() parameter type Joerg Roedel
2020-02-11 13:52 ` [PATCH 07/62] x86/boot/compressed/64: Disable red-zone usage Joerg Roedel
2020-02-11 22:13   ` Andy Lutomirski
2020-02-11 13:52 ` [PATCH 08/62] x86/boot/compressed/64: Add IDT Infrastructure Joerg Roedel
2020-02-11 22:18   ` Andy Lutomirski
2020-02-12 11:19     ` Joerg Roedel
2020-02-14 19:40   ` Andi Kleen
2020-02-15 12:32     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 09/62] x86/boot/compressed/64: Rename kaslr_64.c to ident_map_64.c Joerg Roedel
2020-02-11 13:52 ` [PATCH 10/62] x86/boot/compressed/64: Add page-fault handler Joerg Roedel
2020-02-11 13:52 ` [PATCH 11/62] x86/boot/compressed/64: Always switch to own page-table Joerg Roedel
2020-02-11 13:52 ` [PATCH 12/62] x86/boot/compressed/64: Don't pre-map memory in KASLR code Joerg Roedel
2020-02-11 13:52 ` [PATCH 13/62] x86/boot/compressed/64: Change add_identity_map() to take start and end Joerg Roedel
2020-02-11 13:52 ` [PATCH 14/62] x86/boot/compressed/64: Add stage1 #VC handler Joerg Roedel
2020-02-11 22:23   ` Andy Lutomirski
2020-02-12 11:38     ` Joerg Roedel
2020-02-12 16:22       ` Andy Lutomirski
2020-02-11 13:52 ` [PATCH 15/62] x86/boot/compressed/64: Call set_sev_encryption_mask earlier Joerg Roedel
2020-02-11 13:52 ` [PATCH 16/62] x86/boot/compressed/64: Check return value of kernel_ident_mapping_init() Joerg Roedel
2020-02-11 13:52 ` [PATCH 17/62] x86/boot/compressed/64: Add function to map a page unencrypted Joerg Roedel
2020-02-11 13:52 ` [PATCH 18/62] x86/boot/compressed/64: Setup GHCB Based VC Exception handler Joerg Roedel
2020-02-11 22:25   ` Andy Lutomirski
2020-02-12 11:44     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 19/62] x86/sev-es: Add support for handling IOIO exceptions Joerg Roedel
2020-02-11 22:28   ` Andy Lutomirski
2020-02-12 11:49     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 20/62] x86/fpu: Move xgetbv()/xsetbv() into separate header Joerg Roedel
2020-02-11 13:52 ` [PATCH 21/62] x86/sev-es: Add CPUID handling to #VC handler Joerg Roedel
2020-02-11 13:52 ` [PATCH 22/62] x86/sev-es: Add handler for MMIO events Joerg Roedel
2020-02-11 13:52 ` [PATCH 23/62] x86/idt: Move IDT to data segment Joerg Roedel
2020-02-11 22:41   ` Andy Lutomirski
2020-02-12 11:55     ` Joerg Roedel
2020-02-12 16:23       ` Andy Lutomirski
2020-02-12 16:28         ` Jürgen Groß
2020-02-19 10:42           ` Joerg Roedel
2020-02-19 10:47             ` Jürgen Groß
2020-02-11 13:52 ` [PATCH 24/62] x86/idt: Split idt_data setup out of set_intr_gate() Joerg Roedel
2020-02-11 13:52 ` [PATCH 25/62] x86/head/64: Install boot GDT Joerg Roedel
2020-02-11 22:29   ` Andy Lutomirski
2020-02-12 12:20     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 26/62] x86/head/64: Reload GDT after switch to virtual addresses Joerg Roedel
2020-02-11 13:52 ` [PATCH 27/62] x86/head/64: Load segment registers earlier Joerg Roedel
2020-02-11 13:52 ` [PATCH 28/62] x86/head/64: Switch to initial stack earlier Joerg Roedel
2020-02-11 13:52 ` [PATCH 29/62] x86/head/64: Load IDT earlier Joerg Roedel
2020-02-11 13:52 ` [PATCH 30/62] x86/head/64: Move early exception dispatch to C code Joerg Roedel
2020-02-11 22:44   ` Andy Lutomirski
2020-02-12 12:39     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 31/62] x86/sev-es: Add SEV-ES Feature Detection Joerg Roedel
2020-02-11 13:52 ` [PATCH 32/62] x86/sev-es: Compile early handler code into kernel image Joerg Roedel
2020-02-11 13:52 ` [PATCH 33/62] x86/sev-es: Setup early #VC handler Joerg Roedel
2020-02-11 13:52 ` [PATCH 34/62] x86/sev-es: Setup GHCB based boot " Joerg Roedel
2020-02-11 13:52 ` [PATCH 35/62] x86/sev-es: Setup per-cpu GHCBs for the runtime handler Joerg Roedel
2020-02-11 22:46   ` Andy Lutomirski
2020-02-12 15:16     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 36/62] x86/sev-es: Add Runtime #VC Exception Handler Joerg Roedel
2020-02-11 13:52 ` [PATCH 37/62] x86/sev-es: Wire up existing #VC exit-code handlers Joerg Roedel
2020-02-11 13:52 ` [PATCH 38/62] x86/sev-es: Handle instruction fetches from user-space Joerg Roedel
2020-02-12 21:42   ` Andy Lutomirski
2020-03-13  9:12     ` Joerg Roedel
2020-03-17 21:34       ` Andy Lutomirski
2020-02-11 13:52 ` [PATCH 39/62] x86/sev-es: Harden runtime #VC handler for exceptions " Joerg Roedel
2020-02-11 22:47   ` Andy Lutomirski
2020-02-12 13:16     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 40/62] x86/sev-es: Filter exceptions not supported " Joerg Roedel
2020-02-11 13:52 ` [PATCH 41/62] x86/sev-es: Handle MSR events Joerg Roedel
2020-02-13 15:45   ` Dave Hansen
2020-02-14  7:23     ` Joerg Roedel
2020-02-14 16:59       ` Dave Hansen
2020-02-15 12:45         ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 42/62] x86/sev-es: Handle DR7 read/write events Joerg Roedel
2020-02-11 13:52 ` [PATCH 43/62] x86/sev-es: Handle WBINVD Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 44/62] x86/sev-es: Handle RDTSC Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 45/62] x86/sev-es: Handle RDPMC Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 46/62] x86/sev-es: Handle INVD Events Joerg Roedel
2020-02-12  0:12   ` Andy Lutomirski
2020-02-12 15:36     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 47/62] x86/sev-es: Handle RDTSCP Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 48/62] x86/sev-es: Handle MONITOR/MONITORX Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 49/62] x86/sev-es: Handle MWAIT/MWAITX Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 50/62] x86/sev-es: Handle VMMCALL Events Joerg Roedel
2020-02-12  0:14   ` Andy Lutomirski
2020-02-12 13:22     ` Joerg Roedel
2020-02-11 13:52 ` [PATCH 51/62] x86/sev-es: Handle #AC Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 52/62] x86/sev-es: Handle #DB Events Joerg Roedel
2020-02-11 13:52 ` [PATCH 53/62] x86/paravirt: Allow hypervisor specific VMMCALL handling under SEV-ES Joerg Roedel
2020-02-11 13:52 ` [PATCH 54/62] x86/kvm: Add KVM " Joerg Roedel
2020-02-11 13:52 ` [PATCH 55/62] x86/vmware: Add VMware specific handling for VMMCALL " Joerg Roedel
2020-02-11 13:52 ` [PATCH 56/62] x86/realmode: Add SEV-ES specific trampoline entry point Joerg Roedel
2020-02-11 13:52 ` [PATCH 57/62] x86/realmode: Setup AP jump table Joerg Roedel
2020-02-11 13:52 ` [PATCH 58/62] x86/head/64: Don't call verify_cpu() on starting APs Joerg Roedel
2020-02-11 13:52 ` [PATCH 59/62] x86/head/64: Rename start_cpu0 Joerg Roedel
2020-02-11 13:52 ` [PATCH 60/62] x86/sev-es: Support CPU offline/online Joerg Roedel
2020-02-11 13:52 ` [PATCH 61/62] x86/cpufeature: Add SEV_ES_GUEST CPU Feature Joerg Roedel
2020-02-11 13:52 ` [PATCH 62/62] x86/sev-es: Add NMI state tracking Joerg Roedel
2020-02-11 22:50   ` Andy Lutomirski
2020-02-12 13:56     ` Joerg Roedel
2020-02-11 14:50 ` [RFC PATCH 00/62] Linux as SEV-ES Guest Support Peter Zijlstra
2020-02-11 15:43   ` Joerg Roedel
2020-02-11 22:12     ` Andy Lutomirski
2020-02-12 13:54       ` Joerg Roedel
2020-02-12  3:48 ` Andy Lutomirski
2020-02-12 13:59   ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).