linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/32] x86: Secure Memory Encryption (AMD)
@ 2017-04-18 21:16 Tom Lendacky
  2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
                   ` (31 more replies)
  0 siblings, 32 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:16 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

This patch series provides support for AMD's new Secure Memory Encryption (SME)
feature.

SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.

The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.

This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.

The following links provide additional detail:

AMD Memory Encryption whitepaper:
   http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
   http://support.amd.com/TechDocs/24593.pdf
   SME is section 7.10
   SEV is section 15.34

This patch series is based off of the master branch of tip.
  Commit d36d99770e40 ("Merge branch 'timers/core'")

---

Still to do:
- IOMMU enablement support
- Investigate using memremap() instead of ioremap_cache() for kdump

Changes since v4:
- Re-worked mapping of setup data to not use a fixed list. Rather, check
  dynamically whether the requested early_memremap()/memremap() call
  needs to be mapped decrypted.
- Moved SME cpu feature into scattered features
- Moved some declarations into header files
- Cleared the encryption mask from the __PHYSICAL_MASK so that users
  of macros such as pmd_pfn_mask() don't have to worry/know about the
  encryption mask
- Updated some return types and values related to EFI and e820 functions
  so that an error could be returned
- During cpu shutdown, removed cache disabling and added a check for kexec
  in progress to use wbinvd followed immediately by halt in order to avoid
  any memory corruption
- Update how persistent memory is identified
- Added a function to find command line arguments and their values
- Added sysfs support
- General code cleanup based on feedback
- General cleanup of patch subjects and descriptions


Changes since v3:
- Broke out some of the patches into smaller individual patches
- Updated Documentation
- Added a message to indicate why the IOMMU was disabled
- Updated CPU feature support for SME by taking into account whether
  BIOS has enabled SME
- Eliminated redundant functions
- Added some warning messages for DMA usage of bounce buffers when SME
  is active
- Added support for persistent memory
- Added support to determine when setup data is being mapped and be sure
  to map it un-encrypted
- Added CONFIG support to set the default action of whether to activate
  SME if it is supported/enabled
- Added support for (re)booting with kexec

Changes since v2:
- Updated Documentation
- Make the encryption mask available outside of arch/x86 through a
  standard include file
- Conversion of assembler routines to C where possible (not everything
  could be converted, e.g. the routine that does the actual encryption
  needs to be copied into a safe location and it is difficult to
  determine the actual length of the function in order to copy it)
- Fix SME feature use of scattered CPUID feature
- Creation of SME specific functions for things like encrypting
  the setup data, ramdisk, etc.
- New take on early_memremap / memremap encryption support
- Additional support for accessing video buffers (fbdev/gpu) as
  un-encrypted
- Disable IOMMU for now - need to investigate further in relation to
  how it needs to be programmed relative to accessing physical memory

Changes since v1:
- Added Documentation.
- Removed AMD vendor check for setting the PAT write protect mode
- Updated naming of trampoline flag for SME as well as moving of the
  SME check to before paging is enabled.
- Change to early_memremap to identify the data being mapped as either
  boot data or kernel data.  The idea being that boot data will have
  been placed in memory as un-encrypted data and would need to be accessed
  as such.
- Updated debugfs support for the bootparams to access the data properly.
- Do not set the SYSCFG[MEME] bit, only check it.  The setting of the
  MemEncryptionModeEn bit results in a reduction of physical address size
  of the processor.  It is possible that BIOS could have configured resources
  resources into a range that will now not be addressable.  To prevent this,
  rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
  encryption support in the kernel.

Tom Lendacky (32):
      x86: Documentation for AMD Secure Memory Encryption (SME)
      x86/mm/pat: Set write-protect cache mode for full PAT support
      x86, mpparse, x86/acpi, x86/PCI, SFI: Use memremap for RAM mappings
      x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
      x86/CPU/AMD: Handle SME reduction in physical address size
      x86/mm: Add Secure Memory Encryption (SME) support
      x86/mm: Add support to enable SME in early boot processing
      x86/mm: Simplify p[g4um]d_page() macros
      x86/mm: Provide general kernel support for memory encryption
      x86/mm: Extend early_memremap() support with additional attrs
      x86/mm: Add support for early encrypt/decrypt of memory
      x86/mm: Insure that boot memory areas are mapped properly
      x86/boot/e820: Add support to determine the E820 type of an address
      efi: Add an EFI table address match function
      efi: Update efi_mem_type() to return an error rather than 0
      x86/efi: Update EFI pagetable creation to work with SME
      x86/mm: Add support to access boot related data in the clear
      x86, mpparse: Use memremap to map the mpf and mpc data
      x86/mm: Add support to access persistent memory in the clear
      x86/mm: Add support for changing the memory encryption attribute
      x86, realmode: Decrypt trampoline area if memory encryption is active
      x86, swiotlb: DMA support for memory encryption
      swiotlb: Add warnings for use of bounce buffers with SME
      iommu/amd: Disable AMD IOMMU if memory encryption is active
      x86, realmode: Check for memory encryption on the APs
      x86, drm, fbdev: Do not specify encrypted memory for video mappings
      kvm: x86: svm: Enable Secure Memory Encryption within KVM
      x86/mm, kexec: Allow kexec to be used with SME
      x86/mm: Add support to encrypt the kernel in-place
      x86/boot: Add early cmdline parsing for options with arguments
      x86: Add sysfs support for Secure Memory Encryption
      x86/mm: Add support to make use of Secure Memory Encryption


 Documentation/admin-guide/kernel-parameters.txt |   11 
 Documentation/x86/amd-memory-encryption.txt     |   60 ++
 arch/ia64/kernel/efi.c                          |    4 
 arch/x86/Kconfig                                |   26 +
 arch/x86/boot/compressed/pagetable.c            |    7 
 arch/x86/include/asm/cacheflush.h               |    3 
 arch/x86/include/asm/cmdline.h                  |    2 
 arch/x86/include/asm/cpufeatures.h              |    1 
 arch/x86/include/asm/dma-mapping.h              |    5 
 arch/x86/include/asm/e820/api.h                 |    2 
 arch/x86/include/asm/fixmap.h                   |   20 +
 arch/x86/include/asm/init.h                     |    1 
 arch/x86/include/asm/io.h                       |    4 
 arch/x86/include/asm/irqflags.h                 |    5 
 arch/x86/include/asm/kexec.h                    |    8 
 arch/x86/include/asm/kvm_host.h                 |    2 
 arch/x86/include/asm/mem_encrypt.h              |  115 ++++
 arch/x86/include/asm/msr-index.h                |    2 
 arch/x86/include/asm/page.h                     |    4 
 arch/x86/include/asm/page_types.h               |    2 
 arch/x86/include/asm/pgtable.h                  |   28 +
 arch/x86/include/asm/pgtable_types.h            |   54 +-
 arch/x86/include/asm/processor.h                |    3 
 arch/x86/include/asm/realmode.h                 |   12 
 arch/x86/include/asm/vga.h                      |   13 
 arch/x86/kernel/acpi/boot.c                     |    6 
 arch/x86/kernel/cpu/amd.c                       |   23 +
 arch/x86/kernel/cpu/scattered.c                 |    1 
 arch/x86/kernel/e820.c                          |   26 +
 arch/x86/kernel/espfix_64.c                     |    2 
 arch/x86/kernel/head64.c                        |   42 +-
 arch/x86/kernel/head_64.S                       |   80 +++
 arch/x86/kernel/kdebugfs.c                      |   34 -
 arch/x86/kernel/ksysfs.c                        |   28 +
 arch/x86/kernel/machine_kexec_64.c              |   35 +
 arch/x86/kernel/mpparse.c                       |  106 +++-
 arch/x86/kernel/pci-dma.c                       |   11 
 arch/x86/kernel/pci-nommu.c                     |    2 
 arch/x86/kernel/pci-swiotlb.c                   |    8 
 arch/x86/kernel/process.c                       |   26 +
 arch/x86/kernel/setup.c                         |   10 
 arch/x86/kvm/mmu.c                              |   12 
 arch/x86/kvm/mmu.h                              |    2 
 arch/x86/kvm/svm.c                              |   35 +
 arch/x86/kvm/vmx.c                              |    3 
 arch/x86/kvm/x86.c                              |    3 
 arch/x86/lib/cmdline.c                          |  105 ++++
 arch/x86/mm/Makefile                            |    3 
 arch/x86/mm/ident_map.c                         |   11 
 arch/x86/mm/ioremap.c                           |  255 +++++++++
 arch/x86/mm/kasan_init_64.c                     |    4 
 arch/x86/mm/mem_encrypt.c                       |  626 +++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_boot.S                  |  151 ++++++
 arch/x86/mm/pageattr.c                          |   67 ++
 arch/x86/mm/pat.c                               |    6 
 arch/x86/pci/common.c                           |    4 
 arch/x86/platform/efi/efi.c                     |    6 
 arch/x86/platform/efi/efi_64.c                  |   15 -
 arch/x86/realmode/init.c                        |   16 +
 arch/x86/realmode/rm/trampoline_64.S            |   24 +
 drivers/firmware/efi/efi.c                      |   33 +
 drivers/gpu/drm/drm_gem.c                       |    2 
 drivers/gpu/drm/drm_vm.c                        |    4 
 drivers/gpu/drm/ttm/ttm_bo_vm.c                 |    7 
 drivers/gpu/drm/udl/udl_fb.c                    |    4 
 drivers/iommu/amd_iommu_init.c                  |    7 
 drivers/sfi/sfi_core.c                          |   22 -
 drivers/video/fbdev/core/fbmem.c                |   12 
 include/asm-generic/early_ioremap.h             |    2 
 include/asm-generic/pgtable.h                   |    8 
 include/linux/dma-mapping.h                     |   11 
 include/linux/efi.h                             |    9 
 include/linux/io.h                              |    2 
 include/linux/kexec.h                           |   14 +
 include/linux/mem_encrypt.h                     |   53 ++
 include/linux/swiotlb.h                         |    1 
 init/main.c                                     |   13 
 kernel/kexec_core.c                             |    7 
 kernel/memremap.c                               |   20 +
 lib/swiotlb.c                                   |   59 ++
 mm/early_ioremap.c                              |   28 +
 81 files changed, 2293 insertions(+), 207 deletions(-)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 arch/x86/mm/mem_encrypt_boot.S
 create mode 100644 include/linux/mem_encrypt.h

-- 
Tom Lendacky

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
@ 2017-04-18 21:16 ` Tom Lendacky
  2017-04-19  9:02   ` Borislav Petkov
  2017-04-19  9:52   ` David Howells
  2017-04-18 21:16 ` [PATCH v5 02/32] x86/mm/pat: Set write-protect cache mode for full PAT support Tom Lendacky
                   ` (30 subsequent siblings)
  31 siblings, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:16 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Create a Documentation entry to describe the AMD Secure Memory
Encryption (SME) feature and add documentation for the mem_encrypt=
kernel parameter.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   11 ++++
 Documentation/x86/amd-memory-encryption.txt     |   60 +++++++++++++++++++++++
 2 files changed, 71 insertions(+)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3dd6d5d..84c5787 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2165,6 +2165,17 @@
 			memory contents and reserves bad memory
 			regions that are detected.
 
+	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
+			Valid arguments: on, off
+			Default (depends on kernel configuration option):
+			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
+			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
+			mem_encrypt=on:		Activate SME
+			mem_encrypt=off:	Do not activate SME
+
+			Refer to Documentation/x86/amd-memory-encryption.txt
+			for details on when memory encryption can be activated.
+
 	mem_sleep_default=	[SUSPEND] Default system suspend mode:
 			s2idle  - Suspend-To-Idle
 			shallow - Power-On Suspend or equivalent (if supported)
diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..0b72ff2
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,60 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables.  A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM.  SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below on how to determine its position).  The encryption bit can be specified
+in the cr3 register, allowing the PGD table to be encrypted. Each successive
+level of page tables can also be encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x8000001f reports information related to SME:
+
+	0x8000001f[eax]:
+		Bit[0] indicates support for SME
+	0x8000001f[ebx]:
+		Bits[5:0]  pagetable bit number used to activate memory
+			   encryption
+		Bits[11:6] reduction in physical address space, in bits, when
+			   memory encryption is enabled (this only affects
+			   system physical addresses, not guest physical
+			   addresses)
+
+If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+	0xc0010010:
+		Bit[23]   0 = memory encryption features are disabled
+			  1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system.  If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+The state of SME in the Linux kernel can be documented as follows:
+	- Supported:
+	  The CPU supports SME (determined through CPUID instruction).
+
+	- Enabled:
+	  Supported and bit 23 of MSR_K8_SYSCFG is set.
+
+	- Active:
+	  Supported, Enabled and the Linux kernel is actively applying
+	  the encryption bit to page table entries (the SME mask in the
+	  kernel is non-zero).
+
+SME can also be enabled and activated in the BIOS. If SME is enabled and
+activated in the BIOS, then all memory accesses will be encrypted and it will
+not be necessary to activate the Linux memory encryption support.  If the BIOS
+merely enables SME (sets bit 23 of the MSR_K8_SYSCFG), then Linux can activate
+memory encryption by default (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y) or
+by supplying mem_encrypt=on on the kernel command line.  However, if BIOS does
+not enable SME, then Linux will not be able to activate memory encryption, even
+if configured to do so by default or the mem_encrypt=on command line parameter
+is specified.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 02/32] x86/mm/pat: Set write-protect cache mode for full PAT support
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
  2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
@ 2017-04-18 21:16 ` Tom Lendacky
  2017-04-18 21:16 ` [PATCH v5 03/32] x86, mpparse, x86/acpi, x86/PCI, SFI: Use memremap for RAM mappings Tom Lendacky
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:16 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/mm/pat.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 9b78685..6753d9c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -295,7 +295,7 @@ static void init_cache_modes(void)
  * pat_init - Initialize PAT MSR and PAT table
  *
  * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC and WT.
+ * to enable additional cache attributes, WC, WT and WP.
  *
  * This function must be called on all CPUs using the specific sequence of
  * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
@@ -356,7 +356,7 @@ void pat_init(void)
 		 *      010    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
 		 *      011    3    UC : _PAGE_CACHE_MODE_UC
 		 *      100    4    WB : Reserved
-		 *      101    5    WC : Reserved
+		 *      101    5    WP : _PAGE_CACHE_MODE_WP
 		 *      110    6    UC-: Reserved
 		 *      111    7    WT : _PAGE_CACHE_MODE_WT
 		 *
@@ -364,7 +364,7 @@ void pat_init(void)
 		 * corresponding types in the presence of PAT errata.
 		 */
 		pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
-		      PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+		      PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
 	}
 
 	if (!boot_cpu_done) {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 03/32] x86, mpparse, x86/acpi, x86/PCI, SFI: Use memremap for RAM mappings
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
  2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
  2017-04-18 21:16 ` [PATCH v5 02/32] x86/mm/pat: Set write-protect cache mode for full PAT support Tom Lendacky
@ 2017-04-18 21:16 ` Tom Lendacky
  2017-04-18 21:17 ` [PATCH v5 04/32] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature Tom Lendacky
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:16 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

The ioremap() function is intended for mapping MMIO. For RAM, the
memremap() function can be used. Convert calls from ioremap() to
memremap() when re-mapping RAM.

This will be used later by SME to control how the encryption mask is
applied to memory mappings, with certain memory locations being mapped
decrypted vs encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/acpi/boot.c |    6 +++---
 arch/x86/kernel/kdebugfs.c  |   34 +++++++++++-----------------------
 arch/x86/kernel/ksysfs.c    |   28 ++++++++++++++--------------
 arch/x86/kernel/mpparse.c   |   10 +++++-----
 arch/x86/pci/common.c       |    4 ++--
 drivers/sfi/sfi_core.c      |   22 +++++++++++-----------
 6 files changed, 46 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 6bb6806..850160a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -115,7 +115,7 @@
 #define	ACPI_INVALID_GSI		INT_MIN
 
 /*
- * This is just a simple wrapper around early_ioremap(),
+ * This is just a simple wrapper around early_memremap(),
  * with sanity checks for phys == 0 and size == 0.
  */
 char *__init __acpi_map_table(unsigned long phys, unsigned long size)
@@ -124,7 +124,7 @@ char *__init __acpi_map_table(unsigned long phys, unsigned long size)
 	if (!phys || !size)
 		return NULL;
 
-	return early_ioremap(phys, size);
+	return early_memremap(phys, size);
 }
 
 void __init __acpi_unmap_table(char *map, unsigned long size)
@@ -132,7 +132,7 @@ void __init __acpi_unmap_table(char *map, unsigned long size)
 	if (!map || !size)
 		return;
 
-	early_iounmap(map, size);
+	early_memunmap(map, size);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index 38b6458..fd6f8fb 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -33,7 +33,6 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
 	struct setup_data_node *node = file->private_data;
 	unsigned long remain;
 	loff_t pos = *ppos;
-	struct page *pg;
 	void *p;
 	u64 pa;
 
@@ -47,18 +46,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
 		count = node->len - pos;
 
 	pa = node->paddr + sizeof(struct setup_data) + pos;
-	pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
-	if (PageHighMem(pg)) {
-		p = ioremap_cache(pa, count);
-		if (!p)
-			return -ENXIO;
-	} else
-		p = __va(pa);
+	p = memremap(pa, count, MEMREMAP_WB);
+	if (!p)
+		return -ENOMEM;
 
 	remain = copy_to_user(user_buf, p, count);
 
-	if (PageHighMem(pg))
-		iounmap(p);
+	memunmap(p);
 
 	if (remain)
 		return -EFAULT;
@@ -109,7 +103,6 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 	struct setup_data *data;
 	int error;
 	struct dentry *d;
-	struct page *pg;
 	u64 pa_data;
 	int no = 0;
 
@@ -126,16 +119,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 			goto err_dir;
 		}
 
-		pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
-		if (PageHighMem(pg)) {
-			data = ioremap_cache(pa_data, sizeof(*data));
-			if (!data) {
-				kfree(node);
-				error = -ENXIO;
-				goto err_dir;
-			}
-		} else
-			data = __va(pa_data);
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
+		if (!data) {
+			kfree(node);
+			error = -ENOMEM;
+			goto err_dir;
+		}
 
 		node->paddr = pa_data;
 		node->type = data->type;
@@ -143,8 +132,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 		error = create_setup_data_node(d, no, node);
 		pa_data = data->next;
 
-		if (PageHighMem(pg))
-			iounmap(data);
+		memunmap(data);
 		if (error)
 			goto err_dir;
 		no++;
diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
index 4afc67f..ee51db9 100644
--- a/arch/x86/kernel/ksysfs.c
+++ b/arch/x86/kernel/ksysfs.c
@@ -16,8 +16,8 @@
 #include <linux/stat.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/io.h>
 
-#include <asm/io.h>
 #include <asm/setup.h>
 
 static ssize_t version_show(struct kobject *kobj,
@@ -79,12 +79,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
 			*paddr = pa_data;
 			return 0;
 		}
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data)
 			return -ENOMEM;
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -97,17 +97,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
 	u64 pa_data = boot_params.hdr.setup_data;
 
 	while (pa_data) {
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data)
 			return -ENOMEM;
 		if (nr == i) {
 			*size = data->len;
-			iounmap(data);
+			memunmap(data);
 			return 0;
 		}
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -127,12 +127,12 @@ static ssize_t type_show(struct kobject *kobj,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
 	if (!data)
 		return -ENOMEM;
 
 	ret = sprintf(buf, "0x%x\n", data->type);
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -154,7 +154,7 @@ static ssize_t setup_data_data_read(struct file *fp,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
 	if (!data)
 		return -ENOMEM;
 
@@ -170,15 +170,15 @@ static ssize_t setup_data_data_read(struct file *fp,
 		goto out;
 
 	ret = count;
-	p = ioremap_cache(paddr + sizeof(*data), data->len);
+	p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
 	if (!p) {
 		ret = -ENOMEM;
 		goto out;
 	}
 	memcpy(buf, p + off, count);
-	iounmap(p);
+	memunmap(p);
 out:
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -250,13 +250,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
 	*nr = 0;
 	while (pa_data) {
 		*nr += 1;
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data) {
 			ret = -ENOMEM;
 			goto out;
 		}
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 	}
 
 out:
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 0d904d7..fd37f39 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -436,9 +436,9 @@ static unsigned long __init get_mpc_size(unsigned long physptr)
 	struct mpc_table *mpc;
 	unsigned long size;
 
-	mpc = early_ioremap(physptr, PAGE_SIZE);
+	mpc = early_memremap(physptr, PAGE_SIZE);
 	size = mpc->length;
-	early_iounmap(mpc, PAGE_SIZE);
+	early_memunmap(mpc, PAGE_SIZE);
 	apic_printk(APIC_VERBOSE, "  mpc: %lx-%lx\n", physptr, physptr + size);
 
 	return size;
@@ -450,7 +450,7 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
 	unsigned long size;
 
 	size = get_mpc_size(mpf->physptr);
-	mpc = early_ioremap(mpf->physptr, size);
+	mpc = early_memremap(mpf->physptr, size);
 	/*
 	 * Read the physical hardware table.  Anything here will
 	 * override the defaults.
@@ -461,10 +461,10 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
 #endif
 		pr_err("BIOS bug, MP table errors detected!...\n");
 		pr_cont("... disabling SMP support. (tell your hw vendor)\n");
-		early_iounmap(mpc, size);
+		early_memunmap(mpc, size);
 		return -1;
 	}
-	early_iounmap(mpc, size);
+	early_memunmap(mpc, size);
 
 	if (early)
 		return -1;
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 190e718..08cf71c 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -691,7 +691,7 @@ int pcibios_add_device(struct pci_dev *dev)
 
 	pa_data = boot_params.hdr.setup_data;
 	while (pa_data) {
-		data = ioremap(pa_data, sizeof(*rom));
+		data = memremap(pa_data, sizeof(*rom), MEMREMAP_WB);
 		if (!data)
 			return -ENOMEM;
 
@@ -710,7 +710,7 @@ int pcibios_add_device(struct pci_dev *dev)
 			}
 		}
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 	}
 	set_dma_domain_ops(dev);
 	set_dev_domain_options(dev);
diff --git a/drivers/sfi/sfi_core.c b/drivers/sfi/sfi_core.c
index 296db7a..d5ce534 100644
--- a/drivers/sfi/sfi_core.c
+++ b/drivers/sfi/sfi_core.c
@@ -86,13 +86,13 @@
 /*
  * FW creates and saves the SFI tables in memory. When these tables get
  * used, they may need to be mapped to virtual address space, and the mapping
- * can happen before or after the ioremap() is ready, so a flag is needed
+ * can happen before or after the memremap() is ready, so a flag is needed
  * to indicating this
  */
-static u32 sfi_use_ioremap __read_mostly;
+static u32 sfi_use_memremap __read_mostly;
 
 /*
- * sfi_un/map_memory calls early_ioremap/iounmap which is a __init function
+ * sfi_un/map_memory calls early_memremap/memunmap which is a __init function
  * and introduces section mismatch. So use __ref to make it calm.
  */
 static void __iomem * __ref sfi_map_memory(u64 phys, u32 size)
@@ -100,10 +100,10 @@ static void __iomem * __ref sfi_map_memory(u64 phys, u32 size)
 	if (!phys || !size)
 		return NULL;
 
-	if (sfi_use_ioremap)
-		return ioremap_cache(phys, size);
+	if (sfi_use_memremap)
+		return memremap(phys, size, MEMREMAP_WB);
 	else
-		return early_ioremap(phys, size);
+		return early_memremap(phys, size);
 }
 
 static void __ref sfi_unmap_memory(void __iomem *virt, u32 size)
@@ -111,10 +111,10 @@ static void __ref sfi_unmap_memory(void __iomem *virt, u32 size)
 	if (!virt || !size)
 		return;
 
-	if (sfi_use_ioremap)
-		iounmap(virt);
+	if (sfi_use_memremap)
+		memunmap(virt);
 	else
-		early_iounmap(virt, size);
+		early_memunmap(virt, size);
 }
 
 static void sfi_print_table_header(unsigned long long pa,
@@ -507,8 +507,8 @@ void __init sfi_init_late(void)
 	length = syst_va->header.len;
 	sfi_unmap_memory(syst_va, sizeof(struct sfi_table_simple));
 
-	/* Use ioremap now after it is ready */
-	sfi_use_ioremap = 1;
+	/* Use memremap now after it is ready */
+	sfi_use_memremap = 1;
 	syst_va = sfi_map_memory(syst_pa, length);
 
 	sfi_acpi_init();

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 04/32] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (2 preceding siblings ...)
  2017-04-18 21:16 ` [PATCH v5 03/32] x86, mpparse, x86/acpi, x86/PCI, SFI: Use memremap for RAM mappings Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-18 21:17 ` [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size Tom Lendacky
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature.  SME is identified by CPUID
0x8000001f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG).  Only show the SME feature as available if reported by
CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |    1 +
 arch/x86/include/asm/msr-index.h   |    2 ++
 arch/x86/kernel/cpu/amd.c          |   15 +++++++++++++++
 arch/x86/kernel/cpu/scattered.c    |    1 +
 4 files changed, 19 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 2701e5f..2b692df 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -196,6 +196,7 @@
 
 #define X86_FEATURE_HW_PSTATE	( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_SME		( 7*32+10) /* AMD Secure Memory Encryption */
 
 #define X86_FEATURE_INTEL_PPIN	( 7*32+14) /* Intel Processor Inventory Number */
 #define X86_FEATURE_INTEL_PT	( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 673f9ac..8ff4aaa 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -350,6 +350,8 @@
 #define MSR_K8_TOP_MEM1			0xc001001a
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_K8_SYSCFG			0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT	23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c36140d..5fc5232 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -611,6 +611,21 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 	 */
 	if (cpu_has_amd_erratum(c, amd_erratum_400))
 		set_cpu_bug(c, X86_BUG_AMD_E400);
+
+	/*
+	 * BIOS support is required for SME. If BIOS has not enabled SME
+	 * then don't advertise the feature (set in scattered.c)
+	 */
+	if (c->extended_cpuid_level >= 0x8000001f) {
+		if (cpu_has(c, X86_FEATURE_SME)) {
+			u64 msr;
+
+			/* Check if SME is enabled */
+			rdmsrl(MSR_K8_SYSCFG, msr);
+			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+				clear_cpu_cap(c, X86_FEATURE_SME);
+		}
+	}
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 23c2350..05459ad 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ struct cpuid_bit {
 	{ X86_FEATURE_HW_PSTATE,	CPUID_EDX,  7, 0x80000007, 0 },
 	{ X86_FEATURE_CPB,		CPUID_EDX,  9, 0x80000007, 0 },
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
+	{ X86_FEATURE_SME,		CPUID_EAX,  0, 0x8000001f, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (3 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 04/32] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-20 16:59   ` Borislav Petkov
  2017-04-18 21:17 ` [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support Tom Lendacky
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/cpu/amd.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 5fc5232..35eeeb1 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -613,8 +613,10 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 		set_cpu_bug(c, X86_BUG_AMD_E400);
 
 	/*
-	 * BIOS support is required for SME. If BIOS has not enabled SME
-	 * then don't advertise the feature (set in scattered.c)
+	 * BIOS support is required for SME. If BIOS has enabld SME then
+	 * adjust x86_phys_bits by the SME physical address space reduction
+	 * value. If BIOS has not enabled SME then don't advertise the
+	 * feature (set in scattered.c).
 	 */
 	if (c->extended_cpuid_level >= 0x8000001f) {
 		if (cpu_has(c, X86_FEATURE_SME)) {
@@ -622,8 +624,14 @@ static void early_init_amd(struct cpuinfo_x86 *c)
 
 			/* Check if SME is enabled */
 			rdmsrl(MSR_K8_SYSCFG, msr);
-			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+			if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT) {
+				unsigned int ebx;
+
+				ebx = cpuid_ebx(0x8000001f);
+				c->x86_phys_bits -= (ebx >> 6) & 0x3f;
+			} else {
 				clear_cpu_cap(c, X86_FEATURE_SME);
+			}
 		}
 	}
 }

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (4 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-27 15:46   ` Borislav Petkov
  2017-04-18 21:17 ` [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing Tom Lendacky
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/Kconfig                   |   22 +++++++++++++++++++
 arch/x86/include/asm/mem_encrypt.h |   42 ++++++++++++++++++++++++++++++++++++
 arch/x86/mm/Makefile               |    1 +
 arch/x86/mm/mem_encrypt.c          |   21 ++++++++++++++++++
 include/linux/mem_encrypt.h        |   37 ++++++++++++++++++++++++++++++++
 5 files changed, 123 insertions(+)
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 include/linux/mem_encrypt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4e153e9..cf0cbe8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1407,6 +1407,28 @@ config X86_DIRECT_GBPAGES
 	  supports them), so don't confuse the user by printing
 	  that we have them enabled.
 
+config AMD_MEM_ENCRYPT
+	bool "AMD Secure Memory Encryption (SME) support"
+	depends on X86_64 && CPU_SUP_AMD
+	---help---
+	  Say yes to enable support for the encryption of system memory.
+	  This requires an AMD processor that supports Secure Memory
+	  Encryption (SME).
+
+config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
+	bool "Activate AMD Secure Memory Encryption (SME) by default"
+	default y
+	depends on AMD_MEM_ENCRYPT
+	---help---
+	  Say yes to have system memory encrypted by default if running on
+	  an AMD processor that supports Secure Memory Encryption (SME).
+
+	  If set to Y, then the encryption of system memory can be
+	  deactivated with the mem_encrypt=off command line option.
+
+	  If set to N, then the encryption of system memory can be
+	  activated with the mem_encrypt=on command line option.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 0000000..d5c4a2b
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,42 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+static inline bool sme_active(void)
+{
+	return !!sme_me_mask;
+}
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+#ifndef sme_me_mask
+#define sme_me_mask	0UL
+
+static inline bool sme_active(void)
+{
+	return false;
+}
+#endif
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif	/* __ASSEMBLY__ */
+
+#endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 0fbdcb6..a94a7b6 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 0000000..b99d469
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+/*
+ * Since SME related variables are set early in the boot process they must
+ * reside in the .data section so as not to be zeroed out when the .bss
+ * section is later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 0000000..14a7b9f
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,37 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+#include <asm/mem_encrypt.h>
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+#ifndef sme_me_mask
+#define sme_me_mask	0UL
+
+static inline bool sme_active(void)
+{
+	return false;
+}
+#endif
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif	/* __ASSEMBLY__ */
+
+#endif	/* __MEM_ENCRYPT_H__ */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (5 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-21 14:55   ` Borislav Petkov
  2017-04-18 21:17 ` [PATCH v5 08/32] x86/mm: Simplify p[g4um]d_page() macros Tom Lendacky
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support to the early boot code to use Secure Memory Encryption (SME).
Since the kernel has been loaded into memory in a decrypted state, support
is added to encrypt the kernel in place and update the early pagetables
with the memory encryption mask so that new pagetable entries will use
memory encryption.

The routines to set the encryption mask and perform the encryption are
stub routines for now with functionality to be added in a later patch.

Because of the need to have the routines available to head_64.S, the
mem_encrypt.c is always built and #ifdefs in mem_encrypt.c will provide
functionality or stub routines depending on CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/head_64.S |   61 ++++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/mm/Makefile      |    4 +--
 arch/x86/mm/mem_encrypt.c |   26 +++++++++++++++++++
 3 files changed, 86 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ac9d327..3115e21 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -91,6 +91,23 @@ startup_64:
 	jnz	bad_address
 
 	/*
+	 * Enable Secure Memory Encryption (SME), if supported and enabled.
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 * Save the returned mask in %r12 for later use.
+	 */
+	push	%rsi
+	call	sme_enable
+	pop	%rsi
+	movq	%rax, %r12
+
+	/*
+	 * Add the memory encryption mask to %rbp to include it in the page
+	 * table fixups.
+	 */
+	addq	%r12, %rbp
+
+	/*
 	 * Fixup the physical addresses in the page table
 	 */
 	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
@@ -113,6 +130,7 @@ startup_64:
 	shrq	$PGDIR_SHIFT, %rax
 
 	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+	addq	%r12, %rdx
 	movq	%rdx, 0(%rbx,%rax,8)
 	movq	%rdx, 8(%rbx,%rax,8)
 
@@ -129,6 +147,7 @@ startup_64:
 	movq	%rdi, %rax
 	shrq	$PMD_SHIFT, %rdi
 	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	addq	%r12, %rax
 	leaq	(_end - 1)(%rip), %rcx
 	shrq	$PMD_SHIFT, %rcx
 	subq	%rdi, %rcx
@@ -142,6 +161,12 @@ startup_64:
 	decl	%ecx
 	jnz	1b
 
+	/*
+	 * Determine if any fixups are required. This includes fixups
+	 * based on where the kernel was loaded and whether SME is
+	 * active. If %rbp is zero, then we can skip both the fixups
+	 * and the call to encrypt the kernel.
+	 */
 	test %rbp, %rbp
 	jz .Lskip_fixup
 
@@ -162,11 +187,30 @@ startup_64:
 	cmp	%r8, %rdi
 	jne	1b
 
-	/* Fixup phys_base */
+	/*
+	 * Fixup phys_base - remove the memory encryption mask from %rbp
+	 * to obtain the true physical address.
+	 */
+	subq	%r12, %rbp
 	addq	%rbp, phys_base(%rip)
 
+	/*
+	 * Encrypt the kernel if SME is active.
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 */
+	push	%rsi
+	call	sme_encrypt_kernel
+	pop	%rsi
+
 .Lskip_fixup:
+	/*
+	 * The encryption mask is in %r12. We ADD this to %rax to be sure
+	 * that the encryption mask is part of the value that will be
+	 * stored in %cr3.
+	 */
 	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	addq	%r12, %rax
 	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
@@ -186,7 +230,20 @@ ENTRY(secondary_startup_64)
 	/* Sanitize CPU configuration */
 	call verify_cpu
 
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+	/*
+	 * Get the SME encryption mask.
+	 *  The encryption mask will be returned in %rax so we do an ADD
+	 *  below to be sure that the encryption mask is part of the
+	 *  value that will stored in %cr3.
+	 *
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 */
+	push	%rsi
+	call	sme_get_me_mask
+	pop	%rsi
+
+	addq	$(init_level4_pgt - __START_KERNEL_map), %rax
 1:
 
 	/* Enable PAE mode and PGE */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index a94a7b6..9e13841 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -2,7 +2,7 @@
 KCOV_INSTRUMENT_tlb.o	:= n
 
 obj-y	:=  init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
-	    pat.o pgtable.o physaddr.o setup_nx.o tlb.o
+	    pat.o pgtable.o physaddr.o setup_nx.o tlb.o mem_encrypt.o
 
 # Make sure __phys_addr has no stackprotector
 nostackp := $(call cc-option, -fno-stack-protector)
@@ -38,5 +38,3 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
-
-obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b99d469..cc00d8b 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -11,6 +11,9 @@
  */
 
 #include <linux/linkage.h>
+#include <linux/init.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -19,3 +22,26 @@
  */
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_enable(void)
+{
+	return sme_me_mask;
+}
+
+unsigned long sme_get_me_mask(void)
+{
+	return sme_me_mask;
+}
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+void __init sme_encrypt_kernel(void)	{ }
+unsigned long __init sme_enable(void)	{ return 0; }
+
+unsigned long sme_get_me_mask(void)	{ return 0; }
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 08/32] x86/mm: Simplify p[g4um]d_page() macros
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (6 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-18 21:17 ` [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption Tom Lendacky
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Create a pgd_pfn() and p4d_pfn() macro similar to the p[um]d_pfn() macros
and then use the p[g4um]d_pfn() macros in the p[g4um]d_page() macros
instead of duplicating the code.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/pgtable.h |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 942482a..42b7193 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -195,6 +195,11 @@ static inline unsigned long p4d_pfn(p4d_t p4d)
 	return (p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT;
 }
 
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+	return (pgd_val(pgd) & PTE_PFN_MASK) >> PAGE_SHIFT;
+}
+
 static inline int p4d_large(p4d_t p4d)
 {
 	/* No 512 GiB pages yet */
@@ -704,8 +709,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)		\
-	pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -773,8 +777,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)		\
-	pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)	pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -824,8 +827,7 @@ static inline unsigned long p4d_page_vaddr(p4d_t p4d)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define p4d_page(p4d)		\
-	pfn_to_page((p4d_val(p4d) & p4d_pfn_mask(p4d)) >> PAGE_SHIFT)
+#define p4d_page(p4d)	pfn_to_page(p4d_pfn(p4d))
 
 /* Find an entry in the third-level page table.. */
 static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address)
@@ -859,7 +861,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)	pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (7 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 08/32] x86/mm: Simplify p[g4um]d_page() macros Tom Lendacky
@ 2017-04-18 21:17 ` Tom Lendacky
  2017-04-21 21:52   ` Dave Hansen
  2017-04-27 16:12   ` Borislav Petkov
  2017-04-18 21:18 ` [PATCH v5 10/32] x86/mm: Extend early_memremap() support with additional attrs Tom Lendacky
                   ` (22 subsequent siblings)
  31 siblings, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:17 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Changes to the existing page table macros will allow the SME support to
be enabled in a simple fashion with minimal changes to files that use these
macros.  Since the memory encryption mask will now be part of the regular
pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
_KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
without the encryption mask before SME becomes active.  Two new pgprot()
macros are defined to allow setting or clearing the page encryption mask.

The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
not support encryption for MMIO areas so this define removes the encryption
mask from the page attribute.

Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
creating a physical address with the encryption mask.  These are used when
working with the cr3 register so that the PGD can be encrypted. The current
__va() macro is updated so that the virtual address is generated based off
of the physical address without the encryption mask thus allowing the same
virtual address to be generated regardless of whether encryption is enabled
for that physical location or not.

Also, an early initialization function is added for SME.  If SME is active,
this function:
 - Updates the early_pmd_flags so that early page faults create mappings
   with the encryption mask.
 - Updates the __supported_pte_mask to include the encryption mask.
 - Updates the protection_map entries to include the encryption mask so
   that user-space allocations will automatically have the encryption mask
   applied.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/boot/compressed/pagetable.c |    7 +++++
 arch/x86/include/asm/fixmap.h        |    7 +++++
 arch/x86/include/asm/mem_encrypt.h   |   25 +++++++++++++++++++
 arch/x86/include/asm/page.h          |    4 ++-
 arch/x86/include/asm/page_types.h    |    2 +-
 arch/x86/include/asm/pgtable.h       |    9 +++++++
 arch/x86/include/asm/pgtable_types.h |   45 ++++++++++++++++++++++------------
 arch/x86/include/asm/processor.h     |    3 ++
 arch/x86/kernel/espfix_64.c          |    2 +-
 arch/x86/kernel/head64.c             |   12 ++++++++-
 arch/x86/kernel/head_64.S            |   18 +++++++-------
 arch/x86/mm/kasan_init_64.c          |    4 ++-
 arch/x86/mm/mem_encrypt.c            |   18 ++++++++++++++
 arch/x86/mm/pageattr.c               |    3 ++
 include/asm-generic/pgtable.h        |    8 ++++++
 15 files changed, 134 insertions(+), 33 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 56589d0..411c443 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
 #define __pa(x)  ((unsigned long)(x))
 #define __va(x)  ((void *)((unsigned long)(x)))
 
+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155c..d9ff226 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -157,6 +157,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
 }
 #endif
 
+/*
+ * FIXMAP_PAGE_NOCACHE is used for MMIO. Memory encryption is not
+ * supported for MMIO addresses, so make sure that the memory encryption
+ * mask is not part of the page attributes.
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
 #include <asm-generic/fixmap.h>
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index d5c4a2b..9fdbc53 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,6 +15,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/init.h>
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
@@ -24,6 +26,8 @@ static inline bool sme_active(void)
 	return !!sme_me_mask;
 }
 
+void __init sme_early_init(void);
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 #ifndef sme_me_mask
@@ -35,8 +39,29 @@ static inline bool sme_active(void)
 }
 #endif
 
+static inline void __init sme_early_init(void)
+{
+}
+
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
+/*
+ * The __sme_pa() and __sme_pa_nodebug() macros are meant for use when
+ * writing to or comparing values from the cr3 register.  Having the
+ * encryption mask set in cr3 enables the PGD entry to be encrypted and
+ * avoid special case handling of PGD allocations.
+ */
+#define __sme_pa(x)		(__pa(x) | sme_me_mask)
+#define __sme_pa_nodebug(x)	(__pa_nodebug(x) | sme_me_mask)
+
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x)		((unsigned long)(x) | sme_me_mask)
+#define __sme_clr(x)		((unsigned long)(x) & ~sme_me_mask)
+
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index cf8f619..b54dba7 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -15,6 +15,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/mem_encrypt.h>
+
 struct page;
 
 #include <linux/range.h>
@@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
 #ifndef __va
-#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#define __va(x)			((void *)(__sme_clr(x) + PAGE_OFFSET))
 #endif
 
 #define __boot_va(x)		__va(x)
diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
index 7bd0099..fead0a5 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -15,7 +15,7 @@
 #define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
 #define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
 
-#define __PHYSICAL_MASK		((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
+#define __PHYSICAL_MASK		((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
 #define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
 
 /* Cast *PAGE_MASK to a signed type so that it is sign-extended if
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 42b7193..1f9a2c4 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -3,6 +3,7 @@
 
 #include <asm/page.h>
 #include <asm/pgtable_types.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * Macro to mark a page protection value as UC-
@@ -13,6 +14,12 @@
 		     cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS)))	\
 	 : (prot))
 
+/*
+ * Macros to add or remove encryption attribute
+ */
+#define pgprot_encrypted(prot)	__pgprot(__sme_set(pgprot_val(prot)))
+#define pgprot_decrypted(prot)	__pgprot(__sme_clr(pgprot_val(prot)))
+
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
 
@@ -38,6 +45,8 @@
 
 extern struct mm_struct *pgd_page_get_mm(struct page *page);
 
+extern pmdval_t early_pmd_flags;
+
 #ifdef CONFIG_PARAVIRT
 #include <asm/paravirt.h>
 #else  /* !CONFIG_PARAVIRT */
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index bf9638e..d3ae99c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -2,7 +2,9 @@
 #define _ASM_X86_PGTABLE_DEFS_H
 
 #include <linux/const.h>
+
 #include <asm/page_types.h>
+#include <asm/mem_encrypt.h>
 
 #define FIRST_USER_ADDRESS	0UL
 
@@ -121,10 +123,10 @@
 
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
-#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
-			 _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
-			 _PAGE_DIRTY)
+#define _PAGE_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
+				 _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _KERNPG_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW |		\
+				 _PAGE_ACCESSED | _PAGE_DIRTY)
 
 /*
  * Set of bits not changed in pte_modify.  The pte's
@@ -191,18 +193,29 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
 
-#define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC		__pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX			__pgprot(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_NOCACHE		__pgprot(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_LARGE		__pgprot(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
-
-#define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#ifndef __ASSEMBLY__
+
+#define _PAGE_ENC	(_AT(pteval_t, sme_me_mask))
+
+#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
+			 _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC)
+#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
+			 _PAGE_DIRTY | _PAGE_ENC)
+
+#define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VSYSCALL	__pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR	__pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO		__pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE	__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+
+#endif	/* __ASSEMBLY__ */
 
 /*         xwr */
 #define __P000	PAGE_NONE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3cada99..61e055d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@
 #include <asm/nops.h>
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
+#include <asm/mem_encrypt.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -233,7 +234,7 @@ static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
 
 static inline void load_cr3(pgd_t *pgdir)
 {
-	write_cr3(__pa(pgdir));
+	write_cr3(__sme_pa(pgdir));
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 8e598a1..0955ec7 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -195,7 +195,7 @@ void init_espfix_ap(int cpu)
 
 	pte_p = pte_offset_kernel(&pmd, addr);
 	stack_page = page_address(alloc_pages_node(node, GFP_KERNEL, 0));
-	pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
+	pte = __pte(__pa(stack_page) | ((__PAGE_KERNEL_RO | _PAGE_ENC) & ptemask));
 	for (n = 0; n < ESPFIX_PTE_CLONES; n++)
 		set_pte(&pte_p[n*PTE_STRIDE], pte);
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 43b7002..9056cf9 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -29,6 +29,7 @@
 #include <asm/bootparam_utils.h>
 #include <asm/microcode.h>
 #include <asm/kasan.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * Manage page tables very early on.
@@ -43,7 +44,7 @@ static void __init reset_early_page_tables(void)
 {
 	memset(early_level4_pgt, 0, sizeof(pgd_t)*(PTRS_PER_PGD-1));
 	next_early_pgt = 0;
-	write_cr3(__pa_nodebug(early_level4_pgt));
+	write_cr3(__sme_pa_nodebug(early_level4_pgt));
 }
 
 /* Create a new PMD entry */
@@ -55,7 +56,7 @@ int __init early_make_pgtable(unsigned long address)
 	pmdval_t pmd, *pmd_p;
 
 	/* Invalid address or early pgt is done ?  */
-	if (physaddr >= MAXMEM || read_cr3() != __pa_nodebug(early_level4_pgt))
+	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
 		return -1;
 
 again:
@@ -158,6 +159,13 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 
 	clear_page(init_level4_pgt);
 
+	/*
+	 * SME support may update early_pmd_flags to include the memory
+	 * encryption mask, so it needs to be called before anything
+	 * that may generate a page fault.
+	 */
+	sme_early_init();
+
 	kasan_early_init();
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3115e21..abfe5ee 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -129,7 +129,7 @@ startup_64:
 	movq	%rdi, %rax
 	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+	leaq	(PAGE_SIZE + _KERNPG_TABLE_NOENC)(%rbx), %rdx
 	addq	%r12, %rdx
 	movq	%rdx, 0(%rbx,%rax,8)
 	movq	%rdx, 8(%rbx,%rax,8)
@@ -476,7 +476,7 @@ GLOBAL(name)
 	__INITDATA
 NEXT_PAGE(early_level4_pgt)
 	.fill	511,8,0
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(early_dynamic_pgts)
 	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -488,15 +488,15 @@ NEXT_PAGE(init_level4_pgt)
 	.fill	512,8,0
 #else
 NEXT_PAGE(init_level4_pgt)
-	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.org    init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.org    init_level4_pgt + L4_START_KERNEL*8, 0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.fill	511, 8, 0
 NEXT_PAGE(level2_ident_pgt)
 	/* Since I easily can, map the first 1G.
@@ -508,8 +508,8 @@ NEXT_PAGE(level2_ident_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
 	/*
@@ -527,7 +527,7 @@ NEXT_PAGE(level2_kernel_pgt)
 
 NEXT_PAGE(level2_fixmap_pgt)
 	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
 	.fill	5,8,0
 
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0c7d812..6f1837a 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -92,7 +92,7 @@ static int kasan_die_handler(struct notifier_block *self,
 void __init kasan_early_init(void)
 {
 	int i;
-	pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL;
+	pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL | _PAGE_ENC;
 	pmdval_t pmd_val = __pa_nodebug(kasan_zero_pte) | _KERNPG_TABLE;
 	pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;
 	p4dval_t p4d_val = __pa_nodebug(kasan_zero_pud) | _KERNPG_TABLE;
@@ -158,7 +158,7 @@ void __init kasan_init(void)
 	 */
 	memset(kasan_zero_page, 0, PAGE_SIZE);
 	for (i = 0; i < PTRS_PER_PTE; i++) {
-		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
+		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
 		set_pte(&kasan_zero_pte[i], pte);
 	}
 	/* Flush TLBs again to be sure that write protection applied. */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index cc00d8b..8ca93e5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,6 +15,8 @@
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
+#include <linux/mm.h>
+
 /*
  * Since SME related variables are set early in the boot process they must
  * reside in the .data section so as not to be zeroed out when the .bss
@@ -23,6 +25,22 @@
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+void __init sme_early_init(void)
+{
+	unsigned int i;
+
+	if (!sme_me_mask)
+		return;
+
+	early_pmd_flags = __sme_set(early_pmd_flags);
+
+	__supported_pte_mask = __sme_set(__supported_pte_mask);
+
+	/* Update the protection map with memory encryption mask */
+	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
+		protection_map[i] = pgprot_encrypted(protection_map[i]);
+}
+
 void __init sme_encrypt_kernel(void)
 {
 }
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 56b22fa..669fa48 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2013,6 +2013,9 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
 	if (!(page_flags & _PAGE_RW))
 		cpa.mask_clr = __pgprot(_PAGE_RW);
 
+	if (!(page_flags & _PAGE_ENC))
+		cpa.mask_clr = pgprot_encrypted(cpa.mask_clr);
+
 	cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);
 
 	retval = __change_page_attr_set_clr(&cpa, 0);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 7dfa767..882cb5d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -424,6 +424,14 @@ static inline int pud_same(pud_t pud_a, pud_t pud_b)
 #define pgprot_device pgprot_noncached
 #endif
 
+#ifndef pgprot_encrypted
+#define pgprot_encrypted(prot)	(prot)
+#endif
+
+#ifndef pgprot_decrypted
+#define pgprot_decrypted(prot)	(prot)
+#endif
+
 #ifndef pgprot_modify
 #define pgprot_modify pgprot_modify
 static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 10/32] x86/mm: Extend early_memremap() support with additional attrs
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (8 preceding siblings ...)
  2017-04-18 21:17 ` [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption Tom Lendacky
@ 2017-04-18 21:18 ` Tom Lendacky
  2017-04-18 21:18 ` [PATCH v5 11/32] x86/mm: Add support for early encrypt/decrypt of memory Tom Lendacky
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:18 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add early_memremap() support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/Kconfig                     |    4 +++
 arch/x86/include/asm/fixmap.h        |   13 ++++++++++
 arch/x86/include/asm/pgtable_types.h |    8 ++++++
 arch/x86/mm/ioremap.c                |   44 ++++++++++++++++++++++++++++++++++
 include/asm-generic/early_ioremap.h  |    2 ++
 mm/early_ioremap.c                   |   10 ++++++++
 6 files changed, 81 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cf0cbe8..6bc52d3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1429,6 +1429,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 	  If set to N, then the encryption of system memory can be
 	  activated with the mem_encrypt=on command line option.
 
+config ARCH_USE_MEMREMAP_PROT
+	def_bool y
+	depends on AMD_MEM_ENCRYPT
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d9ff226..dcd9fb5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -164,6 +164,19 @@ static inline void __set_fixmap(enum fixed_addresses idx,
  */
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
 
+/*
+ * Early memremap routines used for in-place encryption. The mappings created
+ * by these routines are intended to be used as temporary mappings.
+ */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+				      unsigned long size);
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+					 unsigned long size);
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+				      unsigned long size);
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+					 unsigned long size);
+
 #include <asm-generic/fixmap.h>
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index d3ae99c..ce8cb1c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {
 
 #define _PAGE_CACHE_MASK	(_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 #define _PAGE_NOCACHE		(cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP		(cachemode2protval(_PAGE_CACHE_MODE_WP))
 
 #define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP		(__PAGE_KERNEL | _PAGE_CACHE_WP)
 
 #define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
 			 _PAGE_DIRTY | _PAGE_ENC)
 
+#define __PAGE_KERNEL_ENC	(__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP	(__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_NOENC	(__PAGE_KERNEL)
+#define __PAGE_KERNEL_NOENC_WP	(__PAGE_KERNEL_WP)
+
 #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e4f7b25..9bfcb1f 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -419,6 +419,50 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+/* Remap memory with encryption */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+				      unsigned long size)
+{
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/*
+ * Remap memory with encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+					 unsigned long size)
+{
+	/* Be sure the write-protect PAT entry is set for write-protect */
+	if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+		return NULL;
+
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC_WP);
+}
+
+/* Remap memory without encryption */
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+				      unsigned long size)
+{
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC);
+}
+
+/*
+ * Remap memory without encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+					 unsigned long size)
+{
+	/* Be sure the write-protect PAT entry is set for write-protect */
+	if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+		return NULL;
+
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC_WP);
+}
+#endif	/* CONFIG_ARCH_USE_MEMREMAP_PROT */
+
 static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
 
 static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
diff --git a/include/asm-generic/early_ioremap.h b/include/asm-generic/early_ioremap.h
index 734ad4d..2edef8d 100644
--- a/include/asm-generic/early_ioremap.h
+++ b/include/asm-generic/early_ioremap.h
@@ -13,6 +13,8 @@ extern void *early_memremap(resource_size_t phys_addr,
 			    unsigned long size);
 extern void *early_memremap_ro(resource_size_t phys_addr,
 			       unsigned long size);
+extern void *early_memremap_prot(resource_size_t phys_addr,
+				 unsigned long size, unsigned long prot_val);
 extern void early_iounmap(void __iomem *addr, unsigned long size);
 extern void early_memunmap(void *addr, unsigned long size);
 
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index 6d5717b..d7d30da 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -226,6 +226,16 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
 }
 #endif
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+void __init *
+early_memremap_prot(resource_size_t phys_addr, unsigned long size,
+		    unsigned long prot_val)
+{
+	return (__force void *)__early_ioremap(phys_addr, size,
+					       __pgprot(prot_val));
+}
+#endif
+
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 
 void __init copy_from_early_mem(void *dest, phys_addr_t src, unsigned long size)

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 11/32] x86/mm: Add support for early encrypt/decrypt of memory
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (9 preceding siblings ...)
  2017-04-18 21:18 ` [PATCH v5 10/32] x86/mm: Extend early_memremap() support with additional attrs Tom Lendacky
@ 2017-04-18 21:18 ` Tom Lendacky
  2017-04-18 21:18 ` [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly Tom Lendacky
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:18 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or decrypted memory area is in the proper state (for example
the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   15 +++++++
 arch/x86/mm/mem_encrypt.c          |   76 ++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9fdbc53..4021203 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
 	return !!sme_me_mask;
 }
 
+void __init sme_early_encrypt(resource_size_t paddr,
+			      unsigned long size);
+void __init sme_early_decrypt(resource_size_t paddr,
+			      unsigned long size);
+
 void __init sme_early_init(void);
 
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
@@ -39,6 +44,16 @@ static inline bool sme_active(void)
 }
 #endif
 
+static inline void __init sme_early_encrypt(resource_size_t paddr,
+					    unsigned long size)
+{
+}
+
+static inline void __init sme_early_decrypt(resource_size_t paddr,
+					    unsigned long size)
+{
+}
+
 static inline void __init sme_early_init(void)
 {
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 8ca93e5..18c0887 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -17,6 +17,9 @@
 
 #include <linux/mm.h>
 
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
 /*
  * Since SME related variables are set early in the boot process they must
  * reside in the .data section so as not to be zeroed out when the .bss
@@ -25,6 +28,79 @@
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as either encrypted or decrypted but the contents
+ * are currently not in the desired state.
+ *
+ * This routine follows the steps outlined in the AMD64 Architecture
+ * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
+ */
+static void __init __sme_early_enc_dec(resource_size_t paddr,
+				       unsigned long size, bool enc)
+{
+	void *src, *dst;
+	size_t len;
+
+	if (!sme_me_mask)
+		return;
+
+	local_flush_tlb();
+	wbinvd();
+
+	/*
+	 * There are limited number of early mapping slots, so map (at most)
+	 * one page at time.
+	 */
+	while (size) {
+		len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+		/*
+		 * Create mappings for the current and desired format of
+		 * the memory. Use a write-protected mapping for the source.
+		 */
+		src = enc ? early_memremap_decrypted_wp(paddr, len) :
+			    early_memremap_encrypted_wp(paddr, len);
+
+		dst = enc ? early_memremap_encrypted(paddr, len) :
+			    early_memremap_decrypted(paddr, len);
+
+		/*
+		 * If a mapping can't be obtained to perform the operation,
+		 * then eventual access of that area in the desired mode
+		 * will cause a crash.
+		 */
+		BUG_ON(!src || !dst);
+
+		/*
+		 * Use a temporary buffer, of cache-line multiple size, to
+		 * avoid data corruption as documented in the APM.
+		 */
+		memcpy(sme_early_buffer, src, len);
+		memcpy(dst, sme_early_buffer, len);
+
+		early_memunmap(dst, len);
+		early_memunmap(src, len);
+
+		paddr += len;
+		size -= len;
+	}
+}
+
+void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
+{
+	__sme_early_enc_dec(paddr, size, true);
+}
+
+void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
+{
+	__sme_early_enc_dec(paddr, size, false);
+}
+
 void __init sme_early_init(void)
 {
 	unsigned int i;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (10 preceding siblings ...)
  2017-04-18 21:18 ` [PATCH v5 11/32] x86/mm: Add support for early encrypt/decrypt of memory Tom Lendacky
@ 2017-04-18 21:18 ` Tom Lendacky
  2017-05-04 10:16   ` Borislav Petkov
  2017-04-18 21:18 ` [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address Tom Lendacky
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:18 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process.  The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.

For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++++
 arch/x86/include/asm/pgtable.h     |    3 +
 arch/x86/kernel/head64.c           |   30 ++++++++++++--
 arch/x86/kernel/setup.c            |   10 +++++
 arch/x86/mm/mem_encrypt.c          |   77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 4021203..130d7fe 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,9 @@ void __init sme_early_encrypt(resource_size_t paddr,
 void __init sme_early_decrypt(resource_size_t paddr,
 			      unsigned long size);
 
+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_unmap_bootdata(char *real_mode_data);
+
 void __init sme_early_init(void);
 
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
@@ -54,6 +57,14 @@ static inline void __init sme_early_decrypt(resource_size_t paddr,
 {
 }
 
+static inline void __init sme_map_bootdata(char *real_mode_data)
+{
+}
+
+static inline void __init sme_unmap_bootdata(char *real_mode_data)
+{
+}
+
 static inline void __init sme_early_init(void)
 {
 }
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 1f9a2c4..1611bb5 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -23,6 +23,9 @@
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
 
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd);
+
 void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
 void ptdump_walk_pgd_level_checkwx(void);
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 9056cf9..e789e14 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -34,7 +34,6 @@
 /*
  * Manage page tables very early on.
  */
-extern pgd_t early_level4_pgt[PTRS_PER_PGD];
 extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD];
 static unsigned int __initdata next_early_pgt = 2;
 pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
@@ -48,12 +47,12 @@ static void __init reset_early_page_tables(void)
 }
 
 /* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
 	unsigned long physaddr = address - __PAGE_OFFSET;
 	pgdval_t pgd, *pgd_p;
 	pudval_t pud, *pud_p;
-	pmdval_t pmd, *pmd_p;
+	pmdval_t *pmd_p;
 
 	/* Invalid address or early pgt is done ?  */
 	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
@@ -95,12 +94,21 @@ int __init early_make_pgtable(unsigned long address)
 		memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
 		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
 	}
-	pmd = (physaddr & PMD_MASK) + early_pmd_flags;
 	pmd_p[pmd_index(address)] = pmd;
 
 	return 0;
 }
 
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	pmdval_t pmd;
+
+	pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+	return __early_make_pgtable(address, pmd);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
@@ -123,6 +131,12 @@ static void __init copy_bootdata(char *real_mode_data)
 	char * command_line;
 	unsigned long cmd_line_ptr;
 
+	/*
+	 * If SME is active, this will create decrypted mappings of the
+	 * boot data in advance of the copy operations.
+	 */
+	sme_map_bootdata(real_mode_data);
+
 	memcpy(&boot_params, real_mode_data, sizeof boot_params);
 	sanitize_boot_params(&boot_params);
 	cmd_line_ptr = get_cmd_line_ptr();
@@ -130,6 +144,14 @@ static void __init copy_bootdata(char *real_mode_data)
 		command_line = __va(cmd_line_ptr);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 	}
+
+	/*
+	 * The old boot data is no longer needed and won't be reserved,
+	 * freeing up that memory for use by the system. If SME is active,
+	 * we need to remove the mappings that were created so that the
+	 * memory doesn't remain mapped as decrypted.
+	 */
+	sme_unmap_bootdata(real_mode_data);
 }
 
 asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 603a166..a95800b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -115,6 +115,7 @@
 #include <asm/microcode.h>
 #include <asm/mmu_context.h>
 #include <asm/kaslr.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -374,6 +375,15 @@ static void __init reserve_initrd(void)
 	    !ramdisk_image || !ramdisk_size)
 		return;		/* No initrd provided by bootloader */
 
+	/*
+	 * If SME is active, this memory will be marked encrypted by the
+	 * kernel when it is accessed (including relocation). However, the
+	 * ramdisk image was loaded decrypted by the bootloader, so make
+	 * sure that it is encrypted before accessing it.
+	 */
+	if (sme_active())
+		sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+
 	initrd_start = 0;
 
 	mapped_size = memblock_mem_size(max_pfn_mapped);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 18c0887..2321f05 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,8 @@
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -101,6 +103,81 @@ void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
 	__sme_early_enc_dec(paddr, size, false);
 }
 
+static void __init sme_early_pgtable_flush(void)
+{
+	write_cr3(__sme_pa_nodebug(early_level4_pgt));
+}
+
+static void __init __sme_early_map_unmap_mem(void *vaddr, unsigned long size,
+					     bool map)
+{
+	unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+	pmdval_t pmd_flags, pmd;
+
+	/* Use early_pmd_flags but remove the encryption mask */
+	pmd_flags = __sme_clr(early_pmd_flags);
+
+	do {
+		pmd = map ? (paddr & PMD_MASK) + pmd_flags : 0;
+		__early_make_pgtable((unsigned long)vaddr, pmd);
+
+		vaddr += PMD_SIZE;
+		paddr += PMD_SIZE;
+		size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
+	} while (size);
+}
+
+static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
+{
+	struct boot_params *boot_data;
+	unsigned long cmdline_paddr;
+
+	__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
+	boot_data = (struct boot_params *)real_mode_data;
+
+	/*
+	 * Determine the command line address only after having established
+	 * the decrypted mapping.
+	 */
+	cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+			((u64)boot_data->ext_cmd_line_ptr << 32);
+
+	if (cmdline_paddr)
+		__sme_early_map_unmap_mem(__va(cmdline_paddr),
+					  COMMAND_LINE_SIZE, map);
+}
+
+void __init sme_unmap_bootdata(char *real_mode_data)
+{
+	/* If SME is not active, the bootdata is in the correct state */
+	if (!sme_active())
+		return;
+
+	/*
+	 * The bootdata and command line aren't needed anymore so clear
+	 * any mapping of them.
+	 */
+	__sme_map_unmap_bootdata(real_mode_data, false);
+
+	sme_early_pgtable_flush();
+}
+
+void __init sme_map_bootdata(char *real_mode_data)
+{
+	/* If SME is not active, the bootdata is in the correct state */
+	if (!sme_active())
+		return;
+
+	/*
+	 * The bootdata and command line will not be encrypted, so they
+	 * need to be mapped as decrypted memory so they can be copied
+	 * properly.
+	 */
+	__sme_map_unmap_bootdata(real_mode_data, true);
+
+	sme_early_pgtable_flush();
+}
+
 void __init sme_early_init(void)
 {
 	unsigned int i;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (11 preceding siblings ...)
  2017-04-18 21:18 ` [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly Tom Lendacky
@ 2017-04-18 21:18 ` Tom Lendacky
  2017-05-05 17:11   ` Borislav Petkov
  2017-04-18 21:18 ` [PATCH v5 14/32] efi: Add an EFI table address match function Tom Lendacky
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:18 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add a function that will return the E820 type associated with an address
range.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/e820/api.h |    2 ++
 arch/x86/kernel/e820.c          |   26 +++++++++++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 8e0f8b8..3641f5f 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -38,6 +38,8 @@
 extern void e820__reallocate_tables(void);
 extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
+extern int  e820__get_entry_type(u64 start, u64 end);
+
 /*
  * Returns true iff the specified range [start,end) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index d78a586..8d68666 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -84,7 +84,8 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
  * Note: this function only works correctly once the E820 table is sorted and
  * not-overlapping (at least for the range specified), which is the case normally.
  */
-bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+static struct e820_entry *__e820__mapped_all(u64 start, u64 end,
+					     enum e820_type type)
 {
 	int i;
 
@@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
 		 * coverage of the desired range exists:
 		 */
 		if (start >= end)
-			return 1;
+			return entry;
 	}
-	return 0;
+
+	return NULL;
+}
+
+/*
+ * This function checks if the entire range <start,end> is mapped with type.
+ */
+bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+{
+	return __e820__mapped_all(start, end, type) ? 1 : 0;
+}
+
+/*
+ * This function returns the type associated with the range <start,end>.
+ */
+int e820__get_entry_type(u64 start, u64 end)
+{
+	struct e820_entry *entry = __e820__mapped_all(start, end, 0);
+
+	return entry ? entry->type : -EINVAL;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 14/32] efi: Add an EFI table address match function
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (12 preceding siblings ...)
  2017-04-18 21:18 ` [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address Tom Lendacky
@ 2017-04-18 21:18 ` Tom Lendacky
  2017-05-15 18:09   ` Borislav Petkov
  2017-04-18 21:19 ` [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0 Tom Lendacky
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:18 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add a function that will determine if a supplied physical address matches
the address of an EFI table.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/firmware/efi/efi.c |   33 +++++++++++++++++++++++++++++++++
 include/linux/efi.h        |    7 +++++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad..8f606a3 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
 };
 EXPORT_SYMBOL(efi);
 
+static unsigned long *efi_tables[] = {
+	&efi.mps,
+	&efi.acpi,
+	&efi.acpi20,
+	&efi.smbios,
+	&efi.smbios3,
+	&efi.sal_systab,
+	&efi.boot_info,
+	&efi.hcdp,
+	&efi.uga,
+	&efi.uv_systab,
+	&efi.fw_vendor,
+	&efi.runtime,
+	&efi.config_table,
+	&efi.esrt,
+	&efi.properties_table,
+	&efi.mem_attr_table,
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -854,6 +873,20 @@ int efi_status_to_err(efi_status_t status)
 	return err;
 }
 
+bool efi_table_address_match(unsigned long phys_addr)
+{
+	unsigned int i;
+
+	if (phys_addr == EFI_INVALID_TABLE_ADDR)
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+		if (*(efi_tables[i]) == phys_addr)
+			return true;
+
+	return false;
+}
+
 #ifdef CONFIG_KEXEC
 static int update_efi_random_seed(struct notifier_block *nb,
 				  unsigned long code, void *unused)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index ec36f42..cd768a1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1079,6 +1079,8 @@ static inline bool efi_enabled(int feature)
 	return test_bit(feature, &efi.flags) != 0;
 }
 extern void efi_reboot(enum reboot_mode reboot_mode, const char *__unused);
+
+extern bool efi_table_address_match(unsigned long phys_addr);
 #else
 static inline bool efi_enabled(int feature)
 {
@@ -1092,6 +1094,11 @@ static inline bool efi_enabled(int feature)
 {
 	return false;
 }
+
+static inline bool efi_table_address_match(unsigned long phys_addr)
+{
+	return false;
+}
 #endif
 
 extern int efi_status_to_err(efi_status_t status);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (13 preceding siblings ...)
  2017-04-18 21:18 ` [PATCH v5 14/32] efi: Add an EFI table address match function Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-05-07 17:18   ` Borislav Petkov
  2017-04-18 21:19 ` [PATCH v5 16/32] x86/efi: Update EFI pagetable creation to work with SME Tom Lendacky
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

The efi_mem_type() function currently returns a 0, which maps to
EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
the supplied physical address. Returning EFI_RESERVED_TYPE implies that
a memmap entry exists, when it doesn't.  Instead of returning 0, change
the function to return a negative error value when no memmap entry is
found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/ia64/kernel/efi.c      |    4 ++--
 arch/x86/platform/efi/efi.c |    6 +++---
 include/linux/efi.h         |    2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index 1212956..8141600 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -757,14 +757,14 @@ static void __init handle_palo(unsigned long phys_addr)
 	return 0;
 }
 
-u32
+int
 efi_mem_type (unsigned long phys_addr)
 {
 	efi_memory_desc_t *md = efi_memory_descriptor(phys_addr);
 
 	if (md)
 		return md->type;
-	return 0;
+	return -EINVAL;
 }
 
 u64
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index a15cf81..f9b0b7a 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1032,12 +1032,12 @@ void __init efi_enter_virtual_mode(void)
 /*
  * Convenience functions to obtain memory types and attributes
  */
-u32 efi_mem_type(unsigned long phys_addr)
+int efi_mem_type(unsigned long phys_addr)
 {
 	efi_memory_desc_t *md;
 
 	if (!efi_enabled(EFI_MEMMAP))
-		return 0;
+		return -ENOTSUPP;
 
 	for_each_efi_memory_desc(md) {
 		if ((md->phys_addr <= phys_addr) &&
@@ -1045,7 +1045,7 @@ u32 efi_mem_type(unsigned long phys_addr)
 				  (md->num_pages << EFI_PAGE_SHIFT))))
 			return md->type;
 	}
-	return 0;
+	return -EINVAL;
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index cd768a1..a27bb3f 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -973,7 +973,7 @@ static inline void efi_esrt_init(void) { }
 extern int efi_config_parse_tables(void *config_tables, int count, int sz,
 				   efi_config_table_type_t *arch_tables);
 extern u64 efi_get_iobase (void);
-extern u32 efi_mem_type (unsigned long phys_addr);
+extern int efi_mem_type (unsigned long phys_addr);
 extern u64 efi_mem_attributes (unsigned long phys_addr);
 extern u64 efi_mem_attribute (unsigned long phys_addr, unsigned long size);
 extern int __init efi_uart_console_only (void);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 16/32] x86/efi: Update EFI pagetable creation to work with SME
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (14 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0 Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-04-18 21:19 ` [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear Tom Lendacky
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

When SME is active, pagetable entries created for EFI need to have the
encryption mask set as necessary.

When the new pagetable pages are allocated they are mapped encrypted. So,
update the efi_pgt value that will be used in cr3 to include the encryption
mask so that the PGD table can be read successfully. The pagetable mapping
as well as the kernel are also added to the pagetable mapping as encrypted.
All other EFI mappings are mapped decrypted (tables, etc.).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/platform/efi/efi_64.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c488625..685881a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -264,7 +264,7 @@ void efi_sync_low_kernel_mappings(void)
 
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
-	unsigned long pfn, text;
+	unsigned long pfn, text, pf;
 	struct page *page;
 	unsigned npages;
 	pgd_t *pgd;
@@ -272,7 +272,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	if (efi_enabled(EFI_OLD_MEMMAP))
 		return 0;
 
-	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+	/*
+	 * Since the PGD is encrypted, set the encryption mask so that when
+	 * this value is loaded into cr3 the PGD will be decrypted during
+	 * the pagetable walk.
+	 */
+	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
 	pgd = efi_pgd;
 
 	/*
@@ -282,7 +287,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	 * phys_efi_set_virtual_address_map().
 	 */
 	pfn = pa_memmap >> PAGE_SHIFT;
-	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+	pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
+	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
 		pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
 		return 1;
 	}
@@ -325,7 +331,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	text = __pa(_text);
 	pfn = text >> PAGE_SHIFT;
 
-	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+	pf = _PAGE_RW | _PAGE_ENC;
+	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
 		pr_err("Failed to map kernel text 1:1\n");
 		return 1;
 	}

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (15 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 16/32] x86/efi: Update EFI pagetable creation to work with SME Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-05-15 18:35   ` Borislav Petkov
  2017-04-18 21:19 ` [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data Tom Lendacky
                   ` (14 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Boot data (such as EFI related data) is not encrypted when the system is
booted because UEFI/BIOS does not run with SME active. In order to access
this data properly it needs to be mapped decrypted.

The early_memremap() support is updated to provide an arch specific
routine to modify the pagetable protection attributes before they are
applied to the new mapping. This is used to remove the encryption mask
for boot related data.

The memremap() support is updated to provide an arch specific routine
to determine if RAM remapping is allowed.  RAM remapping will cause an
encrypted mapping to be generated. By preventing RAM remapping,
ioremap_cache() will be used instead, which will provide a decrypted
mapping of the boot related data.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/io.h |    4 +
 arch/x86/mm/ioremap.c     |  182 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/io.h        |    2 
 kernel/memremap.c         |   20 ++++-
 mm/early_ioremap.c        |   18 ++++
 5 files changed, 219 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7afb0e2..75f2858 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -381,4 +381,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size,
+				       unsigned long flags);
+#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 9bfcb1f..bce0604 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/mmiotrace.h>
+#include <linux/efi.h>
 
 #include <asm/cacheflush.h>
 #include <asm/e820/api.h>
@@ -21,6 +22,7 @@
 #include <asm/tlbflush.h>
 #include <asm/pgalloc.h>
 #include <asm/pat.h>
+#include <asm/setup.h>
 
 #include "physaddr.h"
 
@@ -419,6 +421,186 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+/*
+ * Examine the physical address to determine if it is an area of memory
+ * that should be mapped decrypted.  If the memory is not part of the
+ * kernel usable area it was accessed and created decrypted, so these
+ * areas should be mapped decrypted.
+ */
+static bool memremap_should_map_decrypted(resource_size_t phys_addr,
+					  unsigned long size)
+{
+	/* Check if the address is outside kernel usable area */
+	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
+	case E820_TYPE_RESERVED:
+	case E820_TYPE_ACPI:
+	case E820_TYPE_NVS:
+	case E820_TYPE_UNUSABLE:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+/*
+ * Examine the physical address to determine if it is EFI data. Check
+ * it against the boot params structure and EFI tables and memory types.
+ */
+static bool memremap_is_efi_data(resource_size_t phys_addr,
+				 unsigned long size)
+{
+	u64 paddr;
+
+	/* Check if the address is part of EFI boot/runtime data */
+	if (efi_enabled(EFI_BOOT)) {
+		paddr = boot_params.efi_info.efi_memmap_hi;
+		paddr <<= 32;
+		paddr |= boot_params.efi_info.efi_memmap;
+		if (phys_addr == paddr)
+			return true;
+
+		paddr = boot_params.efi_info.efi_systab_hi;
+		paddr <<= 32;
+		paddr |= boot_params.efi_info.efi_systab;
+		if (phys_addr == paddr)
+			return true;
+
+		if (efi_table_address_match(phys_addr))
+			return true;
+
+		switch (efi_mem_type(phys_addr)) {
+		case EFI_BOOT_SERVICES_DATA:
+		case EFI_RUNTIME_SERVICES_DATA:
+			return true;
+		default:
+			break;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain.
+ */
+static bool memremap_is_setup_data(resource_size_t phys_addr,
+				   unsigned long size)
+{
+	struct setup_data *data;
+	u64 paddr, paddr_next;
+
+	paddr = boot_params.hdr.setup_data;
+	while (paddr) {
+		bool is_setup_data = false;
+
+		if (phys_addr == paddr)
+			return true;
+
+		data = memremap(paddr, sizeof(*data),
+				MEMREMAP_WB | MEMREMAP_DEC);
+
+		paddr_next = data->next;
+
+		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
+			is_setup_data = true;
+
+		memunmap(data);
+
+		if (is_setup_data)
+			return true;
+
+		paddr = paddr_next;
+	}
+
+	return false;
+}
+
+/*
+ * Examine the physical address to determine if it is boot data by checking
+ * it against the boot params setup_data chain (early boot version).
+ */
+static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
+						unsigned long size)
+{
+	struct setup_data *data;
+	u64 paddr, paddr_next;
+
+	paddr = boot_params.hdr.setup_data;
+	while (paddr) {
+		bool is_setup_data = false;
+
+		if (phys_addr == paddr)
+			return true;
+
+		data = early_memremap_decrypted(paddr, sizeof(*data));
+
+		paddr_next = data->next;
+
+		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
+			is_setup_data = true;
+
+		early_memunmap(data, sizeof(*data));
+
+		if (is_setup_data)
+			return true;
+
+		paddr = paddr_next;
+	}
+
+	return false;
+}
+
+/*
+ * Architecture function to determine if RAM remap is allowed. By default, a
+ * RAM remap will map the data as encrypted. Determine if a RAM remap should
+ * not be done so that the data will be mapped decrypted.
+ */
+bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size,
+				unsigned long flags)
+{
+	if (!sme_active())
+		return true;
+
+	if (flags & MEMREMAP_ENC)
+		return true;
+
+	if (flags & MEMREMAP_DEC)
+		return false;
+
+	if (memremap_is_setup_data(phys_addr, size) ||
+	    memremap_is_efi_data(phys_addr, size) ||
+	    memremap_should_map_decrypted(phys_addr, size))
+		return false;
+
+	return true;
+}
+
+/*
+ * Architecture override of __weak function to adjust the protection attributes
+ * used when remapping memory. By default, early_memremp() will map the data
+ * as encrypted. Determine if an encrypted mapping should not be done and set
+ * the appropriate protection attributes.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+					     unsigned long size,
+					     pgprot_t prot)
+{
+	if (!sme_active())
+		return prot;
+
+	if (early_memremap_is_setup_data(phys_addr, size) ||
+	    memremap_is_efi_data(phys_addr, size) ||
+	    memremap_should_map_decrypted(phys_addr, size))
+		prot = pgprot_decrypted(prot);
+	else
+		prot = pgprot_encrypted(prot);
+
+	return prot;
+}
+
 #ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
diff --git a/include/linux/io.h b/include/linux/io.h
index 82ef36e..deaeb1d 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -136,6 +136,8 @@ enum {
 	MEMREMAP_WB = 1 << 0,
 	MEMREMAP_WT = 1 << 1,
 	MEMREMAP_WC = 1 << 2,
+	MEMREMAP_ENC = 1 << 3,
+	MEMREMAP_DEC = 1 << 4,
 };
 
 void *memremap(resource_size_t offset, size_t size, unsigned long flags);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 07e85e5..2361bf7 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,13 +34,24 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
 }
 #endif
 
-static void *try_ram_remap(resource_size_t offset, size_t size)
+#ifndef arch_memremap_do_ram_remap
+static bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size,
+				       unsigned long flags)
+{
+	return true;
+}
+#endif
+
+static void *try_ram_remap(resource_size_t offset, size_t size,
+			   unsigned long flags)
 {
 	unsigned long pfn = PHYS_PFN(offset);
 
 	/* In the simple case just return the existing linear address */
-	if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+	if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+	    arch_memremap_do_ram_remap(offset, size, flags))
 		return __va(offset);
+
 	return NULL; /* fallback to arch_memremap_wb */
 }
 
@@ -48,7 +59,8 @@ static void *try_ram_remap(resource_size_t offset, size_t size)
  * memremap() - remap an iomem_resource as cacheable memory
  * @offset: iomem resource start address
  * @size: size of remap
- * @flags: any of MEMREMAP_WB, MEMREMAP_WT and MEMREMAP_WC
+ * @flags: any of MEMREMAP_WB, MEMREMAP_WT, MEMREMAP_WC,
+ *		  MEMREMAP_ENC, MEMREMAP_DEC
  *
  * memremap() is "ioremap" for cases where it is known that the resource
  * being mapped does not have i/o side effects and the __iomem
@@ -95,7 +107,7 @@ void *memremap(resource_size_t offset, size_t size, unsigned long flags)
 		 * the requested range is potentially in System RAM.
 		 */
 		if (is_ram == REGION_INTERSECTS)
-			addr = try_ram_remap(offset, size);
+			addr = try_ram_remap(offset, size, flags);
 		if (!addr)
 			addr = arch_memremap_wb(offset, size);
 	}
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d7d30da..b1dd4a9 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ static int __init early_ioremap_debug_setup(char *str)
 
 static int after_paging_init __initdata;
 
+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+						    unsigned long size,
+						    pgprot_t prot)
+{
+	return prot;
+}
+
 void __init __weak early_ioremap_shutdown(void)
 {
 }
@@ -215,14 +222,19 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
 void __init *
 early_memremap(resource_size_t phys_addr, unsigned long size)
 {
-	return (__force void *)__early_ioremap(phys_addr, size,
-					       FIXMAP_PAGE_NORMAL);
+	pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+						     FIXMAP_PAGE_NORMAL);
+
+	return (__force void *)__early_ioremap(phys_addr, size, prot);
 }
 #ifdef FIXMAP_PAGE_RO
 void __init *
 early_memremap_ro(resource_size_t phys_addr, unsigned long size)
 {
-	return (__force void *)__early_ioremap(phys_addr, size, FIXMAP_PAGE_RO);
+	pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+						     FIXMAP_PAGE_RO);
+
+	return (__force void *)__early_ioremap(phys_addr, size, prot);
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (16 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-05-16  8:36   ` Borislav Petkov
  2017-04-18 21:19 ` [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear Tom Lendacky
                   ` (13 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

The SMP MP-table is built by UEFI and placed in memory in a decrypted
state. These tables are accessed using a mix of early_memremap(),
early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
to use early_memremap()/early_memunmap(). This allows for proper setting
of the encryption mask so that the data can be successfully accessed when
SME is active.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/mpparse.c |  102 +++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index fd37f39..afbda41d 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -429,7 +429,21 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
 	}
 }
 
-static struct mpf_intel *mpf_found;
+static unsigned long mpf_base;
+
+static void __init unmap_mpf(struct mpf_intel *mpf)
+{
+	early_memunmap(mpf, sizeof(*mpf));
+}
+
+static struct mpf_intel * __init map_mpf(unsigned long paddr)
+{
+	struct mpf_intel *mpf;
+
+	mpf = early_memremap(paddr, sizeof(*mpf));
+
+	return mpf;
+}
 
 static unsigned long __init get_mpc_size(unsigned long physptr)
 {
@@ -444,13 +458,21 @@ static unsigned long __init get_mpc_size(unsigned long physptr)
 	return size;
 }
 
+static void __init unmap_mpc(struct mpc_table *mpc)
+{
+	early_memunmap(mpc, mpc->length);
+}
+
+static struct mpc_table * __init map_mpc(unsigned long paddr)
+{
+	return early_memremap(paddr, get_mpc_size(paddr));
+}
+
 static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
 {
 	struct mpc_table *mpc;
-	unsigned long size;
 
-	size = get_mpc_size(mpf->physptr);
-	mpc = early_memremap(mpf->physptr, size);
+	mpc = map_mpc(mpf->physptr);
 	/*
 	 * Read the physical hardware table.  Anything here will
 	 * override the defaults.
@@ -461,10 +483,10 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
 #endif
 		pr_err("BIOS bug, MP table errors detected!...\n");
 		pr_cont("... disabling SMP support. (tell your hw vendor)\n");
-		early_memunmap(mpc, size);
+		unmap_mpc(mpc);
 		return -1;
 	}
-	early_memunmap(mpc, size);
+	unmap_mpc(mpc);
 
 	if (early)
 		return -1;
@@ -497,12 +519,12 @@ static int __init check_physptr(struct mpf_intel *mpf, unsigned int early)
  */
 void __init default_get_smp_config(unsigned int early)
 {
-	struct mpf_intel *mpf = mpf_found;
+	struct mpf_intel *mpf;
 
 	if (!smp_found_config)
 		return;
 
-	if (!mpf)
+	if (!mpf_base)
 		return;
 
 	if (acpi_lapic && early)
@@ -515,6 +537,8 @@ void __init default_get_smp_config(unsigned int early)
 	if (acpi_lapic && acpi_ioapic)
 		return;
 
+	mpf = map_mpf(mpf_base);
+
 	pr_info("Intel MultiProcessor Specification v1.%d\n",
 		mpf->specification);
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86_32)
@@ -542,8 +566,10 @@ void __init default_get_smp_config(unsigned int early)
 		construct_default_ISA_mptable(mpf->feature1);
 
 	} else if (mpf->physptr) {
-		if (check_physptr(mpf, early))
+		if (check_physptr(mpf, early)) {
+			unmap_mpf(mpf);
 			return;
+		}
 	} else
 		BUG();
 
@@ -552,6 +578,8 @@ void __init default_get_smp_config(unsigned int early)
 	/*
 	 * Only use the first configuration found.
 	 */
+
+	unmap_mpf(mpf);
 }
 
 static void __init smp_reserve_memory(struct mpf_intel *mpf)
@@ -561,15 +589,16 @@ static void __init smp_reserve_memory(struct mpf_intel *mpf)
 
 static int __init smp_scan_config(unsigned long base, unsigned long length)
 {
-	unsigned int *bp = phys_to_virt(base);
+	unsigned int *bp;
 	struct mpf_intel *mpf;
-	unsigned long mem;
+	int ret = 0;
 
 	apic_printk(APIC_VERBOSE, "Scan for SMP in [mem %#010lx-%#010lx]\n",
 		    base, base + length - 1);
 	BUILD_BUG_ON(sizeof(*mpf) != 16);
 
 	while (length > 0) {
+		bp = early_memremap(base, length);
 		mpf = (struct mpf_intel *)bp;
 		if ((*bp == SMP_MAGIC_IDENT) &&
 		    (mpf->length == 1) &&
@@ -579,24 +608,26 @@ static int __init smp_scan_config(unsigned long base, unsigned long length)
 #ifdef CONFIG_X86_LOCAL_APIC
 			smp_found_config = 1;
 #endif
-			mpf_found = mpf;
+			mpf_base = base;
 
-			pr_info("found SMP MP-table at [mem %#010llx-%#010llx] mapped at [%p]\n",
-				(unsigned long long) virt_to_phys(mpf),
-				(unsigned long long) virt_to_phys(mpf) +
-				sizeof(*mpf) - 1, mpf);
+			pr_info("found SMP MP-table at [mem %#010lx-%#010lx] mapped at [%p]\n",
+				base, base + sizeof(*mpf) - 1, mpf);
 
-			mem = virt_to_phys(mpf);
-			memblock_reserve(mem, sizeof(*mpf));
+			memblock_reserve(base, sizeof(*mpf));
 			if (mpf->physptr)
 				smp_reserve_memory(mpf);
 
-			return 1;
+			ret = 1;
 		}
-		bp += 4;
+		early_memunmap(bp, length);
+
+		if (ret)
+			break;
+
+		base += 16;
 		length -= 16;
 	}
-	return 0;
+	return ret;
 }
 
 void __init default_find_smp_config(void)
@@ -842,25 +873,26 @@ static int __init update_mp_table(void)
 	if (!enable_update_mptable)
 		return 0;
 
-	mpf = mpf_found;
-	if (!mpf)
+	if (!mpf_base)
 		return 0;
 
+	mpf = map_mpf(mpf_base);
+
 	/*
 	 * Now see if we need to go further.
 	 */
 	if (mpf->feature1 != 0)
-		return 0;
+		goto do_unmap_mpf;
 
 	if (!mpf->physptr)
-		return 0;
+		goto do_unmap_mpf;
 
-	mpc = phys_to_virt(mpf->physptr);
+	mpc = map_mpc(mpf->physptr);
 
 	if (!smp_check_mpc(mpc, oem, str))
-		return 0;
+		goto do_unmap_mpc;
 
-	pr_info("mpf: %llx\n", (u64)virt_to_phys(mpf));
+	pr_info("mpf: %llx\n", (u64)mpf_base);
 	pr_info("physptr: %x\n", mpf->physptr);
 
 	if (mpc_new_phys && mpc->length > mpc_new_length) {
@@ -878,21 +910,23 @@ static int __init update_mp_table(void)
 		new = mpf_checksum((unsigned char *)mpc, mpc->length);
 		if (old == new) {
 			pr_info("mpc is readonly, please try alloc_mptable instead\n");
-			return 0;
+			goto do_unmap_mpc;
 		}
 		pr_info("use in-position replacing\n");
 	} else {
 		mpf->physptr = mpc_new_phys;
-		mpc_new = phys_to_virt(mpc_new_phys);
+		mpc_new = map_mpc(mpc_new_phys);
 		memcpy(mpc_new, mpc, mpc->length);
+		unmap_mpc(mpc);
 		mpc = mpc_new;
 		/* check if we can modify that */
 		if (mpc_new_phys - mpf->physptr) {
 			struct mpf_intel *mpf_new;
 			/* steal 16 bytes from [0, 1k) */
 			pr_info("mpf new: %x\n", 0x400 - 16);
-			mpf_new = phys_to_virt(0x400 - 16);
+			mpf_new = map_mpf(0x400 - 16);
 			memcpy(mpf_new, mpf, 16);
+			unmap_mpf(mpf);
 			mpf = mpf_new;
 			mpf->physptr = mpc_new_phys;
 		}
@@ -909,6 +943,12 @@ static int __init update_mp_table(void)
 	 */
 	replace_intsrc_all(mpc, mpc_new_phys, mpc_new_length);
 
+do_unmap_mpc:
+	unmap_mpc(mpc);
+
+do_unmap_mpf:
+	unmap_mpf(mpf);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (17 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-05-16 14:04   ` Borislav Petkov
  2017-04-18 21:19 ` [PATCH v5 20/32] x86/mm: Add support for changing the memory encryption attribute Tom Lendacky
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/mm/ioremap.c |   31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index bce0604..55317ba 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -425,17 +425,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
  * kernel usable area it was accessed and created decrypted, so these
- * areas should be mapped decrypted.
+ * areas should be mapped decrypted. And since the encryption key can
+ * change across reboots, persistent memory should also be mapped
+ * decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
 					  unsigned long size)
 {
+	int is_pmem;
+
+	/*
+	 * Check if the address is part of a persistent memory region.
+	 * This check covers areas added by E820, EFI and ACPI.
+	 */
+	is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
+				    IORES_DESC_PERSISTENT_MEMORY);
+	if (is_pmem != REGION_DISJOINT)
+		return true;
+
+	/*
+	 * Check if the non-volatile attribute is set for an EFI
+	 * reserved area.
+	 */
+	if (efi_enabled(EFI_BOOT)) {
+		switch (efi_mem_type(phys_addr)) {
+		case EFI_RESERVED_TYPE:
+			if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
+				return true;
+			break;
+		default:
+			break;
+		}
+	}
+
 	/* Check if the address is outside kernel usable area */
 	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
 	case E820_TYPE_RESERVED:
 	case E820_TYPE_ACPI:
 	case E820_TYPE_NVS:
 	case E820_TYPE_UNUSABLE:
+	case E820_TYPE_PRAM:
 		return true;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 20/32] x86/mm: Add support for changing the memory encryption attribute
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (18 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-04-18 21:19 ` [PATCH v5 21/32] x86, realmode: Decrypt trampoline area if memory encryption is active Tom Lendacky
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support for changing the memory encryption attribute for one or more
memory pages. This will be useful when we have to change the AP trampoline
area to not be encrypted. Or when we need to change the SWIOTLB area to
not be encrypted in support of devices that can't support the encryption
mask range.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cacheflush.h |    3 ++
 arch/x86/mm/pageattr.c            |   62 +++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index e7e1942..e064f70 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -12,6 +12,7 @@
  * Executability : eXeutable, NoteXecutable
  * Read/Write    : ReadOnly, ReadWrite
  * Presence      : NotPresent
+ * Encryption    : Encrypted, Decrypted
  *
  * Within a category, the attributes are mutually exclusive.
  *
@@ -47,6 +48,8 @@
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 669fa48..0a850b1 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1768,6 +1768,68 @@ int set_memory_4k(unsigned long addr, int numpages)
 					__pgprot(0), 1, 0, NULL);
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+	struct cpa_data cpa;
+	unsigned long start;
+	int ret;
+
+	/* Nothing to do if the SME is not active */
+	if (!sme_active())
+		return 0;
+
+	/* Should not be working on unaligned addresses */
+	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
+		addr &= PAGE_MASK;
+
+	start = addr;
+
+	memset(&cpa, 0, sizeof(cpa));
+	cpa.vaddr = &addr;
+	cpa.numpages = numpages;
+	cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
+	cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+	cpa.pgd = init_mm.pgd;
+
+	/* Must avoid aliasing mappings in the highmem code */
+	kmap_flush_unused();
+	vm_unmap_aliases();
+
+	/*
+	 * Before changing the encryption attribute, we need to flush caches.
+	 */
+	if (static_cpu_has(X86_FEATURE_CLFLUSH))
+		cpa_flush_range(start, numpages, 1);
+	else
+		cpa_flush_all(1);
+
+	ret = __change_page_attr_set_clr(&cpa, 1);
+
+	/*
+	 * After changing the encryption attribute, we need to flush TLBs
+	 * again in case any speculative TLB caching occurred (but no need
+	 * to flush caches again).  We could just use cpa_flush_all(), but
+	 * in case TLB flushing gets optimized in the cpa_flush_range()
+	 * path use the same logic as above.
+	 */
+	if (static_cpu_has(X86_FEATURE_CLFLUSH))
+		cpa_flush_range(start, numpages, 0);
+	else
+		cpa_flush_all(0);
+
+	return ret;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_enc_dec(addr, numpages, true);
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_enc_dec(addr, numpages, false);
+}
+
 int set_pages_uc(struct page *page, int numpages)
 {
 	unsigned long addr = (unsigned long)page_address(page);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 21/32] x86, realmode: Decrypt trampoline area if memory encryption is active
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (19 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 20/32] x86/mm: Add support for changing the memory encryption attribute Tom Lendacky
@ 2017-04-18 21:19 ` Tom Lendacky
  2017-04-18 21:20 ` [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption Tom Lendacky
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:19 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/realmode/init.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 5db706f1..21d7506 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,8 @@
 #include <asm/pgtable.h>
 #include <asm/realmode.h>
 #include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>
+#include <asm/cacheflush.h>
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -130,6 +132,16 @@ static void __init set_real_mode_permissions(void)
 	unsigned long text_start =
 		(unsigned long) __va(real_mode_header->text_start);
 
+	/*
+	 * If SME is active, the trampoline area will need to be in
+	 * decrypted memory in order to bring up other processors
+	 * successfully.
+	 */
+	if (sme_active()) {
+		sme_early_decrypt(__pa(base), size);
+		set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+	}
+
 	set_memory_nx((unsigned long) base, size >> PAGE_SHIFT);
 	set_memory_ro((unsigned long) base, ro_size >> PAGE_SHIFT);
 	set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (20 preceding siblings ...)
  2017-04-18 21:19 ` [PATCH v5 21/32] x86, realmode: Decrypt trampoline area if memory encryption is active Tom Lendacky
@ 2017-04-18 21:20 ` Tom Lendacky
  2017-05-16 14:27   ` Borislav Petkov
  2017-04-18 21:20 ` [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:20 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/dma-mapping.h |    5 ++-
 arch/x86/include/asm/mem_encrypt.h |    5 +++
 arch/x86/kernel/pci-dma.c          |   11 +++++--
 arch/x86/kernel/pci-nommu.c        |    2 +
 arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
 arch/x86/mm/mem_encrypt.c          |   22 ++++++++++++++
 include/linux/mem_encrypt.h        |   10 ++++++
 include/linux/swiotlb.h            |    1 +
 init/main.c                        |   13 ++++++++
 lib/swiotlb.c                      |   56 +++++++++++++++++++++++++++++++-----
 10 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 08a0838..d75430a 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <linux/dma-contiguous.h>
+#include <asm/mem_encrypt.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -62,12 +63,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-	return paddr;
+	return __sme_set(paddr);
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-	return daddr;
+	return __sme_clr(daddr);
 }
 #endif /* CONFIG_X86_DMA_REMAP */
 
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 130d7fe..0637b4b 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -36,6 +36,11 @@ void __init sme_early_decrypt(resource_size_t paddr,
 
 void __init sme_early_init(void);
 
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 #ifndef sme_me_mask
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec..72d96d4 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -93,9 +93,12 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 	if (gfpflags_allow_blocking(flag)) {
 		page = dma_alloc_from_contiguous(dev, count, get_order(size),
 						 flag);
-		if (page && page_to_phys(page) + size > dma_mask) {
-			dma_release_from_contiguous(dev, page, count);
-			page = NULL;
+		if (page) {
+			addr = phys_to_dma(dev, page_to_phys(page));
+			if (addr + size > dma_mask) {
+				dma_release_from_contiguous(dev, page, count);
+				page = NULL;
+			}
 		}
 	}
 	/* fallback */
@@ -104,7 +107,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 	if (!page)
 		return NULL;
 
-	addr = page_to_phys(page);
+	addr = phys_to_dma(dev, page_to_phys(page));
 	if (addr + size > dma_mask) {
 		__free_pages(page, get_order(size));
 
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952e..98b576a 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
 				 enum dma_data_direction dir,
 				 unsigned long attrs)
 {
-	dma_addr_t bus = page_to_phys(page) + offset;
+	dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
 	WARN_ON(size == 0);
 	if (!check_addr("map_single", dev, bus, size))
 		return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 1e23577..a75fee7 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
 #include <asm/dma.h>
 #include <asm/xen/swiotlb-xen.h>
 #include <asm/iommu_table.h>
+#include <asm/mem_encrypt.h>
+
 int swiotlb __read_mostly;
 
 void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,11 +66,13 @@ void x86_swiotlb_free_coherent(struct device *dev, size_t size,
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
  *
  * This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	if (swiotlb_force == SWIOTLB_FORCE)
+	if ((swiotlb_force == SWIOTLB_FORCE) || sme_active())
 		swiotlb = 1;
 
 	return swiotlb;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 2321f05..30b07a3 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,11 +16,14 @@
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 #include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
 #include <asm/setup.h>
 #include <asm/bootparam.h>
+#include <asm/cacheflush.h>
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -194,6 +197,25 @@ void __init sme_early_init(void)
 		protection_map[i] = pgprot_encrypted(protection_map[i]);
 }
 
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+	if (!sme_me_mask)
+		return;
+
+	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
+	swiotlb_update_mem_attributes();
+}
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+	WARN(PAGE_ALIGN(size) != size,
+	     "size is not page-aligned (%#lx)\n", size);
+
+	/* Make the SWIOTLB buffer area decrypted */
+	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+}
+
 void __init sme_encrypt_kernel(void)
 {
 }
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 14a7b9f..3c384d1 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -32,6 +32,16 @@ static inline bool sme_active(void)
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
+#ifndef __sme_set
+/*
+ * The __sme_set() and __sme_clr() macros are useful for adding or removing
+ * the encryption mask from a value (e.g. when dealing with pagetable
+ * entries).
+ */
+#define __sme_set(x)		((unsigned long)(x) | sme_me_mask)
+#define __sme_clr(x)		((unsigned long)(x) & ~sme_me_mask)
+#endif
+
 #endif	/* __ASSEMBLY__ */
 
 #endif	/* __MEM_ENCRYPT_H__ */
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4ee479f..15e7160 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -35,6 +35,7 @@ enum swiotlb_force {
 extern unsigned long swiotlb_nr_tbl(void);
 unsigned long swiotlb_size_or_default(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
+extern void __init swiotlb_update_mem_attributes(void);
 
 /*
  * Enumeration for sync targets
diff --git a/init/main.c b/init/main.c
index b0c11cb..e5b4fb7 100644
--- a/init/main.c
+++ b/init/main.c
@@ -467,6 +467,10 @@ void __init __weak thread_stack_cache_init(void)
 }
 #endif
 
+void __init __weak mem_encrypt_init(void)
+{
+}
+
 /*
  * Set up kernel memory allocators
  */
@@ -614,6 +618,15 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	locking_selftest();
 
+	/*
+	 * This needs to be called before any devices perform DMA
+	 * operations that might use the SWIOTLB bounce buffers.
+	 * This call will mark the bounce buffers as decrypted so
+	 * that their usage will not cause "plain-text" data to be
+	 * decrypted when accessed.
+	 */
+	mem_encrypt_init();
+
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index a8d74a7..74d6557 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/scatterlist.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/io.h>
 #include <asm/dma.h>
@@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
 	return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
+void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+}
+
+/* For swiotlb, clear memory encryption mask from dma addresses */
+static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
+				      phys_addr_t address)
+{
+	return __sme_clr(phys_to_dma(hwdev, address));
+}
+
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
 				      volatile void *address)
@@ -183,6 +195,31 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
+/*
+ * Early SWIOTLB allocation may be too early to allow an architecture to
+ * perform the desired operations.  This function allows the architecture to
+ * call SWIOTLB when the operations are possible.  It needs to be called
+ * before the SWIOTLB memory is used.
+ */
+void __init swiotlb_update_mem_attributes(void)
+{
+	void *vaddr;
+	unsigned long bytes;
+
+	if (no_iotlb_memory || late_alloc)
+		return;
+
+	vaddr = phys_to_virt(io_tlb_start);
+	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_mem_attributes(vaddr, bytes);
+	memset(vaddr, 0, bytes);
+
+	vaddr = phys_to_virt(io_tlb_overflow_buffer);
+	bytes = PAGE_ALIGN(io_tlb_overflow);
+	swiotlb_set_mem_attributes(vaddr, bytes);
+	memset(vaddr, 0, bytes);
+}
+
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	void *v_overflow_buffer;
@@ -320,6 +357,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	io_tlb_start = virt_to_phys(tlb);
 	io_tlb_end = io_tlb_start + bytes;
 
+	swiotlb_set_mem_attributes(tlb, bytes);
 	memset(tlb, 0, bytes);
 
 	/*
@@ -330,6 +368,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	if (!v_overflow_buffer)
 		goto cleanup2;
 
+	swiotlb_set_mem_attributes(v_overflow_buffer, io_tlb_overflow);
+	memset(v_overflow_buffer, 0, io_tlb_overflow);
 	io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
 
 	/*
@@ -581,7 +621,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 		return SWIOTLB_MAP_ERROR;
 	}
 
-	start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+	start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
 	return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size,
 				      dir, attrs);
 }
@@ -702,7 +742,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			goto err_warn;
 
 		ret = phys_to_virt(paddr);
-		dev_addr = phys_to_dma(hwdev, paddr);
+		dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
 
 		/* Confirm address can be DMA'd by device */
 		if (dev_addr + size - 1 > dma_mask) {
@@ -812,10 +852,10 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	map = map_single(dev, phys, size, dir, attrs);
 	if (map == SWIOTLB_MAP_ERROR) {
 		swiotlb_full(dev, size, dir, 1);
-		return phys_to_dma(dev, io_tlb_overflow_buffer);
+		return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
 	}
 
-	dev_addr = phys_to_dma(dev, map);
+	dev_addr = swiotlb_phys_to_dma(dev, map);
 
 	/* Ensure that the address returned is DMA'ble */
 	if (dma_capable(dev, dev_addr, size))
@@ -824,7 +864,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	attrs |= DMA_ATTR_SKIP_CPU_SYNC;
 	swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
 
-	return phys_to_dma(dev, io_tlb_overflow_buffer);
+	return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
 }
 EXPORT_SYMBOL_GPL(swiotlb_map_page);
 
@@ -958,7 +998,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 				sg_dma_len(sgl) = 0;
 				return 0;
 			}
-			sg->dma_address = phys_to_dma(hwdev, map);
+			sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
 		} else
 			sg->dma_address = dev_addr;
 		sg_dma_len(sg) = sg->length;
@@ -1026,7 +1066,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 int
 swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
 {
-	return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
+	return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
 }
 EXPORT_SYMBOL(swiotlb_dma_mapping_error);
 
@@ -1039,6 +1079,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 int
 swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-	return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+	return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
 EXPORT_SYMBOL(swiotlb_dma_supported);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (21 preceding siblings ...)
  2017-04-18 21:20 ` [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption Tom Lendacky
@ 2017-04-18 21:20 ` Tom Lendacky
  2017-05-16 14:52   ` Borislav Petkov
  2017-04-18 21:20 ` [PATCH v5 24/32] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:20 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
 include/linux/dma-mapping.h        |   11 +++++++++++
 include/linux/mem_encrypt.h        |    6 ++++++
 lib/swiotlb.c                      |    3 +++
 4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 0637b4b..b406df2 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
 	return !!sme_me_mask;
 }
 
+static inline u64 sme_dma_mask(void)
+{
+	return ((u64)sme_me_mask << 1) - 1;
+}
+
 void __init sme_early_encrypt(resource_size_t paddr,
 			      unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
@@ -50,6 +55,12 @@ static inline bool sme_active(void)
 {
 	return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+	return 0ULL;
+}
+
 #endif
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0977317..f825870 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include <linux/scatterlist.h>
 #include <linux/kmemcheck.h>
 #include <linux/bug.h>
+#include <linux/mem_encrypt.h>
 
 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -577,6 +578,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 
 	if (!dev->dma_mask || !dma_supported(dev, mask))
 		return -EIO;
+
+	if (sme_active() && (mask < sme_dma_mask()))
+		dev_warn_ratelimited(dev,
+				     "SME is active, device will require DMA bounce buffers\n");
+
 	*dev->dma_mask = mask;
 	return 0;
 }
@@ -596,6 +602,11 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
 	if (!dma_supported(dev, mask))
 		return -EIO;
+
+	if (sme_active() && (mask < sme_dma_mask()))
+		dev_warn_ratelimited(dev,
+				     "SME is active, device will require DMA bounce buffers\n");
+
 	dev->coherent_dma_mask = mask;
 	return 0;
 }
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 3c384d1..000c430 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -28,6 +28,12 @@ static inline bool sme_active(void)
 {
 	return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+	return 0ULL;
+}
+
 #endif
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 74d6557..af3a268 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -509,6 +509,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	if (no_iotlb_memory)
 		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
 
+	WARN_ONCE(sme_active(),
+		  "SME is active and the system is using DMA bounce buffers\n");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 24/32] iommu/amd: Disable AMD IOMMU if memory encryption is active
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (22 preceding siblings ...)
  2017-04-18 21:20 ` [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
@ 2017-04-18 21:20 ` Tom Lendacky
  2017-04-18 21:20 ` [PATCH v5 25/32] x86, realmode: Check for memory encryption on the APs Tom Lendacky
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:20 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

For now, disable the AMD IOMMU if memory encryption is active. A future
patch will re-enable the function with full memory encryption support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/iommu/amd_iommu_init.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5a11328..c72d13b 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,6 +29,7 @@
 #include <linux/export.h>
 #include <linux/iommu.h>
 #include <linux/kmemleak.h>
+#include <linux/mem_encrypt.h>
 #include <asm/pci-direct.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
@@ -2552,6 +2553,12 @@ int __init amd_iommu_detect(void)
 	if (amd_iommu_disabled)
 		return -ENODEV;
 
+	/* For now, disable the IOMMU if SME is active */
+	if (sme_active()) {
+		pr_notice("AMD-Vi: SME is active, disabling the IOMMU\n");
+		return -ENODEV;
+	}
+
 	ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
 	if (ret)
 		return ret;

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 25/32] x86, realmode: Check for memory encryption on the APs
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (23 preceding siblings ...)
  2017-04-18 21:20 ` [PATCH v5 24/32] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
@ 2017-04-18 21:20 ` Tom Lendacky
  2017-04-18 21:20 ` [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings Tom Lendacky
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:20 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP, then set the memory encryption bit (bit
23) of MSR_K8_SYSCFG to enable memory encryption on that AP and allow the
AP to continue start up.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/realmode.h      |   12 ++++++++++++
 arch/x86/realmode/init.c             |    4 ++++
 arch/x86/realmode/rm/trampoline_64.S |   24 ++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..90d9152 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
 #ifndef _ARCH_X86_REALMODE_H
 #define _ARCH_X86_REALMODE_H
 
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * in the CONFIG_X86_64 variant.
+ */
+#define TH_FLAGS_SME_ACTIVE_BIT		0
+#define TH_FLAGS_SME_ACTIVE		BIT(TH_FLAGS_SME_ACTIVE_BIT)
+
+#ifndef __ASSEMBLY__
+
 #include <linux/types.h>
 #include <asm/io.h>
 
@@ -38,6 +47,7 @@ struct trampoline_header {
 	u64 start;
 	u64 efer;
 	u32 cr4;
+	u32 flags;
 #endif
 };
 
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
 void set_real_mode_mem(phys_addr_t mem, size_t size);
 void reserve_real_mode(void);
 
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 21d7506..5010089 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -102,6 +102,10 @@ static void __init setup_real_mode(void)
 	trampoline_cr4_features = &trampoline_header->cr4;
 	*trampoline_cr4_features = mmu_cr4_features;
 
+	trampoline_header->flags = 0;
+	if (sme_active())
+		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
+
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
 	trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..614fd70 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
 #include <asm/msr.h>
 #include <asm/segment.h>
 #include <asm/processor-flags.h>
+#include <asm/realmode.h>
 #include "realmode.h"
 
 	.text
@@ -92,6 +93,28 @@ ENTRY(startup_32)
 	movl	%edx, %fs
 	movl	%edx, %gs
 
+	/*
+	 * Check for memory encryption support. This is a safety net in
+	 * case BIOS hasn't done the necessary step of setting the bit in
+	 * the MSR for this AP. If SME is active and we've gotten this far
+	 * then it is safe for us to set the MSR bit and continue. If we
+	 * don't we'll eventually crash trying to execute encrypted
+	 * instructions.
+	 */
+	bt	$TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
+	jnc	.Ldone
+	movl	$MSR_K8_SYSCFG, %ecx
+	rdmsr
+	bts	$MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+	jc	.Ldone
+
+	/*
+	 * Memory encryption is enabled but the SME enable bit for this
+	 * CPU has has not been set.  It is safe to set it, so do so.
+	 */
+	wrmsr
+.Ldone:
+
 	movl	pa_tr_cr4, %eax
 	movl	%eax, %cr4		# Enable PAE mode
 
@@ -147,6 +170,7 @@ GLOBAL(trampoline_header)
 	tr_start:		.space	8
 	GLOBAL(tr_efer)		.space	8
 	GLOBAL(tr_cr4)		.space	4
+	GLOBAL(tr_flags)	.space	4
 END(trampoline_header)
 
 #include "trampoline_common.S"

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (24 preceding siblings ...)
  2017-04-18 21:20 ` [PATCH v5 25/32] x86, realmode: Check for memory encryption on the APs Tom Lendacky
@ 2017-04-18 21:20 ` Tom Lendacky
  2017-05-16 17:35   ` Borislav Petkov
  2017-04-18 21:21 ` [PATCH v5 27/32] kvm: x86: svm: Enable Secure Memory Encryption within KVM Tom Lendacky
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:20 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/vga.h       |   13 +++++++++++++
 arch/x86/mm/pageattr.c           |    2 ++
 drivers/gpu/drm/drm_gem.c        |    2 ++
 drivers/gpu/drm/drm_vm.c         |    4 ++++
 drivers/gpu/drm/ttm/ttm_bo_vm.c  |    7 +++++--
 drivers/gpu/drm/udl/udl_fb.c     |    4 ++++
 drivers/video/fbdev/core/fbmem.c |   12 ++++++++++++
 7 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..5c7567a 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,25 @@
 #ifndef _ASM_X86_VGA_H
 #define _ASM_X86_VGA_H
 
+#include <asm/cacheflush.h>
+
 /*
  *	On the PC, we can just recalculate addresses and then
  *	access the videoram directly without any black magic.
+ *	To support memory encryption however, we need to access
+ *	the videoram as decrypted memory.
  */
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VGA_MAP_MEM(x, s)					\
+({								\
+	unsigned long start = (unsigned long)phys_to_virt(x);	\
+	set_memory_decrypted(start, (s) >> PAGE_SHIFT);		\
+	start;							\
+})
+#else
 #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#endif
 
 #define vga_readb(x) (*(x))
 #define vga_writeb(x, y) (*(y) = (x))
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 0a850b1..5f14f20 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1824,11 +1824,13 @@ int set_memory_encrypted(unsigned long addr, int numpages)
 {
 	return __set_memory_enc_dec(addr, numpages, true);
 }
+EXPORT_SYMBOL_GPL(set_memory_encrypted);
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
 	return __set_memory_enc_dec(addr, numpages, false);
 }
+EXPORT_SYMBOL_GPL(set_memory_decrypted);
 
 int set_pages_uc(struct page *page, int numpages)
 {
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index bc93de3..96af539 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
 #include <linux/pagemap.h>
 #include <linux/shmem_fs.h>
 #include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
 #include <drm/drmP.h>
 #include <drm/drm_vma_manager.h>
 #include <drm/drm_gem.h>
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
 	vma->vm_ops = dev->driver->gem_vm_ops;
 	vma->vm_private_data = obj;
 	vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
 	/* Take a ref for this mapping of the object, so that the fault
 	 * handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index 1170b32..ed4bcbf 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
 #include <linux/efi.h>
 #include <linux/slab.h>
 #endif
+#include <linux/mem_encrypt.h>
 #include <asm/pgtable.h>
 #include "drm_internal.h"
 #include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
 {
 	pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
+	/* We don't want graphics memory to be mapped encrypted */
+	tmp = pgprot_decrypted(tmp);
+
 #if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
 	if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
 		tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 35ffb37..7958279 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
 #include <linux/rbtree.h>
 #include <linux/module.h>
 #include <linux/uaccess.h>
+#include <linux/mem_encrypt.h>
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
@@ -230,9 +231,11 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
 	 * first page.
 	 */
 	for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
-		if (bo->mem.bus.is_iomem)
+		if (bo->mem.bus.is_iomem) {
+			/* Iomem should not be marked encrypted */
+			cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
 			pfn = ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT) + page_offset;
-		else {
+		} else {
 			page = ttm->pages[page_offset];
 			if (unlikely(!page && i == 0)) {
 				retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 8e8d60e..51ee424 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <linux/fb.h>
 #include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
@@ -169,6 +170,9 @@ static int udl_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
 	pr_notice("mmap() framebuffer addr:%lu size:%lu\n",
 		  pos, size);
 
+	/* We don't want the framebuffer to be mapped encrypted */
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
+
 	while (size > 0) {
 		page = vmalloc_to_pfn((void *)pos);
 		if (remap_pfn_range(vma, start, page, PAGE_SIZE, PAGE_SHARED))
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 069fe79..b5e7c33 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -32,6 +32,7 @@
 #include <linux/device.h>
 #include <linux/efi.h>
 #include <linux/fb.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/fb.h>
 
@@ -1405,6 +1406,12 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
 	mutex_lock(&info->mm_lock);
 	if (fb->fb_mmap) {
 		int res;
+
+		/*
+		 * The framebuffer needs to be accessed decrypted, be sure
+		 * SME protection is removed ahead of the call
+		 */
+		vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 		res = fb->fb_mmap(info, vma);
 		mutex_unlock(&info->mm_lock);
 		return res;
@@ -1430,6 +1437,11 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
 	mutex_unlock(&info->mm_lock);
 
 	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	/*
+	 * The framebuffer needs to be accessed decrypted, be sure
+	 * SME protection is removed
+	 */
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 	fb_pgprotect(file, vma, start);
 
 	return vm_iomap_memory(vma, start, len);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 27/32] kvm: x86: svm: Enable Secure Memory Encryption within KVM
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (25 preceding siblings ...)
  2017-04-18 21:20 ` [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings Tom Lendacky
@ 2017-04-18 21:21 ` Tom Lendacky
  2017-04-18 21:21 ` [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME Tom Lendacky
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:21 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Update the KVM support to work with SME. The VMCB has a number of fields
where physical addresses are used and these addresses must contain the
memory encryption mask in order to properly access the encrypted memory.
Also, use the memory encryption mask when creating and using the nested
page tables.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kvm/mmu.c              |   12 ++++++++----
 arch/x86/kvm/mmu.h              |    2 +-
 arch/x86/kvm/svm.c              |   35 ++++++++++++++++++-----------------
 arch/x86/kvm/vmx.c              |    3 ++-
 arch/x86/kvm/x86.c              |    3 ++-
 6 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 74ef58c..a25576b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1066,7 +1066,7 @@ struct kvm_arch_async_pf {
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-		u64 acc_track_mask);
+		u64 acc_track_mask, u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac78105..f362083 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -107,7 +107,7 @@ enum {
 	(((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 1))
 
 
-#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
+#define PT64_BASE_ADDR_MASK __sme_clr((((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)))
 #define PT64_DIR_BASE_ADDR_MASK \
 	(PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + PT64_LEVEL_BITS)) - 1))
 #define PT64_LVL_ADDR_MASK(level) \
@@ -125,7 +125,7 @@ enum {
 					    * PT32_LEVEL_BITS))) - 1))
 
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
-			| shadow_x_mask | shadow_nx_mask)
+			| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
 
 #define ACC_EXEC_MASK    1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
@@ -184,6 +184,7 @@ struct kvm_shadow_walk_iterator {
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
 static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;
 
 /*
  * The mask/value to distinguish a PTE that has been marked not-present for
@@ -317,7 +318,7 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
-		u64 acc_track_mask)
+		u64 acc_track_mask, u64 me_mask)
 {
 	if (acc_track_mask != 0)
 		acc_track_mask |= SPTE_SPECIAL_MASK;
@@ -330,6 +331,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 	shadow_present_mask = p_mask;
 	shadow_acc_track_mask = acc_track_mask;
 	WARN_ON(shadow_accessed_mask != 0 && shadow_acc_track_mask != 0);
+	shadow_me_mask = me_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -2383,7 +2385,8 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
 	BUILD_BUG_ON(VMX_EPT_WRITABLE_MASK != PT_WRITABLE_MASK);
 
 	spte = __pa(sp->spt) | shadow_present_mask | PT_WRITABLE_MASK |
-	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
+	       shadow_user_mask | shadow_x_mask | shadow_accessed_mask |
+	       shadow_me_mask;
 
 	mmu_spte_set(sptep, spte);
 
@@ -2685,6 +2688,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		pte_access &= ~ACC_WRITE_MASK;
 
 	spte |= (u64)pfn << PAGE_SHIFT;
+	spte |= shadow_me_mask;
 
 	if (pte_access & ACC_WRITE_MASK) {
 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ddc56e9..c31b36e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
-	return ((1ULL << (e - s + 1)) - 1) << s;
+	return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5f48f62..183e458 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1138,9 +1138,9 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
 {
 	struct vmcb *vmcb = svm->vmcb;
 	struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
-	phys_addr_t bpa = page_to_phys(svm->avic_backing_page);
-	phys_addr_t lpa = page_to_phys(vm_data->avic_logical_id_table_page);
-	phys_addr_t ppa = page_to_phys(vm_data->avic_physical_id_table_page);
+	phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
+	phys_addr_t lpa = __sme_set(page_to_phys(vm_data->avic_logical_id_table_page));
+	phys_addr_t ppa = __sme_set(page_to_phys(vm_data->avic_physical_id_table_page));
 
 	vmcb->control.avic_backing_page = bpa & AVIC_HPA_MASK;
 	vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
@@ -1200,8 +1200,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_intercept(svm, INTERCEPT_MWAIT);
 	set_intercept(svm, INTERCEPT_XSETBV);
 
-	control->iopm_base_pa = iopm_base;
-	control->msrpm_base_pa = __pa(svm->msrpm);
+	control->iopm_base_pa = __sme_set(iopm_base);
+	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
 	control->int_ctl = V_INTR_MASKING_MASK;
 
 	init_seg(&save->es);
@@ -1334,9 +1334,9 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 
 	new_entry = READ_ONCE(*entry);
-	new_entry = (page_to_phys(svm->avic_backing_page) &
-		     AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK) |
-		     AVIC_PHYSICAL_ID_ENTRY_VALID_MASK;
+	new_entry = __sme_set((page_to_phys(svm->avic_backing_page) &
+			      AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK) |
+			      AVIC_PHYSICAL_ID_ENTRY_VALID_MASK);
 	WRITE_ONCE(*entry, new_entry);
 
 	svm->avic_physical_id_cache = entry;
@@ -1604,7 +1604,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 
 	svm->vmcb = page_address(page);
 	clear_page(svm->vmcb);
-	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
+	svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT);
 	svm->asid_generation = 0;
 	init_vmcb(svm);
 
@@ -1632,7 +1632,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
+	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
 	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
@@ -2301,7 +2301,7 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
 	u64 pdpte;
 	int ret;
 
-	ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte,
+	ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(__sme_clr(cr3)), &pdpte,
 				       offset_in_page(cr3) + index * 8, 8);
 	if (ret)
 		return 0;
@@ -2313,7 +2313,7 @@ static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->control.nested_cr3 = root;
+	svm->vmcb->control.nested_cr3 = __sme_set(root);
 	mark_dirty(svm->vmcb, VMCB_NPT);
 	svm_flush_tlb(vcpu);
 }
@@ -2801,7 +2801,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 		svm->nested.msrpm[p] = svm->msrpm[p] | value;
 	}
 
-	svm->vmcb->control.msrpm_base_pa = __pa(svm->nested.msrpm);
+	svm->vmcb->control.msrpm_base_pa = __sme_set(__pa(svm->nested.msrpm));
 
 	return true;
 }
@@ -4433,7 +4433,7 @@ static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 	pr_debug("SVM: %s: use GA mode for irq %u\n", __func__,
 		 irq.vector);
 	*svm = to_svm(vcpu);
-	vcpu_info->pi_desc_addr = page_to_phys((*svm)->avic_backing_page);
+	vcpu_info->pi_desc_addr = __sme_set(page_to_phys((*svm)->avic_backing_page));
 	vcpu_info->vector = irq.vector;
 
 	return 0;
@@ -4484,7 +4484,8 @@ static int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
 			struct amd_iommu_pi_data pi;
 
 			/* Try to enable guest_mode in IRTE */
-			pi.base = page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK;
+			pi.base = __sme_set(page_to_phys(svm->avic_backing_page) &
+					    AVIC_HPA_MASK);
 			pi.ga_tag = AVIC_GATAG(kvm->arch.avic_vm_id,
 						     svm->vcpu.vcpu_id);
 			pi.is_guest_mode = true;
@@ -4909,7 +4910,7 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->save.cr3 = root;
+	svm->vmcb->save.cr3 = __sme_set(root);
 	mark_dirty(svm->vmcb, VMCB_CR);
 	svm_flush_tlb(vcpu);
 }
@@ -4918,7 +4919,7 @@ static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	svm->vmcb->control.nested_cr3 = root;
+	svm->vmcb->control.nested_cr3 = __sme_set(root);
 	mark_dirty(svm->vmcb, VMCB_NPT);
 
 	/* Also sync guest cr3 here in case we live migrate */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a471e5..16b8391 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6480,7 +6480,8 @@ void vmx_enable_tdp(void)
 		enable_ept_ad_bits ? VMX_EPT_DIRTY_BIT : 0ull,
 		0ull, VMX_EPT_EXECUTABLE_MASK,
 		cpu_has_vmx_ept_execute_only() ? 0ull : VMX_EPT_READABLE_MASK,
-		enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK);
+		enable_ept_ad_bits ? 0ull : VMX_EPT_RWX_MASK,
+		0ull);
 
 	ept_set_mmio_spte_mask();
 	kvm_enable_tdp();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ccbd45e..2e65335 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mem_encrypt.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -6090,7 +6091,7 @@ int kvm_arch_init(void *opaque)
 
 	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
 			PT_DIRTY_MASK, PT64_NX_MASK, 0,
-			PT_PRESENT_MASK, 0);
+			PT_PRESENT_MASK, 0, sme_me_mask);
 	kvm_timer_init();
 
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (26 preceding siblings ...)
  2017-04-18 21:21 ` [PATCH v5 27/32] kvm: x86: svm: Enable Secure Memory Encryption within KVM Tom Lendacky
@ 2017-04-18 21:21 ` Tom Lendacky
  2017-05-17 19:17   ` Borislav Petkov
  2017-05-26  4:17   ` Xunlei Pang
  2017-04-18 21:21 ` [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place Tom Lendacky
                   ` (3 subsequent siblings)
  31 siblings, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:21 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
flush the caches and then halt. This is needed when booting from a state
where SME was not active into a state where SME is active (or vice-versa).
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/init.h          |    1 +
 arch/x86/include/asm/irqflags.h      |    5 +++++
 arch/x86/include/asm/kexec.h         |    8 ++++++++
 arch/x86/include/asm/pgtable_types.h |    1 +
 arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
 arch/x86/mm/ident_map.c              |   11 +++++++----
 include/linux/kexec.h                |   14 ++++++++++++++
 kernel/kexec_core.c                  |    7 +++++++
 9 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 737da62..b2ec511 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -6,6 +6,7 @@ struct x86_mapping_info {
 	void *context;			 /* context for alloc_pgt_page */
 	unsigned long pmd_flag;		 /* page flag for PMD entry */
 	unsigned long offset;		 /* ident mapping offset */
+	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
 };
 
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index ac7692d..38b5920 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
 	asm volatile("hlt": : :"memory");
 }
 
+static inline __cpuidle void native_wbinvd_halt(void)
+{
+	asm volatile("wbinvd; hlt" : : : "memory");
+}
+
 #endif
 
 #ifdef CONFIG_PARAVIRT
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 70ef205..e8183ac 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -207,6 +207,14 @@ struct kexec_entry64_regs {
 	uint64_t r15;
 	uint64_t rip;
 };
+
+extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+				       gfp_t gfp);
+#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
+
+extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
+#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
+
 #endif
 
 typedef void crash_vmclear_fn(void);
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index ce8cb1c..0f326f4 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -213,6 +213,7 @@ enum page_cache_mode {
 #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
 #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
 #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 085c3b3..11c0ca9 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
 	return 0;
 err:
 	free_transition_pgtable(image);
@@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 		.alloc_pgt_page	= alloc_pgt_page,
 		.context	= image,
 		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.kernpg_flag	= _KERNPG_TABLE_NOENC,
 	};
 	unsigned long mstart, mend;
 	pgd_t *level4p;
@@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
 {
 	kexec_mark_crashkres(false);
 }
+
+int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
+{
+	int ret;
+
+	if (sme_active()) {
+		/*
+		 * If SME is active we need to be sure that kexec pages are
+		 * not encrypted because when we boot to the new kernel the
+		 * pages won't be accessed encrypted (initially).
+		 */
+		ret = set_memory_decrypted((unsigned long)vaddr, pages);
+		if (ret)
+			return ret;
+
+		if (gfp & __GFP_ZERO)
+			memset(vaddr, 0, pages * PAGE_SIZE);
+	}
+
+	return 0;
+}
+
+void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+	if (sme_active()) {
+		/*
+		 * If SME is active we need to reset the pages back to being
+		 * an encrypted mapping before freeing them.
+		 */
+		set_memory_encrypted((unsigned long)vaddr, pages);
+	}
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0bb8842..f4e5de6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -24,6 +24,7 @@
 #include <linux/cpuidle.h>
 #include <trace/events/power.h>
 #include <linux/hw_breakpoint.h>
+#include <linux/kexec.h>
 #include <asm/cpu.h>
 #include <asm/apic.h>
 #include <asm/syscalls.h>
@@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
 	return ret;
 }
 #endif
+
 void stop_this_cpu(void *dummy)
 {
+	bool do_wbinvd_halt = false;
+
+	if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
+		/*
+		 * If we are performing a kexec and the processor supports
+		 * SME then we need to clear out cache information before
+		 * halting. With kexec, going from SME inactive to SME active
+		 * requires clearing cache entries so that addresses without
+		 * the encryption bit set don't corrupt the same physical
+		 * address that has the encryption bit set when caches are
+		 * flushed. Perform a wbinvd followed by a halt to achieve
+		 * this.
+		 */
+		do_wbinvd_halt = true;
+	}
+
 	local_irq_disable();
 	/*
 	 * Remove this CPU:
@@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 
-	for (;;)
-		halt();
+	for (;;) {
+		if (do_wbinvd_halt)
+			native_wbinvd_halt();
+		else
+			halt();
+	}
 }
 
 /*
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 04210a2..2c9fd3e 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
 static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 			  unsigned long addr, unsigned long end)
 {
+	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
 	unsigned long next;
 
 	for (; addr < end; addr = next) {
@@ -39,7 +40,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (!pmd)
 			return -ENOMEM;
 		ident_pmd_init(info, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+		set_pud(pud, __pud(__pa(pmd) | kernpg_flag));
 	}
 
 	return 0;
@@ -48,6 +49,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 			  unsigned long addr, unsigned long end)
 {
+	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
 	unsigned long next;
 
 	for (; addr < end; addr = next) {
@@ -67,7 +69,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 		if (!pud)
 			return -ENOMEM;
 		ident_pud_init(info, pud, addr, next);
-		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
+		set_p4d(p4d, __p4d(__pa(pud) | kernpg_flag));
 	}
 
 	return 0;
@@ -76,6 +78,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 			      unsigned long pstart, unsigned long pend)
 {
+	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
 	unsigned long addr = pstart + info->offset;
 	unsigned long end = pend + info->offset;
 	unsigned long next;
@@ -104,14 +107,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		if (result)
 			return result;
 		if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
-			set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
+			set_pgd(pgd, __pgd(__pa(p4d) | kernpg_flag));
 		} else {
 			/*
 			 * With p4d folded, pgd is equal to p4d.
 			 * The pgd entry has to point to the pud page table in this case.
 			 */
 			pud_t *pud = pud_offset(p4d, 0);
-			set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+			set_pgd(pgd, __pgd(__pa(pud) | kernpg_flag));
 		}
 	}
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d419d0e..1c76e3b 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -383,6 +383,20 @@ static inline void *boot_phys_to_virt(unsigned long entry)
 	return phys_to_virt(boot_phys_to_phys(entry));
 }
 
+#ifndef arch_kexec_post_alloc_pages
+static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
+					      gfp_t gfp)
+{
+	return 0;
+}
+#endif
+
+#ifndef arch_kexec_pre_free_pages
+static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
+{
+}
+#endif
+
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index bfe62d5..bb5e7e3 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -38,6 +38,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/compiler.h>
 #include <linux/hugetlb.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -315,6 +316,9 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		count = 1 << order;
 		for (i = 0; i < count; i++)
 			SetPageReserved(pages + i);
+
+		arch_kexec_post_alloc_pages(page_address(pages), count,
+					    gfp_mask);
 	}
 
 	return pages;
@@ -326,6 +330,9 @@ static void kimage_free_pages(struct page *page)
 
 	order = page_private(page);
 	count = 1 << order;
+
+	arch_kexec_pre_free_pages(page_address(page), count);
+
 	for (i = 0; i < count; i++)
 		ClearPageReserved(page + i);
 	__free_pages(page, order);

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (27 preceding siblings ...)
  2017-04-18 21:21 ` [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME Tom Lendacky
@ 2017-04-18 21:21 ` Tom Lendacky
  2017-05-18 12:46   ` Borislav Petkov
  2017-04-18 21:22 ` [PATCH v5 30/32] x86/boot: Add early cmdline parsing for options with arguments Tom Lendacky
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:21 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add the support to encrypt the kernel in-place. This is done by creating
new page mappings for the kernel - a decrypted write-protected mapping
and an encrypted mapping. The kernel is encrypted by copying it through
a temporary buffer.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |    6 +
 arch/x86/mm/Makefile               |    2 
 arch/x86/mm/mem_encrypt.c          |  262 ++++++++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt_boot.S     |  151 +++++++++++++++++++++
 4 files changed, 421 insertions(+)
 create mode 100644 arch/x86/mm/mem_encrypt_boot.S

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index b406df2..8f6f9b4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void)
 	return ((u64)sme_me_mask << 1) - 1;
 }
 
+void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
+			 unsigned long decrypted_kernel_vaddr,
+			 unsigned long kernel_len,
+			 unsigned long encryption_wa,
+			 unsigned long encryption_pgd);
+
 void __init sme_early_encrypt(resource_size_t paddr,
 			      unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 9e13841..0633142 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
 obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
+
+obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 30b07a3..0ff41a4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -24,6 +24,7 @@
 #include <asm/setup.h>
 #include <asm/bootparam.h>
 #include <asm/cacheflush.h>
+#include <asm/sections.h>
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
 	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
 }
 
+void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
+			  unsigned long end)
+{
+	unsigned long addr = start;
+	pgdval_t *pgd_p;
+
+	while (addr < end) {
+		unsigned long pgd_end;
+
+		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (pgd_end > end)
+			pgd_end = end;
+
+		pgd_p = (pgdval_t *)pgd_base + pgd_index(addr);
+		*pgd_p = 0;
+
+		addr = pgd_end;
+	}
+}
+
+#define PGD_FLAGS	_KERNPG_TABLE_NOENC
+#define PUD_FLAGS	_KERNPG_TABLE_NOENC
+#define PMD_FLAGS	(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
+
+static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
+				     unsigned long vaddr, pmdval_t pmd_val)
+{
+	pgdval_t pgd, *pgd_p;
+	pudval_t pud, *pud_p;
+	pmdval_t pmd, *pmd_p;
+
+	pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr);
+	pgd = *pgd_p;
+	if (pgd) {
+		pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK);
+	} else {
+		pud_p = pgtable_area;
+		memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
+		pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
+
+		*pgd_p = (pgdval_t)pud_p + PGD_FLAGS;
+	}
+
+	pud_p += pud_index(vaddr);
+	pud = *pud_p;
+	if (pud) {
+		if (pud & _PAGE_PSE)
+			goto out;
+
+		pmd_p = (pmdval_t *)(pud & ~PTE_FLAGS_MASK);
+	} else {
+		pmd_p = pgtable_area;
+		memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
+		pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
+
+		*pud_p = (pudval_t)pmd_p + PUD_FLAGS;
+	}
+
+	pmd_p += pmd_index(vaddr);
+	pmd = *pmd_p;
+	if (!pmd || !(pmd & _PAGE_PSE))
+		*pmd_p = pmd_val;
+
+out:
+	return pgtable_area;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long len)
+{
+	unsigned long pud_tables, pmd_tables;
+	unsigned long total = 0;
+
+	/*
+	 * Perform a relatively simplistic calculation of the pagetable
+	 * entries that are needed. That mappings will be covered by 2MB
+	 * PMD entries so we can conservatively calculate the required
+	 * number of PUD and PMD structures needed to perform the mappings.
+	 * Incrementing the count for each covers the case where the
+	 * addresses cross entries.
+	 */
+	pud_tables = ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE;
+	pud_tables++;
+	pmd_tables = ALIGN(len, PUD_SIZE) / PUD_SIZE;
+	pmd_tables++;
+
+	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
+	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
+
+	/*
+	 * Now calculate the added pagetable structures needed to populate
+	 * the new pagetables.
+	 */
+	pud_tables = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE;
+	pmd_tables = ALIGN(total, PUD_SIZE) / PUD_SIZE;
+
+	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
+	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
+
+	return total;
+}
+
 void __init sme_encrypt_kernel(void)
 {
+	pgd_t *pgd;
+	void *pgtable_area;
+	unsigned long kernel_start, kernel_end, kernel_len;
+	unsigned long workarea_start, workarea_end, workarea_len;
+	unsigned long execute_start, execute_end, execute_len;
+	unsigned long pgtable_area_len;
+	unsigned long decrypted_base;
+	unsigned long paddr, pmd_flags;
+
+	if (!sme_active())
+		return;
+
+	/*
+	 * Prepare for encrypting the kernel by building new pagetables with
+	 * the necessary attributes needed to encrypt the kernel in place.
+	 *
+	 *   One range of virtual addresses will map the memory occupied
+	 *   by the kernel as encrypted.
+	 *
+	 *   Another range of virtual addresses will map the memory occupied
+	 *   by the kernel as decrypted and write-protected.
+	 *
+	 *     The use of write-protect attribute will prevent any of the
+	 *     memory from being cached.
+	 */
+
+	/* Physical addresses gives us the identity mapped virtual addresses */
+	kernel_start = __pa_symbol(_text);
+	kernel_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+	kernel_len = kernel_end - kernel_start;
+
+	/* Set the encryption workarea to be immediately after the kernel */
+	workarea_start = kernel_end;
+
+	/*
+	 * Calculate required number of workarea bytes needed:
+	 *   executable encryption area size:
+	 *     stack page (PAGE_SIZE)
+	 *     encryption routine page (PAGE_SIZE)
+	 *     intermediate copy buffer (PMD_PAGE_SIZE)
+	 *   pagetable structures for the encryption of the kernel
+	 *   pagetable structures for workarea (in case not currently mapped)
+	 */
+	execute_start = workarea_start;
+	execute_end = execute_start + (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+	execute_len = execute_end - execute_start;
+
+	/*
+	 * One PGD for both encrypted and decrypted mappings and a set of
+	 * PUDs and PMDs for each of the encrypted and decrypted mappings.
+	 */
+	pgtable_area_len = sizeof(pgd_t) * PTRS_PER_PGD;
+	pgtable_area_len += sme_pgtable_calc(execute_end - kernel_start) * 2;
+
+	/* PUDs and PMDs needed in the current pagetables for the workarea */
+	pgtable_area_len += sme_pgtable_calc(execute_len + pgtable_area_len);
+
+	/*
+	 * The total workarea includes the executable encryption area and
+	 * the pagetable area.
+	 */
+	workarea_len = execute_len + pgtable_area_len;
+	workarea_end = workarea_start + workarea_len;
+
+	/*
+	 * Set the address to the start of where newly created pagetable
+	 * structures (PGDs, PUDs and PMDs) will be allocated. New pagetable
+	 * structures are created when the workarea is added to the current
+	 * pagetables and when the new encrypted and decrypted kernel
+	 * mappings are populated.
+	 */
+	pgtable_area = (void *)execute_end;
+
+	/*
+	 * Make sure the current pagetable structure has entries for
+	 * addressing the workarea.
+	 */
+	pgd = (pgd_t *)native_read_cr3();
+	paddr = workarea_start;
+	while (paddr < workarea_end) {
+		pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+						paddr,
+						paddr + PMD_FLAGS);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+	native_write_cr3((unsigned long)pgd);
+
+	/*
+	 * A new pagetable structure is being built to allow for the kernel
+	 * to be encrypted. It starts with an empty PGD that will then be
+	 * populated with new PUDs and PMDs as the encrypted and decrypted
+	 * kernel mappings are created.
+	 */
+	pgd = pgtable_area;
+	memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+	pgtable_area += sizeof(*pgd) * PTRS_PER_PGD;
+
+	/* Add encrypted kernel (identity) mappings */
+	pmd_flags = PMD_FLAGS | _PAGE_ENC;
+	paddr = kernel_start;
+	while (paddr < kernel_end) {
+		pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+						paddr,
+						paddr + pmd_flags);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/*
+	 * A different PGD index/entry must be used to get different
+	 * pagetable entries for the decrypted mapping. Choose the next
+	 * PGD index and convert it to a virtual address to be used as
+	 * the base of the mapping.
+	 */
+	decrypted_base = (pgd_index(workarea_end) + 1) & (PTRS_PER_PGD - 1);
+	decrypted_base <<= PGDIR_SHIFT;
+
+	/* Add decrypted, write-protected kernel (non-identity) mappings */
+	pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+	paddr = kernel_start;
+	while (paddr < kernel_end) {
+		pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+						paddr + decrypted_base,
+						paddr + pmd_flags);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/* Add decrypted workarea mappings to both kernel mappings */
+	paddr = workarea_start;
+	while (paddr < workarea_end) {
+		pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+						paddr,
+						paddr + PMD_FLAGS);
+
+		pgtable_area = sme_populate_pgd(pgd, pgtable_area,
+						paddr + decrypted_base,
+						paddr + PMD_FLAGS);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/* Perform the encryption */
+	sme_encrypt_execute(kernel_start, kernel_start + decrypted_base,
+			    kernel_len, workarea_start, (unsigned long)pgd);
+
+	/*
+	 * At this point we are running encrypted.  Remove the mappings for
+	 * the decrypted areas - all that is needed for this is to remove
+	 * the PGD entry/entries.
+	 */
+	sme_clear_pgd(pgd, kernel_start + decrypted_base,
+		      kernel_end + decrypted_base);
+
+	sme_clear_pgd(pgd, workarea_start + decrypted_base,
+		      workarea_end + decrypted_base);
+
+	/* Flush the TLB - no globals so cr3 is enough */
+	native_write_cr3(native_read_cr3());
 }
 
 unsigned long __init sme_enable(void)
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
new file mode 100644
index 0000000..fb58f9f
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -0,0 +1,151 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+	.text
+	.code64
+ENTRY(sme_encrypt_execute)
+
+	/*
+	 * Entry parameters:
+	 *   RDI - virtual address for the encrypted kernel mapping
+	 *   RSI - virtual address for the decrypted kernel mapping
+	 *   RDX - length of kernel
+	 *   RCX - virtual address of the encryption workarea, including:
+	 *     - stack page (PAGE_SIZE)
+	 *     - encryption routine page (PAGE_SIZE)
+	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
+	 *    R8 - physcial address of the pagetables to use for encryption
+	 */
+
+	push	%rbp
+	push	%r12
+
+	/* Set up a one page stack in the non-encrypted memory area */
+	movq	%rsp, %rbp		/* Save current stack pointer */
+	movq	%rcx, %rax		/* Workarea stack page */
+	movq	%rax, %rsp		/* Set new stack pointer */
+	addq	$PAGE_SIZE, %rsp	/* Stack grows from the bottom */
+	addq	$PAGE_SIZE, %rax	/* Workarea encryption routine */
+
+	movq	%rdi, %r10		/* Encrypted kernel */
+	movq	%rsi, %r11		/* Decrypted kernel */
+	movq	%rdx, %r12		/* Kernel length */
+
+	/* Copy encryption routine into the workarea */
+	movq	%rax, %rdi		/* Workarea encryption routine */
+	leaq	.Lenc_start(%rip), %rsi	/* Encryption routine */
+	movq	$(.Lenc_stop - .Lenc_start), %rcx	/* Encryption routine length */
+	rep	movsb
+
+	/* Setup registers for call */
+	movq	%r10, %rdi		/* Encrypted kernel */
+	movq	%r11, %rsi		/* Decrypted kernel */
+	movq	%r8, %rdx		/* Pagetables used for encryption */
+	movq	%r12, %rcx		/* Kernel length */
+	movq	%rax, %r8		/* Workarea encryption routine */
+	addq	$PAGE_SIZE, %r8		/* Workarea intermediate copy buffer */
+
+	call	*%rax			/* Call the encryption routine */
+
+	movq	%rbp, %rsp		/* Restore original stack pointer */
+
+	pop	%r12
+	pop	%rbp
+
+	ret
+ENDPROC(sme_encrypt_execute)
+
+.Lenc_start:
+ENTRY(sme_enc_routine)
+/*
+ * Routine used to encrypt kernel.
+ *   This routine must be run outside of the kernel proper since
+ *   the kernel will be encrypted during the process. So this
+ *   routine is defined here and then copied to an area outside
+ *   of the kernel where it will remain and run decrypted
+ *   during execution.
+ *
+ *   On entry the registers must be:
+ *     RDI - virtual address for the encrypted kernel mapping
+ *     RSI - virtual address for the decrypted kernel mapping
+ *     RDX - address of the pagetables to use for encryption
+ *     RCX - length of kernel
+ *      R8 - intermediate copy buffer
+ *
+ *     RAX - points to this routine
+ *
+ * The kernel will be encrypted by copying from the non-encrypted
+ * kernel space to an intermediate buffer and then copying from the
+ * intermediate buffer back to the encrypted kernel space. The physical
+ * addresses of the two kernel space mappings are the same which
+ * results in the kernel being encrypted "in place".
+ */
+	/* Enable the new page tables */
+	mov	%rdx, %cr3
+
+	/* Flush any global TLBs */
+	mov	%cr4, %rdx
+	andq	$~X86_CR4_PGE, %rdx
+	mov	%rdx, %cr4
+	orq	$X86_CR4_PGE, %rdx
+	mov	%rdx, %cr4
+
+	/* Set the PAT register PA5 entry to write-protect */
+	push	%rcx
+	movl	$MSR_IA32_CR_PAT, %ecx
+	rdmsr
+	push	%rdx			/* Save original PAT value */
+	andl	$0xffff00ff, %edx	/* Clear PA5 */
+	orl	$0x00000500, %edx	/* Set PA5 to WP */
+	wrmsr
+	pop	%rdx			/* RDX contains original PAT value */
+	pop	%rcx
+
+	movq	%rcx, %r9		/* Save kernel length */
+	movq	%rdi, %r10		/* Save encrypted kernel address */
+	movq	%rsi, %r11		/* Save decrypted kernel address */
+
+	wbinvd				/* Invalidate any cache entries */
+
+	/* Copy/encrypt 2MB at a time */
+1:
+	movq	%r11, %rsi		/* Source - decrypted kernel */
+	movq	%r8, %rdi		/* Dest   - intermediate copy buffer */
+	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
+	rep	movsb
+
+	movq	%r8, %rsi		/* Source - intermediate copy buffer */
+	movq	%r10, %rdi		/* Dest   - encrypted kernel */
+	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
+	rep	movsb
+
+	addq	$PMD_PAGE_SIZE, %r11
+	addq	$PMD_PAGE_SIZE, %r10
+	subq	$PMD_PAGE_SIZE, %r9	/* Kernel length decrement */
+	jnz	1b			/* Kernel length not zero? */
+
+	/* Restore PAT register */
+	push	%rdx			/* Save original PAT value */
+	movl	$MSR_IA32_CR_PAT, %ecx
+	rdmsr
+	pop	%rdx			/* Restore original PAT value */
+	wrmsr
+
+	ret
+ENDPROC(sme_enc_routine)
+.Lenc_stop:

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 30/32] x86/boot: Add early cmdline parsing for options with arguments
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (28 preceding siblings ...)
  2017-04-18 21:21 ` [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place Tom Lendacky
@ 2017-04-18 21:22 ` Tom Lendacky
  2017-04-18 21:22 ` [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption Tom Lendacky
  2017-04-18 21:22 ` [PATCH v5 32/32] x86/mm: Add support to make use of " Tom Lendacky
  31 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:22 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add a cmdline_find_option() function to look for cmdline options that
take arguments. The argument is returned in a supplied buffer and the
argument length (regardless of whether it fits in the supplied buffer)
is returned, with -1 indicating not found.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cmdline.h |    2 +
 arch/x86/lib/cmdline.c         |  105 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/arch/x86/include/asm/cmdline.h b/arch/x86/include/asm/cmdline.h
index e01f7f7..84ae170 100644
--- a/arch/x86/include/asm/cmdline.h
+++ b/arch/x86/include/asm/cmdline.h
@@ -2,5 +2,7 @@
 #define _ASM_X86_CMDLINE_H
 
 int cmdline_find_option_bool(const char *cmdline_ptr, const char *option);
+int cmdline_find_option(const char *cmdline_ptr, const char *option,
+			char *buffer, int bufsize);
 
 #endif /* _ASM_X86_CMDLINE_H */
diff --git a/arch/x86/lib/cmdline.c b/arch/x86/lib/cmdline.c
index 5cc78bf..3261abb 100644
--- a/arch/x86/lib/cmdline.c
+++ b/arch/x86/lib/cmdline.c
@@ -104,7 +104,112 @@ static inline int myisspace(u8 c)
 	return 0;	/* Buffer overrun */
 }
 
+/*
+ * Find a non-boolean option (i.e. option=argument). In accordance with
+ * standard Linux practice, if this option is repeated, this returns the
+ * last instance on the command line.
+ *
+ * @cmdline: the cmdline string
+ * @max_cmdline_size: the maximum size of cmdline
+ * @option: option string to look for
+ * @buffer: memory buffer to return the option argument
+ * @bufsize: size of the supplied memory buffer
+ *
+ * Returns the length of the argument (regardless of if it was
+ * truncated to fit in the buffer), or -1 on not found.
+ */
+static int
+__cmdline_find_option(const char *cmdline, int max_cmdline_size,
+		      const char *option, char *buffer, int bufsize)
+{
+	char c;
+	int pos = 0, len = -1;
+	const char *opptr = NULL;
+	char *bufptr = buffer;
+	enum {
+		st_wordstart = 0,	/* Start of word/after whitespace */
+		st_wordcmp,	/* Comparing this word */
+		st_wordskip,	/* Miscompare, skip */
+		st_bufcpy,	/* Copying this to buffer */
+	} state = st_wordstart;
+
+	if (!cmdline)
+		return -1;      /* No command line */
+
+	/*
+	 * This 'pos' check ensures we do not overrun
+	 * a non-NULL-terminated 'cmdline'
+	 */
+	while (pos++ < max_cmdline_size) {
+		c = *(char *)cmdline++;
+		if (!c)
+			break;
+
+		switch (state) {
+		case st_wordstart:
+			if (myisspace(c))
+				break;
+
+			state = st_wordcmp;
+			opptr = option;
+			/* fall through */
+
+		case st_wordcmp:
+			if ((c == '=') && !*opptr) {
+				/*
+				 * We matched all the way to the end of the
+				 * option we were looking for, prepare to
+				 * copy the argument.
+				 */
+				len = 0;
+				bufptr = buffer;
+				state = st_bufcpy;
+				break;
+			} else if (c == *opptr++) {
+				/*
+				 * We are currently matching, so continue
+				 * to the next character on the cmdline.
+				 */
+				break;
+			}
+			state = st_wordskip;
+			/* fall through */
+
+		case st_wordskip:
+			if (myisspace(c))
+				state = st_wordstart;
+			break;
+
+		case st_bufcpy:
+			if (myisspace(c)) {
+				state = st_wordstart;
+			} else {
+				/*
+				 * Increment len, but don't overrun the
+				 * supplied buffer and leave room for the
+				 * NULL terminator.
+				 */
+				if (++len < bufsize)
+					*bufptr++ = c;
+			}
+			break;
+		}
+	}
+
+	if (bufsize)
+		*bufptr = '\0';
+
+	return len;
+}
+
 int cmdline_find_option_bool(const char *cmdline, const char *option)
 {
 	return __cmdline_find_option_bool(cmdline, COMMAND_LINE_SIZE, option);
 }
+
+int cmdline_find_option(const char *cmdline, const char *option, char *buffer,
+			int bufsize)
+{
+	return __cmdline_find_option(cmdline, COMMAND_LINE_SIZE, option,
+				     buffer, bufsize);
+}

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (29 preceding siblings ...)
  2017-04-18 21:22 ` [PATCH v5 30/32] x86/boot: Add early cmdline parsing for options with arguments Tom Lendacky
@ 2017-04-18 21:22 ` Tom Lendacky
  2017-04-21 21:55   ` Dave Hansen
  2017-05-18 17:01   ` Borislav Petkov
  2017-04-18 21:22 ` [PATCH v5 32/32] x86/mm: Add support to make use of " Tom Lendacky
  31 siblings, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:22 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add sysfs support for SME so that user-space utilities (kdump, etc.) can
determine if SME is active.

A new directory will be created:
  /sys/kernel/mm/sme/

And two entries within the new directory:
  /sys/kernel/mm/sme/active
  /sys/kernel/mm/sme/encryption_mask

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/mm/mem_encrypt.c |   49 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 0ff41a4..7dc4e98 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -18,6 +18,8 @@
 #include <linux/mm.h>
 #include <linux/dma-mapping.h>
 #include <linux/swiotlb.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
@@ -25,6 +27,7 @@
 #include <asm/bootparam.h>
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -38,6 +41,52 @@
 static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
 /*
+ * Sysfs support for SME.
+ *   Create an sme directory under /sys/kernel/mm
+ *   Create two sme entries under /sys/kernel/mm/sme:
+ *     active - returns 0 if not active, 1 if active
+ *     encryption_mask - returns the encryption mask in use
+ */
+static ssize_t active_show(struct kobject *kobj, struct kobj_attribute *attr,
+			   char *buf)
+{
+	return sprintf(buf, "%u\n", sme_active());
+}
+static struct kobj_attribute active_attr = __ATTR_RO(active);
+
+static ssize_t encryption_mask_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *buf)
+{
+	return sprintf(buf, "0x%016lx\n", sme_me_mask);
+}
+static struct kobj_attribute encryption_mask_attr = __ATTR_RO(encryption_mask);
+
+static struct attribute *sme_attrs[] = {
+	&active_attr.attr,
+	&encryption_mask_attr.attr,
+	NULL
+};
+
+static struct attribute_group sme_attr_group = {
+	.attrs = sme_attrs,
+	.name = "sme",
+};
+
+static int __init sme_sysfs_init(void)
+{
+	int ret;
+
+	ret = sysfs_create_group(mm_kobj, &sme_attr_group);
+	if (ret) {
+		pr_err("SME sysfs initialization failed\n");
+		return ret;
+	}
+
+	return 0;
+}
+subsys_initcall(sme_sysfs_init);
+
+/*
  * This routine does not change the underlying encryption setting of the
  * page(s) that map this memory. It assumes that eventually the memory is
  * meant to be accessed as either encrypted or decrypted but the contents

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (30 preceding siblings ...)
  2017-04-18 21:22 ` [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption Tom Lendacky
@ 2017-04-18 21:22 ` Tom Lendacky
  2017-04-21 18:56   ` Tom Lendacky
  2017-05-19 11:27   ` Borislav Petkov
  31 siblings, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-18 21:22 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

Add support to check if SME has been enabled and if memory encryption
should be activated (checking of command line option based on the
configuration of the default state).  If memory encryption is to be
activated, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/head_64.S |    1 +
 arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index abfe5ee..77d7495 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -97,6 +97,7 @@ startup_64:
 	 * Save the returned mask in %r12 for later use.
 	 */
 	push	%rsi
+	movq	%rsi, %rdi
 	call	sme_enable
 	pop	%rsi
 	movq	%rax, %r12
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 7dc4e98..b517cbc 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,6 +28,13 @@
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
 #include <asm/mem_encrypt.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>
+
+static char sme_cmdline_arg[] __initdata = "mem_encrypt";
+static char sme_cmdline_on[]  __initdata = "on";
+static char sme_cmdline_off[] __initdata = "off";
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -255,6 +262,8 @@ void __init mem_encrypt_init(void)
 
 	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
 	swiotlb_update_mem_attributes();
+
+	pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
 void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
@@ -531,8 +540,74 @@ void __init sme_encrypt_kernel(void)
 	native_write_cr3(native_read_cr3());
 }
 
-unsigned long __init sme_enable(void)
+unsigned long __init sme_enable(struct boot_params *bp)
 {
+	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
+	unsigned int eax, ebx, ecx, edx;
+	unsigned long me_mask;
+	bool active_by_default;
+	char buffer[16];
+	u64 msr;
+
+	/* Check for the SME support leaf */
+	eax = 0x80000000;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if (eax < 0x8000001f)
+		goto out;
+
+	/*
+	 * Check for the SME feature:
+	 *   CPUID Fn8000_001F[EAX] - Bit 0
+	 *     Secure Memory Encryption support
+	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
+	 *     Pagetable bit position used to indicate encryption
+	 */
+	eax = 0x8000001f;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if (!(eax & 1))
+		goto out;
+	me_mask = 1UL << (ebx & 0x3f);
+
+	/* Check if SME is enabled */
+	msr = __rdmsr(MSR_K8_SYSCFG);
+	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+		goto out;
+
+	/*
+	 * Fixups have not been applied to phys_base yet, so we must obtain
+	 * the address to the SME command line option data in the following
+	 * way.
+	 */
+	asm ("lea sme_cmdline_arg(%%rip), %0"
+	     : "=r" (cmdline_arg)
+	     : "p" (sme_cmdline_arg));
+	asm ("lea sme_cmdline_on(%%rip), %0"
+	     : "=r" (cmdline_on)
+	     : "p" (sme_cmdline_on));
+	asm ("lea sme_cmdline_off(%%rip), %0"
+	     : "=r" (cmdline_off)
+	     : "p" (sme_cmdline_off));
+
+	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
+		active_by_default = true;
+	else
+		active_by_default = false;
+
+	cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
+				     ((u64)bp->ext_cmd_line_ptr << 32));
+
+	cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
+
+	if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
+		sme_me_mask = me_mask;
+	else if (strncmp(buffer, cmdline_off, sizeof(buffer)) == 0)
+		sme_me_mask = 0;
+	else
+		sme_me_mask = active_by_default ? me_mask : 0;
+
+out:
 	return sme_me_mask;
 }
 
@@ -543,9 +618,9 @@ unsigned long sme_get_me_mask(void)
 
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
-void __init sme_encrypt_kernel(void)	{ }
-unsigned long __init sme_enable(void)	{ return 0; }
+void __init sme_encrypt_kernel(void)			{ }
+unsigned long __init sme_enable(struct boot_params *bp)	{ return 0; }
 
-unsigned long sme_get_me_mask(void)	{ return 0; }
+unsigned long sme_get_me_mask(void)			{ return 0; }
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
@ 2017-04-19  9:02   ` Borislav Petkov
  2017-04-19 14:23     ` Tom Lendacky
  2017-04-19  9:52   ` David Howells
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-04-19  9:02 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

Always have a verb in the Subject to form a "do this" or "do that"
sentence to better explain what the patch does:

"Subject: [PATCH v5 01/32] x86: Add documentation for AMD Secure Memory Encryption (SME)"

On Tue, Apr 18, 2017 at 04:16:25PM -0500, Tom Lendacky wrote:
> Create a Documentation entry to describe the AMD Secure Memory
> Encryption (SME) feature and add documentation for the mem_encrypt=
> kernel parameter.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   11 ++++
>  Documentation/x86/amd-memory-encryption.txt     |   60 +++++++++++++++++++++++
>  2 files changed, 71 insertions(+)
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 3dd6d5d..84c5787 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2165,6 +2165,17 @@
>  			memory contents and reserves bad memory
>  			regions that are detected.
>  
> +	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
> +			Valid arguments: on, off
> +			Default (depends on kernel configuration option):
> +			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
> +			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
> +			mem_encrypt=on:		Activate SME
> +			mem_encrypt=off:	Do not activate SME
> +
> +			Refer to Documentation/x86/amd-memory-encryption.txt
> +			for details on when memory encryption can be activated.
> +
>  	mem_sleep_default=	[SUSPEND] Default system suspend mode:
>  			s2idle  - Suspend-To-Idle
>  			shallow - Power-On Suspend or equivalent (if supported)
> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
> new file mode 100644
> index 0000000..0b72ff2
> --- /dev/null
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -0,0 +1,60 @@
> +Secure Memory Encryption (SME) is a feature found on AMD processors.
> +
> +SME provides the ability to mark individual pages of memory as encrypted using
> +the standard x86 page tables.  A page that is marked encrypted will be
> +automatically decrypted when read from DRAM and encrypted when written to
> +DRAM.  SME can therefore be used to protect the contents of DRAM from physical
> +attacks on the system.
> +
> +A page is encrypted when a page table entry has the encryption bit set (see
> +below on how to determine its position).  The encryption bit can be specified
> +in the cr3 register, allowing the PGD table to be encrypted. Each successive

I missed that the last time: do you mean here, "The encryption bit can
be specified in the %cr3 register allowing for the page table hierarchy
itself to be encrypted."?

> +level of page tables can also be encrypted.

Right, judging by the next sentence, it looks like it.

The rest looks and reads really nice to me, so feel free to add:

Reviewed-by: Borislav Petkov <bp@suse.de>

after addressing those minor nitpicks on your next submission.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
  2017-04-19  9:02   ` Borislav Petkov
@ 2017-04-19  9:52   ` David Howells
  1 sibling, 0 replies; 126+ messages in thread
From: David Howells @ 2017-04-19  9:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: dhowells, Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc,
	x86, kexec, linux-kernel, kasan-dev, linux-mm, iommu,
	Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

Borislav Petkov <bp@alien8.de> wrote:

> "Subject: [PATCH v5 01/32] x86: Add documentation for AMD Secure Memory Encryption (SME)"

Or:

	x86: Document AMD Secure Memory Encryption (SME) support

David

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-04-19  9:02   ` Borislav Petkov
@ 2017-04-19 14:23     ` Tom Lendacky
  2017-04-19 15:38       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-19 14:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 4/19/2017 4:02 AM, Borislav Petkov wrote:
> Always have a verb in the Subject to form a "do this" or "do that"
> sentence to better explain what the patch does:
>
> "Subject: [PATCH v5 01/32] x86: Add documentation for AMD Secure Memory Encryption (SME)"

Will do.

Btw, I tried to update all the subjects and descriptions to be
more descriptive but I'm sure there is still room for improvement
so keep the comments on them coming.

>
> On Tue, Apr 18, 2017 at 04:16:25PM -0500, Tom Lendacky wrote:
>> Create a Documentation entry to describe the AMD Secure Memory
>> Encryption (SME) feature and add documentation for the mem_encrypt=
>> kernel parameter.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt |   11 ++++
>>  Documentation/x86/amd-memory-encryption.txt     |   60 +++++++++++++++++++++++
>>  2 files changed, 71 insertions(+)
>>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 3dd6d5d..84c5787 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2165,6 +2165,17 @@
>>  			memory contents and reserves bad memory
>>  			regions that are detected.
>>
>> +	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
>> +			Valid arguments: on, off
>> +			Default (depends on kernel configuration option):
>> +			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
>> +			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
>> +			mem_encrypt=on:		Activate SME
>> +			mem_encrypt=off:	Do not activate SME
>> +
>> +			Refer to Documentation/x86/amd-memory-encryption.txt
>> +			for details on when memory encryption can be activated.
>> +
>>  	mem_sleep_default=	[SUSPEND] Default system suspend mode:
>>  			s2idle  - Suspend-To-Idle
>>  			shallow - Power-On Suspend or equivalent (if supported)
>> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
>> new file mode 100644
>> index 0000000..0b72ff2
>> --- /dev/null
>> +++ b/Documentation/x86/amd-memory-encryption.txt
>> @@ -0,0 +1,60 @@
>> +Secure Memory Encryption (SME) is a feature found on AMD processors.
>> +
>> +SME provides the ability to mark individual pages of memory as encrypted using
>> +the standard x86 page tables.  A page that is marked encrypted will be
>> +automatically decrypted when read from DRAM and encrypted when written to
>> +DRAM.  SME can therefore be used to protect the contents of DRAM from physical
>> +attacks on the system.
>> +
>> +A page is encrypted when a page table entry has the encryption bit set (see
>> +below on how to determine its position).  The encryption bit can be specified
>> +in the cr3 register, allowing the PGD table to be encrypted. Each successive
>
> I missed that the last time: do you mean here, "The encryption bit can
> be specified in the %cr3 register allowing for the page table hierarchy
> itself to be encrypted."?
>
>> +level of page tables can also be encrypted.
>
> Right, judging by the next sentence, it looks like it.

Correct. I like the hierarchy term so I'll add that to the text.

Note, just because the bit is set in %cr3 doesn't mean the full
hierarchy is encrypted. Each level in the hierarchy needs to have the
encryption bit set. So, theoretically, you could have the encryption
bit set in %cr3 so that the PGD is encrypted, but not set the encryption
bit in the PGD entry for a PUD and so the PUD pointed to by that entry
would not be encrypted.

Thanks,
Tom

>
> The rest looks and reads really nice to me, so feel free to add:
>
> Reviewed-by: Borislav Petkov <bp@suse.de>
>
> after addressing those minor nitpicks on your next submission.
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-04-19 14:23     ` Tom Lendacky
@ 2017-04-19 15:38       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-04-19 15:38 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Wed, Apr 19, 2017 at 09:23:47AM -0500, Tom Lendacky wrote:
> Btw, I tried to update all the subjects and descriptions to be
> more descriptive but I'm sure there is still room for improvement
> so keep the comments on them coming.

No worries there :)

> Note, just because the bit is set in %cr3 doesn't mean the full
> hierarchy is encrypted. Each level in the hierarchy needs to have the
> encryption bit set. So, theoretically, you could have the encryption
> bit set in %cr3 so that the PGD is encrypted, but not set the encryption
> bit in the PGD entry for a PUD and so the PUD pointed to by that entry
> would not be encrypted.

Ha, that is a nice detail I didn't realize. You could add it to the text.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size
  2017-04-18 21:17 ` [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size Tom Lendacky
@ 2017-04-20 16:59   ` Borislav Petkov
  2017-04-20 17:29     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-04-20 16:59 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:17:11PM -0500, Tom Lendacky wrote:
> When System Memory Encryption (SME) is enabled, the physical address
> space is reduced. Adjust the x86_phys_bits value to reflect this
> reduction.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/cpu/amd.c |   14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)

...

> @@ -622,8 +624,14 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>  
>  			/* Check if SME is enabled */
>  			rdmsrl(MSR_K8_SYSCFG, msr);
> -			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> +			if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT) {
> +				unsigned int ebx;
> +
> +				ebx = cpuid_ebx(0x8000001f);
> +				c->x86_phys_bits -= (ebx >> 6) & 0x3f;
> +			} else {
>  				clear_cpu_cap(c, X86_FEATURE_SME);
> +			}

Lemme do some simplifying to save an indent level, get rid of local var
ebx and kill some { }-brackets for a bit better readability:

        if (c->extended_cpuid_level >= 0x8000001f) {
                u64 msr;

                if (!cpu_has(c, X86_FEATURE_SME))
                        return;

                /* Check if SME is enabled */
                rdmsrl(MSR_K8_SYSCFG, msr);
                if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
                        c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
                else
                        clear_cpu_cap(c, X86_FEATURE_SME);
        }

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size
  2017-04-20 16:59   ` Borislav Petkov
@ 2017-04-20 17:29     ` Tom Lendacky
  2017-04-20 18:52       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-20 17:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 4/20/2017 11:59 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:17:11PM -0500, Tom Lendacky wrote:
>> When System Memory Encryption (SME) is enabled, the physical address
>> space is reduced. Adjust the x86_phys_bits value to reflect this
>> reduction.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/cpu/amd.c |   14 +++++++++++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> ...
>
>> @@ -622,8 +624,14 @@ static void early_init_amd(struct cpuinfo_x86 *c)
>>
>>  			/* Check if SME is enabled */
>>  			rdmsrl(MSR_K8_SYSCFG, msr);
>> -			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
>> +			if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT) {
>> +				unsigned int ebx;
>> +
>> +				ebx = cpuid_ebx(0x8000001f);
>> +				c->x86_phys_bits -= (ebx >> 6) & 0x3f;
>> +			} else {
>>  				clear_cpu_cap(c, X86_FEATURE_SME);
>> +			}
>
> Lemme do some simplifying to save an indent level, get rid of local var
> ebx and kill some { }-brackets for a bit better readability:
>
>         if (c->extended_cpuid_level >= 0x8000001f) {
>                 u64 msr;
>
>                 if (!cpu_has(c, X86_FEATURE_SME))
>                         return;
>
>                 /* Check if SME is enabled */
>                 rdmsrl(MSR_K8_SYSCFG, msr);
>                 if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
>                         c->x86_phys_bits -= (cpuid_ebx(0x8000001f) >> 6) & 0x3f;
>                 else
>                         clear_cpu_cap(c, X86_FEATURE_SME);
>         }
>

Hmmm... and actually if cpu_has(X86_FEATURE_SME) is true then it's a
given that extended_cpuid_level >= 0x8000001f.  So this can be
simplified to just:

	if (cpu_has(c, X86_FEATURE_SME)) {
		... the rest of your suggestion (minus cpu_has()) ...
	}

Thanks,
Tom

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size
  2017-04-20 17:29     ` Tom Lendacky
@ 2017-04-20 18:52       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-04-20 18:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Thu, Apr 20, 2017 at 12:29:20PM -0500, Tom Lendacky wrote:
> Hmmm... and actually if cpu_has(X86_FEATURE_SME) is true then it's a
> given that extended_cpuid_level >= 0x8000001f.  So this can be
> simplified to just:
> 
> 	if (cpu_has(c, X86_FEATURE_SME)) {
> 		... the rest of your suggestion (minus cpu_has()) ...

Cool, even better! :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing
  2017-04-18 21:17 ` [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing Tom Lendacky
@ 2017-04-21 14:55   ` Borislav Petkov
  2017-04-21 21:40     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-04-21 14:55 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:17:35PM -0500, Tom Lendacky wrote:
> Add support to the early boot code to use Secure Memory Encryption (SME).
> Since the kernel has been loaded into memory in a decrypted state, support
> is added to encrypt the kernel in place and update the early pagetables

s/support is added to //

> with the memory encryption mask so that new pagetable entries will use
> memory encryption.
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-04-18 21:22 ` [PATCH v5 32/32] x86/mm: Add support to make use of " Tom Lendacky
@ 2017-04-21 18:56   ` Tom Lendacky
  2017-05-19 11:30     ` Borislav Petkov
  2017-05-19 11:27   ` Borislav Petkov
  1 sibling, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-21 18:56 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 4/18/2017 4:22 PM, Tom Lendacky wrote:
> Add support to check if SME has been enabled and if memory encryption
> should be activated (checking of command line option based on the
> configuration of the default state).  If memory encryption is to be
> activated, then the encryption mask is set and the kernel is encrypted
> "in place."
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/head_64.S |    1 +
>  arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 80 insertions(+), 4 deletions(-)
>

...

>
> -unsigned long __init sme_enable(void)
> +unsigned long __init sme_enable(struct boot_params *bp)
>  {
> +	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> +	unsigned int eax, ebx, ecx, edx;
> +	unsigned long me_mask;
> +	bool active_by_default;
> +	char buffer[16];

So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
the stack-protector support causes issues with this function because
it is called so early. I can get past it by adding:

CFLAGS_mem_encrypt.o := $(nostackp)

in the arch/x86/mm/Makefile, but that obviously eliminates the support
for the whole file.  Would it be better to split out the sme_enable()
and other boot routines into a separate file or just apply the
$(nostackp) to the whole file?

Thanks,
Tom

> +	u64 msr;
> +
> +	/* Check for the SME support leaf */
> +	eax = 0x80000000;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (eax < 0x8000001f)
> +		goto out;
> +
> +	/*
> +	 * Check for the SME feature:
> +	 *   CPUID Fn8000_001F[EAX] - Bit 0
> +	 *     Secure Memory Encryption support
> +	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +	 *     Pagetable bit position used to indicate encryption
> +	 */
> +	eax = 0x8000001f;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (!(eax & 1))
> +		goto out;
> +	me_mask = 1UL << (ebx & 0x3f);
> +
> +	/* Check if SME is enabled */
> +	msr = __rdmsr(MSR_K8_SYSCFG);
> +	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> +		goto out;
> +
> +	/*
> +	 * Fixups have not been applied to phys_base yet, so we must obtain
> +	 * the address to the SME command line option data in the following
> +	 * way.
> +	 */
> +	asm ("lea sme_cmdline_arg(%%rip), %0"
> +	     : "=r" (cmdline_arg)
> +	     : "p" (sme_cmdline_arg));
> +	asm ("lea sme_cmdline_on(%%rip), %0"
> +	     : "=r" (cmdline_on)
> +	     : "p" (sme_cmdline_on));
> +	asm ("lea sme_cmdline_off(%%rip), %0"
> +	     : "=r" (cmdline_off)
> +	     : "p" (sme_cmdline_off));
> +
> +	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> +		active_by_default = true;
> +	else
> +		active_by_default = false;
> +
> +	cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
> +				     ((u64)bp->ext_cmd_line_ptr << 32));
> +
> +	cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
> +
> +	if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
> +		sme_me_mask = me_mask;
> +	else if (strncmp(buffer, cmdline_off, sizeof(buffer)) == 0)
> +		sme_me_mask = 0;
> +	else
> +		sme_me_mask = active_by_default ? me_mask : 0;
> +
> +out:
>  	return sme_me_mask;
>  }
>
> @@ -543,9 +618,9 @@ unsigned long sme_get_me_mask(void)
>
>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>
> -void __init sme_encrypt_kernel(void)	{ }
> -unsigned long __init sme_enable(void)	{ return 0; }
> +void __init sme_encrypt_kernel(void)			{ }
> +unsigned long __init sme_enable(struct boot_params *bp)	{ return 0; }
>
> -unsigned long sme_get_me_mask(void)	{ return 0; }
> +unsigned long sme_get_me_mask(void)			{ return 0; }
>
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing
  2017-04-21 14:55   ` Borislav Petkov
@ 2017-04-21 21:40     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-21 21:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 4/21/2017 9:55 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:17:35PM -0500, Tom Lendacky wrote:
>> Add support to the early boot code to use Secure Memory Encryption (SME).
>> Since the kernel has been loaded into memory in a decrypted state, support
>> is added to encrypt the kernel in place and update the early pagetables
>
> s/support is added to //

Done.

Thanks,
Tom

>
>> with the memory encryption mask so that new pagetable entries will use
>> memory encryption.
>>
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-18 21:17 ` [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption Tom Lendacky
@ 2017-04-21 21:52   ` Dave Hansen
  2017-04-24 15:53     ` Tom Lendacky
  2017-04-27 16:12   ` Borislav Petkov
  1 sibling, 1 reply; 126+ messages in thread
From: Dave Hansen @ 2017-04-21 21:52 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 04/18/2017 02:17 PM, Tom Lendacky wrote:
> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>  
>  #ifndef __va
> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
> +#define __va(x)			((void *)(__sme_clr(x) + PAGE_OFFSET))
>  #endif

It seems wrong to be modifying __va().  It currently takes a physical
address, and this modifies it to take a physical address plus the SME bits.

How does that end up ever happening?  If we are pulling physical
addresses out of the page tables, we use p??_phys().  I'd expect *those*
to be masking off the SME bits.

Is it these cases?

	pgd_t *base = __va(read_cr3());

For those, it seems like we really want to create two modes of reading
cr3.  One that truly reads CR3 and another that reads the pgd's physical
address out of CR3.  Then you only do the SME masking on the one
fetching a physical address, and the SME bits never leak into __va().

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-18 21:22 ` [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption Tom Lendacky
@ 2017-04-21 21:55   ` Dave Hansen
  2017-04-27  7:25     ` Dave Young
  2017-05-18 17:01   ` Borislav Petkov
  1 sibling, 1 reply; 126+ messages in thread
From: Dave Hansen @ 2017-04-21 21:55 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> determine if SME is active.
> 
> A new directory will be created:
>   /sys/kernel/mm/sme/
> 
> And two entries within the new directory:
>   /sys/kernel/mm/sme/active
>   /sys/kernel/mm/sme/encryption_mask

Why do they care, and what will they be doing with this information?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-21 21:52   ` Dave Hansen
@ 2017-04-24 15:53     ` Tom Lendacky
  2017-04-24 15:57       ` Dave Hansen
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-04-24 15:53 UTC (permalink / raw)
  To: Dave Hansen, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 4/21/2017 4:52 PM, Dave Hansen wrote:
> On 04/18/2017 02:17 PM, Tom Lendacky wrote:
>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>
>>  #ifndef __va
>> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
>> +#define __va(x)			((void *)(__sme_clr(x) + PAGE_OFFSET))
>>  #endif
>
> It seems wrong to be modifying __va().  It currently takes a physical
> address, and this modifies it to take a physical address plus the SME bits.

This actually modifies it to be sure the encryption bit is not part of
the physical address.

>
> How does that end up ever happening?  If we are pulling physical
> addresses out of the page tables, we use p??_phys().  I'd expect *those*
> to be masking off the SME bits.
>
> Is it these cases?
>
> 	pgd_t *base = __va(read_cr3());
>
> For those, it seems like we really want to create two modes of reading
> cr3.  One that truly reads CR3 and another that reads the pgd's physical
> address out of CR3.  Then you only do the SME masking on the one
> fetching a physical address, and the SME bits never leak into __va().

I'll investigate this and see if I can remove the mod to __va().

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-24 15:53     ` Tom Lendacky
@ 2017-04-24 15:57       ` Dave Hansen
  2017-04-24 16:10         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Dave Hansen @ 2017-04-24 15:57 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 04/24/2017 08:53 AM, Tom Lendacky wrote:
> On 4/21/2017 4:52 PM, Dave Hansen wrote:
>> On 04/18/2017 02:17 PM, Tom Lendacky wrote:
>>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void
>>> *from, unsigned long vaddr,
>>>      __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>>
>>>  #ifndef __va
>>> -#define __va(x)            ((void *)((unsigned long)(x)+PAGE_OFFSET))
>>> +#define __va(x)            ((void *)(__sme_clr(x) + PAGE_OFFSET))
>>>  #endif
>>
>> It seems wrong to be modifying __va().  It currently takes a physical
>> address, and this modifies it to take a physical address plus the SME
>> bits.
> 
> This actually modifies it to be sure the encryption bit is not part of
> the physical address.

If SME bits make it this far, we have a bug elsewhere.  Right?  Probably
best not to paper over it.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-24 15:57       ` Dave Hansen
@ 2017-04-24 16:10         ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-04-24 16:10 UTC (permalink / raw)
  To: Dave Hansen, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Andrey Ryabinin, Alexander Potapenko, Dave Young,
	Thomas Gleixner, Dmitry Vyukov

On 4/24/2017 10:57 AM, Dave Hansen wrote:
> On 04/24/2017 08:53 AM, Tom Lendacky wrote:
>> On 4/21/2017 4:52 PM, Dave Hansen wrote:
>>> On 04/18/2017 02:17 PM, Tom Lendacky wrote:
>>>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void
>>>> *from, unsigned long vaddr,
>>>>      __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>>>
>>>>  #ifndef __va
>>>> -#define __va(x)            ((void *)((unsigned long)(x)+PAGE_OFFSET))
>>>> +#define __va(x)            ((void *)(__sme_clr(x) + PAGE_OFFSET))
>>>>  #endif
>>>
>>> It seems wrong to be modifying __va().  It currently takes a physical
>>> address, and this modifies it to take a physical address plus the SME
>>> bits.
>>
>> This actually modifies it to be sure the encryption bit is not part of
>> the physical address.
>
> If SME bits make it this far, we have a bug elsewhere.  Right?  Probably
> best not to paper over it.

That all depends on the approach.  Currently that's not the case for
the one situation that you mentioned with cr3.  But if we do take the
approach that we should never feed physical addresses to __va() with
the encryption bit set then, yes, it would imply a bug elsewhere - which
is probably a good approach.

I'll work on that. I could even add a debug config option that would
issue a warning should __va() encounter the encryption bit if SME is
enabled or active.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-21 21:55   ` Dave Hansen
@ 2017-04-27  7:25     ` Dave Young
  2017-04-27 15:52       ` Dave Hansen
  2017-05-04 14:13       ` Tom Lendacky
  0 siblings, 2 replies; 126+ messages in thread
From: Dave Young @ 2017-04-27  7:25 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Thomas Gleixner,
	Rik van Riel, Brijesh Singh, Toshimitsu Kani, Arnd Bergmann,
	Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 04/21/17 at 02:55pm, Dave Hansen wrote:
> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> > Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> > determine if SME is active.
> > 
> > A new directory will be created:
> >   /sys/kernel/mm/sme/
> > 
> > And two entries within the new directory:
> >   /sys/kernel/mm/sme/active
> >   /sys/kernel/mm/sme/encryption_mask
> 
> Why do they care, and what will they be doing with this information?

Since kdump will copy old memory but need this to know if the old memory
was encrypted or not. With this sysfs file we can know the previous SME
status and pass to kdump kernel as like a kernel param.

Tom, have you got chance to try if it works or not?

Thanks
Dave

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-04-18 21:17 ` [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support Tom Lendacky
@ 2017-04-27 15:46   ` Borislav Petkov
  2017-05-04 14:24     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-04-27 15:46 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:17:27PM -0500, Tom Lendacky wrote:
> Add support for Secure Memory Encryption (SME). This initial support
> provides a Kconfig entry to build the SME support into the kernel and
> defines the memory encryption mask that will be used in subsequent
> patches to mark pages as encrypted.

...

> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> new file mode 100644
> index 0000000..d5c4a2b
> --- /dev/null
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -0,0 +1,42 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +

These ifdeffery closing #endif markers look strange:

> +#ifndef __X86_MEM_ENCRYPT_H__
> +#define __X86_MEM_ENCRYPT_H__
> +
> +#ifndef __ASSEMBLY__
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +
> +extern unsigned long sme_me_mask;
> +
> +static inline bool sme_active(void)
> +{
> +	return !!sme_me_mask;
> +}
> +
> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
> +
> +#ifndef sme_me_mask
> +#define sme_me_mask	0UL
> +
> +static inline bool sme_active(void)
> +{
> +	return false;
> +}
> +#endif

this endif is the sme_me_mask closing one and it has sme_active() in it.
Shouldn't it be:

#ifndef sme_me_mask
#define sme_me_mask  0UL
#endif

and have sme_active below it, in the !CONFIG_AMD_MEM_ENCRYPT branch?

The same thing is in include/linux/mem_encrypt.h

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-27  7:25     ` Dave Young
@ 2017-04-27 15:52       ` Dave Hansen
  2017-04-28  5:32         ` Dave Young
  2017-05-04 14:17         ` Tom Lendacky
  2017-05-04 14:13       ` Tom Lendacky
  1 sibling, 2 replies; 126+ messages in thread
From: Dave Hansen @ 2017-04-27 15:52 UTC (permalink / raw)
  To: Dave Young
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Thomas Gleixner,
	Rik van Riel, Brijesh Singh, Toshimitsu Kani, Arnd Bergmann,
	Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 04/27/2017 12:25 AM, Dave Young wrote:
> On 04/21/17 at 02:55pm, Dave Hansen wrote:
>> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>>>
>>> A new directory will be created:
>>>   /sys/kernel/mm/sme/
>>>
>>> And two entries within the new directory:
>>>   /sys/kernel/mm/sme/active
>>>   /sys/kernel/mm/sme/encryption_mask
>>
>> Why do they care, and what will they be doing with this information?
> 
> Since kdump will copy old memory but need this to know if the old memory
> was encrypted or not. With this sysfs file we can know the previous SME
> status and pass to kdump kernel as like a kernel param.
> 
> Tom, have you got chance to try if it works or not?

What will the kdump kernel do with it though?  We kexec() into that
kernel so the SME keys will all be the same, right?  So, will the kdump
kernel be just setting the encryption bit in the PTE so it can copy the
old plaintext out?

Why do we need both 'active' and 'encryption_mask'?  How could it be
that the hardware-enumerated 'encryption_mask' changes across a kexec()?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-18 21:17 ` [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption Tom Lendacky
  2017-04-21 21:52   ` Dave Hansen
@ 2017-04-27 16:12   ` Borislav Petkov
  2017-05-04 14:34     ` Tom Lendacky
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-04-27 16:12 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:17:54PM -0500, Tom Lendacky wrote:
> Changes to the existing page table macros will allow the SME support to
> be enabled in a simple fashion with minimal changes to files that use these
> macros.  Since the memory encryption mask will now be part of the regular
> pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
> _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
> without the encryption mask before SME becomes active.  Two new pgprot()
> macros are defined to allow setting or clearing the page encryption mask.

...

> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>  
>  #ifndef __va
> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
> +#define __va(x)			((void *)(__sme_clr(x) + PAGE_OFFSET))
>  #endif
>  
>  #define __boot_va(x)		__va(x)
> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
> index 7bd0099..fead0a5 100644
> --- a/arch/x86/include/asm/page_types.h
> +++ b/arch/x86/include/asm/page_types.h
> @@ -15,7 +15,7 @@
>  #define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
>  #define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
>  
> -#define __PHYSICAL_MASK		((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
> +#define __PHYSICAL_MASK		((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))

That looks strange: poking SME mask hole into a mask...?

>  #define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
>  
>  /* Cast *PAGE_MASK to a signed type so that it is sign-extended if

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-27 15:52       ` Dave Hansen
@ 2017-04-28  5:32         ` Dave Young
  2017-05-04 14:17         ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Dave Young @ 2017-04-28  5:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Thomas Gleixner,
	Rik van Riel, Brijesh Singh, Toshimitsu Kani, Arnd Bergmann,
	Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 04/27/17 at 08:52am, Dave Hansen wrote:
> On 04/27/2017 12:25 AM, Dave Young wrote:
> > On 04/21/17 at 02:55pm, Dave Hansen wrote:
> >> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
> >>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> >>> determine if SME is active.
> >>>
> >>> A new directory will be created:
> >>>   /sys/kernel/mm/sme/
> >>>
> >>> And two entries within the new directory:
> >>>   /sys/kernel/mm/sme/active
> >>>   /sys/kernel/mm/sme/encryption_mask
> >>
> >> Why do they care, and what will they be doing with this information?
> > 
> > Since kdump will copy old memory but need this to know if the old memory
> > was encrypted or not. With this sysfs file we can know the previous SME
> > status and pass to kdump kernel as like a kernel param.
> > 
> > Tom, have you got chance to try if it works or not?
> 
> What will the kdump kernel do with it though?  We kexec() into that
> kernel so the SME keys will all be the same, right?  So, will the kdump
> kernel be just setting the encryption bit in the PTE so it can copy the
> old plaintext out?

I assume it is for active -> non active case, the new boot need to know
the old memory is encrypted. But I think I did not read all the patches
I may miss things.

> 
> Why do we need both 'active' and 'encryption_mask'?  How could it be
> that the hardware-enumerated 'encryption_mask' changes across a kexec()?

Leave this question to Tom..

Thanks
Dave

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly
  2017-04-18 21:18 ` [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly Tom Lendacky
@ 2017-05-04 10:16   ` Borislav Petkov
  2017-05-04 14:39     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-04 10:16 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:18:22PM -0500, Tom Lendacky wrote:
> The boot data and command line data are present in memory in a decrypted
> state and are copied early in the boot process.  The early page fault
> support will map these areas as encrypted, so before attempting to copy
> them, add decrypted mappings so the data is accessed properly when copied.
> 
> For the initrd, encrypt this data in place. Since the future mapping of the
> initrd area will be mapped as encrypted the data will be accessed properly.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |   11 +++++
>  arch/x86/include/asm/pgtable.h     |    3 +
>  arch/x86/kernel/head64.c           |   30 ++++++++++++--
>  arch/x86/kernel/setup.c            |   10 +++++
>  arch/x86/mm/mem_encrypt.c          |   77 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 127 insertions(+), 4 deletions(-)

...

> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 603a166..a95800b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -115,6 +115,7 @@
>  #include <asm/microcode.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kaslr.h>
> +#include <asm/mem_encrypt.h>
>  
>  /*
>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
> @@ -374,6 +375,15 @@ static void __init reserve_initrd(void)
>  	    !ramdisk_image || !ramdisk_size)
>  		return;		/* No initrd provided by bootloader */
>  
> +	/*
> +	 * If SME is active, this memory will be marked encrypted by the
> +	 * kernel when it is accessed (including relocation). However, the
> +	 * ramdisk image was loaded decrypted by the bootloader, so make
> +	 * sure that it is encrypted before accessing it.
> +	 */
> +	if (sme_active())

That test is not needed here because __sme_early_enc_dec() already tests
sme_me_mask. There you should change that test to sme_active() instead.

> +		sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
> +
>  	initrd_start = 0;
>  
>  	mapped_size = memblock_mem_size(max_pfn_mapped);

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-27  7:25     ` Dave Young
  2017-04-27 15:52       ` Dave Hansen
@ 2017-05-04 14:13       ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-04 14:13 UTC (permalink / raw)
  To: Dave Young, Dave Hansen
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Thomas Gleixner, Rik van Riel,
	Brijesh Singh, Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Matt Fleming, Joerg Roedel, Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 4/27/2017 2:25 AM, Dave Young wrote:
> On 04/21/17 at 02:55pm, Dave Hansen wrote:
>> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>>>
>>> A new directory will be created:
>>>   /sys/kernel/mm/sme/
>>>
>>> And two entries within the new directory:
>>>   /sys/kernel/mm/sme/active
>>>   /sys/kernel/mm/sme/encryption_mask
>>
>> Why do they care, and what will they be doing with this information?
>
> Since kdump will copy old memory but need this to know if the old memory
> was encrypted or not. With this sysfs file we can know the previous SME
> status and pass to kdump kernel as like a kernel param.
>
> Tom, have you got chance to try if it works or not?

Sorry, I haven't had a chance to test this yet.

Thanks,
Tom

>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-27 15:52       ` Dave Hansen
  2017-04-28  5:32         ` Dave Young
@ 2017-05-04 14:17         ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-04 14:17 UTC (permalink / raw)
  To: Dave Hansen, Dave Young
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Thomas Gleixner, Rik van Riel,
	Brijesh Singh, Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Matt Fleming, Joerg Roedel, Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 4/27/2017 10:52 AM, Dave Hansen wrote:
> On 04/27/2017 12:25 AM, Dave Young wrote:
>> On 04/21/17 at 02:55pm, Dave Hansen wrote:
>>> On 04/18/2017 02:22 PM, Tom Lendacky wrote:
>>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>>> determine if SME is active.
>>>>
>>>> A new directory will be created:
>>>>   /sys/kernel/mm/sme/
>>>>
>>>> And two entries within the new directory:
>>>>   /sys/kernel/mm/sme/active
>>>>   /sys/kernel/mm/sme/encryption_mask
>>>
>>> Why do they care, and what will they be doing with this information?
>>
>> Since kdump will copy old memory but need this to know if the old memory
>> was encrypted or not. With this sysfs file we can know the previous SME
>> status and pass to kdump kernel as like a kernel param.
>>
>> Tom, have you got chance to try if it works or not?
>
> What will the kdump kernel do with it though?  We kexec() into that
> kernel so the SME keys will all be the same, right?  So, will the kdump

Yes, the SME key will be same after a kexec.

> kernel be just setting the encryption bit in the PTE so it can copy the
> old plaintext out?

Yes, the idea would be to set the encryption bit in the PTE when mapping
and copying encrypted pages and not set it for unencrypted pages.

>
> Why do we need both 'active' and 'encryption_mask'?  How could it be
> that the hardware-enumerated 'encryption_mask' changes across a kexec()?
>

We don't need both, I just added the 'encryption mask' entry for
information. It won't change across a kexec, I can remove it.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-04-27 15:46   ` Borislav Petkov
@ 2017-05-04 14:24     ` Tom Lendacky
  2017-05-04 14:36       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-04 14:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 4/27/2017 10:46 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:17:27PM -0500, Tom Lendacky wrote:
>> Add support for Secure Memory Encryption (SME). This initial support
>> provides a Kconfig entry to build the SME support into the kernel and
>> defines the memory encryption mask that will be used in subsequent
>> patches to mark pages as encrypted.
>
> ...
>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> new file mode 100644
>> index 0000000..d5c4a2b
>> --- /dev/null
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -0,0 +1,42 @@
>> +/*
>> + * AMD Memory Encryption Support
>> + *
>> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>
> These ifdeffery closing #endif markers look strange:
>
>> +#ifndef __X86_MEM_ENCRYPT_H__
>> +#define __X86_MEM_ENCRYPT_H__
>> +
>> +#ifndef __ASSEMBLY__
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +
>> +extern unsigned long sme_me_mask;
>> +
>> +static inline bool sme_active(void)
>> +{
>> +	return !!sme_me_mask;
>> +}
>> +
>> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
>> +
>> +#ifndef sme_me_mask
>> +#define sme_me_mask	0UL
>> +
>> +static inline bool sme_active(void)
>> +{
>> +	return false;
>> +}
>> +#endif
>
> this endif is the sme_me_mask closing one and it has sme_active() in it.
> Shouldn't it be:
>
> #ifndef sme_me_mask
> #define sme_me_mask  0UL
> #endif
>
> and have sme_active below it, in the !CONFIG_AMD_MEM_ENCRYPT branch?
>
> The same thing is in include/linux/mem_encrypt.h

I did this so that an the include order wouldn't cause issues (including
asm/mem_encrypt.h followed by later by a linux/mem_encrypt.h include).
I can make this a bit clearer by having separate #defines for each
thing, e.g.:

#ifndef sme_me_mask
#define sme_me_mask 0UL
#endif

#ifndef sme_active
#define sme_active sme_active
static inline ...
#endif

Is that better/clearer?

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-04-27 16:12   ` Borislav Petkov
@ 2017-05-04 14:34     ` Tom Lendacky
  2017-05-04 17:01       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-04 14:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov



On 4/27/2017 11:12 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:17:54PM -0500, Tom Lendacky wrote:
>> Changes to the existing page table macros will allow the SME support to
>> be enabled in a simple fashion with minimal changes to files that use these
>> macros.  Since the memory encryption mask will now be part of the regular
>> pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
>> _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
>> without the encryption mask before SME becomes active.  Two new pgprot()
>> macros are defined to allow setting or clearing the page encryption mask.
>
> ...
>
>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>
>>  #ifndef __va
>> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
>> +#define __va(x)			((void *)(__sme_clr(x) + PAGE_OFFSET))
>>  #endif
>>
>>  #define __boot_va(x)		__va(x)
>> diff --git a/arch/x86/include/asm/page_types.h b/arch/x86/include/asm/page_types.h
>> index 7bd0099..fead0a5 100644
>> --- a/arch/x86/include/asm/page_types.h
>> +++ b/arch/x86/include/asm/page_types.h
>> @@ -15,7 +15,7 @@
>>  #define PUD_PAGE_SIZE		(_AC(1, UL) << PUD_SHIFT)
>>  #define PUD_PAGE_MASK		(~(PUD_PAGE_SIZE-1))
>>
>> -#define __PHYSICAL_MASK		((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
>> +#define __PHYSICAL_MASK		((phys_addr_t)(__sme_clr((1ULL << __PHYSICAL_MASK_SHIFT) - 1)))
>
> That looks strange: poking SME mask hole into a mask...?

I masked it out here based on a previous comment from Dave Hansen:

   http://marc.info/?l=linux-kernel&m=148778719826905&w=2

I could move the __sme_clr into the individual defines of:

PHYSICAL_PAGE_MASK, PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK

Either way works for me.

Thanks,
Tom

>
>>  #define __VIRTUAL_MASK		((1UL << __VIRTUAL_MASK_SHIFT) - 1)
>>
>>  /* Cast *PAGE_MASK to a signed type so that it is sign-extended if
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-05-04 14:24     ` Tom Lendacky
@ 2017-05-04 14:36       ` Borislav Petkov
  2017-05-16 19:28         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-04 14:36 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Thu, May 04, 2017 at 09:24:11AM -0500, Tom Lendacky wrote:
> I did this so that an the include order wouldn't cause issues (including
> asm/mem_encrypt.h followed by later by a linux/mem_encrypt.h include).
> I can make this a bit clearer by having separate #defines for each
> thing, e.g.:
> 
> #ifndef sme_me_mask
> #define sme_me_mask 0UL
> #endif
> 
> #ifndef sme_active
> #define sme_active sme_active
> static inline ...
> #endif
> 
> Is that better/clearer?

I guess but where do we have to include both the asm/ and the linux/
version?

IOW, can we avoid these issues altogether by partitioning symbol
declarations differently among the headers?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly
  2017-05-04 10:16   ` Borislav Petkov
@ 2017-05-04 14:39     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-04 14:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/4/2017 5:16 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:18:22PM -0500, Tom Lendacky wrote:
>> The boot data and command line data are present in memory in a decrypted
>> state and are copied early in the boot process.  The early page fault
>> support will map these areas as encrypted, so before attempting to copy
>> them, add decrypted mappings so the data is accessed properly when copied.
>>
>> For the initrd, encrypt this data in place. Since the future mapping of the
>> initrd area will be mapped as encrypted the data will be accessed properly.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |   11 +++++
>>  arch/x86/include/asm/pgtable.h     |    3 +
>>  arch/x86/kernel/head64.c           |   30 ++++++++++++--
>>  arch/x86/kernel/setup.c            |   10 +++++
>>  arch/x86/mm/mem_encrypt.c          |   77 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 127 insertions(+), 4 deletions(-)
>
> ...
>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 603a166..a95800b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -115,6 +115,7 @@
>>  #include <asm/microcode.h>
>>  #include <asm/mmu_context.h>
>>  #include <asm/kaslr.h>
>> +#include <asm/mem_encrypt.h>
>>
>>  /*
>>   * max_low_pfn_mapped: highest direct mapped pfn under 4GB
>> @@ -374,6 +375,15 @@ static void __init reserve_initrd(void)
>>  	    !ramdisk_image || !ramdisk_size)
>>  		return;		/* No initrd provided by bootloader */
>>
>> +	/*
>> +	 * If SME is active, this memory will be marked encrypted by the
>> +	 * kernel when it is accessed (including relocation). However, the
>> +	 * ramdisk image was loaded decrypted by the bootloader, so make
>> +	 * sure that it is encrypted before accessing it.
>> +	 */
>> +	if (sme_active())
>
> That test is not needed here because __sme_early_enc_dec() already tests
> sme_me_mask. There you should change that test to sme_active() instead.

Yeah, I was probably thinking slightly ahead to SEV where the initrd
will already be encrypted and so we only want to do this for SME.
That change can come in the SEV support patches, though.

Thanks,
Tom

>
>> +		sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
>> +
>>  	initrd_start = 0;
>>
>>  	mapped_size = memblock_mem_size(max_pfn_mapped);
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption
  2017-05-04 14:34     ` Tom Lendacky
@ 2017-05-04 17:01       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-04 17:01 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Thu, May 04, 2017 at 09:34:09AM -0500, Tom Lendacky wrote:
> I masked it out here based on a previous comment from Dave Hansen:
> 
>   http://marc.info/?l=linux-kernel&m=148778719826905&w=2
> 
> I could move the __sme_clr into the individual defines of:

Nah, it is fine as it is. I was just wondering...

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address
  2017-04-18 21:18 ` [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address Tom Lendacky
@ 2017-05-05 17:11   ` Borislav Petkov
  2017-05-06  7:48     ` Ard Biesheuvel
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-05 17:11 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:18:31PM -0500, Tom Lendacky wrote:
> Add a function that will return the E820 type associated with an address
> range.

...

> @@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
>  		 * coverage of the desired range exists:
>  		 */
>  		if (start >= end)
> -			return 1;
> +			return entry;
>  	}
> -	return 0;
> +
> +	return NULL;
> +}
> +
> +/*
> + * This function checks if the entire range <start,end> is mapped with type.
> + */
> +bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
> +{
> +	return __e820__mapped_all(start, end, type) ? 1 : 0;

	return !!__e820__mapped_all(start, end, type);

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address
  2017-05-05 17:11   ` Borislav Petkov
@ 2017-05-06  7:48     ` Ard Biesheuvel
  0 siblings, 0 replies; 126+ messages in thread
From: Ard Biesheuvel @ 2017-05-06  7:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, linux-arch, linux-efi, KVM devel mailing list,
	linux-doc, x86, kexec, linux-kernel, kasan-dev, linux-mm, iommu,
	Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5 May 2017 at 18:11, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Apr 18, 2017 at 04:18:31PM -0500, Tom Lendacky wrote:
>> Add a function that will return the E820 type associated with an address
>> range.
>
> ...
>
>> @@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
>>                * coverage of the desired range exists:
>>                */
>>               if (start >= end)
>> -                     return 1;
>> +                     return entry;
>>       }
>> -     return 0;
>> +
>> +     return NULL;
>> +}
>> +
>> +/*
>> + * This function checks if the entire range <start,end> is mapped with type.
>> + */
>> +bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
>> +{
>> +     return __e820__mapped_all(start, end, type) ? 1 : 0;
>
>         return !!__e820__mapped_all(start, end, type);
>

Even the !! double negation is redundant, given that the function returns bool.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0
  2017-04-18 21:19 ` [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0 Tom Lendacky
@ 2017-05-07 17:18   ` Borislav Petkov
  2017-05-08 13:20     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-07 17:18 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:19:00PM -0500, Tom Lendacky wrote:
> The efi_mem_type() function currently returns a 0, which maps to
> EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
> the supplied physical address. Returning EFI_RESERVED_TYPE implies that
> a memmap entry exists, when it doesn't.  Instead of returning 0, change
> the function to return a negative error value when no memmap entry is
> found.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---

...

> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index cd768a1..a27bb3f 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -973,7 +973,7 @@ static inline void efi_esrt_init(void) { }
>  extern int efi_config_parse_tables(void *config_tables, int count, int sz,
>  				   efi_config_table_type_t *arch_tables);
>  extern u64 efi_get_iobase (void);
> -extern u32 efi_mem_type (unsigned long phys_addr);
> +extern int efi_mem_type (unsigned long phys_addr);

WARNING: space prohibited between function name and open parenthesis '('
#101: FILE: include/linux/efi.h:976:
+extern int efi_mem_type (unsigned long phys_addr);

Please integrate scripts/checkpatch.pl in your patch creation workflow.
Some of the warnings/errors *actually* make sense.

I know, the other function prototypes have a space too but that's not
our coding style. Looks like this trickled in from ia64, from looking at
arch/ia64/kernel/efi.c.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0
  2017-05-07 17:18   ` Borislav Petkov
@ 2017-05-08 13:20     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-08 13:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/7/2017 12:18 PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:00PM -0500, Tom Lendacky wrote:
>> The efi_mem_type() function currently returns a 0, which maps to
>> EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
>> the supplied physical address. Returning EFI_RESERVED_TYPE implies that
>> a memmap entry exists, when it doesn't.  Instead of returning 0, change
>> the function to return a negative error value when no memmap entry is
>> found.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>
> ...
>
>> diff --git a/include/linux/efi.h b/include/linux/efi.h
>> index cd768a1..a27bb3f 100644
>> --- a/include/linux/efi.h
>> +++ b/include/linux/efi.h
>> @@ -973,7 +973,7 @@ static inline void efi_esrt_init(void) { }
>>  extern int efi_config_parse_tables(void *config_tables, int count, int sz,
>>  				   efi_config_table_type_t *arch_tables);
>>  extern u64 efi_get_iobase (void);
>> -extern u32 efi_mem_type (unsigned long phys_addr);
>> +extern int efi_mem_type (unsigned long phys_addr);
>
> WARNING: space prohibited between function name and open parenthesis '('
> #101: FILE: include/linux/efi.h:976:
> +extern int efi_mem_type (unsigned long phys_addr);
>
> Please integrate scripts/checkpatch.pl in your patch creation workflow.
> Some of the warnings/errors *actually* make sense.

I do/did run scripts/checkpatch.pl against all my patches. In this case
I chose to keep the space in order to stay consistent with some of the
surrounding functions.  No problem though, I can remove the space.

Thanks,
Tom

>
> I know, the other function prototypes have a space too but that's not
> our coding style. Looks like this trickled in from ia64, from looking at
> arch/ia64/kernel/efi.c.
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 14/32] efi: Add an EFI table address match function
  2017-04-18 21:18 ` [PATCH v5 14/32] efi: Add an EFI table address match function Tom Lendacky
@ 2017-05-15 18:09   ` Borislav Petkov
  2017-05-16 21:53     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-15 18:09 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:18:48PM -0500, Tom Lendacky wrote:
> Add a function that will determine if a supplied physical address matches
> the address of an EFI table.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  drivers/firmware/efi/efi.c |   33 +++++++++++++++++++++++++++++++++
>  include/linux/efi.h        |    7 +++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index b372aad..8f606a3 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
>  };
>  EXPORT_SYMBOL(efi);
>  
> +static unsigned long *efi_tables[] = {
> +	&efi.mps,
> +	&efi.acpi,
> +	&efi.acpi20,
> +	&efi.smbios,
> +	&efi.smbios3,
> +	&efi.sal_systab,
> +	&efi.boot_info,
> +	&efi.hcdp,
> +	&efi.uga,
> +	&efi.uv_systab,
> +	&efi.fw_vendor,
> +	&efi.runtime,
> +	&efi.config_table,
> +	&efi.esrt,
> +	&efi.properties_table,
> +	&efi.mem_attr_table,
> +};
> +
>  static bool disable_runtime;
>  static int __init setup_noefi(char *arg)
>  {
> @@ -854,6 +873,20 @@ int efi_status_to_err(efi_status_t status)
>  	return err;
>  }
>  
> +bool efi_table_address_match(unsigned long phys_addr)

efi_is_table_address() reads easier/better in the code.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-04-18 21:19 ` [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear Tom Lendacky
@ 2017-05-15 18:35   ` Borislav Petkov
  2017-05-17 18:54     ` Tom Lendacky
  2017-05-18 19:50     ` Matt Fleming
  0 siblings, 2 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-15 18:35 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted because UEFI/BIOS does not run with SME active. In order to access
> this data properly it needs to be mapped decrypted.
> 
> The early_memremap() support is updated to provide an arch specific

"Update early_memremap() to provide... "

> routine to modify the pagetable protection attributes before they are
> applied to the new mapping. This is used to remove the encryption mask
> for boot related data.
> 
> The memremap() support is updated to provide an arch specific routine

Ditto. Passive tone always reads harder than an active tone,
"doer"-sentence.

> to determine if RAM remapping is allowed.  RAM remapping will cause an
> encrypted mapping to be generated. By preventing RAM remapping,
> ioremap_cache() will be used instead, which will provide a decrypted
> mapping of the boot related data.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/io.h |    4 +
>  arch/x86/mm/ioremap.c     |  182 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/io.h        |    2 
>  kernel/memremap.c         |   20 ++++-
>  mm/early_ioremap.c        |   18 ++++
>  5 files changed, 219 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index 7afb0e2..75f2858 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -381,4 +381,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
>  #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
>  #endif
>  
> +extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size,
> +				       unsigned long flags);
> +#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
> +
>  #endif /* _ASM_X86_IO_H */
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 9bfcb1f..bce0604 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -13,6 +13,7 @@
>  #include <linux/slab.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mmiotrace.h>
> +#include <linux/efi.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/e820/api.h>
> @@ -21,6 +22,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pat.h>
> +#include <asm/setup.h>
>  
>  #include "physaddr.h"
>  
> @@ -419,6 +421,186 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>  	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>  }
>  
> +/*
> + * Examine the physical address to determine if it is an area of memory
> + * that should be mapped decrypted.  If the memory is not part of the
> + * kernel usable area it was accessed and created decrypted, so these
> + * areas should be mapped decrypted.
> + */
> +static bool memremap_should_map_decrypted(resource_size_t phys_addr,
> +					  unsigned long size)
> +{
> +	/* Check if the address is outside kernel usable area */
> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
> +	case E820_TYPE_RESERVED:
> +	case E820_TYPE_ACPI:
> +	case E820_TYPE_NVS:
> +	case E820_TYPE_UNUSABLE:
> +		return true;
> +	default:
> +		break;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Examine the physical address to determine if it is EFI data. Check
> + * it against the boot params structure and EFI tables and memory types.
> + */
> +static bool memremap_is_efi_data(resource_size_t phys_addr,
> +				 unsigned long size)
> +{
> +	u64 paddr;
> +
> +	/* Check if the address is part of EFI boot/runtime data */
> +	if (efi_enabled(EFI_BOOT)) {

Save indentation level:

	if (!efi_enabled(EFI_BOOT))
		return false;


> +		paddr = boot_params.efi_info.efi_memmap_hi;
> +		paddr <<= 32;
> +		paddr |= boot_params.efi_info.efi_memmap;
> +		if (phys_addr == paddr)
> +			return true;
> +
> +		paddr = boot_params.efi_info.efi_systab_hi;
> +		paddr <<= 32;
> +		paddr |= boot_params.efi_info.efi_systab;

So those two above look like could be two global vars which are
initialized somewhere in the EFI init path:

efi_memmap_phys and efi_systab_phys or so.

Matt ?

And then you won't need to create that paddr each time on the fly. I
mean, it's not a lot of instructions but still...

> +		if (phys_addr == paddr)
> +			return true;
> +
> +		if (efi_table_address_match(phys_addr))
> +			return true;
> +
> +		switch (efi_mem_type(phys_addr)) {
> +		case EFI_BOOT_SERVICES_DATA:
> +		case EFI_RUNTIME_SERVICES_DATA:
> +			return true;
> +		default:
> +			break;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Examine the physical address to determine if it is boot data by checking
> + * it against the boot params setup_data chain.
> + */
> +static bool memremap_is_setup_data(resource_size_t phys_addr,
> +				   unsigned long size)
> +{
> +	struct setup_data *data;
> +	u64 paddr, paddr_next;
> +
> +	paddr = boot_params.hdr.setup_data;
> +	while (paddr) {
> +		bool is_setup_data = false;

You don't need that bool:

static bool memremap_is_setup_data(resource_size_t phys_addr,
                                   unsigned long size)
{
        struct setup_data *data;
        u64 paddr, paddr_next;

        paddr = boot_params.hdr.setup_data;
        while (paddr) {
                if (phys_addr == paddr)
                        return true;

                data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC);

                paddr_next = data->next;

                if ((phys_addr > paddr) && (phys_addr < (paddr + data->len))) {
                        memunmap(data);
                        return true;
                }

                memunmap(data);

                paddr = paddr_next;
        }
        return false;
}

Flow is a bit clearer.

> +/*
> + * Examine the physical address to determine if it is boot data by checking
> + * it against the boot params setup_data chain (early boot version).
> + */
> +static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
> +						unsigned long size)
> +{
> +	struct setup_data *data;
> +	u64 paddr, paddr_next;
> +
> +	paddr = boot_params.hdr.setup_data;
> +	while (paddr) {
> +		bool is_setup_data = false;
> +
> +		if (phys_addr == paddr)
> +			return true;
> +
> +		data = early_memremap_decrypted(paddr, sizeof(*data));
> +
> +		paddr_next = data->next;
> +
> +		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
> +			is_setup_data = true;
> +
> +		early_memunmap(data, sizeof(*data));
> +
> +		if (is_setup_data)
> +			return true;
> +
> +		paddr = paddr_next;
> +	}
> +
> +	return false;
> +}

This one is begging to be unified with memremap_is_setup_data() to both
call a __ worker function.

> +
> +/*
> + * Architecture function to determine if RAM remap is allowed. By default, a
> + * RAM remap will map the data as encrypted. Determine if a RAM remap should
> + * not be done so that the data will be mapped decrypted.
> + */
> +bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size,
> +				unsigned long flags)

So this function doesn't do anything - it replies to a yes/no question.
So the name should not say "do" but sound like a question. Maybe:

	if (arch_memremap_can_remap( ... ))

or so...

> +{
> +	if (!sme_active())
> +		return true;
> +
> +	if (flags & MEMREMAP_ENC)
> +		return true;
> +
> +	if (flags & MEMREMAP_DEC)
> +		return false;

So this looks strange to me: both flags MEMREMAP_ENC and _DEC override
setup and efi data checking. But we want to remap setup and EFI  data
*always* decrypted because that data was not encrypted as, as you say,
firmware doesn't run with SME active.

So my simple logic says that EFI stuff should *always* be mapped DEC,
regardless of flags. Ditto for setup data. So that check below should
actually *override* the flags checks and go before them, no?

> +
> +	if (memremap_is_setup_data(phys_addr, size) ||
> +	    memremap_is_efi_data(phys_addr, size) ||
> +	    memremap_should_map_decrypted(phys_addr, size))
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * Architecture override of __weak function to adjust the protection attributes
> + * used when remapping memory. By default, early_memremp() will map the data

early_memremAp() - a is missing.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data
  2017-04-18 21:19 ` [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data Tom Lendacky
@ 2017-05-16  8:36   ` Borislav Petkov
  2017-05-17 20:26     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-16  8:36 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:19:30PM -0500, Tom Lendacky wrote:
> The SMP MP-table is built by UEFI and placed in memory in a decrypted
> state. These tables are accessed using a mix of early_memremap(),
> early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
> to use early_memremap()/early_memunmap(). This allows for proper setting
> of the encryption mask so that the data can be successfully accessed when
> SME is active.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/mpparse.c |  102 +++++++++++++++++++++++++++++++--------------
>  1 file changed, 71 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
> index fd37f39..afbda41d 100644
> --- a/arch/x86/kernel/mpparse.c
> +++ b/arch/x86/kernel/mpparse.c
> @@ -429,7 +429,21 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
>  	}
>  }
>  
> -static struct mpf_intel *mpf_found;
> +static unsigned long mpf_base;
> +
> +static void __init unmap_mpf(struct mpf_intel *mpf)
> +{
> +	early_memunmap(mpf, sizeof(*mpf));
> +}
> +
> +static struct mpf_intel * __init map_mpf(unsigned long paddr)
> +{
> +	struct mpf_intel *mpf;
> +
> +	mpf = early_memremap(paddr, sizeof(*mpf));
> +
> +	return mpf;

	return early_memremap(paddr, sizeof(*mpf));

...

> @@ -842,25 +873,26 @@ static int __init update_mp_table(void)
>  	if (!enable_update_mptable)
>  		return 0;
>  
> -	mpf = mpf_found;
> -	if (!mpf)
> +	if (!mpf_base)
>  		return 0;
>  
> +	mpf = map_mpf(mpf_base);
> +
>  	/*
>  	 * Now see if we need to go further.
>  	 */
>  	if (mpf->feature1 != 0)

You're kidding, right? map_mpf() *can* return NULL.

Also, simplify that test:

	if (mpf->feature1)
		...


> -		return 0;
> +		goto do_unmap_mpf;
>  
>  	if (!mpf->physptr)
> -		return 0;
> +		goto do_unmap_mpf;
>  
> -	mpc = phys_to_virt(mpf->physptr);
> +	mpc = map_mpc(mpf->physptr);

Again: error checking !!!

You have other calls to early_memremap()/map_mpf() in this patch. Please
add error checking everywhere.

>  
>  	if (!smp_check_mpc(mpc, oem, str))
> -		return 0;
> +		goto do_unmap_mpc;
>  
> -	pr_info("mpf: %llx\n", (u64)virt_to_phys(mpf));
> +	pr_info("mpf: %llx\n", (u64)mpf_base);
>  	pr_info("physptr: %x\n", mpf->physptr);
>  
>  	if (mpc_new_phys && mpc->length > mpc_new_length) {
> @@ -878,21 +910,23 @@ static int __init update_mp_table(void)
>  		new = mpf_checksum((unsigned char *)mpc, mpc->length);
>  		if (old == new) {
>  			pr_info("mpc is readonly, please try alloc_mptable instead\n");
> -			return 0;
> +			goto do_unmap_mpc;
>  		}
>  		pr_info("use in-position replacing\n");
>  	} else {
>  		mpf->physptr = mpc_new_phys;
> -		mpc_new = phys_to_virt(mpc_new_phys);
> +		mpc_new = map_mpc(mpc_new_phys);

Ditto.

>  		memcpy(mpc_new, mpc, mpc->length);
> +		unmap_mpc(mpc);
>  		mpc = mpc_new;
>  		/* check if we can modify that */
>  		if (mpc_new_phys - mpf->physptr) {
>  			struct mpf_intel *mpf_new;
>  			/* steal 16 bytes from [0, 1k) */
>  			pr_info("mpf new: %x\n", 0x400 - 16);
> -			mpf_new = phys_to_virt(0x400 - 16);
> +			mpf_new = map_mpf(0x400 - 16);

Ditto.

>  			memcpy(mpf_new, mpf, 16);
> +			unmap_mpf(mpf);
>  			mpf = mpf_new;
>  			mpf->physptr = mpc_new_phys;
>  		}

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear
  2017-04-18 21:19 ` [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear Tom Lendacky
@ 2017-05-16 14:04   ` Borislav Petkov
  2017-05-19 19:52     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-16 14:04 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:19:42PM -0500, Tom Lendacky wrote:
> Persistent memory is expected to persist across reboots. The encryption
> key used by SME will change across reboots which will result in corrupted
> persistent memory.  Persistent memory is handed out by block devices
> through memory remapping functions, so be sure not to map this memory as
> encrypted.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/mm/ioremap.c |   31 ++++++++++++++++++++++++++++++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index bce0604..55317ba 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -425,17 +425,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>   * Examine the physical address to determine if it is an area of memory
>   * that should be mapped decrypted.  If the memory is not part of the
>   * kernel usable area it was accessed and created decrypted, so these
> - * areas should be mapped decrypted.
> + * areas should be mapped decrypted. And since the encryption key can
> + * change across reboots, persistent memory should also be mapped
> + * decrypted.
>   */
>  static bool memremap_should_map_decrypted(resource_size_t phys_addr,
>  					  unsigned long size)
>  {
> +	int is_pmem;
> +
> +	/*
> +	 * Check if the address is part of a persistent memory region.
> +	 * This check covers areas added by E820, EFI and ACPI.
> +	 */
> +	is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
> +				    IORES_DESC_PERSISTENT_MEMORY);
> +	if (is_pmem != REGION_DISJOINT)
> +		return true;
> +
> +	/*
> +	 * Check if the non-volatile attribute is set for an EFI
> +	 * reserved area.
> +	 */
> +	if (efi_enabled(EFI_BOOT)) {
> +		switch (efi_mem_type(phys_addr)) {
> +		case EFI_RESERVED_TYPE:
> +			if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
> +				return true;
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
>  	/* Check if the address is outside kernel usable area */
>  	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
>  	case E820_TYPE_RESERVED:
>  	case E820_TYPE_ACPI:
>  	case E820_TYPE_NVS:
>  	case E820_TYPE_UNUSABLE:
> +	case E820_TYPE_PRAM:

Can't you simply add:

	case E820_TYPE_PMEM:

here too and thus get rid of the region_intersects() thing above?

Because, for example, e820_type_to_iores_desc() maps E820_TYPE_PMEM to
IORES_DESC_PERSISTENT_MEMORY so those should be equivalent...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption
  2017-04-18 21:20 ` [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption Tom Lendacky
@ 2017-05-16 14:27   ` Borislav Petkov
  2017-05-19 19:54     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-16 14:27 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:20:10PM -0500, Tom Lendacky wrote:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create decrypted bounce buffers for use by these devices.

Use a verb in the subject:

Subject: x86, swiotlb: Add memory encryption support

or similar.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME
  2017-04-18 21:20 ` [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
@ 2017-05-16 14:52   ` Borislav Petkov
  2017-05-19 19:55     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-16 14:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:20:19PM -0500, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>  include/linux/dma-mapping.h        |   11 +++++++++++
>  include/linux/mem_encrypt.h        |    6 ++++++
>  lib/swiotlb.c                      |    3 +++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 0637b4b..b406df2 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>  	return !!sme_me_mask;
>  }
>  
> +static inline u64 sme_dma_mask(void)
> +{
> +	return ((u64)sme_me_mask << 1) - 1;
> +}
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
>  			      unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> @@ -50,6 +55,12 @@ static inline bool sme_active(void)
>  {
>  	return false;
>  }
> +
> +static inline u64 sme_dma_mask(void)
> +{
> +	return 0ULL;
> +}
> +
>  #endif
>  
>  static inline void __init sme_early_encrypt(resource_size_t paddr,
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 0977317..f825870 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -10,6 +10,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/kmemcheck.h>
>  #include <linux/bug.h>
> +#include <linux/mem_encrypt.h>
>  
>  /**
>   * List of possible attributes associated with a DMA mapping. The semantics
> @@ -577,6 +578,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>  
>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>  		return -EIO;
> +
> +	if (sme_active() && (mask < sme_dma_mask()))
> +		dev_warn_ratelimited(dev,
> +				     "SME is active, device will require DMA bounce buffers\n");

Bah, no need to break that line - just let it stick out. Ditto for the
others.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings
  2017-04-18 21:20 ` [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings Tom Lendacky
@ 2017-05-16 17:35   ` Borislav Petkov
  2017-05-30 20:07     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-16 17:35 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:20:56PM -0500, Tom Lendacky wrote:
> Since video memory needs to be accessed decrypted, be sure that the
> memory encryption mask is not set for the video ranges.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/vga.h       |   13 +++++++++++++
>  arch/x86/mm/pageattr.c           |    2 ++
>  drivers/gpu/drm/drm_gem.c        |    2 ++
>  drivers/gpu/drm/drm_vm.c         |    4 ++++
>  drivers/gpu/drm/ttm/ttm_bo_vm.c  |    7 +++++--
>  drivers/gpu/drm/udl/udl_fb.c     |    4 ++++
>  drivers/video/fbdev/core/fbmem.c |   12 ++++++++++++
>  7 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
> index c4b9dc2..5c7567a 100644
> --- a/arch/x86/include/asm/vga.h
> +++ b/arch/x86/include/asm/vga.h
> @@ -7,12 +7,25 @@
>  #ifndef _ASM_X86_VGA_H
>  #define _ASM_X86_VGA_H
>  
> +#include <asm/cacheflush.h>
> +
>  /*
>   *	On the PC, we can just recalculate addresses and then
>   *	access the videoram directly without any black magic.
> + *	To support memory encryption however, we need to access
> + *	the videoram as decrypted memory.
>   */
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +#define VGA_MAP_MEM(x, s)					\
> +({								\
> +	unsigned long start = (unsigned long)phys_to_virt(x);	\
> +	set_memory_decrypted(start, (s) >> PAGE_SHIFT);		\
> +	start;							\
> +})
> +#else
>  #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
> +#endif

Can we push the check in and save us the ifdeffery?

#define VGA_MAP_MEM(x, s)                                       \
({                                                              \
        unsigned long start = (unsigned long)phys_to_virt(x);   \
                                                                \
        if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))                 \
                set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
                                                                \
        start;                                                  \
})

It does build here. :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-05-04 14:36       ` Borislav Petkov
@ 2017-05-16 19:28         ` Tom Lendacky
  2017-05-17  7:05           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-16 19:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/4/2017 9:36 AM, Borislav Petkov wrote:
> On Thu, May 04, 2017 at 09:24:11AM -0500, Tom Lendacky wrote:
>> I did this so that an the include order wouldn't cause issues (including
>> asm/mem_encrypt.h followed by later by a linux/mem_encrypt.h include).
>> I can make this a bit clearer by having separate #defines for each
>> thing, e.g.:
>>
>> #ifndef sme_me_mask
>> #define sme_me_mask 0UL
>> #endif
>>
>> #ifndef sme_active
>> #define sme_active sme_active
>> static inline ...
>> #endif
>>
>> Is that better/clearer?
>
> I guess but where do we have to include both the asm/ and the linux/
> version?

It's more of the sequence of various includes.  For example,
init/do_mounts.c includes <linux/module.h> that eventually gets down
to <asm/pgtable_types.h> and then <asm/mem_encrypt.h>.  However, a
bit further down <linux/nfs_fs.h> is included which eventually gets
down to <linux/dma-mapping.h> and then <linux/mem_encyrpt.h>.

>
> IOW, can we avoid these issues altogether by partitioning symbol
> declarations differently among the headers?

It's most problematic when CONFIG_AMD_MEM_ENCRYPT is not defined since
we never include an asm/ version from the linux/ path.  I could create
a mem_encrypt.h in include/asm-generic/ that contains the info that
is in the !CONFIG_AMD_MEM_ENCRYPT path of the linux/ version. Let me
look into that.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 14/32] efi: Add an EFI table address match function
  2017-05-15 18:09   ` Borislav Petkov
@ 2017-05-16 21:53     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-16 21:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/15/2017 1:09 PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:18:48PM -0500, Tom Lendacky wrote:
>> Add a function that will determine if a supplied physical address matches
>> the address of an EFI table.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  drivers/firmware/efi/efi.c |   33 +++++++++++++++++++++++++++++++++
>>  include/linux/efi.h        |    7 +++++++
>>  2 files changed, 40 insertions(+)
>>
>> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
>> index b372aad..8f606a3 100644
>> --- a/drivers/firmware/efi/efi.c
>> +++ b/drivers/firmware/efi/efi.c
>> @@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
>>  };
>>  EXPORT_SYMBOL(efi);
>>
>> +static unsigned long *efi_tables[] = {
>> +	&efi.mps,
>> +	&efi.acpi,
>> +	&efi.acpi20,
>> +	&efi.smbios,
>> +	&efi.smbios3,
>> +	&efi.sal_systab,
>> +	&efi.boot_info,
>> +	&efi.hcdp,
>> +	&efi.uga,
>> +	&efi.uv_systab,
>> +	&efi.fw_vendor,
>> +	&efi.runtime,
>> +	&efi.config_table,
>> +	&efi.esrt,
>> +	&efi.properties_table,
>> +	&efi.mem_attr_table,
>> +};
>> +
>>  static bool disable_runtime;
>>  static int __init setup_noefi(char *arg)
>>  {
>> @@ -854,6 +873,20 @@ int efi_status_to_err(efi_status_t status)
>>  	return err;
>>  }
>>
>> +bool efi_table_address_match(unsigned long phys_addr)
>
> efi_is_table_address() reads easier/better in the code.

Will do.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support
  2017-05-16 19:28         ` Tom Lendacky
@ 2017-05-17  7:05           ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-17  7:05 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 16, 2017 at 02:28:42PM -0500, Tom Lendacky wrote:
> It's most problematic when CONFIG_AMD_MEM_ENCRYPT is not defined since
> we never include an asm/ version from the linux/ path.  I could create
> a mem_encrypt.h in include/asm-generic/ that contains the info that
> is in the !CONFIG_AMD_MEM_ENCRYPT path of the linux/ version. Let me
> look into that.

So we need to keep asm/ and linux/ apart. The linux/ stuff is generic,
global, more or less. The asm/ is arch-specific. So they shouldn't be
overlapping wrt definitions, IMHO.

So asm-generic is the proper approach here because then you won't need
the ifndef fun.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-15 18:35   ` Borislav Petkov
@ 2017-05-17 18:54     ` Tom Lendacky
  2017-05-18  9:02       ` Borislav Petkov
  2017-05-18 19:50     ` Matt Fleming
  1 sibling, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-17 18:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/15/2017 1:35 PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
>> Boot data (such as EFI related data) is not encrypted when the system is
>> booted because UEFI/BIOS does not run with SME active. In order to access
>> this data properly it needs to be mapped decrypted.
>>
>> The early_memremap() support is updated to provide an arch specific
>
> "Update early_memremap() to provide... "

Will do.

>
>> routine to modify the pagetable protection attributes before they are
>> applied to the new mapping. This is used to remove the encryption mask
>> for boot related data.
>>
>> The memremap() support is updated to provide an arch specific routine
>
> Ditto. Passive tone always reads harder than an active tone,
> "doer"-sentence.

Ditto.

>
>> to determine if RAM remapping is allowed.  RAM remapping will cause an
>> encrypted mapping to be generated. By preventing RAM remapping,
>> ioremap_cache() will be used instead, which will provide a decrypted
>> mapping of the boot related data.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/io.h |    4 +
>>  arch/x86/mm/ioremap.c     |  182 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/io.h        |    2
>>  kernel/memremap.c         |   20 ++++-
>>  mm/early_ioremap.c        |   18 ++++
>>  5 files changed, 219 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 7afb0e2..75f2858 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -381,4 +381,8 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
>>  #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
>>  #endif
>>
>> +extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size,
>> +				       unsigned long flags);
>> +#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
>> +
>>  #endif /* _ASM_X86_IO_H */
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 9bfcb1f..bce0604 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mmiotrace.h>
>> +#include <linux/efi.h>
>>
>>  #include <asm/cacheflush.h>
>>  #include <asm/e820/api.h>
>> @@ -21,6 +22,7 @@
>>  #include <asm/tlbflush.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/pat.h>
>> +#include <asm/setup.h>
>>
>>  #include "physaddr.h"
>>
>> @@ -419,6 +421,186 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>>  	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>>  }
>>
>> +/*
>> + * Examine the physical address to determine if it is an area of memory
>> + * that should be mapped decrypted.  If the memory is not part of the
>> + * kernel usable area it was accessed and created decrypted, so these
>> + * areas should be mapped decrypted.
>> + */
>> +static bool memremap_should_map_decrypted(resource_size_t phys_addr,
>> +					  unsigned long size)
>> +{
>> +	/* Check if the address is outside kernel usable area */
>> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
>> +	case E820_TYPE_RESERVED:
>> +	case E820_TYPE_ACPI:
>> +	case E820_TYPE_NVS:
>> +	case E820_TYPE_UNUSABLE:
>> +		return true;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * Examine the physical address to determine if it is EFI data. Check
>> + * it against the boot params structure and EFI tables and memory types.
>> + */
>> +static bool memremap_is_efi_data(resource_size_t phys_addr,
>> +				 unsigned long size)
>> +{
>> +	u64 paddr;
>> +
>> +	/* Check if the address is part of EFI boot/runtime data */
>> +	if (efi_enabled(EFI_BOOT)) {
>
> Save indentation level:
>
> 	if (!efi_enabled(EFI_BOOT))
> 		return false;
>

I was worried what the compiler might do when CONFIG_EFI is not set,
but it appears to take care of it. I'll double check though.

>
>> +		paddr = boot_params.efi_info.efi_memmap_hi;
>> +		paddr <<= 32;
>> +		paddr |= boot_params.efi_info.efi_memmap;
>> +		if (phys_addr == paddr)
>> +			return true;
>> +
>> +		paddr = boot_params.efi_info.efi_systab_hi;
>> +		paddr <<= 32;
>> +		paddr |= boot_params.efi_info.efi_systab;
>
> So those two above look like could be two global vars which are
> initialized somewhere in the EFI init path:
>
> efi_memmap_phys and efi_systab_phys or so.
>
> Matt ?
>
> And then you won't need to create that paddr each time on the fly. I
> mean, it's not a lot of instructions but still...
>
>> +		if (phys_addr == paddr)
>> +			return true;
>> +
>> +		if (efi_table_address_match(phys_addr))
>> +			return true;
>> +
>> +		switch (efi_mem_type(phys_addr)) {
>> +		case EFI_BOOT_SERVICES_DATA:
>> +		case EFI_RUNTIME_SERVICES_DATA:
>> +			return true;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * Examine the physical address to determine if it is boot data by checking
>> + * it against the boot params setup_data chain.
>> + */
>> +static bool memremap_is_setup_data(resource_size_t phys_addr,
>> +				   unsigned long size)
>> +{
>> +	struct setup_data *data;
>> +	u64 paddr, paddr_next;
>> +
>> +	paddr = boot_params.hdr.setup_data;
>> +	while (paddr) {
>> +		bool is_setup_data = false;
>
> You don't need that bool:
>
> static bool memremap_is_setup_data(resource_size_t phys_addr,
>                                    unsigned long size)
> {
>         struct setup_data *data;
>         u64 paddr, paddr_next;
>
>         paddr = boot_params.hdr.setup_data;
>         while (paddr) {
>                 if (phys_addr == paddr)
>                         return true;
>
>                 data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC);
>
>                 paddr_next = data->next;
>
>                 if ((phys_addr > paddr) && (phys_addr < (paddr + data->len))) {
>                         memunmap(data);
>                         return true;
>                 }
>
>                 memunmap(data);
>
>                 paddr = paddr_next;
>         }
>         return false;
> }
>
> Flow is a bit clearer.

I may introduce a length variable to capture data->len right after
paddr_next is set and then have just a single memunmap() call before
the if check.

>
>> +/*
>> + * Examine the physical address to determine if it is boot data by checking
>> + * it against the boot params setup_data chain (early boot version).
>> + */
>> +static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
>> +						unsigned long size)
>> +{
>> +	struct setup_data *data;
>> +	u64 paddr, paddr_next;
>> +
>> +	paddr = boot_params.hdr.setup_data;
>> +	while (paddr) {
>> +		bool is_setup_data = false;
>> +
>> +		if (phys_addr == paddr)
>> +			return true;
>> +
>> +		data = early_memremap_decrypted(paddr, sizeof(*data));
>> +
>> +		paddr_next = data->next;
>> +
>> +		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
>> +			is_setup_data = true;
>> +
>> +		early_memunmap(data, sizeof(*data));
>> +
>> +		if (is_setup_data)
>> +			return true;
>> +
>> +		paddr = paddr_next;
>> +	}
>> +
>> +	return false;
>> +}
>
> This one is begging to be unified with memremap_is_setup_data() to both
> call a __ worker function.

I tried that, but calling an "__init" function (early_memremap()) from
a non "__init" function generated warnings. I suppose I can pass in a
function for the map and unmap but that looks worse to me (also the
unmap functions take different arguments).

>
>> +
>> +/*
>> + * Architecture function to determine if RAM remap is allowed. By default, a
>> + * RAM remap will map the data as encrypted. Determine if a RAM remap should
>> + * not be done so that the data will be mapped decrypted.
>> + */
>> +bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size,
>> +				unsigned long flags)
>
> So this function doesn't do anything - it replies to a yes/no question.
> So the name should not say "do" but sound like a question. Maybe:
>
> 	if (arch_memremap_can_remap( ... ))
>
> or so...

Ok, I'll change that.

>
>> +{
>> +	if (!sme_active())
>> +		return true;
>> +
>> +	if (flags & MEMREMAP_ENC)
>> +		return true;
>> +
>> +	if (flags & MEMREMAP_DEC)
>> +		return false;
>
> So this looks strange to me: both flags MEMREMAP_ENC and _DEC override
> setup and efi data checking. But we want to remap setup and EFI  data
> *always* decrypted because that data was not encrypted as, as you say,
> firmware doesn't run with SME active.
>
> So my simple logic says that EFI stuff should *always* be mapped DEC,
> regardless of flags. Ditto for setup data. So that check below should
> actually *override* the flags checks and go before them, no?

This is like the chicken and the egg scenario. In order to determine if
an address is setup data I have to explicitly map the setup data chain
as decrypted. In order to do that I have to supply a flag to explicitly
map the data decrypted otherwise I wind up back in the
memremap_is_setup_data() function again and again and again...

>
>> +
>> +	if (memremap_is_setup_data(phys_addr, size) ||
>> +	    memremap_is_efi_data(phys_addr, size) ||
>> +	    memremap_should_map_decrypted(phys_addr, size))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * Architecture override of __weak function to adjust the protection attributes
>> + * used when remapping memory. By default, early_memremp() will map the data
>
> early_memremAp() - a is missing.

Got it.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-04-18 21:21 ` [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME Tom Lendacky
@ 2017-05-17 19:17   ` Borislav Petkov
  2017-05-19 20:45     ` Tom Lendacky
  2017-05-26  4:17   ` Xunlei Pang
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-17 19:17 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:21:21PM -0500, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
> 
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.
> 
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/init.h          |    1 +
>  arch/x86/include/asm/irqflags.h      |    5 +++++
>  arch/x86/include/asm/kexec.h         |    8 ++++++++
>  arch/x86/include/asm/pgtable_types.h |    1 +
>  arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
>  arch/x86/mm/ident_map.c              |   11 +++++++----
>  include/linux/kexec.h                |   14 ++++++++++++++
>  kernel/kexec_core.c                  |    7 +++++++
>  9 files changed, 101 insertions(+), 7 deletions(-)

...

> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>  	}
>  	pte = pte_offset_kernel(pmd, vaddr);
> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>  	return 0;
>  err:
>  	free_transition_pgtable(image);
> @@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  		.alloc_pgt_page	= alloc_pgt_page,
>  		.context	= image,
>  		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>  	};
>  	unsigned long mstart, mend;
>  	pgd_t *level4p;
> @@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
>  {
>  	kexec_mark_crashkres(false);
>  }
> +
> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
> +{
> +	int ret;
> +
> +	if (sme_active()) {

	if (!sme_active())
		return 0;

	/*
	 * If SME...


> +		/*
> +		 * If SME is active we need to be sure that kexec pages are
> +		 * not encrypted because when we boot to the new kernel the
> +		 * pages won't be accessed encrypted (initially).
> +		 */
> +		ret = set_memory_decrypted((unsigned long)vaddr, pages);
> +		if (ret)
> +			return ret;
> +
> +		if (gfp & __GFP_ZERO)
> +			memset(vaddr, 0, pages * PAGE_SIZE);

This function is called after alloc_pages() which already zeroes memory
when __GFP_ZERO is supplied.

If you need to clear the memory *after* set_memory_encrypted() happens,
then you should probably mask out __GFP_ZERO before the alloc_pages()
call so as not to do it twice.

> +	}
> +
> +	return 0;
> +}
> +
> +void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
> +{
> +	if (sme_active()) {
> +		/*
> +		 * If SME is active we need to reset the pages back to being
> +		 * an encrypted mapping before freeing them.
> +		 */
> +		set_memory_encrypted((unsigned long)vaddr, pages);
> +	}
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0bb8842..f4e5de6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -24,6 +24,7 @@
>  #include <linux/cpuidle.h>
>  #include <trace/events/power.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/kexec.h>
>  #include <asm/cpu.h>
>  #include <asm/apic.h>
>  #include <asm/syscalls.h>
> @@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
>  	return ret;
>  }
>  #endif
> +
>  void stop_this_cpu(void *dummy)
>  {
> +	bool do_wbinvd_halt = false;
> +
> +	if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
> +		/*
> +		 * If we are performing a kexec and the processor supports
> +		 * SME then we need to clear out cache information before
> +		 * halting. With kexec, going from SME inactive to SME active
> +		 * requires clearing cache entries so that addresses without
> +		 * the encryption bit set don't corrupt the same physical
> +		 * address that has the encryption bit set when caches are
> +		 * flushed. Perform a wbinvd followed by a halt to achieve
> +		 * this.
> +		 */
> +		do_wbinvd_halt = true;
> +	}
> +
>  	local_irq_disable();
>  	/*
>  	 * Remove this CPU:
> @@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
>  	disable_local_APIC();
>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>  
> -	for (;;)
> -		halt();
> +	for (;;) {
> +		if (do_wbinvd_halt)
> +			native_wbinvd_halt();

No need for that native_wbinvd_halt() thing:

	for (;;) {
		if (do_wbinvd)
			wbinvd();

		halt();
	}

>  /*
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 04210a2..2c9fd3e 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
>  static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  			  unsigned long addr, unsigned long end)
>  {
> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;

You're already supplying a x86_mapping_info and thus you can init
kernpg_flag to default _KERNPG_TABLE and override it in the SME+kexec
case, as you already do. And this way you can simply do:

	set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));

here and in the other pagetable functions I've snipped below, and save
yourself some lines.

...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data
  2017-05-16  8:36   ` Borislav Petkov
@ 2017-05-17 20:26     ` Tom Lendacky
  2017-05-18  9:03       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-17 20:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/16/2017 3:36 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:30PM -0500, Tom Lendacky wrote:
>> The SMP MP-table is built by UEFI and placed in memory in a decrypted
>> state. These tables are accessed using a mix of early_memremap(),
>> early_memunmap(), phys_to_virt() and virt_to_phys(). Change all accesses
>> to use early_memremap()/early_memunmap(). This allows for proper setting
>> of the encryption mask so that the data can be successfully accessed when
>> SME is active.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/mpparse.c |  102 +++++++++++++++++++++++++++++++--------------
>>  1 file changed, 71 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
>> index fd37f39..afbda41d 100644
>> --- a/arch/x86/kernel/mpparse.c
>> +++ b/arch/x86/kernel/mpparse.c
>> @@ -429,7 +429,21 @@ static inline void __init construct_default_ISA_mptable(int mpc_default_type)
>>  	}
>>  }
>>
>> -static struct mpf_intel *mpf_found;
>> +static unsigned long mpf_base;
>> +
>> +static void __init unmap_mpf(struct mpf_intel *mpf)
>> +{
>> +	early_memunmap(mpf, sizeof(*mpf));
>> +}
>> +
>> +static struct mpf_intel * __init map_mpf(unsigned long paddr)
>> +{
>> +	struct mpf_intel *mpf;
>> +
>> +	mpf = early_memremap(paddr, sizeof(*mpf));
>> +
>> +	return mpf;
>
> 	return early_memremap(paddr, sizeof(*mpf));
>

Ok.

> ...
>
>> @@ -842,25 +873,26 @@ static int __init update_mp_table(void)
>>  	if (!enable_update_mptable)
>>  		return 0;
>>
>> -	mpf = mpf_found;
>> -	if (!mpf)
>> +	if (!mpf_base)
>>  		return 0;
>>
>> +	mpf = map_mpf(mpf_base);
>> +
>>  	/*
>>  	 * Now see if we need to go further.
>>  	 */
>>  	if (mpf->feature1 != 0)
>
> You're kidding, right? map_mpf() *can* return NULL.

Ugh...  don't know how I forgot about that. Will fix everywhere.

>
> Also, simplify that test:
>
> 	if (mpf->feature1)
> 		...

Ok, I can do that but I hope no one says anything about it being
unrelated to the patch. :)

>
>
>> -		return 0;
>> +		goto do_unmap_mpf;
>>
>>  	if (!mpf->physptr)
>> -		return 0;
>> +		goto do_unmap_mpf;
>>
>> -	mpc = phys_to_virt(mpf->physptr);
>> +	mpc = map_mpc(mpf->physptr);
>
> Again: error checking !!!
>
> You have other calls to early_memremap()/map_mpf() in this patch. Please
> add error checking everywhere.

Yup.

>
>>
>>  	if (!smp_check_mpc(mpc, oem, str))
>> -		return 0;
>> +		goto do_unmap_mpc;
>>
>> -	pr_info("mpf: %llx\n", (u64)virt_to_phys(mpf));
>> +	pr_info("mpf: %llx\n", (u64)mpf_base);
>>  	pr_info("physptr: %x\n", mpf->physptr);
>>
>>  	if (mpc_new_phys && mpc->length > mpc_new_length) {
>> @@ -878,21 +910,23 @@ static int __init update_mp_table(void)
>>  		new = mpf_checksum((unsigned char *)mpc, mpc->length);
>>  		if (old == new) {
>>  			pr_info("mpc is readonly, please try alloc_mptable instead\n");
>> -			return 0;
>> +			goto do_unmap_mpc;
>>  		}
>>  		pr_info("use in-position replacing\n");
>>  	} else {
>>  		mpf->physptr = mpc_new_phys;
>> -		mpc_new = phys_to_virt(mpc_new_phys);
>> +		mpc_new = map_mpc(mpc_new_phys);
>
> Ditto.
>
>>  		memcpy(mpc_new, mpc, mpc->length);
>> +		unmap_mpc(mpc);
>>  		mpc = mpc_new;
>>  		/* check if we can modify that */
>>  		if (mpc_new_phys - mpf->physptr) {
>>  			struct mpf_intel *mpf_new;
>>  			/* steal 16 bytes from [0, 1k) */
>>  			pr_info("mpf new: %x\n", 0x400 - 16);
>> -			mpf_new = phys_to_virt(0x400 - 16);
>> +			mpf_new = map_mpf(0x400 - 16);
>
> Ditto.
>
>>  			memcpy(mpf_new, mpf, 16);
>> +			unmap_mpf(mpf);
>>  			mpf = mpf_new;
>>  			mpf->physptr = mpc_new_phys;
>>  		}
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-17 18:54     ` Tom Lendacky
@ 2017-05-18  9:02       ` Borislav Petkov
  2017-05-19 20:50         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-18  9:02 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Wed, May 17, 2017 at 01:54:39PM -0500, Tom Lendacky wrote:
> I was worried what the compiler might do when CONFIG_EFI is not set,
> but it appears to take care of it. I'll double check though.

There's a efi_enabled() !CONFIG_EFI version too, so should be fine.

> I may introduce a length variable to capture data->len right after
> paddr_next is set and then have just a single memunmap() call before
> the if check.

Yap.

> I tried that, but calling an "__init" function (early_memremap()) from
> a non "__init" function generated warnings. I suppose I can pass in a
> function for the map and unmap but that looks worse to me (also the
> unmap functions take different arguments).

No, the other way around: the __init function should call the non-init
one and you need the non-init one anyway for memremap_is_setup_data().

> This is like the chicken and the egg scenario. In order to determine if
> an address is setup data I have to explicitly map the setup data chain
> as decrypted. In order to do that I have to supply a flag to explicitly
> map the data decrypted otherwise I wind up back in the
> memremap_is_setup_data() function again and again and again...

Oh, fun.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data
  2017-05-17 20:26     ` Tom Lendacky
@ 2017-05-18  9:03       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-18  9:03 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Wed, May 17, 2017 at 03:26:58PM -0500, Tom Lendacky wrote:
> > Also, simplify that test:
> > 
> > 	if (mpf->feature1)
> > 		...
> 
> Ok, I can do that but I hope no one says anything about it being
> unrelated to the patch. :)

Bah, that's minor.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-04-18 21:21 ` [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place Tom Lendacky
@ 2017-05-18 12:46   ` Borislav Petkov
  2017-05-25 22:24     ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-18 12:46 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:21:49PM -0500, Tom Lendacky wrote:
> Add the support to encrypt the kernel in-place. This is done by creating
> new page mappings for the kernel - a decrypted write-protected mapping
> and an encrypted mapping. The kernel is encrypted by copying it through
> a temporary buffer.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |    6 +
>  arch/x86/mm/Makefile               |    2 
>  arch/x86/mm/mem_encrypt.c          |  262 ++++++++++++++++++++++++++++++++++++
>  arch/x86/mm/mem_encrypt_boot.S     |  151 +++++++++++++++++++++
>  4 files changed, 421 insertions(+)
>  create mode 100644 arch/x86/mm/mem_encrypt_boot.S
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index b406df2..8f6f9b4 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void)
>  	return ((u64)sme_me_mask << 1) - 1;
>  }
>  
> +void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
> +			 unsigned long decrypted_kernel_vaddr,
> +			 unsigned long kernel_len,
> +			 unsigned long encryption_wa,
> +			 unsigned long encryption_pgd);
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
>  			      unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 9e13841..0633142 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
>  obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>  obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
> +
> +obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 30b07a3..0ff41a4 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -24,6 +24,7 @@
>  #include <asm/setup.h>
>  #include <asm/bootparam.h>
>  #include <asm/cacheflush.h>
> +#include <asm/sections.h>
>  
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>  	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
>  }
>  
> +void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,

static

> +			  unsigned long end)
> +{
> +	unsigned long addr = start;
> +	pgdval_t *pgd_p;
> +
> +	while (addr < end) {
> +		unsigned long pgd_end;
> +
> +		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
> +		if (pgd_end > end)
> +			pgd_end = end;
> +
> +		pgd_p = (pgdval_t *)pgd_base + pgd_index(addr);
> +		*pgd_p = 0;

Hmm, so this is a contiguous range from [start:end] which translates to
8-byte PGD pointers in the PGD page so you can simply memset that range,
no?

Instead of iterating over each one?

> +
> +		addr = pgd_end;
> +	}
> +}
> +
> +#define PGD_FLAGS	_KERNPG_TABLE_NOENC
> +#define PUD_FLAGS	_KERNPG_TABLE_NOENC
> +#define PMD_FLAGS	(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
> +
> +static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
> +				     unsigned long vaddr, pmdval_t pmd_val)
> +{
> +	pgdval_t pgd, *pgd_p;
> +	pudval_t pud, *pud_p;
> +	pmdval_t pmd, *pmd_p;

You should use the enclosing type, not the underlying one. I.e.,

	pgd_t *pgd;
	pud_t *pud;
	...

and then the macros native_p*d_val(), p*d_offset() and so on. I say
native_* because we don't want to have any paravirt nastyness here.
I believe your previous version was using the proper interfaces.

And the kernel has gotten 5-level pagetables support in
the meantime, so this'll need to start at p4d AFAICT.
arch/x86/mm/fault.c::dump_pagetable() looks like a good example to stare
at.

> +	pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr);
> +	pgd = *pgd_p;
> +	if (pgd) {
> +		pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK);
> +	} else {
> +		pud_p = pgtable_area;
> +		memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
> +		pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
> +
> +		*pgd_p = (pgdval_t)pud_p + PGD_FLAGS;
> +	}
> +
> +	pud_p += pud_index(vaddr);
> +	pud = *pud_p;
> +	if (pud) {
> +		if (pud & _PAGE_PSE)
> +			goto out;
> +
> +		pmd_p = (pmdval_t *)(pud & ~PTE_FLAGS_MASK);
> +	} else {
> +		pmd_p = pgtable_area;
> +		memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
> +		pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
> +
> +		*pud_p = (pudval_t)pmd_p + PUD_FLAGS;
> +	}
> +
> +	pmd_p += pmd_index(vaddr);
> +	pmd = *pmd_p;
> +	if (!pmd || !(pmd & _PAGE_PSE))
> +		*pmd_p = pmd_val;
> +
> +out:
> +	return pgtable_area;
> +}
> +
> +static unsigned long __init sme_pgtable_calc(unsigned long len)
> +{
> +	unsigned long pud_tables, pmd_tables;
> +	unsigned long total = 0;
> +
> +	/*
> +	 * Perform a relatively simplistic calculation of the pagetable
> +	 * entries that are needed. That mappings will be covered by 2MB
> +	 * PMD entries so we can conservatively calculate the required
> +	 * number of PUD and PMD structures needed to perform the mappings.
> +	 * Incrementing the count for each covers the case where the
> +	 * addresses cross entries.
> +	 */
> +	pud_tables = ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE;
> +	pud_tables++;
> +	pmd_tables = ALIGN(len, PUD_SIZE) / PUD_SIZE;
> +	pmd_tables++;
> +
> +	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
> +	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
> +
> +	/*
> +	 * Now calculate the added pagetable structures needed to populate
> +	 * the new pagetables.
> +	 */

Nice commenting, helps following what's going on.

> +	pud_tables = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE;
> +	pmd_tables = ALIGN(total, PUD_SIZE) / PUD_SIZE;
> +
> +	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
> +	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
> +
> +	return total;
> +}
> +
>  void __init sme_encrypt_kernel(void)
>  {
> +	pgd_t *pgd;
> +	void *pgtable_area;
> +	unsigned long kernel_start, kernel_end, kernel_len;
> +	unsigned long workarea_start, workarea_end, workarea_len;
> +	unsigned long execute_start, execute_end, execute_len;
> +	unsigned long pgtable_area_len;
> +	unsigned long decrypted_base;
> +	unsigned long paddr, pmd_flags;


Please sort function local variables declaration in a reverse christmas
tree order:

	<type> longest_variable_name;
	<type> shorter_var_name;
	<type> even_shorter;
	<type> i;

> +
> +	if (!sme_active())
> +		return;

...

> diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
> new file mode 100644
> index 0000000..fb58f9f
> --- /dev/null
> +++ b/arch/x86/mm/mem_encrypt_boot.S
> @@ -0,0 +1,151 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/linkage.h>
> +#include <asm/pgtable.h>
> +#include <asm/page.h>
> +#include <asm/processor-flags.h>
> +#include <asm/msr-index.h>
> +
> +	.text
> +	.code64
> +ENTRY(sme_encrypt_execute)
> +
> +	/*
> +	 * Entry parameters:
> +	 *   RDI - virtual address for the encrypted kernel mapping
> +	 *   RSI - virtual address for the decrypted kernel mapping
> +	 *   RDX - length of kernel
> +	 *   RCX - virtual address of the encryption workarea, including:
> +	 *     - stack page (PAGE_SIZE)
> +	 *     - encryption routine page (PAGE_SIZE)
> +	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
> +	 *    R8 - physcial address of the pagetables to use for encryption
> +	 */
> +
> +	push	%rbp
> +	push	%r12
> +
> +	/* Set up a one page stack in the non-encrypted memory area */
> +	movq	%rsp, %rbp		/* Save current stack pointer */
> +	movq	%rcx, %rax		/* Workarea stack page */
> +	movq	%rax, %rsp		/* Set new stack pointer */
> +	addq	$PAGE_SIZE, %rsp	/* Stack grows from the bottom */
> +	addq	$PAGE_SIZE, %rax	/* Workarea encryption routine */
> +
> +	movq	%rdi, %r10		/* Encrypted kernel */
> +	movq	%rsi, %r11		/* Decrypted kernel */
> +	movq	%rdx, %r12		/* Kernel length */
> +
> +	/* Copy encryption routine into the workarea */
> +	movq	%rax, %rdi		/* Workarea encryption routine */
> +	leaq	.Lenc_start(%rip), %rsi	/* Encryption routine */
> +	movq	$(.Lenc_stop - .Lenc_start), %rcx	/* Encryption routine length */
> +	rep	movsb
> +
> +	/* Setup registers for call */
> +	movq	%r10, %rdi		/* Encrypted kernel */
> +	movq	%r11, %rsi		/* Decrypted kernel */
> +	movq	%r8, %rdx		/* Pagetables used for encryption */
> +	movq	%r12, %rcx		/* Kernel length */
> +	movq	%rax, %r8		/* Workarea encryption routine */
> +	addq	$PAGE_SIZE, %r8		/* Workarea intermediate copy buffer */
> +
> +	call	*%rax			/* Call the encryption routine */
> +
> +	movq	%rbp, %rsp		/* Restore original stack pointer */
> +
> +	pop	%r12
> +	pop	%rbp
> +
> +	ret
> +ENDPROC(sme_encrypt_execute)
> +
> +.Lenc_start:
> +ENTRY(sme_enc_routine)

A function called a "routine"? Why do we need the global symbol?
Nothing's referencing it AFAICT.

> +/*
> + * Routine used to encrypt kernel.
> + *   This routine must be run outside of the kernel proper since
> + *   the kernel will be encrypted during the process. So this
> + *   routine is defined here and then copied to an area outside
> + *   of the kernel where it will remain and run decrypted
> + *   during execution.
> + *
> + *   On entry the registers must be:
> + *     RDI - virtual address for the encrypted kernel mapping
> + *     RSI - virtual address for the decrypted kernel mapping
> + *     RDX - address of the pagetables to use for encryption
> + *     RCX - length of kernel
> + *      R8 - intermediate copy buffer
> + *
> + *     RAX - points to this routine
> + *
> + * The kernel will be encrypted by copying from the non-encrypted
> + * kernel space to an intermediate buffer and then copying from the
> + * intermediate buffer back to the encrypted kernel space. The physical
> + * addresses of the two kernel space mappings are the same which
> + * results in the kernel being encrypted "in place".
> + */
> +	/* Enable the new page tables */
> +	mov	%rdx, %cr3
> +
> +	/* Flush any global TLBs */
> +	mov	%cr4, %rdx
> +	andq	$~X86_CR4_PGE, %rdx
> +	mov	%rdx, %cr4
> +	orq	$X86_CR4_PGE, %rdx
> +	mov	%rdx, %cr4
> +
> +	/* Set the PAT register PA5 entry to write-protect */
> +	push	%rcx
> +	movl	$MSR_IA32_CR_PAT, %ecx
> +	rdmsr
> +	push	%rdx			/* Save original PAT value */
> +	andl	$0xffff00ff, %edx	/* Clear PA5 */
> +	orl	$0x00000500, %edx	/* Set PA5 to WP */

Maybe check first whether PA5 is already set correctly and avoid the
WRMSR and the restoring below too?

> +	wrmsr
> +	pop	%rdx			/* RDX contains original PAT value */
> +	pop	%rcx
> +
> +	movq	%rcx, %r9		/* Save kernel length */
> +	movq	%rdi, %r10		/* Save encrypted kernel address */
> +	movq	%rsi, %r11		/* Save decrypted kernel address */
> +
> +	wbinvd				/* Invalidate any cache entries */
> +
> +	/* Copy/encrypt 2MB at a time */
> +1:
> +	movq	%r11, %rsi		/* Source - decrypted kernel */
> +	movq	%r8, %rdi		/* Dest   - intermediate copy buffer */
> +	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
> +	rep	movsb

not movsQ?

> +	movq	%r8, %rsi		/* Source - intermediate copy buffer */
> +	movq	%r10, %rdi		/* Dest   - encrypted kernel */
> +	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
> +	rep	movsb
> +
> +	addq	$PMD_PAGE_SIZE, %r11
> +	addq	$PMD_PAGE_SIZE, %r10
> +	subq	$PMD_PAGE_SIZE, %r9	/* Kernel length decrement */
> +	jnz	1b			/* Kernel length not zero? */
> +
> +	/* Restore PAT register */
> +	push	%rdx			/* Save original PAT value */
> +	movl	$MSR_IA32_CR_PAT, %ecx
> +	rdmsr
> +	pop	%rdx			/* Restore original PAT value */
> +	wrmsr
> +
> +	ret
> +ENDPROC(sme_enc_routine)
> +.Lenc_stop:
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-04-18 21:22 ` [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption Tom Lendacky
  2017-04-21 21:55   ` Dave Hansen
@ 2017-05-18 17:01   ` Borislav Petkov
  2017-05-26  2:49     ` Dave Young
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-18 17:01 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> determine if SME is active.

But why do user-space tools need to know that?

I mean, when we load the kdump kernel, we do it with the first kernel,
with the kexec_load() syscall, AFAICT. And that code does a lot of
things during that init, like machine_kexec_prepare()->init_pgtable() to
prepare the ident mapping of the second kernel, for example.

What I'm aiming at is that the first kernel knows *exactly* whether SME
is enabled or not and doesn't need to tell the second one through some
sysfs entries - it can do that during loading.

So I don't think we need any userspace things at all...

Or?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-15 18:35   ` Borislav Petkov
  2017-05-17 18:54     ` Tom Lendacky
@ 2017-05-18 19:50     ` Matt Fleming
  2017-05-26 16:22       ` Tom Lendacky
  1 sibling, 1 reply; 126+ messages in thread
From: Matt Fleming @ 2017-05-18 19:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov,
	Ard Biesheuvel

On Mon, 15 May, at 08:35:17PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
>
> > +		paddr = boot_params.efi_info.efi_memmap_hi;
> > +		paddr <<= 32;
> > +		paddr |= boot_params.efi_info.efi_memmap;
> > +		if (phys_addr == paddr)
> > +			return true;
> > +
> > +		paddr = boot_params.efi_info.efi_systab_hi;
> > +		paddr <<= 32;
> > +		paddr |= boot_params.efi_info.efi_systab;
> 
> So those two above look like could be two global vars which are
> initialized somewhere in the EFI init path:
> 
> efi_memmap_phys and efi_systab_phys or so.
> 
> Matt ?
> 
> And then you won't need to create that paddr each time on the fly. I
> mean, it's not a lot of instructions but still...
 
We should already have the physical memmap address available in
'efi.memmap.phys_map'.

And the physical address of the system table should be in
'efi_phys.systab'. See efi_init().

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-04-18 21:22 ` [PATCH v5 32/32] x86/mm: Add support to make use of " Tom Lendacky
  2017-04-21 18:56   ` Tom Lendacky
@ 2017-05-19 11:27   ` Borislav Petkov
  2017-05-30 14:38     ` Tom Lendacky
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-19 11:27 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote:
> Add support to check if SME has been enabled and if memory encryption
> should be activated (checking of command line option based on the
> configuration of the default state).  If memory encryption is to be
> activated, then the encryption mask is set and the kernel is encrypted
> "in place."
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/head_64.S |    1 +
>  arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 80 insertions(+), 4 deletions(-)

...

> +unsigned long __init sme_enable(struct boot_params *bp)
>  {
> +	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> +	unsigned int eax, ebx, ecx, edx;
> +	unsigned long me_mask;
> +	bool active_by_default;
> +	char buffer[16];
> +	u64 msr;
> +
> +	/* Check for the SME support leaf */
> +	eax = 0x80000000;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (eax < 0x8000001f)
> +		goto out;
> +
> +	/*
> +	 * Check for the SME feature:
> +	 *   CPUID Fn8000_001F[EAX] - Bit 0
> +	 *     Secure Memory Encryption support
> +	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +	 *     Pagetable bit position used to indicate encryption
> +	 */
> +	eax = 0x8000001f;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (!(eax & 1))
> +		goto out;

<---- newline here.

> +	me_mask = 1UL << (ebx & 0x3f);
> +
> +	/* Check if SME is enabled */
> +	msr = __rdmsr(MSR_K8_SYSCFG);
> +	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> +		goto out;
> +
> +	/*
> +	 * Fixups have not been applied to phys_base yet, so we must obtain
> +	 * the address to the SME command line option data in the following
> +	 * way.
> +	 */
> +	asm ("lea sme_cmdline_arg(%%rip), %0"
> +	     : "=r" (cmdline_arg)
> +	     : "p" (sme_cmdline_arg));
> +	asm ("lea sme_cmdline_on(%%rip), %0"
> +	     : "=r" (cmdline_on)
> +	     : "p" (sme_cmdline_on));
> +	asm ("lea sme_cmdline_off(%%rip), %0"
> +	     : "=r" (cmdline_off)
> +	     : "p" (sme_cmdline_off));
> +
> +	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> +		active_by_default = true;
> +	else
> +		active_by_default = false;
> +
> +	cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
> +				     ((u64)bp->ext_cmd_line_ptr << 32));
> +
> +	cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
> +
> +	if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
> +		sme_me_mask = me_mask;

Why doesn't simply

	if (!strncmp(buffer, "on", 2))
		...

work?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-04-21 18:56   ` Tom Lendacky
@ 2017-05-19 11:30     ` Borislav Petkov
  2017-05-19 20:16       ` Josh Poimboeuf
  2017-05-30 15:46       ` Tom Lendacky
  0 siblings, 2 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-19 11:30 UTC (permalink / raw)
  To: Tom Lendacky, Josh Poimboeuf
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote:
> On 4/18/2017 4:22 PM, Tom Lendacky wrote:
> > Add support to check if SME has been enabled and if memory encryption
> > should be activated (checking of command line option based on the
> > configuration of the default state).  If memory encryption is to be
> > activated, then the encryption mask is set and the kernel is encrypted
> > "in place."
> > 
> > Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> > ---
> >  arch/x86/kernel/head_64.S |    1 +
> >  arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 80 insertions(+), 4 deletions(-)
> > 
> 
> ...
> 
> > 
> > -unsigned long __init sme_enable(void)
> > +unsigned long __init sme_enable(struct boot_params *bp)
> >  {
> > +	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> > +	unsigned int eax, ebx, ecx, edx;
> > +	unsigned long me_mask;
> > +	bool active_by_default;
> > +	char buffer[16];
> 
> So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
> the stack-protector support causes issues with this function because

What issues?

> it is called so early. I can get past it by adding:
> 
> CFLAGS_mem_encrypt.o := $(nostackp)
> 
> in the arch/x86/mm/Makefile, but that obviously eliminates the support
> for the whole file.  Would it be better to split out the sme_enable()
> and other boot routines into a separate file or just apply the
> $(nostackp) to the whole file?

Josh might have a better idea here... CCed.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear
  2017-05-16 14:04   ` Borislav Petkov
@ 2017-05-19 19:52     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 19:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/16/2017 9:04 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:19:42PM -0500, Tom Lendacky wrote:
>> Persistent memory is expected to persist across reboots. The encryption
>> key used by SME will change across reboots which will result in corrupted
>> persistent memory.  Persistent memory is handed out by block devices
>> through memory remapping functions, so be sure not to map this memory as
>> encrypted.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/mm/ioremap.c |   31 ++++++++++++++++++++++++++++++-
>>  1 file changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index bce0604..55317ba 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -425,17 +425,46 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>>   * Examine the physical address to determine if it is an area of memory
>>   * that should be mapped decrypted.  If the memory is not part of the
>>   * kernel usable area it was accessed and created decrypted, so these
>> - * areas should be mapped decrypted.
>> + * areas should be mapped decrypted. And since the encryption key can
>> + * change across reboots, persistent memory should also be mapped
>> + * decrypted.
>>   */
>>  static bool memremap_should_map_decrypted(resource_size_t phys_addr,
>>  					  unsigned long size)
>>  {
>> +	int is_pmem;
>> +
>> +	/*
>> +	 * Check if the address is part of a persistent memory region.
>> +	 * This check covers areas added by E820, EFI and ACPI.
>> +	 */
>> +	is_pmem = region_intersects(phys_addr, size, IORESOURCE_MEM,
>> +				    IORES_DESC_PERSISTENT_MEMORY);
>> +	if (is_pmem != REGION_DISJOINT)
>> +		return true;
>> +
>> +	/*
>> +	 * Check if the non-volatile attribute is set for an EFI
>> +	 * reserved area.
>> +	 */
>> +	if (efi_enabled(EFI_BOOT)) {
>> +		switch (efi_mem_type(phys_addr)) {
>> +		case EFI_RESERVED_TYPE:
>> +			if (efi_mem_attributes(phys_addr) & EFI_MEMORY_NV)
>> +				return true;
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +
>>  	/* Check if the address is outside kernel usable area */
>>  	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
>>  	case E820_TYPE_RESERVED:
>>  	case E820_TYPE_ACPI:
>>  	case E820_TYPE_NVS:
>>  	case E820_TYPE_UNUSABLE:
>> +	case E820_TYPE_PRAM:
>
> Can't you simply add:
>
> 	case E820_TYPE_PMEM:
>
> here too and thus get rid of the region_intersects() thing above?
>
> Because, for example, e820_type_to_iores_desc() maps E820_TYPE_PMEM to
> IORES_DESC_PERSISTENT_MEMORY so those should be equivalent...

I'll have to double-check this, but I believe that when persistent
memory is identified through the NFIT table it adds it as a resource
but doesn't add it as an e820 entry so I can't rely on the type being
returned as E820_TYPE_PMEM by e820__get_entry_type().

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption
  2017-05-16 14:27   ` Borislav Petkov
@ 2017-05-19 19:54     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 19:54 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/16/2017 9:27 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:20:10PM -0500, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create decrypted bounce buffers for use by these devices.
>
> Use a verb in the subject:
>
> Subject: x86, swiotlb: Add memory encryption support
>
> or similar.

Will do.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME
  2017-05-16 14:52   ` Borislav Petkov
@ 2017-05-19 19:55     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 19:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/16/2017 9:52 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:20:19PM -0500, Tom Lendacky wrote:
>> Add warnings to let the user know when bounce buffers are being used for
>> DMA when SME is active.  Since the bounce buffers are not in encrypted
>> memory, these notifications are to allow the user to determine some
>> appropriate action - if necessary.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>>  include/linux/dma-mapping.h        |   11 +++++++++++
>>  include/linux/mem_encrypt.h        |    6 ++++++
>>  lib/swiotlb.c                      |    3 +++
>>  4 files changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 0637b4b..b406df2 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>>  	return !!sme_me_mask;
>>  }
>>
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return ((u64)sme_me_mask << 1) - 1;
>> +}
>> +
>>  void __init sme_early_encrypt(resource_size_t paddr,
>>  			      unsigned long size);
>>  void __init sme_early_decrypt(resource_size_t paddr,
>> @@ -50,6 +55,12 @@ static inline bool sme_active(void)
>>  {
>>  	return false;
>>  }
>> +
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return 0ULL;
>> +}
>> +
>>  #endif
>>
>>  static inline void __init sme_early_encrypt(resource_size_t paddr,
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 0977317..f825870 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -10,6 +10,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/kmemcheck.h>
>>  #include <linux/bug.h>
>> +#include <linux/mem_encrypt.h>
>>
>>  /**
>>   * List of possible attributes associated with a DMA mapping. The semantics
>> @@ -577,6 +578,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>>
>>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>>  		return -EIO;
>> +
>> +	if (sme_active() && (mask < sme_dma_mask()))
>> +		dev_warn_ratelimited(dev,
>> +				     "SME is active, device will require DMA bounce buffers\n");
>
> Bah, no need to break that line - just let it stick out. Ditto for the
> others.

Ok.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-19 11:30     ` Borislav Petkov
@ 2017-05-19 20:16       ` Josh Poimboeuf
  2017-05-19 20:29         ` Borislav Petkov
  2017-05-30 15:48         ` Tom Lendacky
  2017-05-30 15:46       ` Tom Lendacky
  1 sibling, 2 replies; 126+ messages in thread
From: Josh Poimboeuf @ 2017-05-19 20:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote:
> > it is called so early. I can get past it by adding:
> > 
> > CFLAGS_mem_encrypt.o := $(nostackp)
> > 
> > in the arch/x86/mm/Makefile, but that obviously eliminates the support
> > for the whole file.  Would it be better to split out the sme_enable()
> > and other boot routines into a separate file or just apply the
> > $(nostackp) to the whole file?
> 
> Josh might have a better idea here... CCed.

I'm the stack validation guy, not the stack protection guy :-)

But there is a way to disable compiler options on a per-function basis
with the gcc __optimize__ function attribute.  For example:

  __attribute__((__optimize__("no-stack-protector")))

-- 
Josh

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-19 20:16       ` Josh Poimboeuf
@ 2017-05-19 20:29         ` Borislav Petkov
  2017-05-30 15:48         ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-19 20:29 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, May 19, 2017 at 03:16:51PM -0500, Josh Poimboeuf wrote:
> I'm the stack validation guy, not the stack protection guy :-)

LOL. I thought you were *the* stacks guy. :-)))

But once you've validated it, you could protect it then too. :-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-17 19:17   ` Borislav Petkov
@ 2017-05-19 20:45     ` Tom Lendacky
  2017-05-19 20:58       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 20:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/17/2017 2:17 PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:21:21PM -0500, Tom Lendacky wrote:
>> Provide support so that kexec can be used to boot a kernel when SME is
>> enabled.
>>
>> Support is needed to allocate pages for kexec without encryption.  This
>> is needed in order to be able to reboot in the kernel in the same manner
>> as originally booted.
>>
>> Additionally, when shutting down all of the CPUs we need to be sure to
>> flush the caches and then halt. This is needed when booting from a state
>> where SME was not active into a state where SME is active (or vice-versa).
>> Without these steps, it is possible for cache lines to exist for the same
>> physical location but tagged both with and without the encryption bit. This
>> can cause random memory corruption when caches are flushed depending on
>> which cacheline is written last.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/init.h          |    1 +
>>  arch/x86/include/asm/irqflags.h      |    5 +++++
>>  arch/x86/include/asm/kexec.h         |    8 ++++++++
>>  arch/x86/include/asm/pgtable_types.h |    1 +
>>  arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
>>  arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
>>  arch/x86/mm/ident_map.c              |   11 +++++++----
>>  include/linux/kexec.h                |   14 ++++++++++++++
>>  kernel/kexec_core.c                  |    7 +++++++
>>  9 files changed, 101 insertions(+), 7 deletions(-)
>
> ...
>
>> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>  		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>  	}
>>  	pte = pte_offset_kernel(pmd, vaddr);
>> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>>  	return 0;
>>  err:
>>  	free_transition_pgtable(image);
>> @@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>  		.alloc_pgt_page	= alloc_pgt_page,
>>  		.context	= image,
>>  		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
>> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>>  	};
>>  	unsigned long mstart, mend;
>>  	pgd_t *level4p;
>> @@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
>>  {
>>  	kexec_mark_crashkres(false);
>>  }
>> +
>> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>> +{
>> +	int ret;
>> +
>> +	if (sme_active()) {
>
> 	if (!sme_active())
> 		return 0;
>
> 	/*
> 	 * If SME...
>

Ok.

>
>> +		/*
>> +		 * If SME is active we need to be sure that kexec pages are
>> +		 * not encrypted because when we boot to the new kernel the
>> +		 * pages won't be accessed encrypted (initially).
>> +		 */
>> +		ret = set_memory_decrypted((unsigned long)vaddr, pages);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (gfp & __GFP_ZERO)
>> +			memset(vaddr, 0, pages * PAGE_SIZE);
>
> This function is called after alloc_pages() which already zeroes memory
> when __GFP_ZERO is supplied.
>
> If you need to clear the memory *after* set_memory_encrypted() happens,
> then you should probably mask out __GFP_ZERO before the alloc_pages()
> call so as not to do it twice.

I'll look into that.  I could put the memset() at the end of this
function so that it is done here no matter what.  And update the
default arch_kexec_post_alloc_pages() to also do the memset(). It
just hides the clearing of the pages a bit though by doing that.

>
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>> +{
>> +	if (sme_active()) {
>> +		/*
>> +		 * If SME is active we need to reset the pages back to being
>> +		 * an encrypted mapping before freeing them.
>> +		 */
>> +		set_memory_encrypted((unsigned long)vaddr, pages);
>> +	}
>> +}
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 0bb8842..f4e5de6 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/cpuidle.h>
>>  #include <trace/events/power.h>
>>  #include <linux/hw_breakpoint.h>
>> +#include <linux/kexec.h>
>>  #include <asm/cpu.h>
>>  #include <asm/apic.h>
>>  #include <asm/syscalls.h>
>> @@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
>>  	return ret;
>>  }
>>  #endif
>> +
>>  void stop_this_cpu(void *dummy)
>>  {
>> +	bool do_wbinvd_halt = false;
>> +
>> +	if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
>> +		/*
>> +		 * If we are performing a kexec and the processor supports
>> +		 * SME then we need to clear out cache information before
>> +		 * halting. With kexec, going from SME inactive to SME active
>> +		 * requires clearing cache entries so that addresses without
>> +		 * the encryption bit set don't corrupt the same physical
>> +		 * address that has the encryption bit set when caches are
>> +		 * flushed. Perform a wbinvd followed by a halt to achieve
>> +		 * this.
>> +		 */
>> +		do_wbinvd_halt = true;
>> +	}
>> +
>>  	local_irq_disable();
>>  	/*
>>  	 * Remove this CPU:
>> @@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
>>  	disable_local_APIC();
>>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>>
>> -	for (;;)
>> -		halt();
>> +	for (;;) {
>> +		if (do_wbinvd_halt)
>> +			native_wbinvd_halt();
>
> No need for that native_wbinvd_halt() thing:
>
> 	for (;;) {
> 		if (do_wbinvd)
> 			wbinvd();
>
> 		halt();
> 	}
>

Actually there is.  The above will result in data in the cache because
halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
to the comment above where do_wbinvd_halt is set to true). I could make
this a native_wbinvd() and native_halt() as long as those are
guaranteed to never turn into function calls.  But never say never, so
that's why I created native_wbinvd_halt().

Thanks,
Tom

>>  /*
>> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
>> index 04210a2..2c9fd3e 100644
>> --- a/arch/x86/mm/ident_map.c
>> +++ b/arch/x86/mm/ident_map.c
>> @@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
>>  static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>  			  unsigned long addr, unsigned long end)
>>  {
>> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>
> You're already supplying a x86_mapping_info and thus you can init
> kernpg_flag to default _KERNPG_TABLE and override it in the SME+kexec
> case, as you already do. And this way you can simply do:
>
> 	set_pud(pud, __pud(__pa(pmd) | info->kernpg_flag));
>
> here and in the other pagetable functions I've snipped below, and save
> yourself some lines.

Ok, I'll check into that.

Thanks,
Tom

>
> ...
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-18  9:02       ` Borislav Petkov
@ 2017-05-19 20:50         ` Tom Lendacky
  2017-05-21  7:16           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 20:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/18/2017 4:02 AM, Borislav Petkov wrote:
> On Wed, May 17, 2017 at 01:54:39PM -0500, Tom Lendacky wrote:
>> I was worried what the compiler might do when CONFIG_EFI is not set,
>> but it appears to take care of it. I'll double check though.
>
> There's a efi_enabled() !CONFIG_EFI version too, so should be fine.
>
>> I may introduce a length variable to capture data->len right after
>> paddr_next is set and then have just a single memunmap() call before
>> the if check.
>
> Yap.
>
>> I tried that, but calling an "__init" function (early_memremap()) from
>> a non "__init" function generated warnings. I suppose I can pass in a
>> function for the map and unmap but that looks worse to me (also the
>> unmap functions take different arguments).
>
> No, the other way around: the __init function should call the non-init
> one and you need the non-init one anyway for memremap_is_setup_data().
>

The "worker" function would be doing the loop through the setup data,
but since the setup data is mapped inside the loop I can't do the __init
calling the non-init function and still hope to consolidate the code.
Maybe I'm missing something here...

Thanks,
Tom

>> This is like the chicken and the egg scenario. In order to determine if
>> an address is setup data I have to explicitly map the setup data chain
>> as decrypted. In order to do that I have to supply a flag to explicitly
>> map the data decrypted otherwise I wind up back in the
>> memremap_is_setup_data() function again and again and again...
>
> Oh, fun.
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-19 20:45     ` Tom Lendacky
@ 2017-05-19 20:58       ` Borislav Petkov
  2017-05-19 21:07         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-19 20:58 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:
> Actually there is.  The above will result in data in the cache because
> halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
> to the comment above where do_wbinvd_halt is set to true). I could make
> this a native_wbinvd() and native_halt()

That's why we have the native_* versions - to bypass paravirt crap.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-19 20:58       ` Borislav Petkov
@ 2017-05-19 21:07         ` Tom Lendacky
  2017-05-19 21:28           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 21:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/19/2017 3:58 PM, Borislav Petkov wrote:
> On Fri, May 19, 2017 at 03:45:28PM -0500, Tom Lendacky wrote:
>> Actually there is.  The above will result in data in the cache because
>> halt() turns into a function call if CONFIG_PARAVIRT is defined (refer
>> to the comment above where do_wbinvd_halt is set to true). I could make
>> this a native_wbinvd() and native_halt()
>
> That's why we have the native_* versions - to bypass paravirt crap.

As long as those never change from static inline everything will be
fine. I can change it, but I really like how it explicitly indicates
what is needed in this case. Even if the function gets changed from
static inline the fact that the instructions are sequential in the
function covers that case.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-19 21:07         ` Tom Lendacky
@ 2017-05-19 21:28           ` Borislav Petkov
  2017-05-19 21:38             ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-19 21:28 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:
> As long as those never change from static inline everything will be
> fine. I can change it, but I really like how it explicitly indicates

I know what you want to do. But you're practically defining a helper
which contains two arbitrary instructions which probably no one else
will need.

So how about we simplify this function even more. We don't need to pay
attention to kexec being in progress because we're halting anyway so who
cares how fast we halt.

Might have to state that in the comment below though, instead of what's
there now.

And for the exact same moot reason, we don't need to look at SME CPUID
feature - we can just as well WBINVD unconditionally.

void stop_this_cpu(void *dummy)
{
        local_irq_disable();
        /*
         * Remove this CPU:
         */
        set_cpu_online(smp_processor_id(), false);
        disable_local_APIC();
        mcheck_cpu_clear(this_cpu_ptr(&cpu_info));

        for (;;) {
                /*
                 * If we are performing a kexec and the processor supports
                 * SME then we need to clear out cache information before
                 * halting. With kexec, going from SME inactive to SME active
                 * requires clearing cache entries so that addresses without
                 * the encryption bit set don't corrupt the same physical
                 * address that has the encryption bit set when caches are
                 * flushed. Perform a wbinvd followed by a halt to achieve
                 * this.
                 */
                asm volatile("wbinvd; hlt" ::: "memory");
        }
}

How's that?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-19 21:28           ` Borislav Petkov
@ 2017-05-19 21:38             ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-19 21:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/19/2017 4:28 PM, Borislav Petkov wrote:
> On Fri, May 19, 2017 at 04:07:24PM -0500, Tom Lendacky wrote:
>> As long as those never change from static inline everything will be
>> fine. I can change it, but I really like how it explicitly indicates
>
> I know what you want to do. But you're practically defining a helper
> which contains two arbitrary instructions which probably no one else
> will need.
>
> So how about we simplify this function even more. We don't need to pay
> attention to kexec being in progress because we're halting anyway so who
> cares how fast we halt.
>
> Might have to state that in the comment below though, instead of what's
> there now.
>
> And for the exact same moot reason, we don't need to look at SME CPUID
> feature - we can just as well WBINVD unconditionally.
>
> void stop_this_cpu(void *dummy)
> {
>         local_irq_disable();
>         /*
>          * Remove this CPU:
>          */
>         set_cpu_online(smp_processor_id(), false);
>         disable_local_APIC();
>         mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>
>         for (;;) {
>                 /*
>                  * If we are performing a kexec and the processor supports
>                  * SME then we need to clear out cache information before
>                  * halting. With kexec, going from SME inactive to SME active
>                  * requires clearing cache entries so that addresses without
>                  * the encryption bit set don't corrupt the same physical
>                  * address that has the encryption bit set when caches are
>                  * flushed. Perform a wbinvd followed by a halt to achieve
>                  * this.
>                  */
>                 asm volatile("wbinvd; hlt" ::: "memory");
>         }
> }
>
> How's that?

I can live with that!

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-19 20:50         ` Tom Lendacky
@ 2017-05-21  7:16           ` Borislav Petkov
  2017-05-30 16:46             ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-21  7:16 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Fri, May 19, 2017 at 03:50:32PM -0500, Tom Lendacky wrote:
> The "worker" function would be doing the loop through the setup data,
> but since the setup data is mapped inside the loop I can't do the __init
> calling the non-init function and still hope to consolidate the code.
> Maybe I'm missing something here...

Hmm, I see what you mean. But the below change ontop doesn't fire any
warnings here. Maybe your .config has something set which I don't...

---
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 55317ba3b6dc..199c983192ae 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -515,71 +515,50 @@ static bool memremap_is_efi_data(resource_size_t phys_addr,
  * Examine the physical address to determine if it is boot data by checking
  * it against the boot params setup_data chain.
  */
-static bool memremap_is_setup_data(resource_size_t phys_addr,
-				   unsigned long size)
+static bool
+__memremap_is_setup_data(resource_size_t phys_addr, unsigned long size, bool early)
 {
 	struct setup_data *data;
 	u64 paddr, paddr_next;
+	u32 len;
 
 	paddr = boot_params.hdr.setup_data;
 	while (paddr) {
-		bool is_setup_data = false;
 
 		if (phys_addr == paddr)
 			return true;
 
-		data = memremap(paddr, sizeof(*data),
-				MEMREMAP_WB | MEMREMAP_DEC);
+		if (early)
+			data = early_memremap_decrypted(paddr, sizeof(*data));
+		else
+			data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC);
 
 		paddr_next = data->next;
+		len = data->len;
 
-		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
-			is_setup_data = true;
+		if (early)
+			early_memunmap(data, sizeof(*data));
+		else
+			memunmap(data);
 
-		memunmap(data);
 
-		if (is_setup_data)
+		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
 			return true;
 
 		paddr = paddr_next;
 	}
-
 	return false;
 }
 
-/*
- * Examine the physical address to determine if it is boot data by checking
- * it against the boot params setup_data chain (early boot version).
- */
 static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
 						unsigned long size)
 {
-	struct setup_data *data;
-	u64 paddr, paddr_next;
-
-	paddr = boot_params.hdr.setup_data;
-	while (paddr) {
-		bool is_setup_data = false;
-
-		if (phys_addr == paddr)
-			return true;
-
-		data = early_memremap_decrypted(paddr, sizeof(*data));
-
-		paddr_next = data->next;
-
-		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
-			is_setup_data = true;
-
-		early_memunmap(data, sizeof(*data));
-
-		if (is_setup_data)
-			return true;
-
-		paddr = paddr_next;
-	}
+	return __memremap_is_setup_data(phys_addr, size, true);
+}
 
-	return false;
+static bool memremap_is_setup_data(resource_size_t phys_addr, unsigned long size)
+{
+	return __memremap_is_setup_data(phys_addr, size, false);
 }
 
 /*

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-05-18 12:46   ` Borislav Petkov
@ 2017-05-25 22:24     ` Tom Lendacky
  2017-05-26 16:25       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-25 22:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/18/2017 7:46 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:21:49PM -0500, Tom Lendacky wrote:
>> Add the support to encrypt the kernel in-place. This is done by creating
>> new page mappings for the kernel - a decrypted write-protected mapping
>> and an encrypted mapping. The kernel is encrypted by copying it through
>> a temporary buffer.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |    6 +
>>  arch/x86/mm/Makefile               |    2
>>  arch/x86/mm/mem_encrypt.c          |  262 ++++++++++++++++++++++++++++++++++++
>>  arch/x86/mm/mem_encrypt_boot.S     |  151 +++++++++++++++++++++
>>  4 files changed, 421 insertions(+)
>>  create mode 100644 arch/x86/mm/mem_encrypt_boot.S
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index b406df2..8f6f9b4 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -31,6 +31,12 @@ static inline u64 sme_dma_mask(void)
>>  	return ((u64)sme_me_mask << 1) - 1;
>>  }
>>
>> +void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
>> +			 unsigned long decrypted_kernel_vaddr,
>> +			 unsigned long kernel_len,
>> +			 unsigned long encryption_wa,
>> +			 unsigned long encryption_pgd);
>> +
>>  void __init sme_early_encrypt(resource_size_t paddr,
>>  			      unsigned long size);
>>  void __init sme_early_decrypt(resource_size_t paddr,
>> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
>> index 9e13841..0633142 100644
>> --- a/arch/x86/mm/Makefile
>> +++ b/arch/x86/mm/Makefile
>> @@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)		+= numa_emulation.o
>>  obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
>>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>>  obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
>> +
>> +obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 30b07a3..0ff41a4 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -24,6 +24,7 @@
>>  #include <asm/setup.h>
>>  #include <asm/bootparam.h>
>>  #include <asm/cacheflush.h>
>> +#include <asm/sections.h>
>>
>>  /*
>>   * Since SME related variables are set early in the boot process they must
>> @@ -216,8 +217,269 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>>  	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
>>  }
>>
>> +void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
>
> static

Yup.

>
>> +			  unsigned long end)
>> +{
>> +	unsigned long addr = start;
>> +	pgdval_t *pgd_p;
>> +
>> +	while (addr < end) {
>> +		unsigned long pgd_end;
>> +
>> +		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
>> +		if (pgd_end > end)
>> +			pgd_end = end;
>> +
>> +		pgd_p = (pgdval_t *)pgd_base + pgd_index(addr);
>> +		*pgd_p = 0;
>
> Hmm, so this is a contiguous range from [start:end] which translates to
> 8-byte PGD pointers in the PGD page so you can simply memset that range,
> no?
>
> Instead of iterating over each one?

I guess I could do that, but this will probably only end up clearing a
single PGD entry anyway since it's highly doubtful the address range
would cross a 512GB boundary.

>
>> +
>> +		addr = pgd_end;
>> +	}
>> +}
>> +
>> +#define PGD_FLAGS	_KERNPG_TABLE_NOENC
>> +#define PUD_FLAGS	_KERNPG_TABLE_NOENC
>> +#define PMD_FLAGS	(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
>> +
>> +static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
>> +				     unsigned long vaddr, pmdval_t pmd_val)
>> +{
>> +	pgdval_t pgd, *pgd_p;
>> +	pudval_t pud, *pud_p;
>> +	pmdval_t pmd, *pmd_p;
>
> You should use the enclosing type, not the underlying one. I.e.,
>
> 	pgd_t *pgd;
> 	pud_t *pud;
> 	...
>
> and then the macros native_p*d_val(), p*d_offset() and so on. I say
> native_* because we don't want to have any paravirt nastyness here.
> I believe your previous version was using the proper interfaces.

I won't be able to use the p*d_offset() macros since they use __va()
and we're identity mapped during this time (which is why I would guess
the proposed changes for the 5-level pagetables in
arch/x86/kernel/head64.c, __startup_64, don't use these macros
either). I should be able to use the native_set_p*d() and others though,
I'll look into that.

>
> And the kernel has gotten 5-level pagetables support in
> the meantime, so this'll need to start at p4d AFAICT.
> arch/x86/mm/fault.c::dump_pagetable() looks like a good example to stare
> at.

Yeah, I accounted for that in the other parts of the code but I need
to do that here also.

>
>> +	pgd_p = (pgdval_t *)pgd_base + pgd_index(vaddr);
>> +	pgd = *pgd_p;
>> +	if (pgd) {
>> +		pud_p = (pudval_t *)(pgd & ~PTE_FLAGS_MASK);
>> +	} else {
>> +		pud_p = pgtable_area;
>> +		memset(pud_p, 0, sizeof(*pud_p) * PTRS_PER_PUD);
>> +		pgtable_area += sizeof(*pud_p) * PTRS_PER_PUD;
>> +
>> +		*pgd_p = (pgdval_t)pud_p + PGD_FLAGS;
>> +	}
>> +
>> +	pud_p += pud_index(vaddr);
>> +	pud = *pud_p;
>> +	if (pud) {
>> +		if (pud & _PAGE_PSE)
>> +			goto out;
>> +
>> +		pmd_p = (pmdval_t *)(pud & ~PTE_FLAGS_MASK);
>> +	} else {
>> +		pmd_p = pgtable_area;
>> +		memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
>> +		pgtable_area += sizeof(*pmd_p) * PTRS_PER_PMD;
>> +
>> +		*pud_p = (pudval_t)pmd_p + PUD_FLAGS;
>> +	}
>> +
>> +	pmd_p += pmd_index(vaddr);
>> +	pmd = *pmd_p;
>> +	if (!pmd || !(pmd & _PAGE_PSE))
>> +		*pmd_p = pmd_val;
>> +
>> +out:
>> +	return pgtable_area;
>> +}
>> +
>> +static unsigned long __init sme_pgtable_calc(unsigned long len)
>> +{
>> +	unsigned long pud_tables, pmd_tables;
>> +	unsigned long total = 0;
>> +
>> +	/*
>> +	 * Perform a relatively simplistic calculation of the pagetable
>> +	 * entries that are needed. That mappings will be covered by 2MB
>> +	 * PMD entries so we can conservatively calculate the required
>> +	 * number of PUD and PMD structures needed to perform the mappings.
>> +	 * Incrementing the count for each covers the case where the
>> +	 * addresses cross entries.
>> +	 */
>> +	pud_tables = ALIGN(len, PGDIR_SIZE) / PGDIR_SIZE;
>> +	pud_tables++;
>> +	pmd_tables = ALIGN(len, PUD_SIZE) / PUD_SIZE;
>> +	pmd_tables++;
>> +
>> +	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
>> +	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
>> +
>> +	/*
>> +	 * Now calculate the added pagetable structures needed to populate
>> +	 * the new pagetables.
>> +	 */
>
> Nice commenting, helps following what's going on.
>
>> +	pud_tables = ALIGN(total, PGDIR_SIZE) / PGDIR_SIZE;
>> +	pmd_tables = ALIGN(total, PUD_SIZE) / PUD_SIZE;
>> +
>> +	total += pud_tables * sizeof(pud_t) * PTRS_PER_PUD;
>> +	total += pmd_tables * sizeof(pmd_t) * PTRS_PER_PMD;
>> +
>> +	return total;
>> +}
>> +
>>  void __init sme_encrypt_kernel(void)
>>  {
>> +	pgd_t *pgd;
>> +	void *pgtable_area;
>> +	unsigned long kernel_start, kernel_end, kernel_len;
>> +	unsigned long workarea_start, workarea_end, workarea_len;
>> +	unsigned long execute_start, execute_end, execute_len;
>> +	unsigned long pgtable_area_len;
>> +	unsigned long decrypted_base;
>> +	unsigned long paddr, pmd_flags;
>
>
> Please sort function local variables declaration in a reverse christmas
> tree order:
>
> 	<type> longest_variable_name;
> 	<type> shorter_var_name;
> 	<type> even_shorter;
> 	<type> i;
>

Will do.

>> +
>> +	if (!sme_active())
>> +		return;
>
> ...
>
>> diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
>> new file mode 100644
>> index 0000000..fb58f9f
>> --- /dev/null
>> +++ b/arch/x86/mm/mem_encrypt_boot.S
>> @@ -0,0 +1,151 @@
>> +/*
>> + * AMD Memory Encryption Support
>> + *
>> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/pgtable.h>
>> +#include <asm/page.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/msr-index.h>
>> +
>> +	.text
>> +	.code64
>> +ENTRY(sme_encrypt_execute)
>> +
>> +	/*
>> +	 * Entry parameters:
>> +	 *   RDI - virtual address for the encrypted kernel mapping
>> +	 *   RSI - virtual address for the decrypted kernel mapping
>> +	 *   RDX - length of kernel
>> +	 *   RCX - virtual address of the encryption workarea, including:
>> +	 *     - stack page (PAGE_SIZE)
>> +	 *     - encryption routine page (PAGE_SIZE)
>> +	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
>> +	 *    R8 - physcial address of the pagetables to use for encryption
>> +	 */
>> +
>> +	push	%rbp
>> +	push	%r12
>> +
>> +	/* Set up a one page stack in the non-encrypted memory area */
>> +	movq	%rsp, %rbp		/* Save current stack pointer */
>> +	movq	%rcx, %rax		/* Workarea stack page */
>> +	movq	%rax, %rsp		/* Set new stack pointer */
>> +	addq	$PAGE_SIZE, %rsp	/* Stack grows from the bottom */
>> +	addq	$PAGE_SIZE, %rax	/* Workarea encryption routine */
>> +
>> +	movq	%rdi, %r10		/* Encrypted kernel */
>> +	movq	%rsi, %r11		/* Decrypted kernel */
>> +	movq	%rdx, %r12		/* Kernel length */
>> +
>> +	/* Copy encryption routine into the workarea */
>> +	movq	%rax, %rdi		/* Workarea encryption routine */
>> +	leaq	.Lenc_start(%rip), %rsi	/* Encryption routine */
>> +	movq	$(.Lenc_stop - .Lenc_start), %rcx	/* Encryption routine length */
>> +	rep	movsb
>> +
>> +	/* Setup registers for call */
>> +	movq	%r10, %rdi		/* Encrypted kernel */
>> +	movq	%r11, %rsi		/* Decrypted kernel */
>> +	movq	%r8, %rdx		/* Pagetables used for encryption */
>> +	movq	%r12, %rcx		/* Kernel length */
>> +	movq	%rax, %r8		/* Workarea encryption routine */
>> +	addq	$PAGE_SIZE, %r8		/* Workarea intermediate copy buffer */
>> +
>> +	call	*%rax			/* Call the encryption routine */
>> +
>> +	movq	%rbp, %rsp		/* Restore original stack pointer */
>> +
>> +	pop	%r12
>> +	pop	%rbp
>> +
>> +	ret
>> +ENDPROC(sme_encrypt_execute)
>> +
>> +.Lenc_start:
>> +ENTRY(sme_enc_routine)
>
> A function called a "routine"? Why do we need the global symbol?
> Nothing's referencing it AFAICT.

I can change the name. As for the use of ENTRY... without the
ENTRY/ENDPROC combination I was receiving a warning about a return
instruction outside of a callable function. It looks like I can just
define the "sme_enc_routine:" label with the ENDPROC and the warning
goes away and the global is avoided. It doesn't like the local labels
(.L...) so I'll use the new name.

>
>> +/*
>> + * Routine used to encrypt kernel.
>> + *   This routine must be run outside of the kernel proper since
>> + *   the kernel will be encrypted during the process. So this
>> + *   routine is defined here and then copied to an area outside
>> + *   of the kernel where it will remain and run decrypted
>> + *   during execution.
>> + *
>> + *   On entry the registers must be:
>> + *     RDI - virtual address for the encrypted kernel mapping
>> + *     RSI - virtual address for the decrypted kernel mapping
>> + *     RDX - address of the pagetables to use for encryption
>> + *     RCX - length of kernel
>> + *      R8 - intermediate copy buffer
>> + *
>> + *     RAX - points to this routine
>> + *
>> + * The kernel will be encrypted by copying from the non-encrypted
>> + * kernel space to an intermediate buffer and then copying from the
>> + * intermediate buffer back to the encrypted kernel space. The physical
>> + * addresses of the two kernel space mappings are the same which
>> + * results in the kernel being encrypted "in place".
>> + */
>> +	/* Enable the new page tables */
>> +	mov	%rdx, %cr3
>> +
>> +	/* Flush any global TLBs */
>> +	mov	%cr4, %rdx
>> +	andq	$~X86_CR4_PGE, %rdx
>> +	mov	%rdx, %cr4
>> +	orq	$X86_CR4_PGE, %rdx
>> +	mov	%rdx, %cr4
>> +
>> +	/* Set the PAT register PA5 entry to write-protect */
>> +	push	%rcx
>> +	movl	$MSR_IA32_CR_PAT, %ecx
>> +	rdmsr
>> +	push	%rdx			/* Save original PAT value */
>> +	andl	$0xffff00ff, %edx	/* Clear PA5 */
>> +	orl	$0x00000500, %edx	/* Set PA5 to WP */
>
> Maybe check first whether PA5 is already set correctly and avoid the
> WRMSR and the restoring below too?

In the overall scheme of things it's probably not that big a deal when
compared to everything that's about to happen below.

>
>> +	wrmsr
>> +	pop	%rdx			/* RDX contains original PAT value */
>> +	pop	%rcx
>> +
>> +	movq	%rcx, %r9		/* Save kernel length */
>> +	movq	%rdi, %r10		/* Save encrypted kernel address */
>> +	movq	%rsi, %r11		/* Save decrypted kernel address */
>> +
>> +	wbinvd				/* Invalidate any cache entries */
>> +
>> +	/* Copy/encrypt 2MB at a time */
>> +1:
>> +	movq	%r11, %rsi		/* Source - decrypted kernel */
>> +	movq	%r8, %rdi		/* Dest   - intermediate copy buffer */
>> +	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
>> +	rep	movsb
>
> not movsQ?

The hardware will try to optimize rep movsb into large chunks assuming
things are aligned, sizes are large enough, etc. so we don't have to
explicitly specify and setup for a rep movsq.

Thanks,
Tom

>
>> +	movq	%r8, %rsi		/* Source - intermediate copy buffer */
>> +	movq	%r10, %rdi		/* Dest   - encrypted kernel */
>> +	movq	$PMD_PAGE_SIZE, %rcx	/* 2MB length */
>> +	rep	movsb
>> +
>> +	addq	$PMD_PAGE_SIZE, %r11
>> +	addq	$PMD_PAGE_SIZE, %r10
>> +	subq	$PMD_PAGE_SIZE, %r9	/* Kernel length decrement */
>> +	jnz	1b			/* Kernel length not zero? */
>> +
>> +	/* Restore PAT register */
>> +	push	%rdx			/* Save original PAT value */
>> +	movl	$MSR_IA32_CR_PAT, %ecx
>> +	rdmsr
>> +	pop	%rdx			/* Restore original PAT value */
>> +	wrmsr
>> +
>> +	ret
>> +ENDPROC(sme_enc_routine)
>> +.Lenc_stop:
>>
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-05-18 17:01   ` Borislav Petkov
@ 2017-05-26  2:49     ` Dave Young
  2017-05-26  5:04       ` Xunlei Pang
  0 siblings, 1 reply; 126+ messages in thread
From: Dave Young @ 2017-05-26  2:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Thomas Gleixner, Dmitry Vyukov, xlpang

Ccing Xunlei he is reading the patches see what need to be done for
kdump. There should still be several places to handle to make kdump work.

On 05/18/17 at 07:01pm, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
> > Add sysfs support for SME so that user-space utilities (kdump, etc.) can
> > determine if SME is active.
> 
> But why do user-space tools need to know that?
> 
> I mean, when we load the kdump kernel, we do it with the first kernel,
> with the kexec_load() syscall, AFAICT. And that code does a lot of
> things during that init, like machine_kexec_prepare()->init_pgtable() to
> prepare the ident mapping of the second kernel, for example.
> 
> What I'm aiming at is that the first kernel knows *exactly* whether SME
> is enabled or not and doesn't need to tell the second one through some
> sysfs entries - it can do that during loading.
> 
> So I don't think we need any userspace things at all...

If kdump kernel can get the SME status from hardware register then this
should be not necessary and this patch can be dropped.

Thanks
Dave

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-04-18 21:21 ` [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME Tom Lendacky
  2017-05-17 19:17   ` Borislav Petkov
@ 2017-05-26  4:17   ` Xunlei Pang
  2017-05-27  2:17     ` Dave Young
  2017-05-30 17:46     ` Tom Lendacky
  1 sibling, 2 replies; 126+ messages in thread
From: Xunlei Pang @ 2017-05-26  4:17 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Thomas Gleixner, Rik van Riel, Brijesh Singh, Toshimitsu Kani,
	Arnd Bergmann, Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko, Dave Young,
	Larry Woodman, Dmitry Vyukov

On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
>
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.

Hi Tom,

Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel:
    kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc.

We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc
are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools.
I think a straightforward way would be to mark the whole reserved memory range without
encryption before loading all the kexec segments for kdump, I guess we can handle this
easily in arch_kexec_unprotect_crashkres().

Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the
encrypted data.

Regards,
Xunlei

>
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/init.h          |    1 +
>  arch/x86/include/asm/irqflags.h      |    5 +++++
>  arch/x86/include/asm/kexec.h         |    8 ++++++++
>  arch/x86/include/asm/pgtable_types.h |    1 +
>  arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
>  arch/x86/mm/ident_map.c              |   11 +++++++----
>  include/linux/kexec.h                |   14 ++++++++++++++
>  kernel/kexec_core.c                  |    7 +++++++
>  9 files changed, 101 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..b2ec511 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>  	void *context;			 /* context for alloc_pgt_page */
>  	unsigned long pmd_flag;		 /* page flag for PMD entry */
>  	unsigned long offset;		 /* ident mapping offset */
> +	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>  };
>  
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index ac7692d..38b5920 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>  	asm volatile("hlt": : :"memory");
>  }
>  
> +static inline __cpuidle void native_wbinvd_halt(void)
> +{
> +	asm volatile("wbinvd; hlt" : : : "memory");
> +}
> +
>  #endif
>  
>  #ifdef CONFIG_PARAVIRT
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 70ef205..e8183ac 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>  	uint64_t r15;
>  	uint64_t rip;
>  };
> +
> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
> +				       gfp_t gfp);
> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
> +
> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
> +
>  #endif
>  
>  typedef void crash_vmclear_fn(void);
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index ce8cb1c..0f326f4 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -213,6 +213,7 @@ enum page_cache_mode {
>  #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
>  #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>  #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
> +#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
>  #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>  #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>  #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 085c3b3..11c0ca9 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>  	}
>  	pte = pte_offset_kernel(pmd, vaddr);
> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>  	return 0;
>  err:
>  	free_transition_pgtable(image);
> @@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  		.alloc_pgt_page	= alloc_pgt_page,
>  		.context	= image,
>  		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>  	};
>  	unsigned long mstart, mend;
>  	pgd_t *level4p;
> @@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
>  {
>  	kexec_mark_crashkres(false);
>  }
> +
> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
> +{
> +	int ret;
> +
> +	if (sme_active()) {
> +		/*
> +		 * If SME is active we need to be sure that kexec pages are
> +		 * not encrypted because when we boot to the new kernel the
> +		 * pages won't be accessed encrypted (initially).
> +		 */
> +		ret = set_memory_decrypted((unsigned long)vaddr, pages);
> +		if (ret)
> +			return ret;
> +
> +		if (gfp & __GFP_ZERO)
> +			memset(vaddr, 0, pages * PAGE_SIZE);
> +	}
> +
> +	return 0;
> +}
> +
> +void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
> +{
> +	if (sme_active()) {
> +		/*
> +		 * If SME is active we need to reset the pages back to being
> +		 * an encrypted mapping before freeing them.
> +		 */
> +		set_memory_encrypted((unsigned long)vaddr, pages);
> +	}
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0bb8842..f4e5de6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -24,6 +24,7 @@
>  #include <linux/cpuidle.h>
>  #include <trace/events/power.h>
>  #include <linux/hw_breakpoint.h>
> +#include <linux/kexec.h>
>  #include <asm/cpu.h>
>  #include <asm/apic.h>
>  #include <asm/syscalls.h>
> @@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
>  	return ret;
>  }
>  #endif
> +
>  void stop_this_cpu(void *dummy)
>  {
> +	bool do_wbinvd_halt = false;
> +
> +	if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
> +		/*
> +		 * If we are performing a kexec and the processor supports
> +		 * SME then we need to clear out cache information before
> +		 * halting. With kexec, going from SME inactive to SME active
> +		 * requires clearing cache entries so that addresses without
> +		 * the encryption bit set don't corrupt the same physical
> +		 * address that has the encryption bit set when caches are
> +		 * flushed. Perform a wbinvd followed by a halt to achieve
> +		 * this.
> +		 */
> +		do_wbinvd_halt = true;
> +	}
> +
>  	local_irq_disable();
>  	/*
>  	 * Remove this CPU:
> @@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
>  	disable_local_APIC();
>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>  
> -	for (;;)
> -		halt();
> +	for (;;) {
> +		if (do_wbinvd_halt)
> +			native_wbinvd_halt();
> +		else
> +			halt();
> +	}
>  }
>  
>  /*
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 04210a2..2c9fd3e 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
>  static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  			  unsigned long addr, unsigned long end)
>  {
> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>  	unsigned long next;
>  
>  	for (; addr < end; addr = next) {
> @@ -39,7 +40,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  		if (!pmd)
>  			return -ENOMEM;
>  		ident_pmd_init(info, pmd, addr, next);
> -		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
> +		set_pud(pud, __pud(__pa(pmd) | kernpg_flag));
>  	}
>  
>  	return 0;
> @@ -48,6 +49,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>  			  unsigned long addr, unsigned long end)
>  {
> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>  	unsigned long next;
>  
>  	for (; addr < end; addr = next) {
> @@ -67,7 +69,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>  		if (!pud)
>  			return -ENOMEM;
>  		ident_pud_init(info, pud, addr, next);
> -		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
> +		set_p4d(p4d, __p4d(__pa(pud) | kernpg_flag));
>  	}
>  
>  	return 0;
> @@ -76,6 +78,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>  			      unsigned long pstart, unsigned long pend)
>  {
> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>  	unsigned long addr = pstart + info->offset;
>  	unsigned long end = pend + info->offset;
>  	unsigned long next;
> @@ -104,14 +107,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>  		if (result)
>  			return result;
>  		if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
> -			set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
> +			set_pgd(pgd, __pgd(__pa(p4d) | kernpg_flag));
>  		} else {
>  			/*
>  			 * With p4d folded, pgd is equal to p4d.
>  			 * The pgd entry has to point to the pud page table in this case.
>  			 */
>  			pud_t *pud = pud_offset(p4d, 0);
> -			set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
> +			set_pgd(pgd, __pgd(__pa(pud) | kernpg_flag));
>  		}
>  	}
>  
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index d419d0e..1c76e3b 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -383,6 +383,20 @@ static inline void *boot_phys_to_virt(unsigned long entry)
>  	return phys_to_virt(boot_phys_to_phys(entry));
>  }
>  
> +#ifndef arch_kexec_post_alloc_pages
> +static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
> +					      gfp_t gfp)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#ifndef arch_kexec_pre_free_pages
> +static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
> +{
> +}
> +#endif
> +
>  #else /* !CONFIG_KEXEC_CORE */
>  struct pt_regs;
>  struct task_struct;
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index bfe62d5..bb5e7e3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -38,6 +38,7 @@
>  #include <linux/syscore_ops.h>
>  #include <linux/compiler.h>
>  #include <linux/hugetlb.h>
> +#include <linux/mem_encrypt.h>
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -315,6 +316,9 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  		count = 1 << order;
>  		for (i = 0; i < count; i++)
>  			SetPageReserved(pages + i);
> +
> +		arch_kexec_post_alloc_pages(page_address(pages), count,
> +					    gfp_mask);
>  	}
>  
>  	return pages;
> @@ -326,6 +330,9 @@ static void kimage_free_pages(struct page *page)
>  
>  	order = page_private(page);
>  	count = 1 << order;
> +
> +	arch_kexec_pre_free_pages(page_address(page), count);
> +
>  	for (i = 0; i < count; i++)
>  		ClearPageReserved(page + i);
>  	__free_pages(page, order);
>
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-05-26  2:49     ` Dave Young
@ 2017-05-26  5:04       ` Xunlei Pang
  2017-05-26 15:47         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Xunlei Pang @ 2017-05-26  5:04 UTC (permalink / raw)
  To: Dave Young, Borislav Petkov
  Cc: linux-efi, Brijesh Singh, Toshimitsu Kani,
	Radim Krčmář,
	Matt Fleming, x86, linux-mm, Alexander Potapenko, H. Peter Anvin,
	Larry Woodman, linux-arch, kvm, Jonathan Corbet, Joerg Roedel,
	linux-doc, kasan-dev, Ingo Molnar, Andrey Ryabinin, Tom Lendacky,
	Rik van Riel, Arnd Bergmann, Konrad Rzeszutek Wilk, xlpang,
	Andy Lutomirski, Thomas Gleixner, Dmitry Vyukov, kexec,
	linux-kernel, iommu, Michael S. Tsirkin, Paolo Bonzini

On 05/26/2017 at 10:49 AM, Dave Young wrote:
> Ccing Xunlei he is reading the patches see what need to be done for
> kdump. There should still be several places to handle to make kdump work.
>
> On 05/18/17 at 07:01pm, Borislav Petkov wrote:
>> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>> determine if SME is active.
>> But why do user-space tools need to know that?
>>
>> I mean, when we load the kdump kernel, we do it with the first kernel,
>> with the kexec_load() syscall, AFAICT. And that code does a lot of
>> things during that init, like machine_kexec_prepare()->init_pgtable() to
>> prepare the ident mapping of the second kernel, for example.
>>
>> What I'm aiming at is that the first kernel knows *exactly* whether SME
>> is enabled or not and doesn't need to tell the second one through some
>> sysfs entries - it can do that during loading.
>>
>> So I don't think we need any userspace things at all...
> If kdump kernel can get the SME status from hardware register then this
> should be not necessary and this patch can be dropped.

Yes, I also agree with dropping this one.

Regards,
Xunlei

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption
  2017-05-26  5:04       ` Xunlei Pang
@ 2017-05-26 15:47         ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-26 15:47 UTC (permalink / raw)
  To: xlpang, Dave Young, Borislav Petkov
  Cc: linux-efi, Brijesh Singh, Toshimitsu Kani,
	Radim Krčmář,
	Matt Fleming, x86, linux-mm, Alexander Potapenko, H. Peter Anvin,
	Larry Woodman, linux-arch, kvm, Jonathan Corbet, Joerg Roedel,
	linux-doc, kasan-dev, Ingo Molnar, Andrey Ryabinin, Rik van Riel,
	Arnd Bergmann, Konrad Rzeszutek Wilk, Andy Lutomirski,
	Thomas Gleixner, Dmitry Vyukov, kexec, linux-kernel, iommu,
	Michael S. Tsirkin, Paolo Bonzini

On 5/26/2017 12:04 AM, Xunlei Pang wrote:
> On 05/26/2017 at 10:49 AM, Dave Young wrote:
>> Ccing Xunlei he is reading the patches see what need to be done for
>> kdump. There should still be several places to handle to make kdump work.
>>
>> On 05/18/17 at 07:01pm, Borislav Petkov wrote:
>>> On Tue, Apr 18, 2017 at 04:22:12PM -0500, Tom Lendacky wrote:
>>>> Add sysfs support for SME so that user-space utilities (kdump, etc.) can
>>>> determine if SME is active.
>>> But why do user-space tools need to know that?
>>>
>>> I mean, when we load the kdump kernel, we do it with the first kernel,
>>> with the kexec_load() syscall, AFAICT. And that code does a lot of
>>> things during that init, like machine_kexec_prepare()->init_pgtable() to
>>> prepare the ident mapping of the second kernel, for example.
>>>
>>> What I'm aiming at is that the first kernel knows *exactly* whether SME
>>> is enabled or not and doesn't need to tell the second one through some
>>> sysfs entries - it can do that during loading.
>>>
>>> So I don't think we need any userspace things at all...
>> If kdump kernel can get the SME status from hardware register then this
>> should be not necessary and this patch can be dropped.
>
> Yes, I also agree with dropping this one.

Consensus is to drop, so it will be.

Thanks,
Tom

>
> Regards,
> Xunlei
>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-18 19:50     ` Matt Fleming
@ 2017-05-26 16:22       ` Tom Lendacky
  2017-05-26 16:35         ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-26 16:22 UTC (permalink / raw)
  To: Matt Fleming, Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov,
	Ard Biesheuvel

On 5/18/2017 2:50 PM, Matt Fleming wrote:
> On Mon, 15 May, at 08:35:17PM, Borislav Petkov wrote:
>> On Tue, Apr 18, 2017 at 04:19:21PM -0500, Tom Lendacky wrote:
>>
>>> +		paddr = boot_params.efi_info.efi_memmap_hi;
>>> +		paddr <<= 32;
>>> +		paddr |= boot_params.efi_info.efi_memmap;
>>> +		if (phys_addr == paddr)
>>> +			return true;
>>> +
>>> +		paddr = boot_params.efi_info.efi_systab_hi;
>>> +		paddr <<= 32;
>>> +		paddr |= boot_params.efi_info.efi_systab;
>>
>> So those two above look like could be two global vars which are
>> initialized somewhere in the EFI init path:
>>
>> efi_memmap_phys and efi_systab_phys or so.
>>
>> Matt ?
>>
>> And then you won't need to create that paddr each time on the fly. I
>> mean, it's not a lot of instructions but still...
>
> We should already have the physical memmap address available in
> 'efi.memmap.phys_map'.

Unfortunately memremap_is_efi_data() is called before the efi structure
gets initialized, so I can't use that value.

>
> And the physical address of the system table should be in
> 'efi_phys.systab'. See efi_init().

In addition to the same issue as efi.memmap.phys_map, efi_phys has
the __initdata attribute so it will be released/freed which will cause
problems in checks performed afterwards.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-05-25 22:24     ` Tom Lendacky
@ 2017-05-26 16:25       ` Borislav Petkov
  2017-05-30 16:39         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-26 16:25 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Thu, May 25, 2017 at 05:24:27PM -0500, Tom Lendacky wrote:
> I guess I could do that, but this will probably only end up clearing a
> single PGD entry anyway since it's highly doubtful the address range
> would cross a 512GB boundary.

Or you can compute how many 512G-covering, i.e., PGD entries there are
and clear just the right amnount. :^)

> I can change the name. As for the use of ENTRY... without the
> ENTRY/ENDPROC combination I was receiving a warning about a return
> instruction outside of a callable function. It looks like I can just
> define the "sme_enc_routine:" label with the ENDPROC and the warning
> goes away and the global is avoided. It doesn't like the local labels
> (.L...) so I'll use the new name.

Is that warning from objtool or where does it come from?

How do I trigger it locally?

> The hardware will try to optimize rep movsb into large chunks assuming
> things are aligned, sizes are large enough, etc. so we don't have to
> explicitly specify and setup for a rep movsq.

I thought the hw does that for movsq too?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-26 16:22       ` Tom Lendacky
@ 2017-05-26 16:35         ` Borislav Petkov
  2017-05-30 17:47           ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-26 16:35 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Matt Fleming, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov,
	Ard Biesheuvel

On Fri, May 26, 2017 at 11:22:36AM -0500, Tom Lendacky wrote:
> In addition to the same issue as efi.memmap.phys_map, efi_phys has
> the __initdata attribute so it will be released/freed which will cause
> problems in checks performed afterwards.

Sounds to me like we should drop the __initdata attr and prepare them
much earlier for use by the SME code.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-26  4:17   ` Xunlei Pang
@ 2017-05-27  2:17     ` Dave Young
  2017-05-30 17:46     ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Dave Young @ 2017-05-27  2:17 UTC (permalink / raw)
  To: xlpang
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Thomas Gleixner,
	Rik van Riel, Brijesh Singh, Toshimitsu Kani, Arnd Bergmann,
	Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko,
	Larry Woodman, Dmitry Vyukov

On 05/26/17 at 12:17pm, Xunlei Pang wrote:
> On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
> > Provide support so that kexec can be used to boot a kernel when SME is
> > enabled.
> >
> > Support is needed to allocate pages for kexec without encryption.  This
> > is needed in order to be able to reboot in the kernel in the same manner
> > as originally booted.
> 
> Hi Tom,
> 
> Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel:
>     kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc.
> 
> We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc
> are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools.

For kexec_load, it is loaded by kexec-tools, we have in kernel loader
syscall kexec_file_load, it is handled in kernel.

> I think a straightforward way would be to mark the whole reserved memory range without
> encryption before loading all the kexec segments for kdump, I guess we can handle this
> easily in arch_kexec_unprotect_crashkres().
> 
> Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the
> encrypted data.

Tom, could you have a try on kdump according to suggestion from Xunlei?
It is just based on theoretical patch understanding, there could be
other issues when you work on it. Feel free to ask if we can help on
anything.

Thanks
Dave

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-19 11:27   ` Borislav Petkov
@ 2017-05-30 14:38     ` Tom Lendacky
  2017-05-30 14:55       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 14:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/19/2017 6:27 AM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:22:23PM -0500, Tom Lendacky wrote:
>> Add support to check if SME has been enabled and if memory encryption
>> should be activated (checking of command line option based on the
>> configuration of the default state).  If memory encryption is to be
>> activated, then the encryption mask is set and the kernel is encrypted
>> "in place."
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>   arch/x86/kernel/head_64.S |    1 +
>>   arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
>>   2 files changed, 80 insertions(+), 4 deletions(-)
> 
> ...
> 
>> +unsigned long __init sme_enable(struct boot_params *bp)
>>   {
>> +	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
>> +	unsigned int eax, ebx, ecx, edx;
>> +	unsigned long me_mask;
>> +	bool active_by_default;
>> +	char buffer[16];
>> +	u64 msr;
>> +
>> +	/* Check for the SME support leaf */
>> +	eax = 0x80000000;
>> +	ecx = 0;
>> +	native_cpuid(&eax, &ebx, &ecx, &edx);
>> +	if (eax < 0x8000001f)
>> +		goto out;
>> +
>> +	/*
>> +	 * Check for the SME feature:
>> +	 *   CPUID Fn8000_001F[EAX] - Bit 0
>> +	 *     Secure Memory Encryption support
>> +	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
>> +	 *     Pagetable bit position used to indicate encryption
>> +	 */
>> +	eax = 0x8000001f;
>> +	ecx = 0;
>> +	native_cpuid(&eax, &ebx, &ecx, &edx);
>> +	if (!(eax & 1))
>> +		goto out;
> 
> <---- newline here.
> 
>> +	me_mask = 1UL << (ebx & 0x3f);
>> +
>> +	/* Check if SME is enabled */
>> +	msr = __rdmsr(MSR_K8_SYSCFG);
>> +	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
>> +		goto out;
>> +
>> +	/*
>> +	 * Fixups have not been applied to phys_base yet, so we must obtain
>> +	 * the address to the SME command line option data in the following
>> +	 * way.
>> +	 */
>> +	asm ("lea sme_cmdline_arg(%%rip), %0"
>> +	     : "=r" (cmdline_arg)
>> +	     : "p" (sme_cmdline_arg));
>> +	asm ("lea sme_cmdline_on(%%rip), %0"
>> +	     : "=r" (cmdline_on)
>> +	     : "p" (sme_cmdline_on));
>> +	asm ("lea sme_cmdline_off(%%rip), %0"
>> +	     : "=r" (cmdline_off)
>> +	     : "p" (sme_cmdline_off));
>> +
>> +	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
>> +		active_by_default = true;
>> +	else
>> +		active_by_default = false;
>> +
>> +	cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
>> +				     ((u64)bp->ext_cmd_line_ptr << 32));
>> +
>> +	cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
>> +
>> +	if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)
>> +		sme_me_mask = me_mask;
> 
> Why doesn't simply
> 
> 	if (!strncmp(buffer, "on", 2))
> 		...
> 
> work?

In this case we're running identity mapped and the "on" constant ends up
as kernel address (0xffffffff81...) which results in a segfault.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-30 14:38     ` Tom Lendacky
@ 2017-05-30 14:55       ` Borislav Petkov
  2017-05-30 15:37         ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-30 14:55 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote:
> In this case we're running identity mapped and the "on" constant ends up
> as kernel address (0xffffffff81...) which results in a segfault.

Would

	static const char *__on_str = "on";

	...

	if (!strncmp(buffer, __pa_nodebug(__on_str), 2))
		...

work?

__phys_addr_nodebug() seems to pay attention to phys_base and
PAGE_OFFSET and so on...

I'd like to avoid that rip-relative address finding in inline asm which
looks fragile to me.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-30 14:55       ` Borislav Petkov
@ 2017-05-30 15:37         ` Tom Lendacky
  2017-05-31  8:49           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 15:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/30/2017 9:55 AM, Borislav Petkov wrote:
> On Tue, May 30, 2017 at 09:38:36AM -0500, Tom Lendacky wrote:
>> In this case we're running identity mapped and the "on" constant ends up
>> as kernel address (0xffffffff81...) which results in a segfault.
> 
> Would
> 
> 	static const char *__on_str = "on";
> 
> 	...
> 
> 	if (!strncmp(buffer, __pa_nodebug(__on_str), 2))
> 		...
> 
> work?
> 
> __phys_addr_nodebug() seems to pay attention to phys_base and
> PAGE_OFFSET and so on...

Except that phys_base hasn't been adjusted yet so that doesn't work
either.

> 
> I'd like to avoid that rip-relative address finding in inline asm which
> looks fragile to me.

I can define the command line option and the "on" and "off" values as
character buffers in the function and initialize them on a per character
basis (using a static string causes the same issues as referencing a
string constant), i.e.:

char cmdline_arg[] = {'m', 'e', 'm', '_', 'e', 'n', 'c', 'r', 'y', 'p', 't', '\0'};
char cmdline_off[] = {'o', 'f', 'f', '\0'};
char cmdline_on[] = {'o', 'n', '\0'};

It doesn't look the greatest, but it works and removes the need for the
rip-relative addressing.

Thanks,
Tom

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-19 11:30     ` Borislav Petkov
  2017-05-19 20:16       ` Josh Poimboeuf
@ 2017-05-30 15:46       ` Tom Lendacky
  1 sibling, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 15:46 UTC (permalink / raw)
  To: Borislav Petkov, Josh Poimboeuf
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/19/2017 6:30 AM, Borislav Petkov wrote:
> On Fri, Apr 21, 2017 at 01:56:13PM -0500, Tom Lendacky wrote:
>> On 4/18/2017 4:22 PM, Tom Lendacky wrote:
>>> Add support to check if SME has been enabled and if memory encryption
>>> should be activated (checking of command line option based on the
>>> configuration of the default state).  If memory encryption is to be
>>> activated, then the encryption mask is set and the kernel is encrypted
>>> "in place."
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>> ---
>>>   arch/x86/kernel/head_64.S |    1 +
>>>   arch/x86/mm/mem_encrypt.c |   83 +++++++++++++++++++++++++++++++++++++++++++--
>>>   2 files changed, 80 insertions(+), 4 deletions(-)
>>>
>>
>> ...
>>
>>>
>>> -unsigned long __init sme_enable(void)
>>> +unsigned long __init sme_enable(struct boot_params *bp)
>>>   {
>>> +	const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
>>> +	unsigned int eax, ebx, ecx, edx;
>>> +	unsigned long me_mask;
>>> +	bool active_by_default;
>>> +	char buffer[16];
>>
>> So it turns out that when KASLR is enabled (CONFIG_RAMDOMIZE_BASE=y)
>> the stack-protector support causes issues with this function because
> 
> What issues?

The stack protection support makes use of the gs segment register and
at this point not everything is setup properly to allow it to work,
so it segfaults.

Thanks,
Tom

> 
>> it is called so early. I can get past it by adding:
>>
>> CFLAGS_mem_encrypt.o := $(nostackp)
>>
>> in the arch/x86/mm/Makefile, but that obviously eliminates the support
>> for the whole file.  Would it be better to split out the sme_enable()
>> and other boot routines into a separate file or just apply the
>> $(nostackp) to the whole file?
> 
> Josh might have a better idea here... CCed.
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-19 20:16       ` Josh Poimboeuf
  2017-05-19 20:29         ` Borislav Petkov
@ 2017-05-30 15:48         ` Tom Lendacky
  2017-05-31  9:15           ` Borislav Petkov
  1 sibling, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 15:48 UTC (permalink / raw)
  To: Josh Poimboeuf, Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/19/2017 3:16 PM, Josh Poimboeuf wrote:
> On Fri, May 19, 2017 at 01:30:05PM +0200, Borislav Petkov wrote:
>>> it is called so early. I can get past it by adding:
>>>
>>> CFLAGS_mem_encrypt.o := $(nostackp)
>>>
>>> in the arch/x86/mm/Makefile, but that obviously eliminates the support
>>> for the whole file.  Would it be better to split out the sme_enable()
>>> and other boot routines into a separate file or just apply the
>>> $(nostackp) to the whole file?
>>
>> Josh might have a better idea here... CCed.
> 
> I'm the stack validation guy, not the stack protection guy :-)
> 
> But there is a way to disable compiler options on a per-function basis
> with the gcc __optimize__ function attribute.  For example:
> 
>    __attribute__((__optimize__("no-stack-protector")))
> 

I'll look at doing that instead of removing the support for the whole
file.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-05-26 16:25       ` Borislav Petkov
@ 2017-05-30 16:39         ` Tom Lendacky
  2017-05-31  9:51           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 16:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/26/2017 11:25 AM, Borislav Petkov wrote:
> On Thu, May 25, 2017 at 05:24:27PM -0500, Tom Lendacky wrote:
>> I guess I could do that, but this will probably only end up clearing a
>> single PGD entry anyway since it's highly doubtful the address range
>> would cross a 512GB boundary.
> 
> Or you can compute how many 512G-covering, i.e., PGD entries there are
> and clear just the right amnount. :^)
> 
>> I can change the name. As for the use of ENTRY... without the
>> ENTRY/ENDPROC combination I was receiving a warning about a return
>> instruction outside of a callable function. It looks like I can just
>> define the "sme_enc_routine:" label with the ENDPROC and the warning
>> goes away and the global is avoided. It doesn't like the local labels
>> (.L...) so I'll use the new name.
> 
> Is that warning from objtool or where does it come from?

Yes, it's from objtool:

arch/x86/mm/mem_encrypt_boot.o: warning: objtool: .text+0xd2: return 
instruction outside of a callable function

> 
> How do I trigger it locally

I think having CONFIG_STACK_VALIDATION=y will trigger it.

> 
>> The hardware will try to optimize rep movsb into large chunks assuming
>> things are aligned, sizes are large enough, etc. so we don't have to
>> explicitly specify and setup for a rep movsq.
> 
> I thought the hw does that for movsq too?

It does.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-21  7:16           ` Borislav Petkov
@ 2017-05-30 16:46             ` Tom Lendacky
  2017-05-31 11:31               ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 16:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/21/2017 2:16 AM, Borislav Petkov wrote:
> On Fri, May 19, 2017 at 03:50:32PM -0500, Tom Lendacky wrote:
>> The "worker" function would be doing the loop through the setup data,
>> but since the setup data is mapped inside the loop I can't do the __init
>> calling the non-init function and still hope to consolidate the code.
>> Maybe I'm missing something here...
> 
> Hmm, I see what you mean. But the below change ontop doesn't fire any
> warnings here. Maybe your .config has something set which I don't...

Check if you have CONFIG_DEBUG_SECTION_MISMATCH=y

Thanks,
Tom

> 
> ---
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 55317ba3b6dc..199c983192ae 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -515,71 +515,50 @@ static bool memremap_is_efi_data(resource_size_t phys_addr,
>    * Examine the physical address to determine if it is boot data by checking
>    * it against the boot params setup_data chain.
>    */
> -static bool memremap_is_setup_data(resource_size_t phys_addr,
> -				   unsigned long size)
> +static bool
> +__memremap_is_setup_data(resource_size_t phys_addr, unsigned long size, bool early)
>   {
>   	struct setup_data *data;
>   	u64 paddr, paddr_next;
> +	u32 len;
>   
>   	paddr = boot_params.hdr.setup_data;
>   	while (paddr) {
> -		bool is_setup_data = false;
>   
>   		if (phys_addr == paddr)
>   			return true;
>   
> -		data = memremap(paddr, sizeof(*data),
> -				MEMREMAP_WB | MEMREMAP_DEC);
> +		if (early)
> +			data = early_memremap_decrypted(paddr, sizeof(*data));
> +		else
> +			data = memremap(paddr, sizeof(*data), MEMREMAP_WB | MEMREMAP_DEC);
>   
>   		paddr_next = data->next;
> +		len = data->len;
>   
> -		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
> -			is_setup_data = true;
> +		if (early)
> +			early_memunmap(data, sizeof(*data));
> +		else
> +			memunmap(data);
>   
> -		memunmap(data);
>   
> -		if (is_setup_data)
> +		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
>   			return true;
>   
>   		paddr = paddr_next;
>   	}
> -
>   	return false;
>   }
>   
> -/*
> - * Examine the physical address to determine if it is boot data by checking
> - * it against the boot params setup_data chain (early boot version).
> - */
>   static bool __init early_memremap_is_setup_data(resource_size_t phys_addr,
>   						unsigned long size)
>   {
> -	struct setup_data *data;
> -	u64 paddr, paddr_next;
> -
> -	paddr = boot_params.hdr.setup_data;
> -	while (paddr) {
> -		bool is_setup_data = false;
> -
> -		if (phys_addr == paddr)
> -			return true;
> -
> -		data = early_memremap_decrypted(paddr, sizeof(*data));
> -
> -		paddr_next = data->next;
> -
> -		if ((phys_addr > paddr) && (phys_addr < (paddr + data->len)))
> -			is_setup_data = true;
> -
> -		early_memunmap(data, sizeof(*data));
> -
> -		if (is_setup_data)
> -			return true;
> -
> -		paddr = paddr_next;
> -	}
> +	return __memremap_is_setup_data(phys_addr, size, true);
> +}
>   
> -	return false;
> +static bool memremap_is_setup_data(resource_size_t phys_addr, unsigned long size)
> +{
> +	return __memremap_is_setup_data(phys_addr, size, false);
>   }
>   
>   /*
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-26  4:17   ` Xunlei Pang
  2017-05-27  2:17     ` Dave Young
@ 2017-05-30 17:46     ` Tom Lendacky
  2017-05-31 10:01       ` Borislav Petkov
  2017-05-31 15:03       ` Xunlei Pang
  1 sibling, 2 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 17:46 UTC (permalink / raw)
  To: xlpang, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Thomas Gleixner, Rik van Riel, Brijesh Singh, Toshimitsu Kani,
	Arnd Bergmann, Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko, Dave Young,
	Larry Woodman, Dmitry Vyukov

On 5/25/2017 11:17 PM, Xunlei Pang wrote:
> On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
>> Provide support so that kexec can be used to boot a kernel when SME is
>> enabled.
>>
>> Support is needed to allocate pages for kexec without encryption.  This
>> is needed in order to be able to reboot in the kernel in the same manner
>> as originally booted.
> 
> Hi Tom,
> 
> Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel:
>      kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. >
> We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc
> are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools.
> I think a straightforward way would be to mark the whole reserved memory range without
> encryption before loading all the kexec segments for kdump, I guess we can handle this
> easily in arch_kexec_unprotect_crashkres().

Yes, that would work.

> 
> Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the
> encrypted data.

This is an area that I'm not familiar with, so I don't completely
understand the flow in regards to where/when/how the ELF headers are
copied and what needs to be done.

Can you elaborate a bit on this?

Thanks,
Tom

> 
> Regards,
> Xunlei
> 
>>
>> Additionally, when shutting down all of the CPUs we need to be sure to
>> flush the caches and then halt. This is needed when booting from a state
>> where SME was not active into a state where SME is active (or vice-versa).
>> Without these steps, it is possible for cache lines to exist for the same
>> physical location but tagged both with and without the encryption bit. This
>> can cause random memory corruption when caches are flushed depending on
>> which cacheline is written last.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>   arch/x86/include/asm/init.h          |    1 +
>>   arch/x86/include/asm/irqflags.h      |    5 +++++
>>   arch/x86/include/asm/kexec.h         |    8 ++++++++
>>   arch/x86/include/asm/pgtable_types.h |    1 +
>>   arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
>>   arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
>>   arch/x86/mm/ident_map.c              |   11 +++++++----
>>   include/linux/kexec.h                |   14 ++++++++++++++
>>   kernel/kexec_core.c                  |    7 +++++++
>>   9 files changed, 101 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>> index 737da62..b2ec511 100644
>> --- a/arch/x86/include/asm/init.h
>> +++ b/arch/x86/include/asm/init.h
>> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>>   	void *context;			 /* context for alloc_pgt_page */
>>   	unsigned long pmd_flag;		 /* page flag for PMD entry */
>>   	unsigned long offset;		 /* ident mapping offset */
>> +	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>>   };
>>   
>>   int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>> index ac7692d..38b5920 100644
>> --- a/arch/x86/include/asm/irqflags.h
>> +++ b/arch/x86/include/asm/irqflags.h
>> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>>   	asm volatile("hlt": : :"memory");
>>   }
>>   
>> +static inline __cpuidle void native_wbinvd_halt(void)
>> +{
>> +	asm volatile("wbinvd; hlt" : : : "memory");
>> +}
>> +
>>   #endif
>>   
>>   #ifdef CONFIG_PARAVIRT
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 70ef205..e8183ac 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>>   	uint64_t r15;
>>   	uint64_t rip;
>>   };
>> +
>> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
>> +				       gfp_t gfp);
>> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
>> +
>> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
>> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
>> +
>>   #endif
>>   
>>   typedef void crash_vmclear_fn(void);
>> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
>> index ce8cb1c..0f326f4 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -213,6 +213,7 @@ enum page_cache_mode {
>>   #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
>>   #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>>   #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
>> +#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
>>   #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>>   #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>>   #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 085c3b3..11c0ca9 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>   		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>   	}
>>   	pte = pte_offset_kernel(pmd, vaddr);
>> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>>   	return 0;
>>   err:
>>   	free_transition_pgtable(image);
>> @@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>   		.alloc_pgt_page	= alloc_pgt_page,
>>   		.context	= image,
>>   		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
>> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>>   	};
>>   	unsigned long mstart, mend;
>>   	pgd_t *level4p;
>> @@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
>>   {
>>   	kexec_mark_crashkres(false);
>>   }
>> +
>> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>> +{
>> +	int ret;
>> +
>> +	if (sme_active()) {
>> +		/*
>> +		 * If SME is active we need to be sure that kexec pages are
>> +		 * not encrypted because when we boot to the new kernel the
>> +		 * pages won't be accessed encrypted (initially).
>> +		 */
>> +		ret = set_memory_decrypted((unsigned long)vaddr, pages);
>> +		if (ret)
>> +			return ret;
>> +
>> +		if (gfp & __GFP_ZERO)
>> +			memset(vaddr, 0, pages * PAGE_SIZE);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>> +{
>> +	if (sme_active()) {
>> +		/*
>> +		 * If SME is active we need to reset the pages back to being
>> +		 * an encrypted mapping before freeing them.
>> +		 */
>> +		set_memory_encrypted((unsigned long)vaddr, pages);
>> +	}
>> +}
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 0bb8842..f4e5de6 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -24,6 +24,7 @@
>>   #include <linux/cpuidle.h>
>>   #include <trace/events/power.h>
>>   #include <linux/hw_breakpoint.h>
>> +#include <linux/kexec.h>
>>   #include <asm/cpu.h>
>>   #include <asm/apic.h>
>>   #include <asm/syscalls.h>
>> @@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
>>   	return ret;
>>   }
>>   #endif
>> +
>>   void stop_this_cpu(void *dummy)
>>   {
>> +	bool do_wbinvd_halt = false;
>> +
>> +	if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
>> +		/*
>> +		 * If we are performing a kexec and the processor supports
>> +		 * SME then we need to clear out cache information before
>> +		 * halting. With kexec, going from SME inactive to SME active
>> +		 * requires clearing cache entries so that addresses without
>> +		 * the encryption bit set don't corrupt the same physical
>> +		 * address that has the encryption bit set when caches are
>> +		 * flushed. Perform a wbinvd followed by a halt to achieve
>> +		 * this.
>> +		 */
>> +		do_wbinvd_halt = true;
>> +	}
>> +
>>   	local_irq_disable();
>>   	/*
>>   	 * Remove this CPU:
>> @@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
>>   	disable_local_APIC();
>>   	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>>   
>> -	for (;;)
>> -		halt();
>> +	for (;;) {
>> +		if (do_wbinvd_halt)
>> +			native_wbinvd_halt();
>> +		else
>> +			halt();
>> +	}
>>   }
>>   
>>   /*
>> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
>> index 04210a2..2c9fd3e 100644
>> --- a/arch/x86/mm/ident_map.c
>> +++ b/arch/x86/mm/ident_map.c
>> @@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
>>   static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>   			  unsigned long addr, unsigned long end)
>>   {
>> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>   	unsigned long next;
>>   
>>   	for (; addr < end; addr = next) {
>> @@ -39,7 +40,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>   		if (!pmd)
>>   			return -ENOMEM;
>>   		ident_pmd_init(info, pmd, addr, next);
>> -		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>> +		set_pud(pud, __pud(__pa(pmd) | kernpg_flag));
>>   	}
>>   
>>   	return 0;
>> @@ -48,6 +49,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>   static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>   			  unsigned long addr, unsigned long end)
>>   {
>> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>   	unsigned long next;
>>   
>>   	for (; addr < end; addr = next) {
>> @@ -67,7 +69,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>   		if (!pud)
>>   			return -ENOMEM;
>>   		ident_pud_init(info, pud, addr, next);
>> -		set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
>> +		set_p4d(p4d, __p4d(__pa(pud) | kernpg_flag));
>>   	}
>>   
>>   	return 0;
>> @@ -76,6 +78,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>   int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>>   			      unsigned long pstart, unsigned long pend)
>>   {
>> +	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>   	unsigned long addr = pstart + info->offset;
>>   	unsigned long end = pend + info->offset;
>>   	unsigned long next;
>> @@ -104,14 +107,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>>   		if (result)
>>   			return result;
>>   		if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
>> -			set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
>> +			set_pgd(pgd, __pgd(__pa(p4d) | kernpg_flag));
>>   		} else {
>>   			/*
>>   			 * With p4d folded, pgd is equal to p4d.
>>   			 * The pgd entry has to point to the pud page table in this case.
>>   			 */
>>   			pud_t *pud = pud_offset(p4d, 0);
>> -			set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
>> +			set_pgd(pgd, __pgd(__pa(pud) | kernpg_flag));
>>   		}
>>   	}
>>   
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index d419d0e..1c76e3b 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -383,6 +383,20 @@ static inline void *boot_phys_to_virt(unsigned long entry)
>>   	return phys_to_virt(boot_phys_to_phys(entry));
>>   }
>>   
>> +#ifndef arch_kexec_post_alloc_pages
>> +static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
>> +					      gfp_t gfp)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#ifndef arch_kexec_pre_free_pages
>> +static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>> +{
>> +}
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index bfe62d5..bb5e7e3 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -38,6 +38,7 @@
>>   #include <linux/syscore_ops.h>
>>   #include <linux/compiler.h>
>>   #include <linux/hugetlb.h>
>> +#include <linux/mem_encrypt.h>
>>   
>>   #include <asm/page.h>
>>   #include <asm/sections.h>
>> @@ -315,6 +316,9 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>>   		count = 1 << order;
>>   		for (i = 0; i < count; i++)
>>   			SetPageReserved(pages + i);
>> +
>> +		arch_kexec_post_alloc_pages(page_address(pages), count,
>> +					    gfp_mask);
>>   	}
>>   
>>   	return pages;
>> @@ -326,6 +330,9 @@ static void kimage_free_pages(struct page *page)
>>   
>>   	order = page_private(page);
>>   	count = 1 << order;
>> +
>> +	arch_kexec_pre_free_pages(page_address(page), count);
>> +
>>   	for (i = 0; i < count; i++)
>>   		ClearPageReserved(page + i);
>>   	__free_pages(page, order);
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-26 16:35         ` Borislav Petkov
@ 2017-05-30 17:47           ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 17:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Matt Fleming, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov,
	Ard Biesheuvel

On 5/26/2017 11:35 AM, Borislav Petkov wrote:
> On Fri, May 26, 2017 at 11:22:36AM -0500, Tom Lendacky wrote:
>> In addition to the same issue as efi.memmap.phys_map, efi_phys has
>> the __initdata attribute so it will be released/freed which will cause
>> problems in checks performed afterwards.
> 
> Sounds to me like we should drop the __initdata attr and prepare them
> much earlier for use by the SME code.

Probably something we can look at for a follow-on patch.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings
  2017-05-16 17:35   ` Borislav Petkov
@ 2017-05-30 20:07     ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-30 20:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/16/2017 12:35 PM, Borislav Petkov wrote:
> On Tue, Apr 18, 2017 at 04:20:56PM -0500, Tom Lendacky wrote:
>> Since video memory needs to be accessed decrypted, be sure that the
>> memory encryption mask is not set for the video ranges.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>   arch/x86/include/asm/vga.h       |   13 +++++++++++++
>>   arch/x86/mm/pageattr.c           |    2 ++
>>   drivers/gpu/drm/drm_gem.c        |    2 ++
>>   drivers/gpu/drm/drm_vm.c         |    4 ++++
>>   drivers/gpu/drm/ttm/ttm_bo_vm.c  |    7 +++++--
>>   drivers/gpu/drm/udl/udl_fb.c     |    4 ++++
>>   drivers/video/fbdev/core/fbmem.c |   12 ++++++++++++
>>   7 files changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
>> index c4b9dc2..5c7567a 100644
>> --- a/arch/x86/include/asm/vga.h
>> +++ b/arch/x86/include/asm/vga.h
>> @@ -7,12 +7,25 @@
>>   #ifndef _ASM_X86_VGA_H
>>   #define _ASM_X86_VGA_H
>>   
>> +#include <asm/cacheflush.h>
>> +
>>   /*
>>    *	On the PC, we can just recalculate addresses and then
>>    *	access the videoram directly without any black magic.
>> + *	To support memory encryption however, we need to access
>> + *	the videoram as decrypted memory.
>>    */
>>   
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +#define VGA_MAP_MEM(x, s)					\
>> +({								\
>> +	unsigned long start = (unsigned long)phys_to_virt(x);	\
>> +	set_memory_decrypted(start, (s) >> PAGE_SHIFT);		\
>> +	start;							\
>> +})
>> +#else
>>   #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
>> +#endif
> 
> Can we push the check in and save us the ifdeffery?
> 
> #define VGA_MAP_MEM(x, s)                                       \
> ({                                                              \
>          unsigned long start = (unsigned long)phys_to_virt(x);   \
>                                                                  \
>          if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))                 \
>                  set_memory_decrypted(start, (s) >> PAGE_SHIFT); \
>                                                                  \
>          start;                                                  \
> })
> 
> It does build here. :)
> 

That works for me and it's a lot cleaner.  I'll make the change.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-30 15:37         ` Tom Lendacky
@ 2017-05-31  8:49           ` Borislav Petkov
  2017-05-31 13:37             ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31  8:49 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 30, 2017 at 10:37:03AM -0500, Tom Lendacky wrote:
> I can define the command line option and the "on" and "off" values as
> character buffers in the function and initialize them on a per character
> basis (using a static string causes the same issues as referencing a
> string constant), i.e.:
> 
> char cmdline_arg[] = {'m', 'e', 'm', '_', 'e', 'n', 'c', 'r', 'y', 'p', 't', '\0'};
> char cmdline_off[] = {'o', 'f', 'f', '\0'};
> char cmdline_on[] = {'o', 'n', '\0'};
> 
> It doesn't look the greatest, but it works and removes the need for the
> rip-relative addressing.

Well, I'm not thrilled about this one either. It's like being between a
rock and a hard place. :-\

On the one hand, we need the encryption mask before we do the fixups and
OTOH we need to do the fixups in order to access the strings properly.
Yuck.

Well, the only thing I can think of right now is maybe define
"mem_encrypt=" at the end of head_64.S and pass it in from asm to
sme_enable() and then do the "on"/"off" comparsion with local char
buffers. That could make it less ugly...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-30 15:48         ` Tom Lendacky
@ 2017-05-31  9:15           ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31  9:15 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Josh Poimboeuf, linux-arch, linux-efi, kvm, linux-doc, x86,
	kexec, linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 30, 2017 at 10:48:27AM -0500, Tom Lendacky wrote:
> I'll look at doing that instead of removing the support for the whole
> file.

Right, so I don't think the stack protector is even ready that early -
we do set it up later:

        /* Set up %gs.
         *
         * The base of %gs always points to the bottom of the irqstack
         * union.  If the stack protector canary is enabled, it is
         * located at %gs:40.  Note that, on SMP, the boot cpu uses
         * init data section till per cpu areas are set up.
         */
        movl    $MSR_GS_BASE,%ecx
        movl    initial_gs(%rip),%eax
        movl    initial_gs+4(%rip),%edx
        wrmsr

so I think marking the function "no-stack-protector" is the only option
right now. We can always look at fixing that later.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-05-30 16:39         ` Tom Lendacky
@ 2017-05-31  9:51           ` Borislav Petkov
  2017-05-31 13:12             ` Tom Lendacky
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31  9:51 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 30, 2017 at 11:39:07AM -0500, Tom Lendacky wrote:
> Yes, it's from objtool:
> 
> arch/x86/mm/mem_encrypt_boot.o: warning: objtool: .text+0xd2: return
> instruction outside of a callable function

Oh, well, let's make it a global symbol then. Who knows, we might have
to live-patch it someday :-)

---
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
index fb58f9f953e3..7720b0050840 100644
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -47,9 +47,9 @@ ENTRY(sme_encrypt_execute)
 	movq	%rdx, %r12		/* Kernel length */
 
 	/* Copy encryption routine into the workarea */
-	movq	%rax, %rdi		/* Workarea encryption routine */
-	leaq	.Lenc_start(%rip), %rsi	/* Encryption routine */
-	movq	$(.Lenc_stop - .Lenc_start), %rcx	/* Encryption routine length */
+	movq	%rax, %rdi				/* Workarea encryption routine */
+	leaq	__enc_copy(%rip), %rsi			/* Encryption routine */
+	movq	$(.L__enc_copy_end - __enc_copy), %rcx	/* Encryption routine length */
 	rep	movsb
 
 	/* Setup registers for call */
@@ -70,8 +70,7 @@ ENTRY(sme_encrypt_execute)
 	ret
 ENDPROC(sme_encrypt_execute)
 
-.Lenc_start:
-ENTRY(sme_enc_routine)
+ENTRY(__enc_copy)
 /*
  * Routine used to encrypt kernel.
  *   This routine must be run outside of the kernel proper since
@@ -147,5 +146,5 @@ ENTRY(sme_enc_routine)
 	wrmsr
 
 	ret
-ENDPROC(sme_enc_routine)
-.Lenc_stop:
+.L__enc_copy_end:
+ENDPROC(__enc_copy)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-30 17:46     ` Tom Lendacky
@ 2017-05-31 10:01       ` Borislav Petkov
  2017-05-31 15:03       ` Xunlei Pang
  1 sibling, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31 10:01 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: xlpang, linux-arch, linux-efi, kvm, linux-doc, x86, kexec,
	linux-kernel, kasan-dev, linux-mm, iommu, Thomas Gleixner,
	Rik van Riel, Brijesh Singh, Toshimitsu Kani, Arnd Bergmann,
	Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Alexander Potapenko, Dave Young, Larry Woodman,
	Dmitry Vyukov

On Tue, May 30, 2017 at 12:46:14PM -0500, Tom Lendacky wrote:
> This is an area that I'm not familiar with, so I don't completely
> understand the flow in regards to where/when/how the ELF headers are
> copied and what needs to be done.

So my suggestion is still to put kexec/kdump on the backburner for now
and concentrate on the 30-ish patchset first. Once they're done, we can
start dealing with it. Ditto with the IOMMU side of things. One thing at
a time.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear
  2017-05-30 16:46             ` Tom Lendacky
@ 2017-05-31 11:31               ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31 11:31 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Tue, May 30, 2017 at 11:46:52AM -0500, Tom Lendacky wrote:
> Check if you have CONFIG_DEBUG_SECTION_MISMATCH=y

$ grep MISM .config
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_SECTION_MISMATCH_WARN_ONLY=y

Still no joy.

Can you give me your .config?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place
  2017-05-31  9:51           ` Borislav Petkov
@ 2017-05-31 13:12             ` Tom Lendacky
  0 siblings, 0 replies; 126+ messages in thread
From: Tom Lendacky @ 2017-05-31 13:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/31/2017 4:51 AM, Borislav Petkov wrote:
> On Tue, May 30, 2017 at 11:39:07AM -0500, Tom Lendacky wrote:
>> Yes, it's from objtool:
>>
>> arch/x86/mm/mem_encrypt_boot.o: warning: objtool: .text+0xd2: return
>> instruction outside of a callable function
> 
> Oh, well, let's make it a global symbol then. Who knows, we might have
> to live-patch it someday :-)

Can do.

Thanks,
Tom

> 
> ---
> diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
> index fb58f9f953e3..7720b0050840 100644
> --- a/arch/x86/mm/mem_encrypt_boot.S
> +++ b/arch/x86/mm/mem_encrypt_boot.S
> @@ -47,9 +47,9 @@ ENTRY(sme_encrypt_execute)
>   	movq	%rdx, %r12		/* Kernel length */
>   
>   	/* Copy encryption routine into the workarea */
> -	movq	%rax, %rdi		/* Workarea encryption routine */
> -	leaq	.Lenc_start(%rip), %rsi	/* Encryption routine */
> -	movq	$(.Lenc_stop - .Lenc_start), %rcx	/* Encryption routine length */
> +	movq	%rax, %rdi				/* Workarea encryption routine */
> +	leaq	__enc_copy(%rip), %rsi			/* Encryption routine */
> +	movq	$(.L__enc_copy_end - __enc_copy), %rcx	/* Encryption routine length */
>   	rep	movsb
>   
>   	/* Setup registers for call */
> @@ -70,8 +70,7 @@ ENTRY(sme_encrypt_execute)
>   	ret
>   ENDPROC(sme_encrypt_execute)
>   
> -.Lenc_start:
> -ENTRY(sme_enc_routine)
> +ENTRY(__enc_copy)
>   /*
>    * Routine used to encrypt kernel.
>    *   This routine must be run outside of the kernel proper since
> @@ -147,5 +146,5 @@ ENTRY(sme_enc_routine)
>   	wrmsr
>   
>   	ret
> -ENDPROC(sme_enc_routine)
> -.Lenc_stop:
> +.L__enc_copy_end:
> +ENDPROC(__enc_copy)
> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-31  8:49           ` Borislav Petkov
@ 2017-05-31 13:37             ` Tom Lendacky
  2017-05-31 14:12               ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Tom Lendacky @ 2017-05-31 13:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On 5/31/2017 3:49 AM, Borislav Petkov wrote:
> On Tue, May 30, 2017 at 10:37:03AM -0500, Tom Lendacky wrote:
>> I can define the command line option and the "on" and "off" values as
>> character buffers in the function and initialize them on a per character
>> basis (using a static string causes the same issues as referencing a
>> string constant), i.e.:
>>
>> char cmdline_arg[] = {'m', 'e', 'm', '_', 'e', 'n', 'c', 'r', 'y', 'p', 't', '\0'};
>> char cmdline_off[] = {'o', 'f', 'f', '\0'};
>> char cmdline_on[] = {'o', 'n', '\0'};
>>
>> It doesn't look the greatest, but it works and removes the need for the
>> rip-relative addressing.
> 
> Well, I'm not thrilled about this one either. It's like being between a
> rock and a hard place. :-\
> 
> On the one hand, we need the encryption mask before we do the fixups and
> OTOH we need to do the fixups in order to access the strings properly.
> Yuck.
> 
> Well, the only thing I can think of right now is maybe define
> "mem_encrypt=" at the end of head_64.S and pass it in from asm to
> sme_enable() and then do the "on"/"off" comparsion with local char
> buffers. That could make it less ugly...

I like keeping the command line option and the values together. It may
not look the greatest but I like it more than defining the command line
option in head_64.S and passing it in as an argument.

OTOH, I don't think the rip-relative addressing was that bad, I can
always go back to that...

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 32/32] x86/mm: Add support to make use of Secure Memory Encryption
  2017-05-31 13:37             ` Tom Lendacky
@ 2017-05-31 14:12               ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31 14:12 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Larry Woodman, Brijesh Singh, Ingo Molnar,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Alexander Potapenko, Dave Young, Thomas Gleixner, Dmitry Vyukov

On Wed, May 31, 2017 at 08:37:50AM -0500, Tom Lendacky wrote:
> I like keeping the command line option and the values together. It may
> not look the greatest but I like it more than defining the command line
> option in head_64.S and passing it in as an argument.
> 
> OTOH, I don't think the rip-relative addressing was that bad, I can
> always go back to that...

Yeah, no nice solution here. Having gone full circle, the rip-relative
thing doesn't look all that bad, all of a sudden. I'd let you decide
what to do...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-30 17:46     ` Tom Lendacky
  2017-05-31 10:01       ` Borislav Petkov
@ 2017-05-31 15:03       ` Xunlei Pang
  2017-05-31 15:48         ` Borislav Petkov
  1 sibling, 1 reply; 126+ messages in thread
From: Xunlei Pang @ 2017-05-31 15:03 UTC (permalink / raw)
  To: Tom Lendacky, xlpang, linux-arch, linux-efi, kvm, linux-doc, x86,
	kexec, linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Thomas Gleixner, Rik van Riel, Brijesh Singh, Toshimitsu Kani,
	Arnd Bergmann, Jonathan Corbet, Matt Fleming, Joerg Roedel,
	Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Paolo Bonzini, Alexander Potapenko, Dave Young,
	Larry Woodman, Dmitry Vyukov

On 05/31/2017 at 01:46 AM, Tom Lendacky wrote:
> On 5/25/2017 11:17 PM, Xunlei Pang wrote:
>> On 04/19/2017 at 05:21 AM, Tom Lendacky wrote:
>>> Provide support so that kexec can be used to boot a kernel when SME is
>>> enabled.
>>>
>>> Support is needed to allocate pages for kexec without encryption.  This
>>> is needed in order to be able to reboot in the kernel in the same manner
>>> as originally booted.
>>
>> Hi Tom,
>>
>> Looks like kdump will break, I didn't see the similar handling for kdump cases, see kernel:
>>      kimage_alloc_crash_control_pages(), kimage_load_crash_segment(), etc. >
>> We need to support kdump with SME, kdump kernel/initramfs/purgatory/elfcorehdr/etc
>> are all loaded into the reserved memory(see crashkernel=X) by userspace kexec-tools.
>> I think a straightforward way would be to mark the whole reserved memory range without
>> encryption before loading all the kexec segments for kdump, I guess we can handle this
>> easily in arch_kexec_unprotect_crashkres().
>
> Yes, that would work.
>
>>
>> Moreover, now that "elfcorehdr=X" is left as decrypted, it needs to be remapped to the
>> encrypted data.
>
> This is an area that I'm not familiar with, so I don't completely
> understand the flow in regards to where/when/how the ELF headers are
> copied and what needs to be done.
>
> Can you elaborate a bit on this?

"elfcorehdr" is generated by userspace kexec-tools(git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git), it's
actually ELF CORE header data(elf header, PT_LOAD/PT_NOTE program header), see kexec/crashdump-elf.c::FUNC().

For kdump case, it will be put in some reserved crash memory allocated by kexec-tools, and passed the corresponding
start address of the allocated reserved crash memory to kdump kernel via "elfcorehdr=", please see kernel functions
setup_elfcorehdr() and vmcore_init() for how it is parsed by kdump kernel.

Regards,
Xunlei

>>
>>>
>>> Additionally, when shutting down all of the CPUs we need to be sure to
>>> flush the caches and then halt. This is needed when booting from a state
>>> where SME was not active into a state where SME is active (or vice-versa).
>>> Without these steps, it is possible for cache lines to exist for the same
>>> physical location but tagged both with and without the encryption bit. This
>>> can cause random memory corruption when caches are flushed depending on
>>> which cacheline is written last.
>>>
>>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>>> ---
>>>   arch/x86/include/asm/init.h          |    1 +
>>>   arch/x86/include/asm/irqflags.h      |    5 +++++
>>>   arch/x86/include/asm/kexec.h         |    8 ++++++++
>>>   arch/x86/include/asm/pgtable_types.h |    1 +
>>>   arch/x86/kernel/machine_kexec_64.c   |   35 +++++++++++++++++++++++++++++++++-
>>>   arch/x86/kernel/process.c            |   26 +++++++++++++++++++++++--
>>>   arch/x86/mm/ident_map.c              |   11 +++++++----
>>>   include/linux/kexec.h                |   14 ++++++++++++++
>>>   kernel/kexec_core.c                  |    7 +++++++
>>>   9 files changed, 101 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>>> index 737da62..b2ec511 100644
>>> --- a/arch/x86/include/asm/init.h
>>> +++ b/arch/x86/include/asm/init.h
>>> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>>>       void *context;             /* context for alloc_pgt_page */
>>>       unsigned long pmd_flag;         /* page flag for PMD entry */
>>>       unsigned long offset;         /* ident mapping offset */
>>> +    unsigned long kernpg_flag;     /* kernel pagetable flag override */
>>>   };
>>>     int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>>> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
>>> index ac7692d..38b5920 100644
>>> --- a/arch/x86/include/asm/irqflags.h
>>> +++ b/arch/x86/include/asm/irqflags.h
>>> @@ -58,6 +58,11 @@ static inline __cpuidle void native_halt(void)
>>>       asm volatile("hlt": : :"memory");
>>>   }
>>>   +static inline __cpuidle void native_wbinvd_halt(void)
>>> +{
>>> +    asm volatile("wbinvd; hlt" : : : "memory");
>>> +}
>>> +
>>>   #endif
>>>     #ifdef CONFIG_PARAVIRT
>>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>>> index 70ef205..e8183ac 100644
>>> --- a/arch/x86/include/asm/kexec.h
>>> +++ b/arch/x86/include/asm/kexec.h
>>> @@ -207,6 +207,14 @@ struct kexec_entry64_regs {
>>>       uint64_t r15;
>>>       uint64_t rip;
>>>   };
>>> +
>>> +extern int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
>>> +                       gfp_t gfp);
>>> +#define arch_kexec_post_alloc_pages arch_kexec_post_alloc_pages
>>> +
>>> +extern void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages);
>>> +#define arch_kexec_pre_free_pages arch_kexec_pre_free_pages
>>> +
>>>   #endif
>>>     typedef void crash_vmclear_fn(void);
>>> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
>>> index ce8cb1c..0f326f4 100644
>>> --- a/arch/x86/include/asm/pgtable_types.h
>>> +++ b/arch/x86/include/asm/pgtable_types.h
>>> @@ -213,6 +213,7 @@ enum page_cache_mode {
>>>   #define PAGE_KERNEL        __pgprot(__PAGE_KERNEL | _PAGE_ENC)
>>>   #define PAGE_KERNEL_RO        __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>>>   #define PAGE_KERNEL_EXEC    __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
>>> +#define PAGE_KERNEL_EXEC_NOENC    __pgprot(__PAGE_KERNEL_EXEC)
>>>   #define PAGE_KERNEL_RX        __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>>>   #define PAGE_KERNEL_NOCACHE    __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>>>   #define PAGE_KERNEL_LARGE    __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
>>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>>> index 085c3b3..11c0ca9 100644
>>> --- a/arch/x86/kernel/machine_kexec_64.c
>>> +++ b/arch/x86/kernel/machine_kexec_64.c
>>> @@ -86,7 +86,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>>           set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>>       }
>>>       pte = pte_offset_kernel(pmd, vaddr);
>>> -    set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>>> +    set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>>>       return 0;
>>>   err:
>>>       free_transition_pgtable(image);
>>> @@ -114,6 +114,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>>           .alloc_pgt_page    = alloc_pgt_page,
>>>           .context    = image,
>>>           .pmd_flag    = __PAGE_KERNEL_LARGE_EXEC,
>>> +        .kernpg_flag    = _KERNPG_TABLE_NOENC,
>>>       };
>>>       unsigned long mstart, mend;
>>>       pgd_t *level4p;
>>> @@ -597,3 +598,35 @@ void arch_kexec_unprotect_crashkres(void)
>>>   {
>>>       kexec_mark_crashkres(false);
>>>   }
>>> +
>>> +int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (sme_active()) {
>>> +        /*
>>> +         * If SME is active we need to be sure that kexec pages are
>>> +         * not encrypted because when we boot to the new kernel the
>>> +         * pages won't be accessed encrypted (initially).
>>> +         */
>>> +        ret = set_memory_decrypted((unsigned long)vaddr, pages);
>>> +        if (ret)
>>> +            return ret;
>>> +
>>> +        if (gfp & __GFP_ZERO)
>>> +            memset(vaddr, 0, pages * PAGE_SIZE);
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>>> +{
>>> +    if (sme_active()) {
>>> +        /*
>>> +         * If SME is active we need to reset the pages back to being
>>> +         * an encrypted mapping before freeing them.
>>> +         */
>>> +        set_memory_encrypted((unsigned long)vaddr, pages);
>>> +    }
>>> +}
>>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>> index 0bb8842..f4e5de6 100644
>>> --- a/arch/x86/kernel/process.c
>>> +++ b/arch/x86/kernel/process.c
>>> @@ -24,6 +24,7 @@
>>>   #include <linux/cpuidle.h>
>>>   #include <trace/events/power.h>
>>>   #include <linux/hw_breakpoint.h>
>>> +#include <linux/kexec.h>
>>>   #include <asm/cpu.h>
>>>   #include <asm/apic.h>
>>>   #include <asm/syscalls.h>
>>> @@ -355,8 +356,25 @@ bool xen_set_default_idle(void)
>>>       return ret;
>>>   }
>>>   #endif
>>> +
>>>   void stop_this_cpu(void *dummy)
>>>   {
>>> +    bool do_wbinvd_halt = false;
>>> +
>>> +    if (kexec_in_progress && boot_cpu_has(X86_FEATURE_SME)) {
>>> +        /*
>>> +         * If we are performing a kexec and the processor supports
>>> +         * SME then we need to clear out cache information before
>>> +         * halting. With kexec, going from SME inactive to SME active
>>> +         * requires clearing cache entries so that addresses without
>>> +         * the encryption bit set don't corrupt the same physical
>>> +         * address that has the encryption bit set when caches are
>>> +         * flushed. Perform a wbinvd followed by a halt to achieve
>>> +         * this.
>>> +         */
>>> +        do_wbinvd_halt = true;
>>> +    }
>>> +
>>>       local_irq_disable();
>>>       /*
>>>        * Remove this CPU:
>>> @@ -365,8 +383,12 @@ void stop_this_cpu(void *dummy)
>>>       disable_local_APIC();
>>>       mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>>>   -    for (;;)
>>> -        halt();
>>> +    for (;;) {
>>> +        if (do_wbinvd_halt)
>>> +            native_wbinvd_halt();
>>> +        else
>>> +            halt();
>>> +    }
>>>   }
>>>     /*
>>> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
>>> index 04210a2..2c9fd3e 100644
>>> --- a/arch/x86/mm/ident_map.c
>>> +++ b/arch/x86/mm/ident_map.c
>>> @@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
>>>   static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>>                 unsigned long addr, unsigned long end)
>>>   {
>>> +    unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>>       unsigned long next;
>>>         for (; addr < end; addr = next) {
>>> @@ -39,7 +40,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>>           if (!pmd)
>>>               return -ENOMEM;
>>>           ident_pmd_init(info, pmd, addr, next);
>>> -        set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
>>> +        set_pud(pud, __pud(__pa(pmd) | kernpg_flag));
>>>       }
>>>         return 0;
>>> @@ -48,6 +49,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>>>   static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>>                 unsigned long addr, unsigned long end)
>>>   {
>>> +    unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>>       unsigned long next;
>>>         for (; addr < end; addr = next) {
>>> @@ -67,7 +69,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>>           if (!pud)
>>>               return -ENOMEM;
>>>           ident_pud_init(info, pud, addr, next);
>>> -        set_p4d(p4d, __p4d(__pa(pud) | _KERNPG_TABLE));
>>> +        set_p4d(p4d, __p4d(__pa(pud) | kernpg_flag));
>>>       }
>>>         return 0;
>>> @@ -76,6 +78,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
>>>   int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>>>                     unsigned long pstart, unsigned long pend)
>>>   {
>>> +    unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
>>>       unsigned long addr = pstart + info->offset;
>>>       unsigned long end = pend + info->offset;
>>>       unsigned long next;
>>> @@ -104,14 +107,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>>>           if (result)
>>>               return result;
>>>           if (IS_ENABLED(CONFIG_X86_5LEVEL)) {
>>> -            set_pgd(pgd, __pgd(__pa(p4d) | _KERNPG_TABLE));
>>> +            set_pgd(pgd, __pgd(__pa(p4d) | kernpg_flag));
>>>           } else {
>>>               /*
>>>                * With p4d folded, pgd is equal to p4d.
>>>                * The pgd entry has to point to the pud page table in this case.
>>>                */
>>>               pud_t *pud = pud_offset(p4d, 0);
>>> -            set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
>>> +            set_pgd(pgd, __pgd(__pa(pud) | kernpg_flag));
>>>           }
>>>       }
>>>   diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>> index d419d0e..1c76e3b 100644
>>> --- a/include/linux/kexec.h
>>> +++ b/include/linux/kexec.h
>>> @@ -383,6 +383,20 @@ static inline void *boot_phys_to_virt(unsigned long entry)
>>>       return phys_to_virt(boot_phys_to_phys(entry));
>>>   }
>>>   +#ifndef arch_kexec_post_alloc_pages
>>> +static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages,
>>> +                          gfp_t gfp)
>>> +{
>>> +    return 0;
>>> +}
>>> +#endif
>>> +
>>> +#ifndef arch_kexec_pre_free_pages
>>> +static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>   #else /* !CONFIG_KEXEC_CORE */
>>>   struct pt_regs;
>>>   struct task_struct;
>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>> index bfe62d5..bb5e7e3 100644
>>> --- a/kernel/kexec_core.c
>>> +++ b/kernel/kexec_core.c
>>> @@ -38,6 +38,7 @@
>>>   #include <linux/syscore_ops.h>
>>>   #include <linux/compiler.h>
>>>   #include <linux/hugetlb.h>
>>> +#include <linux/mem_encrypt.h>
>>>     #include <asm/page.h>
>>>   #include <asm/sections.h>
>>> @@ -315,6 +316,9 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>>>           count = 1 << order;
>>>           for (i = 0; i < count; i++)
>>>               SetPageReserved(pages + i);
>>> +
>>> +        arch_kexec_post_alloc_pages(page_address(pages), count,
>>> +                        gfp_mask);
>>>       }
>>>         return pages;
>>> @@ -326,6 +330,9 @@ static void kimage_free_pages(struct page *page)
>>>         order = page_private(page);
>>>       count = 1 << order;
>>> +
>>> +    arch_kexec_pre_free_pages(page_address(page), count);
>>> +
>>>       for (i = 0; i < count; i++)
>>>           ClearPageReserved(page + i);
>>>       __free_pages(page, order);
>>>
>>>
>>> _______________________________________________
>>> kexec mailing list
>>> kexec@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/kexec
>>

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME
  2017-05-31 15:03       ` Xunlei Pang
@ 2017-05-31 15:48         ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2017-05-31 15:48 UTC (permalink / raw)
  To: xlpang, Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, kexec, linux-kernel,
	kasan-dev, linux-mm, iommu, Thomas Gleixner, Rik van Riel,
	Brijesh Singh, Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Matt Fleming, Joerg Roedel, Radim Krčmář,
	Konrad Rzeszutek Wilk, Andrey Ryabinin, Ingo Molnar,
	Michael S. Tsirkin, Andy Lutomirski, H. Peter Anvin,
	Paolo Bonzini, Alexander Potapenko, Dave Young, Larry Woodman,
	Dmitry Vyukov

On Wed, May 31, 2017 at 11:03:52PM +0800, Xunlei Pang wrote:
> For kdump case, it will be put in some reserved crash memory allocated
> by kexec-tools, and passed the corresponding start address of the
> allocated reserved crash memory to kdump kernel via "elfcorehdr=",
> please see kernel functions setup_elfcorehdr() and vmcore_init() for
> how it is parsed by kdump kernel.

... which could be a great way to pass the SME status to the second
kernel without any funky sysfs games.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2017-05-31 15:49 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 21:16 [PATCH v5 00/32] x86: Secure Memory Encryption (AMD) Tom Lendacky
2017-04-18 21:16 ` [PATCH v5 01/32] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
2017-04-19  9:02   ` Borislav Petkov
2017-04-19 14:23     ` Tom Lendacky
2017-04-19 15:38       ` Borislav Petkov
2017-04-19  9:52   ` David Howells
2017-04-18 21:16 ` [PATCH v5 02/32] x86/mm/pat: Set write-protect cache mode for full PAT support Tom Lendacky
2017-04-18 21:16 ` [PATCH v5 03/32] x86, mpparse, x86/acpi, x86/PCI, SFI: Use memremap for RAM mappings Tom Lendacky
2017-04-18 21:17 ` [PATCH v5 04/32] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature Tom Lendacky
2017-04-18 21:17 ` [PATCH v5 05/32] x86/CPU/AMD: Handle SME reduction in physical address size Tom Lendacky
2017-04-20 16:59   ` Borislav Petkov
2017-04-20 17:29     ` Tom Lendacky
2017-04-20 18:52       ` Borislav Petkov
2017-04-18 21:17 ` [PATCH v5 06/32] x86/mm: Add Secure Memory Encryption (SME) support Tom Lendacky
2017-04-27 15:46   ` Borislav Petkov
2017-05-04 14:24     ` Tom Lendacky
2017-05-04 14:36       ` Borislav Petkov
2017-05-16 19:28         ` Tom Lendacky
2017-05-17  7:05           ` Borislav Petkov
2017-04-18 21:17 ` [PATCH v5 07/32] x86/mm: Add support to enable SME in early boot processing Tom Lendacky
2017-04-21 14:55   ` Borislav Petkov
2017-04-21 21:40     ` Tom Lendacky
2017-04-18 21:17 ` [PATCH v5 08/32] x86/mm: Simplify p[g4um]d_page() macros Tom Lendacky
2017-04-18 21:17 ` [PATCH v5 09/32] x86/mm: Provide general kernel support for memory encryption Tom Lendacky
2017-04-21 21:52   ` Dave Hansen
2017-04-24 15:53     ` Tom Lendacky
2017-04-24 15:57       ` Dave Hansen
2017-04-24 16:10         ` Tom Lendacky
2017-04-27 16:12   ` Borislav Petkov
2017-05-04 14:34     ` Tom Lendacky
2017-05-04 17:01       ` Borislav Petkov
2017-04-18 21:18 ` [PATCH v5 10/32] x86/mm: Extend early_memremap() support with additional attrs Tom Lendacky
2017-04-18 21:18 ` [PATCH v5 11/32] x86/mm: Add support for early encrypt/decrypt of memory Tom Lendacky
2017-04-18 21:18 ` [PATCH v5 12/32] x86/mm: Insure that boot memory areas are mapped properly Tom Lendacky
2017-05-04 10:16   ` Borislav Petkov
2017-05-04 14:39     ` Tom Lendacky
2017-04-18 21:18 ` [PATCH v5 13/32] x86/boot/e820: Add support to determine the E820 type of an address Tom Lendacky
2017-05-05 17:11   ` Borislav Petkov
2017-05-06  7:48     ` Ard Biesheuvel
2017-04-18 21:18 ` [PATCH v5 14/32] efi: Add an EFI table address match function Tom Lendacky
2017-05-15 18:09   ` Borislav Petkov
2017-05-16 21:53     ` Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 15/32] efi: Update efi_mem_type() to return an error rather than 0 Tom Lendacky
2017-05-07 17:18   ` Borislav Petkov
2017-05-08 13:20     ` Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 16/32] x86/efi: Update EFI pagetable creation to work with SME Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 17/32] x86/mm: Add support to access boot related data in the clear Tom Lendacky
2017-05-15 18:35   ` Borislav Petkov
2017-05-17 18:54     ` Tom Lendacky
2017-05-18  9:02       ` Borislav Petkov
2017-05-19 20:50         ` Tom Lendacky
2017-05-21  7:16           ` Borislav Petkov
2017-05-30 16:46             ` Tom Lendacky
2017-05-31 11:31               ` Borislav Petkov
2017-05-18 19:50     ` Matt Fleming
2017-05-26 16:22       ` Tom Lendacky
2017-05-26 16:35         ` Borislav Petkov
2017-05-30 17:47           ` Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 18/32] x86, mpparse: Use memremap to map the mpf and mpc data Tom Lendacky
2017-05-16  8:36   ` Borislav Petkov
2017-05-17 20:26     ` Tom Lendacky
2017-05-18  9:03       ` Borislav Petkov
2017-04-18 21:19 ` [PATCH v5 19/32] x86/mm: Add support to access persistent memory in the clear Tom Lendacky
2017-05-16 14:04   ` Borislav Petkov
2017-05-19 19:52     ` Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 20/32] x86/mm: Add support for changing the memory encryption attribute Tom Lendacky
2017-04-18 21:19 ` [PATCH v5 21/32] x86, realmode: Decrypt trampoline area if memory encryption is active Tom Lendacky
2017-04-18 21:20 ` [PATCH v5 22/32] x86, swiotlb: DMA support for memory encryption Tom Lendacky
2017-05-16 14:27   ` Borislav Petkov
2017-05-19 19:54     ` Tom Lendacky
2017-04-18 21:20 ` [PATCH v5 23/32] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
2017-05-16 14:52   ` Borislav Petkov
2017-05-19 19:55     ` Tom Lendacky
2017-04-18 21:20 ` [PATCH v5 24/32] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
2017-04-18 21:20 ` [PATCH v5 25/32] x86, realmode: Check for memory encryption on the APs Tom Lendacky
2017-04-18 21:20 ` [PATCH v5 26/32] x86, drm, fbdev: Do not specify encrypted memory for video mappings Tom Lendacky
2017-05-16 17:35   ` Borislav Petkov
2017-05-30 20:07     ` Tom Lendacky
2017-04-18 21:21 ` [PATCH v5 27/32] kvm: x86: svm: Enable Secure Memory Encryption within KVM Tom Lendacky
2017-04-18 21:21 ` [PATCH v5 28/32] x86/mm, kexec: Allow kexec to be used with SME Tom Lendacky
2017-05-17 19:17   ` Borislav Petkov
2017-05-19 20:45     ` Tom Lendacky
2017-05-19 20:58       ` Borislav Petkov
2017-05-19 21:07         ` Tom Lendacky
2017-05-19 21:28           ` Borislav Petkov
2017-05-19 21:38             ` Tom Lendacky
2017-05-26  4:17   ` Xunlei Pang
2017-05-27  2:17     ` Dave Young
2017-05-30 17:46     ` Tom Lendacky
2017-05-31 10:01       ` Borislav Petkov
2017-05-31 15:03       ` Xunlei Pang
2017-05-31 15:48         ` Borislav Petkov
2017-04-18 21:21 ` [PATCH v5 29/32] x86/mm: Add support to encrypt the kernel in-place Tom Lendacky
2017-05-18 12:46   ` Borislav Petkov
2017-05-25 22:24     ` Tom Lendacky
2017-05-26 16:25       ` Borislav Petkov
2017-05-30 16:39         ` Tom Lendacky
2017-05-31  9:51           ` Borislav Petkov
2017-05-31 13:12             ` Tom Lendacky
2017-04-18 21:22 ` [PATCH v5 30/32] x86/boot: Add early cmdline parsing for options with arguments Tom Lendacky
2017-04-18 21:22 ` [PATCH v5 31/32] x86: Add sysfs support for Secure Memory Encryption Tom Lendacky
2017-04-21 21:55   ` Dave Hansen
2017-04-27  7:25     ` Dave Young
2017-04-27 15:52       ` Dave Hansen
2017-04-28  5:32         ` Dave Young
2017-05-04 14:17         ` Tom Lendacky
2017-05-04 14:13       ` Tom Lendacky
2017-05-18 17:01   ` Borislav Petkov
2017-05-26  2:49     ` Dave Young
2017-05-26  5:04       ` Xunlei Pang
2017-05-26 15:47         ` Tom Lendacky
2017-04-18 21:22 ` [PATCH v5 32/32] x86/mm: Add support to make use of " Tom Lendacky
2017-04-21 18:56   ` Tom Lendacky
2017-05-19 11:30     ` Borislav Petkov
2017-05-19 20:16       ` Josh Poimboeuf
2017-05-19 20:29         ` Borislav Petkov
2017-05-30 15:48         ` Tom Lendacky
2017-05-31  9:15           ` Borislav Petkov
2017-05-30 15:46       ` Tom Lendacky
2017-05-19 11:27   ` Borislav Petkov
2017-05-30 14:38     ` Tom Lendacky
2017-05-30 14:55       ` Borislav Petkov
2017-05-30 15:37         ` Tom Lendacky
2017-05-31  8:49           ` Borislav Petkov
2017-05-31 13:37             ` Tom Lendacky
2017-05-31 14:12               ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).