linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
@ 2017-02-16 15:41 Tom Lendacky
  2017-02-16 15:42 ` [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
                   ` (29 more replies)
  0 siblings, 30 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:41 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This RFC patch series provides support for AMD's new Secure Memory
Encryption (SME) feature.

SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.

The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.

This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.

The following links provide additional detail:

AMD Memory Encryption whitepaper:
   http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
   http://support.amd.com/TechDocs/24593.pdf
   SME is section 7.10
   SEV is section 15.34

This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")

---

Still to do: IOMMU enablement support

Changes since v3:
- Broke out some of the patches into smaller individual patches
- Updated Documentation
- Added a message to indicate why the IOMMU was disabled
- Updated CPU feature support for SME by taking into account whether
  BIOS has enabled SME
- Eliminated redundant functions
- Added some warning messages for DMA usage of bounce buffers when SME
  is active
- Added support for persistent memory
- Added support to determine when setup data is being mapped and be sure
  to map it un-encrypted
- Added CONFIG support to set the default action of whether to activate
  SME if it is supported/enabled
- Added support for (re)booting with kexec

Changes since v2:
- Updated Documentation
- Make the encryption mask available outside of arch/x86 through a
  standard include file
- Conversion of assembler routines to C where possible (not everything
  could be converted, e.g. the routine that does the actual encryption
  needs to be copied into a safe location and it is difficult to
  determine the actual length of the function in order to copy it)
- Fix SME feature use of scattered CPUID feature
- Creation of SME specific functions for things like encrypting
  the setup data, ramdisk, etc.
- New take on early_memremap / memremap encryption support
- Additional support for accessing video buffers (fbdev/gpu) as
  un-encrypted
- Disable IOMMU for now - need to investigate further in relation to
  how it needs to be programmed relative to accessing physical memory

Changes since v1:
- Added Documentation.
- Removed AMD vendor check for setting the PAT write protect mode
- Updated naming of trampoline flag for SME as well as moving of the
  SME check to before paging is enabled.
- Change to early_memremap to identify the data being mapped as either
  boot data or kernel data.  The idea being that boot data will have
  been placed in memory as un-encrypted data and would need to be accessed
  as such.
- Updated debugfs support for the bootparams to access the data properly.
- Do not set the SYSCFG[MEME] bit, only check it.  The setting of the
  MemEncryptionModeEn bit results in a reduction of physical address size
  of the processor.  It is possible that BIOS could have configured resources
  resources into a range that will now not be addressable.  To prevent this,
  rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
  encryption support in the kernel.

Tom Lendacky (28):
      x86: Documentation for AMD Secure Memory Encryption (SME)
      x86: Set the write-protect cache mode for full PAT support
      x86: Add the Secure Memory Encryption CPU feature
      x86: Handle reduction in physical address size with SME
      x86: Add Secure Memory Encryption (SME) support
      x86: Add support to enable SME during early boot processing
      x86: Provide general kernel support for memory encryption
      x86: Extend the early_memremap support with additional attrs
      x86: Add support for early encryption/decryption of memory
      x86: Insure that boot memory areas are mapped properly
      x86: Add support to determine the E820 type of an address
      efi: Add an EFI table address match function
      efi: Update efi_mem_type() to return defined EFI mem types
      Add support to access boot related data in the clear
      Add support to access persistent memory in the clear
      x86: Add support for changing memory encryption attribute
      x86: Decrypt trampoline area if memory encryption is active
      x86: DMA support for memory encryption
      swiotlb: Add warnings for use of bounce buffers with SME
      iommu/amd: Disable AMD IOMMU if memory encryption is active
      x86: Check for memory encryption on the APs
      x86: Do not specify encrypted memory for video mappings
      x86/kvm: Enable Secure Memory Encryption of nested page tables
      x86: Access the setup data through debugfs decrypted
      x86: Access the setup data through sysfs decrypted
      x86: Allow kexec to be used with SME
      x86: Add support to encrypt the kernel in-place
      x86: Add support to make use of Secure Memory Encryption


 Documentation/admin-guide/kernel-parameters.txt |   11 +
 Documentation/x86/amd-memory-encryption.txt     |   57 ++++
 arch/x86/Kconfig                                |   26 ++
 arch/x86/boot/compressed/pagetable.c            |    7 +
 arch/x86/include/asm/cacheflush.h               |    5 
 arch/x86/include/asm/cpufeature.h               |    7 -
 arch/x86/include/asm/cpufeatures.h              |    5 
 arch/x86/include/asm/disabled-features.h        |    3 
 arch/x86/include/asm/dma-mapping.h              |    5 
 arch/x86/include/asm/e820/api.h                 |    2 
 arch/x86/include/asm/e820/types.h               |    2 
 arch/x86/include/asm/fixmap.h                   |   20 +
 arch/x86/include/asm/init.h                     |    1 
 arch/x86/include/asm/io.h                       |    3 
 arch/x86/include/asm/kvm_host.h                 |    3 
 arch/x86/include/asm/mem_encrypt.h              |  108 ++++++++
 arch/x86/include/asm/msr-index.h                |    2 
 arch/x86/include/asm/page.h                     |    4 
 arch/x86/include/asm/pgtable.h                  |   26 +-
 arch/x86/include/asm/pgtable_types.h            |   54 +++-
 arch/x86/include/asm/processor.h                |    3 
 arch/x86/include/asm/realmode.h                 |   12 +
 arch/x86/include/asm/required-features.h        |    3 
 arch/x86/include/asm/setup.h                    |    8 +
 arch/x86/include/asm/vga.h                      |   13 +
 arch/x86/kernel/Makefile                        |    3 
 arch/x86/kernel/cpu/common.c                    |   23 ++
 arch/x86/kernel/e820.c                          |   26 ++
 arch/x86/kernel/espfix_64.c                     |    2 
 arch/x86/kernel/head64.c                        |   46 +++
 arch/x86/kernel/head_64.S                       |   65 ++++-
 arch/x86/kernel/kdebugfs.c                      |   30 +-
 arch/x86/kernel/ksysfs.c                        |   27 +-
 arch/x86/kernel/machine_kexec_64.c              |    3 
 arch/x86/kernel/mem_encrypt_boot.S              |  156 ++++++++++++
 arch/x86/kernel/mem_encrypt_init.c              |  310 +++++++++++++++++++++++
 arch/x86/kernel/pci-dma.c                       |   11 +
 arch/x86/kernel/pci-nommu.c                     |    2 
 arch/x86/kernel/pci-swiotlb.c                   |    8 -
 arch/x86/kernel/process.c                       |   43 +++
 arch/x86/kernel/setup.c                         |   43 +++
 arch/x86/kernel/smp.c                           |    4 
 arch/x86/kvm/mmu.c                              |    8 -
 arch/x86/kvm/vmx.c                              |    3 
 arch/x86/kvm/x86.c                              |    3 
 arch/x86/mm/Makefile                            |    1 
 arch/x86/mm/ident_map.c                         |    6 
 arch/x86/mm/ioremap.c                           |  157 ++++++++++++
 arch/x86/mm/kasan_init_64.c                     |    4 
 arch/x86/mm/mem_encrypt.c                       |  218 ++++++++++++++++
 arch/x86/mm/pageattr.c                          |   71 +++++
 arch/x86/mm/pat.c                               |    6 
 arch/x86/platform/efi/efi.c                     |    4 
 arch/x86/platform/efi/efi_64.c                  |   16 +
 arch/x86/realmode/init.c                        |   16 +
 arch/x86/realmode/rm/trampoline_64.S            |   17 +
 drivers/firmware/efi/efi.c                      |   33 ++
 drivers/gpu/drm/drm_gem.c                       |    2 
 drivers/gpu/drm/drm_vm.c                        |    4 
 drivers/gpu/drm/ttm/ttm_bo_vm.c                 |    7 -
 drivers/gpu/drm/udl/udl_fb.c                    |    4 
 drivers/iommu/amd_iommu_init.c                  |    7 +
 drivers/video/fbdev/core/fbmem.c                |   12 +
 include/asm-generic/early_ioremap.h             |    2 
 include/asm-generic/pgtable.h                   |    8 +
 include/linux/dma-mapping.h                     |   11 +
 include/linux/efi.h                             |    7 +
 include/linux/mem_encrypt.h                     |   53 ++++
 include/linux/swiotlb.h                         |    1 
 init/main.c                                     |   13 +
 kernel/kexec_core.c                             |   24 ++
 kernel/memremap.c                               |   11 +
 lib/swiotlb.c                                   |   59 ++++
 mm/early_ioremap.c                              |   28 ++
 74 files changed, 1880 insertions(+), 128 deletions(-)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
 create mode 100644 arch/x86/kernel/mem_encrypt_init.c
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 include/linux/mem_encrypt.h

-- 
Tom Lendacky

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
@ 2017-02-16 15:42 ` Tom Lendacky
  2017-02-16 17:56   ` Borislav Petkov
  2017-02-16 15:42 ` [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support Tom Lendacky
                   ` (28 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:42 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds a Documenation entry to decribe the AMD Secure Memory
Encryption (SME) feature.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   11 ++++
 Documentation/x86/amd-memory-encryption.txt     |   57 +++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 Documentation/x86/amd-memory-encryption.txt

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 110745e..91c40fa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2145,6 +2145,17 @@
 			memory contents and reserves bad memory
 			regions that are detected.
 
+	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
+			Valid arguments: on, off
+			Default (depends on kernel configuration option):
+			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
+			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
+			mem_encrypt=on:		Activate SME
+			mem_encrypt=off:	Do not activate SME
+
+			Refer to the SME documentation for details on when
+			memory encryption can be activated.
+
 	mem_sleep_default=	[SUSPEND] Default system suspend mode:
 			s2idle  - Suspend-To-Idle
 			shallow - Power-On Suspend or equivalent (if supported)
diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..0938e89
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,57 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables.  A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM.  SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below how to determine the position of the bit).  The encryption bit can be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x8000001f reports information related to SME:
+
+	0x8000001f[eax]:
+		Bit[0] indicates support for SME
+	0x8000001f[ebx]:
+		Bit[5:0]  pagetable bit number used to activate memory
+			  encryption
+		Bit[11:6] reduction in physical address space, in bits, when
+			  memory encryption is enabled (this only affects system
+			  physical addresses, not guest physical addresses)
+
+If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+	0xc0010010:
+		Bit[23]   0 = memory encryption features are disabled
+			  1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system.  If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+The state of SME in the Linux kernel can be documented as follows:
+	- Supported:
+	  The CPU supports SME (determined through CPUID instruction).
+
+	- Enabled:
+	  Supported and bit 23 of the SYS_CFG MSR is set.
+
+	- Active:
+	  Supported, Enabled and the Linux kernel is actively applying
+	  the encryption bit to page table entries (the SME mask in the
+	  kernel is non-zero).
+
+SME can also be enabled and activated in the BIOS. If SME is enabled and
+activated in the BIOS, then all memory accesses will be encrypted and it will
+not be necessary to activate the Linux memory encryption support.  If the BIOS
+merely enables SME (sets bit 23 of the SYS_CFG MSR), then Linux can activate
+memory encryption.  However, if BIOS does not enable SME, then Linux will not
+attempt to activate memory encryption, even if configured to do so by default
+or the mem_encrypt=on command line parameter is specified.

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
  2017-02-16 15:42 ` [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
@ 2017-02-16 15:42 ` Tom Lendacky
  2017-02-17 11:07   ` Borislav Petkov
  2017-02-16 15:42 ` [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature Tom Lendacky
                   ` (27 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:42 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/mm/pat.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 9b78685..6753d9c 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -295,7 +295,7 @@ static void init_cache_modes(void)
  * pat_init - Initialize PAT MSR and PAT table
  *
  * This function initializes PAT MSR and PAT table with an OS-defined value
- * to enable additional cache attributes, WC and WT.
+ * to enable additional cache attributes, WC, WT and WP.
  *
  * This function must be called on all CPUs using the specific sequence of
  * operations defined in Intel SDM. mtrr_rendezvous_handler() provides this
@@ -356,7 +356,7 @@ void pat_init(void)
 		 *      010    2    UC-: _PAGE_CACHE_MODE_UC_MINUS
 		 *      011    3    UC : _PAGE_CACHE_MODE_UC
 		 *      100    4    WB : Reserved
-		 *      101    5    WC : Reserved
+		 *      101    5    WP : _PAGE_CACHE_MODE_WP
 		 *      110    6    UC-: Reserved
 		 *      111    7    WT : _PAGE_CACHE_MODE_WT
 		 *
@@ -364,7 +364,7 @@ void pat_init(void)
 		 * corresponding types in the presence of PAT errata.
 		 */
 		pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
-		      PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+		      PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
 	}
 
 	if (!boot_cpu_done) {

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
  2017-02-16 15:42 ` [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
  2017-02-16 15:42 ` [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support Tom Lendacky
@ 2017-02-16 15:42 ` Tom Lendacky
  2017-02-16 18:13   ` Borislav Petkov
  2017-02-16 15:42 ` [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME Tom Lendacky
                   ` (26 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:42 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Update the CPU features to include identifying and reporting on the
Secure Memory Encryption (SME) feature.  SME is identified by CPUID
0x8000001f, but requires BIOS support to enable it (set bit 23 of
SYS_CFG MSR).  Only show the SME feature as available if reported by
CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cpufeature.h        |    7 +++++--
 arch/x86/include/asm/cpufeatures.h       |    5 ++++-
 arch/x86/include/asm/disabled-features.h |    3 ++-
 arch/x86/include/asm/msr-index.h         |    2 ++
 arch/x86/include/asm/required-features.h |    3 ++-
 arch/x86/kernel/cpu/common.c             |   19 +++++++++++++++++++
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index d59c15c..ea2de6a 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -28,6 +28,7 @@ enum cpuid_leafs
 	CPUID_8000_000A_EDX,
 	CPUID_7_ECX,
 	CPUID_8000_0007_EBX,
+	CPUID_8000_001F_EAX,
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
@@ -78,8 +79,9 @@ enum cpuid_leafs
 	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) ||	\
+	   CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) ||	\
 	   REQUIRED_MASK_CHECK					  ||	\
-	   BUILD_BUG_ON_ZERO(NCAPINTS != 18))
+	   BUILD_BUG_ON_ZERO(NCAPINTS != 19))
 
 #define DISABLED_MASK_BIT_SET(feature_bit)				\
 	 ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK,  0, feature_bit) ||	\
@@ -100,8 +102,9 @@ enum cpuid_leafs
 	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) ||	\
 	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) ||	\
+	   CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) ||	\
 	   DISABLED_MASK_CHECK					  ||	\
-	   BUILD_BUG_ON_ZERO(NCAPINTS != 18))
+	   BUILD_BUG_ON_ZERO(NCAPINTS != 19))
 
 #define cpu_has(c, bit)							\
 	(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :	\
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d45ab4b..331fb81 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -12,7 +12,7 @@
 /*
  * Defines x86 CPU feature bits
  */
-#define NCAPINTS	18	/* N 32-bit words worth of info */
+#define NCAPINTS	19	/* N 32-bit words worth of info */
 #define NBUGINTS	1	/* N 32-bit bug flags */
 
 /*
@@ -296,6 +296,9 @@
 #define X86_FEATURE_SUCCOR	(17*32+1) /* Uncorrectable error containment and recovery */
 #define X86_FEATURE_SMCA	(17*32+3) /* Scalable MCA */
 
+/* AMD-defined CPU features, CPUID level 0x8000001f (eax), word 18 */
+#define X86_FEATURE_SME		(18*32+0) /* Secure Memory Encryption */
+
 /*
  * BUG word(s)
  */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 85599ad..8b45e08 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -57,6 +57,7 @@
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE)
 #define DISABLED_MASK17	0
-#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
+#define DISABLED_MASK18	0
+#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
 
 #endif /* _ASM_X86_DISABLED_FEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 00293a9..e2d0503 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -339,6 +339,8 @@
 #define MSR_K8_TOP_MEM1			0xc001001a
 #define MSR_K8_TOP_MEM2			0xc001001d
 #define MSR_K8_SYSCFG			0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT	23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT	BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
 #define MSR_K8_INT_PENDING_MSG		0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h
index fac9a5c..6847d85 100644
--- a/arch/x86/include/asm/required-features.h
+++ b/arch/x86/include/asm/required-features.h
@@ -100,6 +100,7 @@
 #define REQUIRED_MASK15	0
 #define REQUIRED_MASK16	0
 #define REQUIRED_MASK17	0
-#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18)
+#define REQUIRED_MASK18	0
+#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
 
 #endif /* _ASM_X86_REQUIRED_FEATURES_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c188ae5..b33bc06 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -763,6 +763,25 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 	if (c->extended_cpuid_level >= 0x8000000a)
 		c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
 
+	if (c->extended_cpuid_level >= 0x8000001f) {
+		cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+
+		/* SME feature support */
+		if ((c->x86_vendor == X86_VENDOR_AMD) && (eax & 0x01)) {
+			u64 msr;
+
+			/*
+			 * For SME, BIOS support is required. If BIOS has not
+			 * enabled SME don't advertise the feature.
+			 */
+			rdmsrl(MSR_K8_SYSCFG, msr);
+			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+				eax &= ~0x01;
+		}
+
+		c->x86_capability[CPUID_8000_001F_EAX] = eax;
+	}
+
 	init_scattered_cpuid_features(c);
 
 	/*

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (2 preceding siblings ...)
  2017-02-16 15:42 ` [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature Tom Lendacky
@ 2017-02-16 15:42 ` Tom Lendacky
  2017-02-17 11:04   ` Borislav Petkov
  2017-02-16 15:43 ` [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support Tom Lendacky
                   ` (25 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:42 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/cpu/common.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b33bc06..358208d7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -771,11 +771,15 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 			u64 msr;
 
 			/*
-			 * For SME, BIOS support is required. If BIOS has not
-			 * enabled SME don't advertise the feature.
+			 * For SME, BIOS support is required. If BIOS has
+			 * enabled SME adjust x86_phys_bits by the SME
+			 * physical address space reduction value. If BIOS
+			 * has not enabled SME don't advertise the feature.
 			 */
 			rdmsrl(MSR_K8_SYSCFG, msr);
-			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+			if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
+				c->x86_phys_bits -= (ebx >> 6) & 0x3f;
+			else
 				eax &= ~0x01;
 		}
 

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (3 preceding siblings ...)
  2017-02-16 15:42 ` [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME Tom Lendacky
@ 2017-02-16 15:43 ` Tom Lendacky
  2017-02-17 12:00   ` Borislav Petkov
  2017-02-25 15:29   ` Borislav Petkov
  2017-02-16 15:43 ` [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing Tom Lendacky
                   ` (24 subsequent siblings)
  29 siblings, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:43 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/Kconfig                   |   22 +++++++++++++++++++
 arch/x86/include/asm/mem_encrypt.h |   42 ++++++++++++++++++++++++++++++++++++
 arch/x86/mm/Makefile               |    1 +
 arch/x86/mm/mem_encrypt.c          |   21 ++++++++++++++++++
 include/linux/mem_encrypt.h        |   37 ++++++++++++++++++++++++++++++++
 5 files changed, 123 insertions(+)
 create mode 100644 arch/x86/include/asm/mem_encrypt.h
 create mode 100644 arch/x86/mm/mem_encrypt.c
 create mode 100644 include/linux/mem_encrypt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f8fbfc5..a3b8c71 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1395,6 +1395,28 @@ config X86_DIRECT_GBPAGES
 	  supports them), so don't confuse the user by printing
 	  that we have them enabled.
 
+config AMD_MEM_ENCRYPT
+	bool "AMD Secure Memory Encryption (SME) support"
+	depends on X86_64 && CPU_SUP_AMD
+	---help---
+	  Say yes to enable support for the encryption of system memory.
+	  This requires an AMD processor that supports Secure Memory
+	  Encryption (SME).
+
+config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
+	bool "Activate AMD Secure Memory Encryption (SME) by default"
+	default y
+	depends on AMD_MEM_ENCRYPT
+	---help---
+	  Say yes to have system memory encrypted by default if running on
+	  an AMD processor that supports Secure Memory Encryption (SME).
+
+	  If set to Y, then the encryption of system memory can be
+	  deactivated with the mem_encrypt=off command line option.
+
+	  If set to N, then the encryption of system memory can be
+	  activated with the mem_encrypt=on command line option.
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 0000000..ccc53b0
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,42 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+static inline bool sme_active(void)
+{
+	return (sme_me_mask) ? true : false;
+}
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+#ifndef sme_me_mask
+#define sme_me_mask	0UL
+
+static inline bool sme_active(void)
+{
+	return false;
+}
+#endif
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif	/* __ASSEMBLY__ */
+
+#endif	/* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b84..44d4d21 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX)	+= mpx.o
 obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
 obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
 
+obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 0000000..b99d469
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+/*
+ * Since SME related variables are set early in the boot process they must
+ * reside in the .data section so as not to be zeroed out when the .bss
+ * section is later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 0000000..14a7b9f
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,37 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+#include <asm/mem_encrypt.h>
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+#ifndef sme_me_mask
+#define sme_me_mask	0UL
+
+static inline bool sme_active(void)
+{
+	return false;
+}
+#endif
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif	/* __ASSEMBLY__ */
+
+#endif	/* __MEM_ENCRYPT_H__ */

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (4 preceding siblings ...)
  2017-02-16 15:43 ` [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support Tom Lendacky
@ 2017-02-16 15:43 ` Tom Lendacky
  2017-02-20 12:51   ` Borislav Petkov
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
                   ` (23 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:43 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds support to the early boot code to use Secure Memory
Encryption (SME).  Support is added to update the early pagetables with
the memory encryption mask and to encrypt the kernel in place.

The routines to set the encryption mask and perform the encryption are
stub routines for now with full function to be added in a later patch.

A new file, arch/x86/kernel/mem_encrypt_init.c, is introduced to avoid
adding #ifdefs within arch/x86/kernel/head_64.S and allow
arch/x86/mm/mem_encrypt.c to be removed from the build if SME is not
configured. The mem_encrypt_init.c file will contain the necessary #ifdefs
to allow head_64.S to successfully build and call the SME routines.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/Makefile           |    2 +
 arch/x86/kernel/head_64.S          |   46 ++++++++++++++++++++++++++++++++-
 arch/x86/kernel/mem_encrypt_init.c |   50 ++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kernel/mem_encrypt_init.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index bdcdb3b..33af80a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -140,4 +140,6 @@ ifeq ($(CONFIG_X86_64),y)
 
 	obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
 	obj-y				+= vsmp_64.o
+
+	obj-y				+= mem_encrypt_init.o
 endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index b467b14..4f8201b 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -91,6 +91,23 @@ startup_64:
 	jnz	bad_address
 
 	/*
+	 * Enable Secure Memory Encryption (SME), if supported and enabled.
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 * Save the returned mask in %r12 for later use.
+	 */
+	push	%rsi
+	call	sme_enable
+	pop	%rsi
+	movq	%rax, %r12
+
+	/*
+	 * Add the memory encryption mask to %rbp to include it in the page
+	 * table fixups.
+	 */
+	addq	%r12, %rbp
+
+	/*
 	 * Fixup the physical addresses in the page table
 	 */
 	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
@@ -113,6 +130,7 @@ startup_64:
 	shrq	$PGDIR_SHIFT, %rax
 
 	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+	addq	%r12, %rdx
 	movq	%rdx, 0(%rbx,%rax,8)
 	movq	%rdx, 8(%rbx,%rax,8)
 
@@ -129,6 +147,7 @@ startup_64:
 	movq	%rdi, %rax
 	shrq	$PMD_SHIFT, %rdi
 	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+	addq	%r12, %rax
 	leaq	(_end - 1)(%rip), %rcx
 	shrq	$PMD_SHIFT, %rcx
 	subq	%rdi, %rcx
@@ -162,11 +181,25 @@ startup_64:
 	cmp	%r8, %rdi
 	jne	1b
 
-	/* Fixup phys_base */
+	/*
+	 * Fixup phys_base - remove the memory encryption mask from %rbp
+	 * to obtain the true physical address.
+	 */
+	subq	%r12, %rbp
 	addq	%rbp, phys_base(%rip)
 
+	/*
+	 * Encrypt the kernel if SME is active.
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 */
+	push	%rsi
+	call	sme_encrypt_kernel
+	pop	%rsi
+
 .Lskip_fixup:
 	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
+	addq	%r12, %rax
 	jmp 1f
 ENTRY(secondary_startup_64)
 	/*
@@ -186,7 +219,16 @@ ENTRY(secondary_startup_64)
 	/* Sanitize CPU configuration */
 	call verify_cpu
 
-	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+	/*
+	 * Get the SME encryption mask.
+	 * The real_mode_data address is in %rsi and that register can be
+	 * clobbered by the called function so be sure to save it.
+	 */
+	push	%rsi
+	call	sme_get_me_mask
+	pop	%rsi
+
+	addq	$(init_level4_pgt - __START_KERNEL_map), %rax
 1:
 
 	/* Enable PAE mode and PGE */
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
new file mode 100644
index 0000000..25af15d
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -0,0 +1,50 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+#include <linux/mem_encrypt.h>
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_get_me_mask(void)
+{
+	return sme_me_mask;
+}
+
+unsigned long __init sme_enable(void)
+{
+	return sme_me_mask;
+}
+
+#else	/* !CONFIG_AMD_MEM_ENCRYPT */
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_get_me_mask(void)
+{
+	return 0;
+}
+
+unsigned long __init sme_enable(void)
+{
+	return 0;
+}
+
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (5 preceding siblings ...)
  2017-02-16 15:43 ` [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing Tom Lendacky
@ 2017-02-16 15:43 ` Tom Lendacky
  2017-02-20 15:21   ` Borislav Petkov
                     ` (3 more replies)
  2017-02-16 15:43 ` [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs Tom Lendacky
                   ` (22 subsequent siblings)
  29 siblings, 4 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:43 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Adding general kernel support for memory encryption includes:
- Modify and create some page table macros to include the Secure Memory
  Encryption (SME) memory encryption mask
- Modify and create some macros for calculating physical and virtual
  memory addresses
- Provide an SME initialization routine to update the protection map with
  the memory encryption mask so that it is used by default
- #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/boot/compressed/pagetable.c |    7 +++++
 arch/x86/include/asm/fixmap.h        |    7 +++++
 arch/x86/include/asm/mem_encrypt.h   |   14 +++++++++++
 arch/x86/include/asm/page.h          |    4 ++-
 arch/x86/include/asm/pgtable.h       |   26 ++++++++++++++------
 arch/x86/include/asm/pgtable_types.h |   45 ++++++++++++++++++++++------------
 arch/x86/include/asm/processor.h     |    3 ++
 arch/x86/kernel/espfix_64.c          |    2 +-
 arch/x86/kernel/head64.c             |   12 ++++++++-
 arch/x86/kernel/head_64.S            |   18 +++++++-------
 arch/x86/mm/kasan_init_64.c          |    4 ++-
 arch/x86/mm/mem_encrypt.c            |   20 +++++++++++++++
 arch/x86/mm/pageattr.c               |    3 ++
 include/asm-generic/pgtable.h        |    8 ++++++
 14 files changed, 133 insertions(+), 40 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 56589d0..411c443 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
 #define __pa(x)  ((unsigned long)(x))
 #define __va(x)  ((void *)((unsigned long)(x)))
 
+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
 #include "misc.h"
 
 /* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 8554f96..83e91f0 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -153,6 +153,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
 }
 #endif
 
+/*
+ * Fixmap settings used with memory encryption
+ *   - FIXMAP_PAGE_NOCACHE is used for MMIO so make sure the memory
+ *     encryption mask is not part of the page attributes
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
 #include <asm-generic/fixmap.h>
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index ccc53b0..547989d 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,6 +15,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/init.h>
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
@@ -24,6 +26,11 @@ static inline bool sme_active(void)
 	return (sme_me_mask) ? true : false;
 }
 
+void __init sme_early_init(void);
+
+#define __sme_pa(x)		(__pa((x)) | sme_me_mask)
+#define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
+
 #else	/* !CONFIG_AMD_MEM_ENCRYPT */
 
 #ifndef sme_me_mask
@@ -35,6 +42,13 @@ static inline bool sme_active(void)
 }
 #endif
 
+static inline void __init sme_early_init(void)
+{
+}
+
+#define __sme_pa		__pa
+#define __sme_pa_nodebug	__pa_nodebug
+
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif	/* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index cf8f619..b1f7bf6 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -15,6 +15,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/mem_encrypt.h>
+
 struct page;
 
 #include <linux/range.h>
@@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
 	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
 #ifndef __va
-#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
+#define __va(x)			((void *)(((unsigned long)(x) & ~sme_me_mask) + PAGE_OFFSET))
 #endif
 
 #define __boot_va(x)		__va(x)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 2d81161..b41caab 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -3,6 +3,7 @@
 
 #include <asm/page.h>
 #include <asm/pgtable_types.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * Macro to mark a page protection value as UC-
@@ -13,6 +14,12 @@
 		     cachemode2protval(_PAGE_CACHE_MODE_UC_MINUS)))	\
 	 : (prot))
 
+/*
+ * Macros to add or remove encryption attribute
+ */
+#define pgprot_encrypted(prot)	__pgprot(pgprot_val(prot) | sme_me_mask)
+#define pgprot_decrypted(prot)	__pgprot(pgprot_val(prot) & ~sme_me_mask)
+
 #ifndef __ASSEMBLY__
 #include <asm/x86_init.h>
 
@@ -153,17 +160,22 @@ static inline int pte_special(pte_t pte)
 
 static inline unsigned long pte_pfn(pte_t pte)
 {
-	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
+	return (pte_val(pte) & ~sme_me_mask & PTE_PFN_MASK) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pmd_pfn(pmd_t pmd)
 {
-	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
+	return (pmd_val(pmd) & ~sme_me_mask & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
 }
 
 static inline unsigned long pud_pfn(pud_t pud)
 {
-	return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+	return (pud_val(pud) & ~sme_me_mask & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+}
+
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+	return (pgd_val(pgd) & ~sme_me_mask) >> PAGE_SHIFT;
 }
 
 #define pte_page(pte)	pfn_to_page(pte_pfn(pte))
@@ -563,8 +575,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pmd_page(pmd)		\
-	pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
 
 /*
  * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -632,8 +643,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pud_page(pud)		\
-	pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud)	pfn_to_page(pud_pfn(pud))
 
 /* Find an entry in the second-level page table.. */
 static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -673,7 +683,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
  * Currently stuck as a macro due to indirect forward reference to
  * linux/mmzone.h's __section_mem_map_addr() definition:
  */
-#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd)	pfn_to_page(pgd_pfn(pgd))
 
 /* to find an entry in a page-table-directory. */
 static inline unsigned long pud_index(unsigned long address)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 8b4de22..500fc60 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -2,7 +2,9 @@
 #define _ASM_X86_PGTABLE_DEFS_H
 
 #include <linux/const.h>
+
 #include <asm/page_types.h>
+#include <asm/mem_encrypt.h>
 
 #define FIRST_USER_ADDRESS	0UL
 
@@ -121,10 +123,10 @@
 
 #define _PAGE_PROTNONE	(_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
 
-#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
-			 _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
-			 _PAGE_DIRTY)
+#define _PAGE_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
+				 _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _KERNPG_TABLE_NOENC	(_PAGE_PRESENT | _PAGE_RW |		\
+				 _PAGE_ACCESSED | _PAGE_DIRTY)
 
 /*
  * Set of bits not changed in pte_modify.  The pte's
@@ -191,18 +193,29 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
 
-#define PAGE_KERNEL			__pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO			__pgprot(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC		__pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX			__pgprot(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_NOCACHE		__pgprot(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_LARGE		__pgprot(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_EXEC		__pgprot(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL		__pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VVAR		__pgprot(__PAGE_KERNEL_VVAR)
-
-#define PAGE_KERNEL_IO			__pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE		__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#ifndef __ASSEMBLY__
+
+#define _PAGE_ENC	(_AT(pteval_t, sme_me_mask))
+
+#define _PAGE_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |	\
+			 _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC)
+#define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
+			 _PAGE_DIRTY | _PAGE_ENC)
+
+#define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC	__pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VSYSCALL	__pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR	__pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO		__pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE	__pgprot(__PAGE_KERNEL_IO_NOCACHE)
+
+#endif	/* __ASSEMBLY__ */
 
 /*         xwr */
 #define __P000	PAGE_NONE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index e6cfe7b..86da9a4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@
 #include <asm/nops.h>
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
+#include <asm/mem_encrypt.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -240,7 +241,7 @@ static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
 
 static inline void load_cr3(pgd_t *pgdir)
 {
-	write_cr3(__pa(pgdir));
+	write_cr3(__sme_pa(pgdir));
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 04f89ca..51566d7 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -193,7 +193,7 @@ void init_espfix_ap(int cpu)
 
 	pte_p = pte_offset_kernel(&pmd, addr);
 	stack_page = page_address(alloc_pages_node(node, GFP_KERNEL, 0));
-	pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
+	pte = __pte(__pa(stack_page) | ((__PAGE_KERNEL_RO | _PAGE_ENC) & ptemask));
 	for (n = 0; n < ESPFIX_PTE_CLONES; n++)
 		set_pte(&pte_p[n*PTE_STRIDE], pte);
 
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index baa0e7b..182a4c7 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -28,6 +28,7 @@
 #include <asm/bootparam_utils.h>
 #include <asm/microcode.h>
 #include <asm/kasan.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * Manage page tables very early on.
@@ -42,7 +43,7 @@ static void __init reset_early_page_tables(void)
 {
 	memset(early_level4_pgt, 0, sizeof(pgd_t)*(PTRS_PER_PGD-1));
 	next_early_pgt = 0;
-	write_cr3(__pa_nodebug(early_level4_pgt));
+	write_cr3(__sme_pa_nodebug(early_level4_pgt));
 }
 
 /* Create a new PMD entry */
@@ -54,7 +55,7 @@ int __init early_make_pgtable(unsigned long address)
 	pmdval_t pmd, *pmd_p;
 
 	/* Invalid address or early pgt is done ?  */
-	if (physaddr >= MAXMEM || read_cr3() != __pa_nodebug(early_level4_pgt))
+	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
 		return -1;
 
 again:
@@ -157,6 +158,13 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
 
 	clear_page(init_level4_pgt);
 
+	/*
+	 * SME support may update early_pmd_flags to include the memory
+	 * encryption mask, so it needs to be called before anything
+	 * that may generate a page fault.
+	 */
+	sme_early_init();
+
 	kasan_early_init();
 
 	for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4f8201b..edd2f14 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -129,7 +129,7 @@ startup_64:
 	movq	%rdi, %rax
 	shrq	$PGDIR_SHIFT, %rax
 
-	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
+	leaq	(PAGE_SIZE + _KERNPG_TABLE_NOENC)(%rbx), %rdx
 	addq	%r12, %rdx
 	movq	%rdx, 0(%rbx,%rax,8)
 	movq	%rdx, 8(%rbx,%rax,8)
@@ -463,7 +463,7 @@ GLOBAL(name)
 	__INITDATA
 NEXT_PAGE(early_level4_pgt)
 	.fill	511,8,0
-	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(early_dynamic_pgts)
 	.fill	512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -475,15 +475,15 @@ NEXT_PAGE(init_level4_pgt)
 	.fill	512,8,0
 #else
 NEXT_PAGE(init_level4_pgt)
-	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.org    init_level4_pgt + L4_PAGE_OFFSET*8, 0
-	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.org    init_level4_pgt + L4_START_KERNEL*8, 0
 	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
-	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad   level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level3_ident_pgt)
-	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+	.quad	level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
 	.fill	511, 8, 0
 NEXT_PAGE(level2_ident_pgt)
 	/* Since I easily can, map the first 1G.
@@ -495,8 +495,8 @@ NEXT_PAGE(level2_ident_pgt)
 NEXT_PAGE(level3_kernel_pgt)
 	.fill	L3_START_KERNEL,8,0
 	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
-	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
-	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.quad	level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 
 NEXT_PAGE(level2_kernel_pgt)
 	/*
@@ -514,7 +514,7 @@ NEXT_PAGE(level2_kernel_pgt)
 
 NEXT_PAGE(level2_fixmap_pgt)
 	.fill	506,8,0
-	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+	.quad	level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
 	/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
 	.fill	5,8,0
 
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 66d2017..072a70a 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -70,7 +70,7 @@ static int kasan_die_handler(struct notifier_block *self,
 void __init kasan_early_init(void)
 {
 	int i;
-	pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL;
+	pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL | _PAGE_ENC;
 	pmdval_t pmd_val = __pa_nodebug(kasan_zero_pte) | _KERNPG_TABLE;
 	pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;
 
@@ -132,7 +132,7 @@ void __init kasan_init(void)
 	 */
 	memset(kasan_zero_page, 0, PAGE_SIZE);
 	for (i = 0; i < PTRS_PER_PTE; i++) {
-		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
+		pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
 		set_pte(&kasan_zero_pte[i], pte);
 	}
 	/* Flush TLBs again to be sure that write protection applied. */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b99d469..d71df97 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -11,6 +11,10 @@
  */
 
 #include <linux/linkage.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+
+extern pmdval_t early_pmd_flags;
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -19,3 +23,19 @@
  */
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
+
+void __init sme_early_init(void)
+{
+	unsigned int i;
+
+	if (!sme_me_mask)
+		return;
+
+	early_pmd_flags |= sme_me_mask;
+
+	__supported_pte_mask |= sme_me_mask;
+
+	/* Update the protection map with memory encryption mask */
+	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
+		protection_map[i] = pgprot_encrypted(protection_map[i]);
+}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index a57e8e0..91c5c63 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1987,6 +1987,9 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
 	if (!(page_flags & _PAGE_RW))
 		cpa.mask_clr = __pgprot(_PAGE_RW);
 
+	if (!(page_flags & _PAGE_ENC))
+		cpa.mask_clr = __pgprot(pgprot_val(cpa.mask_clr) | _PAGE_ENC);
+
 	cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);
 
 	retval = __change_page_attr_set_clr(&cpa, 0);
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 18af2bc..4a24451 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -314,6 +314,14 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
 #define pgprot_device pgprot_noncached
 #endif
 
+#ifndef pgprot_encrypted
+#define pgprot_encrypted(prot)	(prot)
+#endif
+
+#ifndef pgprot_decrypted
+#define pgprot_decrypted(prot)	(prot)
+#endif
+
 #ifndef pgprot_modify
 #define pgprot_modify pgprot_modify
 static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (6 preceding siblings ...)
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
@ 2017-02-16 15:43 ` Tom Lendacky
  2017-02-20 15:43   ` Borislav Petkov
  2017-02-16 15:43 ` [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory Tom Lendacky
                   ` (21 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:43 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add to the early_memremap support to be able to specify encrypted and
decrypted mappings with and without write-protection. The use of
write-protection is necessary when encrypting data "in place". The
write-protect attribute is considered cacheable for loads, but not
stores. This implies that the hardware will never give the core a
dirty line with this memtype.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/Kconfig                     |    4 +++
 arch/x86/include/asm/fixmap.h        |   13 ++++++++++
 arch/x86/include/asm/pgtable_types.h |    8 ++++++
 arch/x86/mm/ioremap.c                |   44 ++++++++++++++++++++++++++++++++++
 include/asm-generic/early_ioremap.h  |    2 ++
 mm/early_ioremap.c                   |   10 ++++++++
 6 files changed, 81 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a3b8c71..581eae4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1417,6 +1417,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
 	  If set to N, then the encryption of system memory can be
 	  activated with the mem_encrypt=on command line option.
 
+config ARCH_USE_MEMREMAP_PROT
+	def_bool y
+	depends on AMD_MEM_ENCRYPT
+
 # Common NUMA Features
 config NUMA
 	bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 83e91f0..8233373 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -160,6 +160,19 @@ static inline void __set_fixmap(enum fixed_addresses idx,
  */
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
 
+/*
+ * Early memremap routines used for in-place encryption. The mappings created
+ * by these routines are intended to be used as temporary mappings.
+ */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+				      unsigned long size);
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+					 unsigned long size);
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+				      unsigned long size);
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+					 unsigned long size);
+
 #include <asm-generic/fixmap.h>
 
 #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 500fc60..f00e70f 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {
 
 #define _PAGE_CACHE_MASK	(_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 #define _PAGE_NOCACHE		(cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP		(cachemode2protval(_PAGE_CACHE_MODE_WP))
 
 #define PAGE_NONE	__pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
 #define PAGE_SHARED	__pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
 #define __PAGE_KERNEL_VVAR		(__PAGE_KERNEL_RO | _PAGE_USER)
 #define __PAGE_KERNEL_LARGE		(__PAGE_KERNEL | _PAGE_PSE)
 #define __PAGE_KERNEL_LARGE_EXEC	(__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP		(__PAGE_KERNEL | _PAGE_CACHE_WP)
 
 #define __PAGE_KERNEL_IO		(__PAGE_KERNEL)
 #define __PAGE_KERNEL_IO_NOCACHE	(__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
 #define _KERNPG_TABLE	(_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED |	\
 			 _PAGE_DIRTY | _PAGE_ENC)
 
+#define __PAGE_KERNEL_ENC	(__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP	(__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_NOENC	(__PAGE_KERNEL)
+#define __PAGE_KERNEL_NOENC_WP	(__PAGE_KERNEL_WP)
+
 #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c43b6b3..2385e70 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -419,6 +419,50 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+/* Remap memory with encryption */
+void __init *early_memremap_encrypted(resource_size_t phys_addr,
+				      unsigned long size)
+{
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/*
+ * Remap memory with encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_encrypted_wp(resource_size_t phys_addr,
+					 unsigned long size)
+{
+	/* Be sure the write-protect PAT entry is set for write-protect */
+	if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+		return NULL;
+
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC_WP);
+}
+
+/* Remap memory without encryption */
+void __init *early_memremap_decrypted(resource_size_t phys_addr,
+				      unsigned long size)
+{
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC);
+}
+
+/*
+ * Remap memory without encryption and write-protected - cannot be called
+ * before pat_init() is called
+ */
+void __init *early_memremap_decrypted_wp(resource_size_t phys_addr,
+					 unsigned long size)
+{
+	/* Be sure the write-protect PAT entry is set for write-protect */
+	if (__pte2cachemode_tbl[_PAGE_CACHE_MODE_WP] != _PAGE_CACHE_MODE_WP)
+		return NULL;
+
+	return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_NOENC_WP);
+}
+#endif	/* CONFIG_ARCH_USE_MEMREMAP_PROT */
+
 static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;
 
 static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
diff --git a/include/asm-generic/early_ioremap.h b/include/asm-generic/early_ioremap.h
index 734ad4d..2edef8d 100644
--- a/include/asm-generic/early_ioremap.h
+++ b/include/asm-generic/early_ioremap.h
@@ -13,6 +13,8 @@ extern void *early_memremap(resource_size_t phys_addr,
 			    unsigned long size);
 extern void *early_memremap_ro(resource_size_t phys_addr,
 			       unsigned long size);
+extern void *early_memremap_prot(resource_size_t phys_addr,
+				 unsigned long size, unsigned long prot_val);
 extern void early_iounmap(void __iomem *addr, unsigned long size);
 extern void early_memunmap(void *addr, unsigned long size);
 
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index 6d5717b..d7d30da 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -226,6 +226,16 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
 }
 #endif
 
+#ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
+void __init *
+early_memremap_prot(resource_size_t phys_addr, unsigned long size,
+		    unsigned long prot_val)
+{
+	return (__force void *)__early_ioremap(phys_addr, size,
+					       __pgprot(prot_val));
+}
+#endif
+
 #define MAX_MAP_CHUNK	(NR_FIX_BTMAPS << PAGE_SHIFT)
 
 void __init copy_from_early_mem(void *dest, phys_addr_t src, unsigned long size)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (7 preceding siblings ...)
  2017-02-16 15:43 ` [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs Tom Lendacky
@ 2017-02-16 15:43 ` Tom Lendacky
  2017-02-20 18:22   ` Borislav Petkov
  2017-02-16 15:44 ` [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly Tom Lendacky
                   ` (20 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:43 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or decrypted memory area is in the proper state (for example
the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).

The early_memmap support is enhanced to specify encrypted and decrypted
mappings with and without write-protection. The use of write-protection is
necessary when encrypting data "in place". The write-protect attribute is
considered cacheable for loads, but not stores. This implies that the
hardware will never give the core a dirty line with this memtype.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   15 +++++++
 arch/x86/mm/mem_encrypt.c          |   79 ++++++++++++++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 547989d..3c9052c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
 	return (sme_me_mask) ? true : false;
 }
 
+void __init sme_early_encrypt(resource_size_t paddr,
+			      unsigned long size);
+void __init sme_early_decrypt(resource_size_t paddr,
+			      unsigned long size);
+
 void __init sme_early_init(void);
 
 #define __sme_pa(x)		(__pa((x)) | sme_me_mask)
@@ -42,6 +47,16 @@ static inline bool sme_active(void)
 }
 #endif
 
+static inline void __init sme_early_encrypt(resource_size_t paddr,
+					    unsigned long size)
+{
+}
+
+static inline void __init sme_early_decrypt(resource_size_t paddr,
+					    unsigned long size)
+{
+}
+
 static inline void __init sme_early_init(void)
 {
 }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index d71df97..ac3565c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
 #include <linux/init.h>
 #include <linux/mm.h>
 
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
 extern pmdval_t early_pmd_flags;
 
 /*
@@ -24,6 +27,82 @@
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as either encrypted or decrypted but the contents
+ * are currently not in the desired stated.
+ *
+ * This routine follows the steps outlined in the AMD64 Architecture
+ * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
+ */
+static void __init __sme_early_enc_dec(resource_size_t paddr,
+				       unsigned long size, bool enc)
+{
+	void *src, *dst;
+	size_t len;
+
+	if (!sme_me_mask)
+		return;
+
+	local_flush_tlb();
+	wbinvd();
+
+	/*
+	 * There are limited number of early mapping slots, so map (at most)
+	 * one page at time.
+	 */
+	while (size) {
+		len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+		/*
+		 * Create write protected mappings for the current format
+		 * of the memory.
+		 */
+		src = enc ? early_memremap_decrypted_wp(paddr, len) :
+			    early_memremap_encrypted_wp(paddr, len);
+
+		/*
+		 * Create mappings for the desired format of the memory.
+		 */
+		dst = enc ? early_memremap_encrypted(paddr, len) :
+			    early_memremap_decrypted(paddr, len);
+
+		/*
+		 * If a mapping can't be obtained to perform the operation,
+		 * then eventual access of that area will in the desired
+		 * mode will cause a crash.
+		 */
+		BUG_ON(!src || !dst);
+
+		/*
+		 * Use a temporary buffer, of cache-line multiple size, to
+		 * avoid data corruption as documented in the APM.
+		 */
+		memcpy(sme_early_buffer, src, len);
+		memcpy(dst, sme_early_buffer, len);
+
+		early_memunmap(dst, len);
+		early_memunmap(src, len);
+
+		paddr += len;
+		size -= len;
+	}
+}
+
+void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
+{
+	__sme_early_enc_dec(paddr, size, true);
+}
+
+void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
+{
+	__sme_early_enc_dec(paddr, size, false);
+}
+
 void __init sme_early_init(void)
 {
 	unsigned int i;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (8 preceding siblings ...)
  2017-02-16 15:43 ` [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory Tom Lendacky
@ 2017-02-16 15:44 ` Tom Lendacky
  2017-02-20 19:45   ` Borislav Petkov
  2017-02-16 15:44 ` [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address Tom Lendacky
                   ` (19 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:44 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

The boot data and command line data are present in memory in a decrypted
state and are copied early in the boot process.  The early page fault
support will map these areas as encrypted, so before attempting to copy
them, add decrypted mappings so the data is accessed properly when copied.

For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++++
 arch/x86/kernel/head64.c           |   34 +++++++++++++++--
 arch/x86/kernel/setup.c            |   10 +++++
 arch/x86/mm/mem_encrypt.c          |   74 ++++++++++++++++++++++++++++++++++++
 4 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 3c9052c..e2b7364 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -31,6 +31,9 @@ void __init sme_early_encrypt(resource_size_t paddr,
 void __init sme_early_decrypt(resource_size_t paddr,
 			      unsigned long size);
 
+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_unmap_bootdata(char *real_mode_data);
+
 void __init sme_early_init(void);
 
 #define __sme_pa(x)		(__pa((x)) | sme_me_mask)
@@ -57,6 +60,14 @@ static inline void __init sme_early_decrypt(resource_size_t paddr,
 {
 }
 
+static inline void __init sme_map_bootdata(char *real_mode_data)
+{
+}
+
+static inline void __init sme_unmap_bootdata(char *real_mode_data)
+{
+}
+
 static inline void __init sme_early_init(void)
 {
 }
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 182a4c7..03f8e74 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -46,13 +46,18 @@ static void __init reset_early_page_tables(void)
 	write_cr3(__sme_pa_nodebug(early_level4_pgt));
 }
 
+void __init __early_pgtable_flush(void)
+{
+	write_cr3(__sme_pa_nodebug(early_level4_pgt));
+}
+
 /* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
 	unsigned long physaddr = address - __PAGE_OFFSET;
 	pgdval_t pgd, *pgd_p;
 	pudval_t pud, *pud_p;
-	pmdval_t pmd, *pmd_p;
+	pmdval_t *pmd_p;
 
 	/* Invalid address or early pgt is done ?  */
 	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
@@ -94,12 +99,21 @@ int __init early_make_pgtable(unsigned long address)
 		memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
 		*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
 	}
-	pmd = (physaddr & PMD_MASK) + early_pmd_flags;
 	pmd_p[pmd_index(address)] = pmd;
 
 	return 0;
 }
 
+int __init early_make_pgtable(unsigned long address)
+{
+	unsigned long physaddr = address - __PAGE_OFFSET;
+	pmdval_t pmd;
+
+	pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+	return __early_make_pgtable(address, pmd);
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
@@ -122,6 +136,12 @@ static void __init copy_bootdata(char *real_mode_data)
 	char * command_line;
 	unsigned long cmd_line_ptr;
 
+	/*
+	 * If SME is active, this will create decrypted mappings of the
+	 * boot data in advance of the copy operations.
+	 */
+	sme_map_bootdata(real_mode_data);
+
 	memcpy(&boot_params, real_mode_data, sizeof boot_params);
 	sanitize_boot_params(&boot_params);
 	cmd_line_ptr = get_cmd_line_ptr();
@@ -129,6 +149,14 @@ static void __init copy_bootdata(char *real_mode_data)
 		command_line = __va(cmd_line_ptr);
 		memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 	}
+
+	/*
+	 * The old boot data is no longer needed and won't be reserved,
+	 * freeing up that memory for use by the system. If SME is active,
+	 * we need to remove the mappings that were created so that the
+	 * memory doesn't remain mapped as decrypted.
+	 */
+	sme_unmap_bootdata(real_mode_data);
 }
 
 asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index cab13f7..bd5b9a7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -114,6 +114,7 @@
 #include <asm/microcode.h>
 #include <asm/mmu_context.h>
 #include <asm/kaslr.h>
+#include <asm/mem_encrypt.h>
 
 /*
  * max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -376,6 +377,15 @@ static void __init reserve_initrd(void)
 	    !ramdisk_image || !ramdisk_size)
 		return;		/* No initrd provided by bootloader */
 
+	/*
+	 * If SME is active, this memory will be marked encrypted by the
+	 * kernel when it is accessed (including relocation). However, the
+	 * ramdisk image was loaded decrypted by the bootloader, so make
+	 * sure that it is encrypted before accessing it.
+	 */
+	if (sme_active())
+		sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+
 	initrd_start = 0;
 
 	mapped_size = memblock_mem_size(max_pfn_mapped);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ac3565c..ec548e9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,8 +16,12 @@
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>
 
 extern pmdval_t early_pmd_flags;
+int __init __early_make_pgtable(unsigned long, pmdval_t);
+void __init __early_pgtable_flush(void);
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -103,6 +107,76 @@ void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
 	__sme_early_enc_dec(paddr, size, false);
 }
 
+static void __init __sme_early_map_unmap_mem(void *vaddr, unsigned long size,
+					     bool map)
+{
+	unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+	pmdval_t pmd_flags, pmd;
+
+	/* Use early_pmd_flags but remove the encryption mask */
+	pmd_flags = early_pmd_flags & ~sme_me_mask;
+
+	do {
+		pmd = map ? (paddr & PMD_MASK) + pmd_flags : 0;
+		__early_make_pgtable((unsigned long)vaddr, pmd);
+
+		vaddr += PMD_SIZE;
+		paddr += PMD_SIZE;
+		size = (size <= PMD_SIZE) ? 0 : size - PMD_SIZE;
+	} while (size);
+}
+
+static void __init __sme_map_unmap_bootdata(char *real_mode_data, bool map)
+{
+	struct boot_params *boot_data;
+	unsigned long cmdline_paddr;
+
+	__sme_early_map_unmap_mem(real_mode_data, sizeof(boot_params), map);
+	boot_data = (struct boot_params *)real_mode_data;
+
+	/*
+	 * Determine the command line address only after having established
+	 * the decrypted mapping.
+	 */
+	cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+			((u64)boot_data->ext_cmd_line_ptr << 32);
+
+	if (cmdline_paddr)
+		__sme_early_map_unmap_mem(__va(cmdline_paddr),
+					  COMMAND_LINE_SIZE, map);
+}
+
+void __init sme_unmap_bootdata(char *real_mode_data)
+{
+	/* If SME is not active, the bootdata is in the correct state */
+	if (!sme_active())
+		return;
+
+	/*
+	 * The bootdata and command line aren't needed anymore so clear
+	 * any mapping of them.
+	 */
+	__sme_map_unmap_bootdata(real_mode_data, false);
+
+	__early_pgtable_flush();
+}
+
+void __init sme_map_bootdata(char *real_mode_data)
+{
+	/* If SME is not active, the bootdata is in the correct state */
+	if (!sme_active())
+		return;
+
+	/*
+	 * The bootdata and command line will not be encrypted, so they
+	 * need to be mapped as decrypted memory so they can be copied
+	 * properly.
+	 */
+	__sme_map_unmap_bootdata(real_mode_data, true);
+
+	__early_pgtable_flush();
+}
+
 void __init sme_early_init(void)
 {
 	unsigned int i;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (9 preceding siblings ...)
  2017-02-16 15:44 ` [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly Tom Lendacky
@ 2017-02-16 15:44 ` Tom Lendacky
  2017-02-20 20:09   ` Borislav Petkov
  2017-02-16 15:44 ` [RFC PATCH v4 12/28] efi: Add an EFI table address match function Tom Lendacky
                   ` (18 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:44 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds support to return the E820 type associated with an address
range.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/e820/api.h   |    2 ++
 arch/x86/include/asm/e820/types.h |    2 ++
 arch/x86/kernel/e820.c            |   26 +++++++++++++++++++++++---
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 8e0f8b8..7c1bdc9 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -38,6 +38,8 @@
 extern void e820__reallocate_tables(void);
 extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
+extern enum e820_type e820__get_entry_type(u64 start, u64 end);
+
 /*
  * Returns true iff the specified range [start,end) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
index 4adeed0..bf49591 100644
--- a/arch/x86/include/asm/e820/types.h
+++ b/arch/x86/include/asm/e820/types.h
@@ -7,6 +7,8 @@
  * These are the E820 types known to the kernel:
  */
 enum e820_type {
+	E820_TYPE_INVALID	= 0,
+
 	E820_TYPE_RAM		= 1,
 	E820_TYPE_RESERVED	= 2,
 	E820_TYPE_ACPI		= 3,
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 6e9b26f..2ee7ee2 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -84,7 +84,8 @@ bool e820__mapped_any(u64 start, u64 end, enum e820_type type)
  * Note: this function only works correctly once the E820 table is sorted and
  * not-overlapping (at least for the range specified), which is the case normally.
  */
-bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+static struct e820_entry *__e820__mapped_all(u64 start, u64 end,
+					     enum e820_type type)
 {
 	int i;
 
@@ -110,9 +111,28 @@ bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
 		 * coverage of the desired range exists:
 		 */
 		if (start >= end)
-			return 1;
+			return entry;
 	}
-	return 0;
+
+	return NULL;
+}
+
+/*
+ * This function checks if the entire range <start,end> is mapped with type.
+ */
+bool __init e820__mapped_all(u64 start, u64 end, enum e820_type type)
+{
+	return __e820__mapped_all(start, end, type) ? 1 : 0;
+}
+
+/*
+ * This function returns the type associated with the range <start,end>.
+ */
+enum e820_type e820__get_entry_type(u64 start, u64 end)
+{
+	struct e820_entry *entry = __e820__mapped_all(start, end, 0);
+
+	return entry ? entry->type : E820_TYPE_INVALID;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 12/28] efi: Add an EFI table address match function
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (10 preceding siblings ...)
  2017-02-16 15:44 ` [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address Tom Lendacky
@ 2017-02-16 15:44 ` Tom Lendacky
  2017-02-16 15:44 ` [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types Tom Lendacky
                   ` (17 subsequent siblings)
  29 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:44 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds support that will determine if a supplied physical address
matches the address of an EFI table.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/firmware/efi/efi.c |   33 +++++++++++++++++++++++++++++++++
 include/linux/efi.h        |    7 +++++++
 2 files changed, 40 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index e7d4040..e8dbcdf 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,25 @@ struct efi __read_mostly efi = {
 };
 EXPORT_SYMBOL(efi);
 
+static unsigned long *efi_tables[] = {
+	&efi.mps,
+	&efi.acpi,
+	&efi.acpi20,
+	&efi.smbios,
+	&efi.smbios3,
+	&efi.sal_systab,
+	&efi.boot_info,
+	&efi.hcdp,
+	&efi.uga,
+	&efi.uv_systab,
+	&efi.fw_vendor,
+	&efi.runtime,
+	&efi.config_table,
+	&efi.esrt,
+	&efi.properties_table,
+	&efi.mem_attr_table,
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -855,6 +874,20 @@ int efi_status_to_err(efi_status_t status)
 	return err;
 }
 
+bool efi_table_address_match(unsigned long phys_addr)
+{
+	unsigned int i;
+
+	if (phys_addr == EFI_INVALID_TABLE_ADDR)
+		return false;
+
+	for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+		if (*(efi_tables[i]) == phys_addr)
+			return true;
+
+	return false;
+}
+
 #ifdef CONFIG_KEXEC
 static int update_efi_random_seed(struct notifier_block *nb,
 				  unsigned long code, void *unused)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 94d34e0..7694b23 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1079,6 +1079,8 @@ static inline bool efi_enabled(int feature)
 	return test_bit(feature, &efi.flags) != 0;
 }
 extern void efi_reboot(enum reboot_mode reboot_mode, const char *__unused);
+
+extern bool efi_table_address_match(unsigned long phys_addr);
 #else
 static inline bool efi_enabled(int feature)
 {
@@ -1092,6 +1094,11 @@ static inline bool efi_enabled(int feature)
 {
 	return false;
 }
+
+static inline bool efi_table_address_match(unsigned long phys_addr)
+{
+	return false;
+}
 #endif
 
 extern int efi_status_to_err(efi_status_t status);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (11 preceding siblings ...)
  2017-02-16 15:44 ` [RFC PATCH v4 12/28] efi: Add an EFI table address match function Tom Lendacky
@ 2017-02-16 15:44 ` Tom Lendacky
  2017-02-21 12:05   ` Matt Fleming
  2017-02-16 15:45 ` [RFC PATCH v4 14/28] Add support to access boot related data in the clear Tom Lendacky
                   ` (16 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:44 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Update the efi_mem_type() to return EFI_RESERVED_TYPE instead of a
hardcoded 0.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/platform/efi/efi.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index a15cf81..6407103 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1037,7 +1037,7 @@ u32 efi_mem_type(unsigned long phys_addr)
 	efi_memory_desc_t *md;
 
 	if (!efi_enabled(EFI_MEMMAP))
-		return 0;
+		return EFI_RESERVED_TYPE;
 
 	for_each_efi_memory_desc(md) {
 		if ((md->phys_addr <= phys_addr) &&
@@ -1045,7 +1045,7 @@ u32 efi_mem_type(unsigned long phys_addr)
 				  (md->num_pages << EFI_PAGE_SHIFT))))
 			return md->type;
 	}
-	return 0;
+	return EFI_RESERVED_TYPE;
 }
 
 static int __init arch_parse_efi_cmdline(char *str)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (12 preceding siblings ...)
  2017-02-16 15:44 ` [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types Tom Lendacky
@ 2017-02-16 15:45 ` Tom Lendacky
  2017-02-21 15:06   ` Borislav Petkov
  2017-03-08  6:55   ` Dave Young
  2017-02-16 15:45 ` [RFC PATCH v4 15/28] Add support to access persistent memory " Tom Lendacky
                   ` (15 subsequent siblings)
  29 siblings, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:45 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be mapped decrypted.  Add support to apply the proper
attributes to the EFI page tables and to the early_memremap and memremap
APIs to identify the type of data being accessed so that the proper
encryption attribute can be applied.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/io.h      |    3 +
 arch/x86/include/asm/setup.h   |    8 +++
 arch/x86/kernel/setup.c        |   33 ++++++++++++
 arch/x86/mm/ioremap.c          |  111 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/platform/efi/efi_64.c |   16 ++++--
 kernel/memremap.c              |   11 ++++
 mm/early_ioremap.c             |   18 +++++-
 7 files changed, 192 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 7afb0e2..833f7cc 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -381,4 +381,7 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size);
+#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
+
 #endif /* _ASM_X86_IO_H */
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index ac1d5da..99998d9 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -63,6 +63,14 @@ static inline void x86_ce4100_early_setup(void) { }
 #include <asm/espfix.h>
 #include <linux/kernel.h>
 
+struct setup_data_attrs {
+	u64 paddr;
+	unsigned long size;
+};
+
+extern struct setup_data_attrs setup_data_list[];
+extern unsigned int setup_data_list_count;
+
 /*
  * This is set up by the setup-routine at boot-time
  */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bd5b9a7..d2234bf 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -148,6 +148,9 @@ int default_check_phys_apicid_present(int phys_apicid)
 
 struct boot_params boot_params;
 
+struct setup_data_attrs setup_data_list[32];
+unsigned int setup_data_list_count;
+
 /*
  * Machine setup..
  */
@@ -419,6 +422,32 @@ static void __init reserve_initrd(void)
 }
 #endif /* CONFIG_BLK_DEV_INITRD */
 
+static void __init update_setup_data_list(u64 pa_data, unsigned long size)
+{
+	unsigned int i;
+
+	for (i = 0; i < setup_data_list_count; i++) {
+		if (setup_data_list[i].paddr != pa_data)
+			continue;
+
+		setup_data_list[i].size = size;
+		break;
+	}
+}
+
+static void __init add_to_setup_data_list(u64 pa_data, unsigned long size)
+{
+	if (!sme_active())
+		return;
+
+	if (!WARN(setup_data_list_count == ARRAY_SIZE(setup_data_list),
+		  "exceeded maximum setup data list slots")) {
+		setup_data_list[setup_data_list_count].paddr = pa_data;
+		setup_data_list[setup_data_list_count].size = size;
+		setup_data_list_count++;
+	}
+}
+
 static void __init parse_setup_data(void)
 {
 	struct setup_data *data;
@@ -428,12 +457,16 @@ static void __init parse_setup_data(void)
 	while (pa_data) {
 		u32 data_len, data_type;
 
+		add_to_setup_data_list(pa_data, sizeof(*data));
+
 		data = early_memremap(pa_data, sizeof(*data));
 		data_len = data->len + sizeof(struct setup_data);
 		data_type = data->type;
 		pa_next = data->next;
 		early_memunmap(data, sizeof(*data));
 
+		update_setup_data_list(pa_data, data_len);
+
 		switch (data_type) {
 		case SETUP_E820_EXT:
 			e820__memory_setup_extended(pa_data, data_len);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 2385e70..b0ff6bc 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/mmiotrace.h>
+#include <linux/efi.h>
 
 #include <asm/cacheflush.h>
 #include <asm/e820/api.h>
@@ -21,6 +22,7 @@
 #include <asm/tlbflush.h>
 #include <asm/pgalloc.h>
 #include <asm/pat.h>
+#include <asm/setup.h>
 
 #include "physaddr.h"
 
@@ -419,6 +421,115 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
 }
 
+/*
+ * Examine the physical address to determine if it is boot data. Check
+ * it against the boot params structure and EFI tables.
+ */
+static bool memremap_is_setup_data(resource_size_t phys_addr,
+				   unsigned long size)
+{
+	unsigned int i;
+	u64 paddr;
+
+	for (i = 0; i < setup_data_list_count; i++) {
+		if (phys_addr < setup_data_list[i].paddr)
+			continue;
+
+		if (phys_addr >= (setup_data_list[i].paddr +
+				  setup_data_list[i].size))
+			continue;
+
+		/* Address is within setup data range */
+		return true;
+	}
+
+	paddr = boot_params.efi_info.efi_memmap_hi;
+	paddr <<= 32;
+	paddr |= boot_params.efi_info.efi_memmap;
+	if (phys_addr == paddr)
+		return true;
+
+	paddr = boot_params.efi_info.efi_systab_hi;
+	paddr <<= 32;
+	paddr |= boot_params.efi_info.efi_systab;
+	if (phys_addr == paddr)
+		return true;
+
+	if (efi_table_address_match(phys_addr))
+		return true;
+
+	return false;
+}
+
+/*
+ * This function determines if an address should be mapped encrypted.
+ * Boot setup data, EFI data and E820 areas are checked in making this
+ * determination.
+ */
+static bool memremap_should_map_encrypted(resource_size_t phys_addr,
+					  unsigned long size)
+{
+	/*
+	 * SME is not active, return true:
+	 *   - For early_memremap_pgprot_adjust(), returning true or false
+	 *     results in the same protection value
+	 *   - For arch_memremap_do_ram_remap(), returning true will allow
+	 *     the RAM remap to occur instead of falling back to ioremap()
+	 */
+	if (!sme_active())
+		return true;
+
+	/* Check if the address is part of the setup data */
+	if (memremap_is_setup_data(phys_addr, size))
+		return false;
+
+	/* Check if the address is part of EFI boot/runtime data */
+	switch (efi_mem_type(phys_addr)) {
+	case EFI_BOOT_SERVICES_DATA:
+	case EFI_RUNTIME_SERVICES_DATA:
+		return false;
+	default:
+		break;
+	}
+
+	/* Check if the address is outside kernel usable area */
+	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
+	case E820_TYPE_RESERVED:
+	case E820_TYPE_ACPI:
+	case E820_TYPE_NVS:
+	case E820_TYPE_UNUSABLE:
+		return false;
+	default:
+		break;
+	}
+
+	return true;
+}
+
+/*
+ * Architecure function to determine if RAM remap is allowed.
+ */
+bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
+{
+	return memremap_should_map_encrypted(phys_addr, size);
+}
+
+/*
+ * Architecure override of __weak function to adjust the protection attributes
+ * used when remapping memory.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+					     unsigned long size,
+					     pgprot_t prot)
+{
+	if (memremap_should_map_encrypted(phys_addr, size))
+		prot = pgprot_encrypted(prot);
+	else
+		prot = pgprot_decrypted(prot);
+
+	return prot;
+}
+
 #ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 2ee7694..2d8674d 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -243,7 +243,7 @@ void efi_sync_low_kernel_mappings(void)
 
 int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 {
-	unsigned long pfn, text;
+	unsigned long pfn, text, pf;
 	struct page *page;
 	unsigned npages;
 	pgd_t *pgd;
@@ -251,7 +251,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	if (efi_enabled(EFI_OLD_MEMMAP))
 		return 0;
 
-	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+	/*
+	 * Since the PGD is encrypted, set the encryption mask so that when
+	 * this value is loaded into cr3 the PGD will be decrypted during
+	 * the pagetable walk.
+	 */
+	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
 	pgd = efi_pgd;
 
 	/*
@@ -261,7 +267,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	 * phys_efi_set_virtual_address_map().
 	 */
 	pfn = pa_memmap >> PAGE_SHIFT;
-	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+	pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
+	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
 		pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
 		return 1;
 	}
@@ -304,7 +311,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
 	text = __pa(_text);
 	pfn = text >> PAGE_SHIFT;
 
-	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+	pf = _PAGE_RW | _PAGE_ENC;
+	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
 		pr_err("Failed to map kernel text 1:1\n");
 		return 1;
 	}
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 9ecedc2..d7a27ea 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,13 +34,22 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
 }
 #endif
 
+#ifndef arch_memremap_do_ram_remap
+static bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size)
+{
+	return true;
+}
+#endif
+
 static void *try_ram_remap(resource_size_t offset, size_t size)
 {
 	unsigned long pfn = PHYS_PFN(offset);
 
 	/* In the simple case just return the existing linear address */
-	if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+	if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+	    arch_memremap_do_ram_remap(offset, size))
 		return __va(offset);
+
 	return NULL; /* fallback to arch_memremap_wb */
 }
 
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d7d30da..b1dd4a9 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ static int __init early_ioremap_debug_setup(char *str)
 
 static int after_paging_init __initdata;
 
+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+						    unsigned long size,
+						    pgprot_t prot)
+{
+	return prot;
+}
+
 void __init __weak early_ioremap_shutdown(void)
 {
 }
@@ -215,14 +222,19 @@ void __init early_iounmap(void __iomem *addr, unsigned long size)
 void __init *
 early_memremap(resource_size_t phys_addr, unsigned long size)
 {
-	return (__force void *)__early_ioremap(phys_addr, size,
-					       FIXMAP_PAGE_NORMAL);
+	pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+						     FIXMAP_PAGE_NORMAL);
+
+	return (__force void *)__early_ioremap(phys_addr, size, prot);
 }
 #ifdef FIXMAP_PAGE_RO
 void __init *
 early_memremap_ro(resource_size_t phys_addr, unsigned long size)
 {
-	return (__force void *)__early_ioremap(phys_addr, size, FIXMAP_PAGE_RO);
+	pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+						     FIXMAP_PAGE_RO);
+
+	return (__force void *)__early_ioremap(phys_addr, size, prot);
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 15/28] Add support to access persistent memory in the clear
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (13 preceding siblings ...)
  2017-02-16 15:45 ` [RFC PATCH v4 14/28] Add support to access boot related data in the clear Tom Lendacky
@ 2017-02-16 15:45 ` Tom Lendacky
  2017-03-17 22:58   ` Elliott, Robert (Persistent Memory)
  2017-02-16 15:45 ` [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute Tom Lendacky
                   ` (14 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:45 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Persistent memory is expected to persist across reboots. The encryption
key used by SME will change across reboots which will result in corrupted
persistent memory.  Persistent memory is handed out by block devices
through memory remapping functions, so be sure not to map this memory as
encrypted.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/mm/ioremap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index b0ff6bc..c6cb921 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -498,6 +498,8 @@ static bool memremap_should_map_encrypted(resource_size_t phys_addr,
 	case E820_TYPE_ACPI:
 	case E820_TYPE_NVS:
 	case E820_TYPE_UNUSABLE:
+	case E820_TYPE_PMEM:
+	case E820_TYPE_PRAM:
 		return false;
 	default:
 		break;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (14 preceding siblings ...)
  2017-02-16 15:45 ` [RFC PATCH v4 15/28] Add support to access persistent memory " Tom Lendacky
@ 2017-02-16 15:45 ` Tom Lendacky
  2017-02-22 18:52   ` Borislav Petkov
  2017-02-16 15:45 ` [RFC PATCH v4 17/28] x86: Decrypt trampoline area if memory encryption is active Tom Lendacky
                   ` (13 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:45 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add support for changing the memory encryption attribute for one or more
memory pages.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cacheflush.h |    3 ++
 arch/x86/mm/pageattr.c            |   66 +++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 872877d..33ae60a 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -12,6 +12,7 @@
  * Executability : eXeutable, NoteXecutable
  * Read/Write    : ReadOnly, ReadWrite
  * Presence      : NotPresent
+ * Encryption    : Encrypted, Decrypted
  *
  * Within a category, the attributes are mutually exclusive.
  *
@@ -47,6 +48,8 @@
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 91c5c63..9710f5c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1742,6 +1742,72 @@ int set_memory_4k(unsigned long addr, int numpages)
 					__pgprot(0), 1, 0, NULL);
 }
 
+static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+	struct cpa_data cpa;
+	unsigned long start;
+	int ret;
+
+	/* Nothing to do if the _PAGE_ENC attribute is zero */
+	if (_PAGE_ENC == 0)
+		return 0;
+
+	/* Save original start address since it will be modified */
+	start = addr;
+
+	memset(&cpa, 0, sizeof(cpa));
+	cpa.vaddr = &addr;
+	cpa.numpages = numpages;
+	cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
+	cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
+	cpa.pgd = init_mm.pgd;
+
+	/* Should not be working on unaligned addresses */
+	if (WARN_ONCE(*cpa.vaddr & ~PAGE_MASK,
+		      "misaligned address: %#lx\n", *cpa.vaddr))
+		*cpa.vaddr &= PAGE_MASK;
+
+	/* Must avoid aliasing mappings in the highmem code */
+	kmap_flush_unused();
+	vm_unmap_aliases();
+
+	/*
+	 * Before changing the encryption attribute, we need to flush caches.
+	 */
+	if (static_cpu_has(X86_FEATURE_CLFLUSH))
+		cpa_flush_range(start, numpages, 1);
+	else
+		cpa_flush_all(1);
+
+	ret = __change_page_attr_set_clr(&cpa, 1);
+
+	/*
+	 * After changing the encryption attribute, we need to flush TLBs
+	 * again in case any speculative TLB caching occurred (but no need
+	 * to flush caches again).  We could just use cpa_flush_all(), but
+	 * in case TLB flushing gets optimized in the cpa_flush_range()
+	 * path use the same logic as above.
+	 */
+	if (static_cpu_has(X86_FEATURE_CLFLUSH))
+		cpa_flush_range(start, numpages, 0);
+	else
+		cpa_flush_all(0);
+
+	return ret;
+}
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_enc_dec(addr, numpages, true);
+}
+EXPORT_SYMBOL(set_memory_encrypted);
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+	return __set_memory_enc_dec(addr, numpages, false);
+}
+EXPORT_SYMBOL(set_memory_decrypted);
+
 int set_pages_uc(struct page *page, int numpages)
 {
 	unsigned long addr = (unsigned long)page_address(page);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 17/28] x86: Decrypt trampoline area if memory encryption is active
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (15 preceding siblings ...)
  2017-02-16 15:45 ` [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute Tom Lendacky
@ 2017-02-16 15:45 ` Tom Lendacky
  2017-02-16 15:46 ` [RFC PATCH v4 18/28] x86: DMA support for memory encryption Tom Lendacky
                   ` (12 subsequent siblings)
  29 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:45 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/realmode/init.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 5db706f1..21d7506 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,8 @@
 #include <asm/pgtable.h>
 #include <asm/realmode.h>
 #include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>
+#include <asm/cacheflush.h>
 
 struct real_mode_header *real_mode_header;
 u32 *trampoline_cr4_features;
@@ -130,6 +132,16 @@ static void __init set_real_mode_permissions(void)
 	unsigned long text_start =
 		(unsigned long) __va(real_mode_header->text_start);
 
+	/*
+	 * If SME is active, the trampoline area will need to be in
+	 * decrypted memory in order to bring up other processors
+	 * successfully.
+	 */
+	if (sme_active()) {
+		sme_early_decrypt(__pa(base), size);
+		set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+	}
+
 	set_memory_nx((unsigned long) base, size >> PAGE_SHIFT);
 	set_memory_ro((unsigned long) base, ro_size >> PAGE_SHIFT);
 	set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 18/28] x86: DMA support for memory encryption
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (16 preceding siblings ...)
  2017-02-16 15:45 ` [RFC PATCH v4 17/28] x86: Decrypt trampoline area if memory encryption is active Tom Lendacky
@ 2017-02-16 15:46 ` Tom Lendacky
  2017-02-25 17:10   ` Borislav Petkov
  2017-02-16 15:46 ` [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
                   ` (11 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:46 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create decrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/dma-mapping.h |    5 ++-
 arch/x86/include/asm/mem_encrypt.h |    5 +++
 arch/x86/kernel/pci-dma.c          |   11 +++++--
 arch/x86/kernel/pci-nommu.c        |    2 +
 arch/x86/kernel/pci-swiotlb.c      |    8 ++++-
 arch/x86/mm/mem_encrypt.c          |   22 ++++++++++++++
 include/linux/swiotlb.h            |    1 +
 init/main.c                        |   13 ++++++++
 lib/swiotlb.c                      |   56 +++++++++++++++++++++++++++++++-----
 9 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..c9cdcae 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
 #include <asm/io.h>
 #include <asm/swiotlb.h>
 #include <linux/dma-contiguous.h>
+#include <asm/mem_encrypt.h>
 
 #ifdef CONFIG_ISA
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -69,12 +70,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
 
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-	return paddr;
+	return paddr | sme_me_mask;
 }
 
 static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
 {
-	return daddr;
+	return daddr & ~sme_me_mask;
 }
 #endif /* CONFIG_X86_DMA_REMAP */
 
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index e2b7364..87e816f 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -36,6 +36,11 @@ void __init sme_early_decrypt(resource_size_t paddr,
 
 void __init sme_early_init(void);
 
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
+
 #define __sme_pa(x)		(__pa((x)) | sme_me_mask)
 #define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
 
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d30c377..0ce28df 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -92,9 +92,12 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 	/* CMA can be used only in the context which permits sleeping */
 	if (gfpflags_allow_blocking(flag)) {
 		page = dma_alloc_from_contiguous(dev, count, get_order(size));
-		if (page && page_to_phys(page) + size > dma_mask) {
-			dma_release_from_contiguous(dev, page, count);
-			page = NULL;
+		if (page) {
+			addr = phys_to_dma(dev, page_to_phys(page));
+			if (addr + size > dma_mask) {
+				dma_release_from_contiguous(dev, page, count);
+				page = NULL;
+			}
 		}
 	}
 	/* fallback */
@@ -103,7 +106,7 @@ void *dma_generic_alloc_coherent(struct device *dev, size_t size,
 	if (!page)
 		return NULL;
 
-	addr = page_to_phys(page);
+	addr = phys_to_dma(dev, page_to_phys(page));
 	if (addr + size > dma_mask) {
 		__free_pages(page, get_order(size));
 
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 00e71ce..922c10d 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
 				 enum dma_data_direction dir,
 				 unsigned long attrs)
 {
-	dma_addr_t bus = page_to_phys(page) + offset;
+	dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
 	WARN_ON(size == 0);
 	if (!check_addr("map_single", dev, bus, size))
 		return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index 410efb2..a0677a9 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
 #include <asm/dma.h>
 #include <asm/xen/swiotlb-xen.h>
 #include <asm/iommu_table.h>
+#include <asm/mem_encrypt.h>
+
 int swiotlb __read_mostly;
 
 void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,11 +66,13 @@ void x86_swiotlb_free_coherent(struct device *dev, size_t size,
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
  *
  * This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
  */
 int __init pci_swiotlb_detect_override(void)
 {
-	if (swiotlb_force == SWIOTLB_FORCE)
+	if ((swiotlb_force == SWIOTLB_FORCE) || sme_active())
 		swiotlb = 1;
 
 	return swiotlb;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ec548e9..a46bcf4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -13,11 +13,14 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
 #include <asm/setup.h>
 #include <asm/bootparam.h>
+#include <asm/cacheflush.h>
 
 extern pmdval_t early_pmd_flags;
 int __init __early_make_pgtable(unsigned long, pmdval_t);
@@ -192,3 +195,22 @@ void __init sme_early_init(void)
 	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
 		protection_map[i] = pgprot_encrypted(protection_map[i]);
 }
+
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+	if (!sme_me_mask)
+		return;
+
+	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
+	swiotlb_update_mem_attributes();
+}
+
+void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+	WARN(PAGE_ALIGN(size) != size,
+	     "size is not page aligned (%#lx)\n", size);
+
+	/* Make the SWIOTLB buffer area decrypted */
+	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
+}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4ee479f..15e7160 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -35,6 +35,7 @@ enum swiotlb_force {
 extern unsigned long swiotlb_nr_tbl(void);
 unsigned long swiotlb_size_or_default(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
+extern void __init swiotlb_update_mem_attributes(void);
 
 /*
  * Enumeration for sync targets
diff --git a/init/main.c b/init/main.c
index 8222caa..ba13f8f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -466,6 +466,10 @@ void __init __weak thread_stack_cache_init(void)
 }
 #endif
 
+void __init __weak mem_encrypt_init(void)
+{
+}
+
 /*
  * Set up kernel memory allocators
  */
@@ -614,6 +618,15 @@ asmlinkage __visible void __init start_kernel(void)
 	 */
 	locking_selftest();
 
+	/*
+	 * This needs to be called before any devices perform DMA
+	 * operations that might use the swiotlb bounce buffers.
+	 * This call will mark the bounce buffers as decrypted so
+	 * that their usage will not cause "plain-text" data to be
+	 * decrypted when accessed.
+	 */
+	mem_encrypt_init();
+
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start && !initrd_below_start_ok &&
 	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index a8d74a7..c463067 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/gfp.h>
 #include <linux/scatterlist.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/io.h>
 #include <asm/dma.h>
@@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
 	return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
+void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
+{
+}
+
+/* For swiotlb, clear memory encryption mask from dma addresses */
+static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
+				      phys_addr_t address)
+{
+	return phys_to_dma(hwdev, address) & ~sme_me_mask;
+}
+
 /* Note that this doesn't work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
 				      volatile void *address)
@@ -183,6 +195,31 @@ void swiotlb_print_info(void)
 	       bytes >> 20, vstart, vend - 1);
 }
 
+/*
+ * Early SWIOTLB allocation may be to early to allow an architecture to
+ * perform the desired operations.  This function allows the architecture to
+ * call SWIOTLB when the operations are possible.  This function needs to be
+ * called before the SWIOTLB memory is used.
+ */
+void __init swiotlb_update_mem_attributes(void)
+{
+	void *vaddr;
+	unsigned long bytes;
+
+	if (no_iotlb_memory || late_alloc)
+		return;
+
+	vaddr = phys_to_virt(io_tlb_start);
+	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+	swiotlb_set_mem_attributes(vaddr, bytes);
+	memset(vaddr, 0, bytes);
+
+	vaddr = phys_to_virt(io_tlb_overflow_buffer);
+	bytes = PAGE_ALIGN(io_tlb_overflow);
+	swiotlb_set_mem_attributes(vaddr, bytes);
+	memset(vaddr, 0, bytes);
+}
+
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
 	void *v_overflow_buffer;
@@ -320,6 +357,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	io_tlb_start = virt_to_phys(tlb);
 	io_tlb_end = io_tlb_start + bytes;
 
+	swiotlb_set_mem_attributes(tlb, bytes);
 	memset(tlb, 0, bytes);
 
 	/*
@@ -330,6 +368,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 	if (!v_overflow_buffer)
 		goto cleanup2;
 
+	swiotlb_set_mem_attributes(v_overflow_buffer, io_tlb_overflow);
+	memset(v_overflow_buffer, 0, io_tlb_overflow);
 	io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
 
 	/*
@@ -581,7 +621,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 		return SWIOTLB_MAP_ERROR;
 	}
 
-	start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+	start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
 	return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size,
 				      dir, attrs);
 }
@@ -702,7 +742,7 @@ void swiotlb_tbl_sync_single(struct device *hwdev, phys_addr_t tlb_addr,
 			goto err_warn;
 
 		ret = phys_to_virt(paddr);
-		dev_addr = phys_to_dma(hwdev, paddr);
+		dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
 
 		/* Confirm address can be DMA'd by device */
 		if (dev_addr + size - 1 > dma_mask) {
@@ -812,10 +852,10 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	map = map_single(dev, phys, size, dir, attrs);
 	if (map == SWIOTLB_MAP_ERROR) {
 		swiotlb_full(dev, size, dir, 1);
-		return phys_to_dma(dev, io_tlb_overflow_buffer);
+		return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
 	}
 
-	dev_addr = phys_to_dma(dev, map);
+	dev_addr = swiotlb_phys_to_dma(dev, map);
 
 	/* Ensure that the address returned is DMA'ble */
 	if (dma_capable(dev, dev_addr, size))
@@ -824,7 +864,7 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
 	attrs |= DMA_ATTR_SKIP_CPU_SYNC;
 	swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
 
-	return phys_to_dma(dev, io_tlb_overflow_buffer);
+	return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
 }
 EXPORT_SYMBOL_GPL(swiotlb_map_page);
 
@@ -958,7 +998,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 				sg_dma_len(sgl) = 0;
 				return 0;
 			}
-			sg->dma_address = phys_to_dma(hwdev, map);
+			sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
 		} else
 			sg->dma_address = dev_addr;
 		sg_dma_len(sg) = sg->length;
@@ -1026,7 +1066,7 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 int
 swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
 {
-	return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
+	return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
 }
 EXPORT_SYMBOL(swiotlb_dma_mapping_error);
 
@@ -1039,6 +1079,6 @@ void swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 int
 swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-	return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+	return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
 EXPORT_SYMBOL(swiotlb_dma_supported);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (17 preceding siblings ...)
  2017-02-16 15:46 ` [RFC PATCH v4 18/28] x86: DMA support for memory encryption Tom Lendacky
@ 2017-02-16 15:46 ` Tom Lendacky
  2017-02-17 15:59   ` Konrad Rzeszutek Wilk
  2017-02-27 17:52   ` Borislav Petkov
  2017-02-16 15:46 ` [RFC PATCH v4 20/28] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
                   ` (10 subsequent siblings)
  29 siblings, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:46 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add warnings to let the user know when bounce buffers are being used for
DMA when SME is active.  Since the bounce buffers are not in encrypted
memory, these notifications are to allow the user to determine some
appropriate action - if necessary.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
 include/linux/dma-mapping.h        |   11 +++++++++++
 include/linux/mem_encrypt.h        |    6 ++++++
 lib/swiotlb.c                      |    3 +++
 4 files changed, 31 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 87e816f..5a17f1b 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,11 @@ static inline bool sme_active(void)
 	return (sme_me_mask) ? true : false;
 }
 
+static inline u64 sme_dma_mask(void)
+{
+	return ((u64)sme_me_mask << 1) - 1;
+}
+
 void __init sme_early_encrypt(resource_size_t paddr,
 			      unsigned long size);
 void __init sme_early_decrypt(resource_size_t paddr,
@@ -53,6 +58,12 @@ static inline bool sme_active(void)
 {
 	return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+	return 0ULL;
+}
+
 #endif
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 10c5a17..130bef7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -10,6 +10,7 @@
 #include <linux/scatterlist.h>
 #include <linux/kmemcheck.h>
 #include <linux/bug.h>
+#include <linux/mem_encrypt.h>
 
 /**
  * List of possible attributes associated with a DMA mapping. The semantics
@@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
 
 	if (!dev->dma_mask || !dma_supported(dev, mask))
 		return -EIO;
+
+	if (sme_active() && (mask < sme_dma_mask()))
+		dev_warn(dev,
+			 "SME is active, device will require DMA bounce buffers\n");
+
 	*dev->dma_mask = mask;
 	return 0;
 }
@@ -576,6 +582,11 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
 	if (!dma_supported(dev, mask))
 		return -EIO;
+
+	if (sme_active() && (mask < sme_dma_mask()))
+		dev_warn(dev,
+			 "SME is active, device will require DMA bounce buffers\n");
+
 	dev->coherent_dma_mask = mask;
 	return 0;
 }
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 14a7b9f..6829ff1 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -28,6 +28,12 @@ static inline bool sme_active(void)
 {
 	return false;
 }
+
+static inline u64 sme_dma_mask(void)
+{
+	return 0ULL;
+}
+
 #endif
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index c463067..aff9353 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -509,6 +509,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	if (no_iotlb_memory)
 		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
 
+	WARN_ONCE(sme_active(),
+		  "SME is active and system is using DMA bounce buffers\n");
+
 	mask = dma_get_seg_boundary(hwdev);
 
 	tbl_dma_addr &= mask;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 20/28] iommu/amd: Disable AMD IOMMU if memory encryption is active
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (18 preceding siblings ...)
  2017-02-16 15:46 ` [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
@ 2017-02-16 15:46 ` Tom Lendacky
  2017-02-16 15:46 ` [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs Tom Lendacky
                   ` (9 subsequent siblings)
  29 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:46 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

For now, disable the AMD IOMMU if memory encryption is active. A future
patch will re-enable the function with full memory encryption support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/iommu/amd_iommu_init.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6799cf9..6df2dd5 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -29,6 +29,7 @@
 #include <linux/export.h>
 #include <linux/iommu.h>
 #include <linux/kmemleak.h>
+#include <linux/mem_encrypt.h>
 #include <asm/pci-direct.h>
 #include <asm/iommu.h>
 #include <asm/gart.h>
@@ -2544,6 +2545,12 @@ int __init amd_iommu_detect(void)
 	if (amd_iommu_disabled)
 		return -ENODEV;
 
+	/* For now, disable the IOMMU if SME is active */
+	if (sme_active()) {
+		pr_notice("AMD-Vi: SME is active, disabling the IOMMU\n");
+		return -ENODEV;
+	}
+
 	ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
 	if (ret)
 		return ret;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (19 preceding siblings ...)
  2017-02-16 15:46 ` [RFC PATCH v4 20/28] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
@ 2017-02-16 15:46 ` Tom Lendacky
  2017-02-27 18:17   ` Borislav Petkov
  2017-02-16 15:47 ` [RFC PATCH v4 22/28] x86: Do not specify encrypted memory for video mappings Tom Lendacky
                   ` (8 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:46 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP, then set the SYS_CFG MSR bit to enable
memory encryption on that AP and allow the AP to continue start up.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/realmode.h      |   12 ++++++++++++
 arch/x86/realmode/init.c             |    4 ++++
 arch/x86/realmode/rm/trampoline_64.S |   17 +++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..4f7ef53 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
 #ifndef _ARCH_X86_REALMODE_H
 #define _ARCH_X86_REALMODE_H
 
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * int the CONFIG_X86_64 variant.
+ */
+#define TH_FLAGS_SME_ACTIVE_BIT		0
+#define TH_FLAGS_SME_ACTIVE		BIT(TH_FLAGS_SME_ACTIVE_BIT)
+
+#ifndef __ASSEMBLY__
+
 #include <linux/types.h>
 #include <asm/io.h>
 
@@ -38,6 +47,7 @@ struct trampoline_header {
 	u64 start;
 	u64 efer;
 	u32 cr4;
+	u32 flags;
 #endif
 };
 
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
 void set_real_mode_mem(phys_addr_t mem, size_t size);
 void reserve_real_mode(void);
 
+#endif /* __ASSEMBLY__ */
+
 #endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 21d7506..5010089 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -102,6 +102,10 @@ static void __init setup_real_mode(void)
 	trampoline_cr4_features = &trampoline_header->cr4;
 	*trampoline_cr4_features = mmu_cr4_features;
 
+	trampoline_header->flags = 0;
+	if (sme_active())
+		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
+
 	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
 	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
 	trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..a88c3d1 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
 #include <asm/msr.h>
 #include <asm/segment.h>
 #include <asm/processor-flags.h>
+#include <asm/realmode.h>
 #include "realmode.h"
 
 	.text
@@ -92,6 +93,21 @@ ENTRY(startup_32)
 	movl	%edx, %fs
 	movl	%edx, %gs
 
+	/* Check for memory encryption support */
+	bt	$TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
+	jnc	.Ldone
+	movl	$MSR_K8_SYSCFG, %ecx
+	rdmsr
+	bts	$MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+	jc	.Ldone
+
+	/*
+	 * Memory encryption is enabled but the SME enable bit for this
+	 * CPU has has not been set.  It is safe to set it, so do so.
+	 */
+	wrmsr
+.Ldone:
+
 	movl	pa_tr_cr4, %eax
 	movl	%eax, %cr4		# Enable PAE mode
 
@@ -147,6 +163,7 @@ GLOBAL(trampoline_header)
 	tr_start:		.space	8
 	GLOBAL(tr_efer)		.space	8
 	GLOBAL(tr_cr4)		.space	4
+	GLOBAL(tr_flags)	.space	4
 END(trampoline_header)
 
 #include "trampoline_common.S"

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 22/28] x86: Do not specify encrypted memory for video mappings
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (20 preceding siblings ...)
  2017-02-16 15:46 ` [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs Tom Lendacky
@ 2017-02-16 15:47 ` Tom Lendacky
  2017-02-16 15:47 ` [RFC PATCH v4 23/28] x86/kvm: Enable Secure Memory Encryption of nested page tables Tom Lendacky
                   ` (7 subsequent siblings)
  29 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:47 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Since video memory needs to be accessed decrypted, be sure that the
memory encryption mask is not set for the video ranges.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/vga.h       |   13 +++++++++++++
 drivers/gpu/drm/drm_gem.c        |    2 ++
 drivers/gpu/drm/drm_vm.c         |    4 ++++
 drivers/gpu/drm/ttm/ttm_bo_vm.c  |    7 +++++--
 drivers/gpu/drm/udl/udl_fb.c     |    4 ++++
 drivers/video/fbdev/core/fbmem.c |   12 ++++++++++++
 6 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..5c7567a 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,25 @@
 #ifndef _ASM_X86_VGA_H
 #define _ASM_X86_VGA_H
 
+#include <asm/cacheflush.h>
+
 /*
  *	On the PC, we can just recalculate addresses and then
  *	access the videoram directly without any black magic.
+ *	To support memory encryption however, we need to access
+ *	the videoram as decrypted memory.
  */
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VGA_MAP_MEM(x, s)					\
+({								\
+	unsigned long start = (unsigned long)phys_to_virt(x);	\
+	set_memory_decrypted(start, (s) >> PAGE_SHIFT);		\
+	start;							\
+})
+#else
 #define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#endif
 
 #define vga_readb(x) (*(x))
 #define vga_writeb(x, y) (*(y) = (x))
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 465bacd..f9b3be0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
 #include <linux/pagemap.h>
 #include <linux/shmem_fs.h>
 #include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
 #include <drm/drmP.h>
 #include <drm/drm_vma_manager.h>
 #include <drm/drm_gem.h>
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
 	vma->vm_ops = dev->driver->gem_vm_ops;
 	vma->vm_private_data = obj;
 	vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
 	/* Take a ref for this mapping of the object, so that the fault
 	 * handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index bd311c7..f0fc52a 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
 #include <linux/efi.h>
 #include <linux/slab.h>
 #endif
+#include <linux/mem_encrypt.h>
 #include <asm/pgtable.h>
 #include "drm_internal.h"
 #include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
 {
 	pgprot_t tmp = vm_get_page_prot(vma->vm_flags);
 
+	/* We don't want graphics memory to be mapped encrypted */
+	tmp = pgprot_decrypted(tmp);
+
 #if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
 	if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
 		tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 68ef993..09f2e73 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
 #include <linux/rbtree.h>
 #include <linux/module.h>
 #include <linux/uaccess.h>
+#include <linux/mem_encrypt.h>
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
@@ -218,9 +219,11 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	 * first page.
 	 */
 	for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
-		if (bo->mem.bus.is_iomem)
+		if (bo->mem.bus.is_iomem) {
+			/* Iomem should not be marked encrypted */
+			cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
 			pfn = ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT) + page_offset;
-		else {
+		} else {
 			page = ttm->pages[page_offset];
 			if (unlikely(!page && i == 0)) {
 				retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 167f42c..2207ec0 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
 #include <linux/slab.h>
 #include <linux/fb.h>
 #include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
@@ -169,6 +170,9 @@ static int udl_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
 	pr_notice("mmap() framebuffer addr:%lu size:%lu\n",
 		  pos, size);
 
+	/* We don't want the framebuffer to be mapped encrypted */
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
+
 	while (size > 0) {
 		page = vmalloc_to_pfn((void *)pos);
 		if (remap_pfn_range(vma, start, page, PAGE_SIZE, PAGE_SHARED))
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 76c1ad9..b895e60 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -32,6 +32,7 @@
 #include <linux/device.h>
 #include <linux/efi.h>
 #include <linux/fb.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/fb.h>
 
@@ -1405,6 +1406,12 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
 	mutex_lock(&info->mm_lock);
 	if (fb->fb_mmap) {
 		int res;
+
+		/*
+		 * The framebuffer needs to be accessed decrypted, be sure
+		 * SME protection is removed ahead of the call
+		 */
+		vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 		res = fb->fb_mmap(info, vma);
 		mutex_unlock(&info->mm_lock);
 		return res;
@@ -1430,6 +1437,11 @@ static long fb_compat_ioctl(struct file *file, unsigned int cmd,
 	mutex_unlock(&info->mm_lock);
 
 	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	/*
+	 * The framebuffer needs to be accessed decrypted, be sure
+	 * SME protection is removed
+	 */
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 	fb_pgprotect(file, vma, start);
 
 	return vm_iomap_memory(vma, start, len);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 23/28] x86/kvm: Enable Secure Memory Encryption of nested page tables
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (21 preceding siblings ...)
  2017-02-16 15:47 ` [RFC PATCH v4 22/28] x86: Do not specify encrypted memory for video mappings Tom Lendacky
@ 2017-02-16 15:47 ` Tom Lendacky
  2017-02-16 15:47 ` [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted Tom Lendacky
                   ` (6 subsequent siblings)
  29 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:47 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Update the KVM support to include the memory encryption mask when creating
and using nested page tables.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/kvm_host.h |    3 ++-
 arch/x86/kvm/mmu.c              |    8 ++++++--
 arch/x86/kvm/vmx.c              |    3 ++-
 arch/x86/kvm/x86.c              |    3 ++-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a7066dc..37326b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1050,7 +1050,8 @@ struct kvm_arch_async_pf {
 void kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask);
+		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
+		u64 me_mask);
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d8d235b..46f246c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -123,7 +123,7 @@ enum {
 					    * PT32_LEVEL_BITS))) - 1))
 
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
-			| shadow_x_mask | shadow_nx_mask)
+			| shadow_x_mask | shadow_nx_mask | shadow_me_mask)
 
 #define ACC_EXEC_MASK    1
 #define ACC_WRITE_MASK   PT_WRITABLE_MASK
@@ -178,6 +178,7 @@ struct kvm_shadow_walk_iterator {
 static u64 __read_mostly shadow_dirty_mask;
 static u64 __read_mostly shadow_mmio_mask;
 static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;
 
 static void mmu_spte_set(u64 *sptep, u64 spte);
 static void mmu_free_roots(struct kvm_vcpu *vcpu);
@@ -285,7 +286,8 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
 }
 
 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
-		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask)
+		u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
+		u64 me_mask)
 {
 	shadow_user_mask = user_mask;
 	shadow_accessed_mask = accessed_mask;
@@ -293,6 +295,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 	shadow_nx_mask = nx_mask;
 	shadow_x_mask = x_mask;
 	shadow_present_mask = p_mask;
+	shadow_me_mask = me_mask;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
 
@@ -2546,6 +2549,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 		pte_access &= ~ACC_WRITE_MASK;
 
 	spte |= (u64)pfn << PAGE_SHIFT;
+	spte |= shadow_me_mask;
 
 	if (pte_access & ACC_WRITE_MASK) {
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a236dec..fac3c27 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6703,7 +6703,8 @@ static __init int hardware_setup(void)
 			(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
 			0ull, VMX_EPT_EXECUTABLE_MASK,
 			cpu_has_vmx_ept_execute_only() ?
-				      0ull : VMX_EPT_READABLE_MASK);
+				      0ull : VMX_EPT_READABLE_MASK,
+			0ull);
 		ept_set_mmio_spte_mask();
 		kvm_enable_tdp();
 	} else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a719783..9e6a593 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -67,6 +67,7 @@
 #include <asm/pvclock.h>
 #include <asm/div64.h>
 #include <asm/irq_remapping.h>
+#include <asm/mem_encrypt.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -6027,7 +6028,7 @@ int kvm_arch_init(void *opaque)
 
 	kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
 			PT_DIRTY_MASK, PT64_NX_MASK, 0,
-			PT_PRESENT_MASK);
+			PT_PRESENT_MASK, sme_me_mask);
 	kvm_timer_init();
 
 	perf_register_guest_info_callbacks(&kvm_guest_cbs);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (22 preceding siblings ...)
  2017-02-16 15:47 ` [RFC PATCH v4 23/28] x86/kvm: Enable Secure Memory Encryption of nested page tables Tom Lendacky
@ 2017-02-16 15:47 ` Tom Lendacky
  2017-03-08  7:04   ` Dave Young
  2017-02-16 15:47 ` [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted Tom Lendacky
                   ` (5 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:47 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Use memremap() to map the setup data.  This simplifies the code and will
make the appropriate decision as to whether a RAM remapping can be done
or if a fallback to ioremap_cache() is needed (which includes checking
PageHighMem).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/kdebugfs.c |   30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index bdb83e4..c3d354d 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -48,17 +48,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
 
 	pa = node->paddr + sizeof(struct setup_data) + pos;
 	pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
-	if (PageHighMem(pg)) {
-		p = ioremap_cache(pa, count);
-		if (!p)
-			return -ENXIO;
-	} else
-		p = __va(pa);
+	p = memremap(pa, count, MEMREMAP_WB);
+	if (!p)
+		return -ENXIO;
 
 	remain = copy_to_user(user_buf, p, count);
 
-	if (PageHighMem(pg))
-		iounmap(p);
+	memunmap(p);
 
 	if (remain)
 		return -EFAULT;
@@ -127,15 +123,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 		}
 
 		pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
-		if (PageHighMem(pg)) {
-			data = ioremap_cache(pa_data, sizeof(*data));
-			if (!data) {
-				kfree(node);
-				error = -ENXIO;
-				goto err_dir;
-			}
-		} else
-			data = __va(pa_data);
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
+		if (!data) {
+			kfree(node);
+			error = -ENXIO;
+			goto err_dir;
+		}
 
 		node->paddr = pa_data;
 		node->type = data->type;
@@ -143,8 +136,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
 		error = create_setup_data_node(d, no, node);
 		pa_data = data->next;
 
-		if (PageHighMem(pg))
-			iounmap(data);
+		memunmap(data);
 		if (error)
 			goto err_dir;
 		no++;

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (23 preceding siblings ...)
  2017-02-16 15:47 ` [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted Tom Lendacky
@ 2017-02-16 15:47 ` Tom Lendacky
  2017-03-08  7:09   ` Dave Young
  2017-02-16 15:47 ` [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME Tom Lendacky
                   ` (4 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:47 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Use memremap() to map the setup data.  This will make the appropriate
decision as to whether a RAM remapping can be done or if a fallback to
ioremap_cache() is needed (similar to the setup data debugfs support).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/ksysfs.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
index 4afc67f..d653b3e 100644
--- a/arch/x86/kernel/ksysfs.c
+++ b/arch/x86/kernel/ksysfs.c
@@ -16,6 +16,7 @@
 #include <linux/stat.h>
 #include <linux/slab.h>
 #include <linux/mm.h>
+#include <linux/io.h>
 
 #include <asm/io.h>
 #include <asm/setup.h>
@@ -79,12 +80,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
 			*paddr = pa_data;
 			return 0;
 		}
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data)
 			return -ENOMEM;
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -97,17 +98,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
 	u64 pa_data = boot_params.hdr.setup_data;
 
 	while (pa_data) {
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data)
 			return -ENOMEM;
 		if (nr == i) {
 			*size = data->len;
-			iounmap(data);
+			memunmap(data);
 			return 0;
 		}
 
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 		i++;
 	}
 	return -EINVAL;
@@ -127,12 +128,12 @@ static ssize_t type_show(struct kobject *kobj,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
 	if (!data)
 		return -ENOMEM;
 
 	ret = sprintf(buf, "0x%x\n", data->type);
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -154,7 +155,7 @@ static ssize_t setup_data_data_read(struct file *fp,
 	ret = get_setup_data_paddr(nr, &paddr);
 	if (ret)
 		return ret;
-	data = ioremap_cache(paddr, sizeof(*data));
+	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
 	if (!data)
 		return -ENOMEM;
 
@@ -170,15 +171,15 @@ static ssize_t setup_data_data_read(struct file *fp,
 		goto out;
 
 	ret = count;
-	p = ioremap_cache(paddr + sizeof(*data), data->len);
+	p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
 	if (!p) {
 		ret = -ENOMEM;
 		goto out;
 	}
 	memcpy(buf, p + off, count);
-	iounmap(p);
+	memunmap(p);
 out:
-	iounmap(data);
+	memunmap(data);
 	return ret;
 }
 
@@ -250,13 +251,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
 	*nr = 0;
 	while (pa_data) {
 		*nr += 1;
-		data = ioremap_cache(pa_data, sizeof(*data));
+		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
 		if (!data) {
 			ret = -ENOMEM;
 			goto out;
 		}
 		pa_data = data->next;
-		iounmap(data);
+		memunmap(data);
 	}
 
 out:

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (24 preceding siblings ...)
  2017-02-16 15:47 ` [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted Tom Lendacky
@ 2017-02-16 15:47 ` Tom Lendacky
  2017-02-17 15:57   ` Konrad Rzeszutek Wilk
  2017-02-28 10:35   ` Borislav Petkov
  2017-02-16 15:48 ` [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place Tom Lendacky
                   ` (3 subsequent siblings)
  29 siblings, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:47 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Provide support so that kexec can be used to boot a kernel when SME is
enabled.

Support is needed to allocate pages for kexec without encryption.  This
is needed in order to be able to reboot in the kernel in the same manner
as originally booted.

Additionally, when shutting down all of the CPUs we need to be sure to
disable caches, flush the caches and then halt. This is needed when booting
from a state where SME was not active into a state where SME is active.
Without these steps, it is possible for cache lines to exist for the same
physical location but tagged both with and without the encryption bit. This
can cause random memory corruption when caches are flushed depending on
which cacheline is written last.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/include/asm/cacheflush.h    |    2 ++
 arch/x86/include/asm/init.h          |    1 +
 arch/x86/include/asm/mem_encrypt.h   |   10 ++++++++
 arch/x86/include/asm/pgtable_types.h |    1 +
 arch/x86/kernel/machine_kexec_64.c   |    3 ++
 arch/x86/kernel/process.c            |   43 +++++++++++++++++++++++++++++++++-
 arch/x86/kernel/smp.c                |    4 ++-
 arch/x86/mm/ident_map.c              |    6 +++--
 arch/x86/mm/pageattr.c               |    2 ++
 include/linux/mem_encrypt.h          |   10 ++++++++
 kernel/kexec_core.c                  |   24 +++++++++++++++++++
 11 files changed, 100 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 33ae60a..2180cd5 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -48,8 +48,10 @@
 int set_memory_rw(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
+#endif
 
 int set_memory_array_uc(unsigned long *addr, int addrinarray);
 int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 737da62..b2ec511 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -6,6 +6,7 @@ struct x86_mapping_info {
 	void *context;			 /* context for alloc_pgt_page */
 	unsigned long pmd_flag;		 /* page flag for PMD entry */
 	unsigned long offset;		 /* ident mapping offset */
+	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
 };
 
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 5a17f1b..1fd5426 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -64,6 +64,16 @@ static inline u64 sme_dma_mask(void)
 	return 0ULL;
 }
 
+static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
+{
+	return 0;
+}
+
+static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
+{
+	return 0;
+}
+
 #endif
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f00e70f..456c5cc 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -213,6 +213,7 @@ enum page_cache_mode {
 #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
 #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
 #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
 #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
 #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
 #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 307b1f4..b01648c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -76,7 +76,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
 		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
 	}
 	pte = pte_offset_kernel(pmd, vaddr);
-	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
 	return 0;
 err:
 	free_transition_pgtable(image);
@@ -104,6 +104,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 		.alloc_pgt_page	= alloc_pgt_page,
 		.context	= image,
 		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
+		.kernpg_flag	= _KERNPG_TABLE_NOENC,
 	};
 	unsigned long mstart, mend;
 	pgd_t *level4p;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3ed869c..9b01261 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -279,8 +279,43 @@ bool xen_set_default_idle(void)
 	return ret;
 }
 #endif
-void stop_this_cpu(void *dummy)
+
+static bool is_smt_thread(int cpu)
 {
+#ifdef CONFIG_SCHED_SMT
+	if (cpumask_test_cpu(smp_processor_id(), cpu_smt_mask(cpu)))
+		return true;
+#endif
+	return false;
+}
+
+void stop_this_cpu(void *data)
+{
+	atomic_t *stopping_cpu = data;
+	bool do_cache_disable = false;
+	bool do_wbinvd = false;
+
+	if (stopping_cpu) {
+		int stopping_id = atomic_read(stopping_cpu);
+		struct cpuinfo_x86 *c = &cpu_data(stopping_id);
+
+		/*
+		 * If the processor supports SME then we need to clear
+		 * out cache information before halting it because we could
+		 * be performing a kexec. With kexec, going from SME
+		 * inactive to SME active requires clearing cache entries
+		 * so that addresses without the encryption bit set don't
+		 * corrupt the same physical address that has the encryption
+		 * bit set when caches are flushed. If this is not an SMT
+		 * thread of the stopping CPU then we disable caching at this
+		 * point to keep the cache clean.
+		 */
+		if (cpu_has(c, X86_FEATURE_SME)) {
+			do_cache_disable = !is_smt_thread(stopping_id);
+			do_wbinvd = true;
+		}
+	}
+
 	local_irq_disable();
 	/*
 	 * Remove this CPU:
@@ -289,6 +324,12 @@ void stop_this_cpu(void *dummy)
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
 
+	if (do_cache_disable)
+		write_cr0(read_cr0() | X86_CR0_CD);
+
+	if (do_wbinvd)
+		wbinvd();
+
 	for (;;)
 		halt();
 }
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index d3c66a1..64b2cda 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -162,7 +162,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 	if (raw_smp_processor_id() == atomic_read(&stopping_cpu))
 		return NMI_HANDLED;
 
-	stop_this_cpu(NULL);
+	stop_this_cpu(&stopping_cpu);
 
 	return NMI_HANDLED;
 }
@@ -174,7 +174,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
 asmlinkage __visible void smp_reboot_interrupt(void)
 {
 	ipi_entering_ack_irq();
-	stop_this_cpu(NULL);
+	stop_this_cpu(&stopping_cpu);
 	irq_exit();
 }
 
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 4473cb4..3e7da84 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -20,6 +20,7 @@ static void ident_pmd_init(struct x86_mapping_info *info, pmd_t *pmd_page,
 static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 			  unsigned long addr, unsigned long end)
 {
+	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
 	unsigned long next;
 
 	for (; addr < end; addr = next) {
@@ -39,7 +40,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 		if (!pmd)
 			return -ENOMEM;
 		ident_pmd_init(info, pmd, addr, next);
-		set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+		set_pud(pud, __pud(__pa(pmd) | kernpg_flag));
 	}
 
 	return 0;
@@ -48,6 +49,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 			      unsigned long pstart, unsigned long pend)
 {
+	unsigned long kernpg_flag = info->kernpg_flag ? : _KERNPG_TABLE;
 	unsigned long addr = pstart + info->offset;
 	unsigned long end = pend + info->offset;
 	unsigned long next;
@@ -75,7 +77,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
 		result = ident_pud_init(info, pud, addr, next);
 		if (result)
 			return result;
-		set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+		set_pgd(pgd, __pgd(__pa(pud) | kernpg_flag));
 	}
 
 	return 0;
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 9710f5c..46cc89d 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1742,6 +1742,7 @@ int set_memory_4k(unsigned long addr, int numpages)
 					__pgprot(0), 1, 0, NULL);
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
 {
 	struct cpa_data cpa;
@@ -1807,6 +1808,7 @@ int set_memory_decrypted(unsigned long addr, int numpages)
 	return __set_memory_enc_dec(addr, numpages, false);
 }
 EXPORT_SYMBOL(set_memory_decrypted);
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
 
 int set_pages_uc(struct page *page, int numpages)
 {
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 6829ff1..913cf80 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -34,6 +34,16 @@ static inline u64 sme_dma_mask(void)
 	return 0ULL;
 }
 
+static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
+{
+	return 0;
+}
+
+static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
+{
+	return 0;
+}
+
 #endif
 
 #endif	/* CONFIG_AMD_MEM_ENCRYPT */
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5617cc4..ab62f41 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -38,6 +38,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/compiler.h>
 #include <linux/hugetlb.h>
+#include <linux/mem_encrypt.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -315,6 +316,18 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		count = 1 << order;
 		for (i = 0; i < count; i++)
 			SetPageReserved(pages + i);
+
+		/*
+		 * If SME is active we need to be sure that kexec pages are
+		 * not encrypted because when we boot to the new kernel the
+		 * pages won't be accessed encrypted (initially).
+		 */
+		if (sme_active()) {
+			void *vaddr = page_address(pages);
+
+			set_memory_decrypted((unsigned long)vaddr, count);
+			memset(vaddr, 0, count * PAGE_SIZE);
+		}
 	}
 
 	return pages;
@@ -326,6 +339,17 @@ static void kimage_free_pages(struct page *page)
 
 	order = page_private(page);
 	count = 1 << order;
+
+	/*
+	 * If SME is active we need to reset the pages back to being an
+	 * encrypted mapping before freeing them.
+	 */
+	if (sme_active()) {
+		void *vaddr = page_address(page);
+
+		set_memory_encrypted((unsigned long)vaddr, count);
+	}
+
 	for (i = 0; i < count; i++)
 		ClearPageReserved(page + i);
 	__free_pages(page, order);

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (25 preceding siblings ...)
  2017-02-16 15:47 ` [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME Tom Lendacky
@ 2017-02-16 15:48 ` Tom Lendacky
  2017-03-01 17:36   ` Borislav Petkov
  2017-02-16 15:48 ` [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption Tom Lendacky
                   ` (2 subsequent siblings)
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:48 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds the support to encrypt the kernel in-place. This is
done by creating new page mappings for the kernel - a decrypted
write-protected mapping and an encrypted mapping. The kernel is encyrpted
by copying the kernel through a temporary buffer.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/Makefile           |    1 
 arch/x86/kernel/mem_encrypt_boot.S |  156 +++++++++++++++++++++++++++++
 arch/x86/kernel/mem_encrypt_init.c |  191 ++++++++++++++++++++++++++++++++++++
 3 files changed, 348 insertions(+)
 create mode 100644 arch/x86/kernel/mem_encrypt_boot.S

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 33af80a..dc3ed84 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -142,4 +142,5 @@ ifeq ($(CONFIG_X86_64),y)
 	obj-y				+= vsmp_64.o
 
 	obj-y				+= mem_encrypt_init.o
+	obj-$(CONFIG_AMD_MEM_ENCRYPT)	+= mem_encrypt_boot.o
 endif
diff --git a/arch/x86/kernel/mem_encrypt_boot.S b/arch/x86/kernel/mem_encrypt_boot.S
new file mode 100644
index 0000000..58e1756
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_boot.S
@@ -0,0 +1,156 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+	.text
+	.code64
+ENTRY(sme_encrypt_execute)
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	/*
+	 * Entry parameters:
+	 *   RDI - virtual address for the encrypted kernel mapping
+	 *   RSI - virtual address for the decrypted kernel mapping
+	 *   RDX - length of kernel
+	 *   RCX - address of the encryption workarea
+	 *     - stack page (PAGE_SIZE)
+	 *     - encryption routine page (PAGE_SIZE)
+	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
+	 *    R8 - address of the pagetables to use for encryption
+	 */
+
+	/* Set up a one page stack in the non-encrypted memory area */
+	movq	%rcx, %rax
+	addq	$PAGE_SIZE, %rax
+	movq	%rsp, %rbp
+	movq	%rax, %rsp
+	push	%rbp
+
+	push	%r12
+	push	%r13
+
+	movq	%rdi, %r10
+	movq	%rsi, %r11
+	movq	%rdx, %r12
+	movq	%rcx, %r13
+
+	/* Copy encryption routine into the workarea */
+	movq	%rax, %rdi
+	leaq	.Lencrypt_start(%rip), %rsi
+	movq	$(.Lencrypt_stop - .Lencrypt_start), %rcx
+	rep	movsb
+
+	/* Setup registers for call */
+	movq	%r10, %rdi
+	movq	%r11, %rsi
+	movq	%r8, %rdx
+	movq	%r12, %rcx
+	movq	%rax, %r8
+	addq	$PAGE_SIZE, %r8
+
+	/* Call the encryption routine */
+	call	*%rax
+
+	pop	%r13
+	pop	%r12
+
+	pop	%rsp			/* Restore original stack pointer */
+.Lencrypt_exit:
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
+
+	ret
+ENDPROC(sme_encrypt_execute)
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/*
+ * Routine used to encrypt kernel.
+ *   This routine must be run outside of the kernel proper since
+ *   the kernel will be encrypted during the process. So this
+ *   routine is defined here and then copied to an area outside
+ *   of the kernel where it will remain and run decrypted
+ *   during execution.
+ *
+ *   On entry the registers must be:
+ *     RDI - virtual address for the encrypted kernel mapping
+ *     RSI - virtual address for the decrypted kernel mapping
+ *     RDX - address of the pagetables to use for encryption
+ *     RCX - length of kernel
+ *      R8 - intermediate copy buffer
+ *
+ *     RAX - points to this routine
+ *
+ * The kernel will be encrypted by copying from the non-encrypted
+ * kernel space to an intermediate buffer and then copying from the
+ * intermediate buffer back to the encrypted kernel space. The physical
+ * addresses of the two kernel space mappings are the same which
+ * results in the kernel being encrypted "in place".
+ */
+.Lencrypt_start:
+	/* Enable the new page tables */
+	mov	%rdx, %cr3
+
+	/* Flush any global TLBs */
+	mov	%cr4, %rdx
+	andq	$~X86_CR4_PGE, %rdx
+	mov	%rdx, %cr4
+	orq	$X86_CR4_PGE, %rdx
+	mov	%rdx, %cr4
+
+	/* Set the PAT register PA5 entry to write-protect */
+	push	%rcx
+	movl	$MSR_IA32_CR_PAT, %ecx
+	rdmsr
+	push	%rdx			/* Save original PAT value */
+	andl	$0xffff00ff, %edx	/* Clear PA5 */
+	orl	$0x00000500, %edx	/* Set PA5 to WP */
+	wrmsr
+	pop	%rdx			/* RDX contains original PAT value */
+	pop	%rcx
+
+	movq	%rcx, %r9		/* Save length */
+	movq	%rdi, %r10		/* Save destination address */
+	movq	%rsi, %r11		/* Save source address */
+
+	wbinvd				/* Invalidate any cache entries */
+
+	/* Copy/encrypt 2MB at a time */
+1:
+	movq	%r11, %rsi
+	movq	%r8, %rdi
+	movq	$PMD_PAGE_SIZE, %rcx
+	rep	movsb
+
+	movq	%r8, %rsi
+	movq	%r10, %rdi
+	movq	$PMD_PAGE_SIZE, %rcx
+	rep	movsb
+
+	addq	$PMD_PAGE_SIZE, %r11
+	addq	$PMD_PAGE_SIZE, %r10
+	subq	$PMD_PAGE_SIZE, %r9
+	jnz	1b
+
+	/* Restore PAT register */
+	push	%rdx
+	movl	$MSR_IA32_CR_PAT, %ecx
+	rdmsr
+	pop	%rdx
+	wrmsr
+
+	ret
+.Lencrypt_stop:
+#endif	/* CONFIG_AMD_MEM_ENCRYPT */
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 25af15d..07cbb90 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -16,9 +16,200 @@
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 #include <linux/mem_encrypt.h>
+#include <linux/mm.h>
+
+#include <asm/sections.h>
+
+extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
+				void *, pgd_t *);
+
+#define PGD_FLAGS	_KERNPG_TABLE_NOENC
+#define PUD_FLAGS	_KERNPG_TABLE_NOENC
+#define PMD_FLAGS	__PAGE_KERNEL_LARGE_EXEC
+
+static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
+				      void *vaddr, pmdval_t pmd_val)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pgd += pgd_index((unsigned long)vaddr);
+	if (pgd_none(*pgd)) {
+		pud = next_page;
+		memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
+		native_set_pgd(pgd,
+			       native_make_pgd((unsigned long)pud + PGD_FLAGS));
+		next_page += sizeof(*pud) * PTRS_PER_PUD;
+	} else {
+		pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
+	}
+
+	pud += pud_index((unsigned long)vaddr);
+	if (pud_none(*pud)) {
+		pmd = next_page;
+		memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
+		native_set_pud(pud,
+			       native_make_pud((unsigned long)pmd + PUD_FLAGS));
+		next_page += sizeof(*pmd) * PTRS_PER_PMD;
+	} else {
+		pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
+	}
+
+	pmd += pmd_index((unsigned long)vaddr);
+	if (pmd_none(*pmd) || !pmd_large(*pmd))
+		native_set_pmd(pmd, native_make_pmd(pmd_val));
+
+	return next_page;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long start,
+					     unsigned long end)
+{
+	unsigned long addr, total;
+
+	total = 0;
+	addr = start;
+	while (addr < end) {
+		unsigned long pgd_end;
+
+		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+		if (pgd_end > end)
+			pgd_end = end;
+
+		total += sizeof(pud_t) * PTRS_PER_PUD * 2;
+
+		while (addr < pgd_end) {
+			unsigned long pud_end;
+
+			pud_end = (addr & PUD_MASK) + PUD_SIZE;
+			if (pud_end > end)
+				pud_end = end;
+
+			total += sizeof(pmd_t) * PTRS_PER_PMD * 2;
+
+			addr = pud_end;
+		}
+
+		addr = pgd_end;
+	}
+	total += sizeof(pgd_t) * PTRS_PER_PGD;
+
+	return total;
+}
 
 void __init sme_encrypt_kernel(void)
 {
+	pgd_t *pgd;
+	void *workarea, *next_page, *vaddr;
+	unsigned long kern_start, kern_end, kern_len;
+	unsigned long index, paddr, pmd_flags;
+	unsigned long exec_size, full_size;
+
+	/* If SME is not active then no need to prepare */
+	if (!sme_active())
+		return;
+
+	/* Set the workarea to be after the kernel */
+	workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+
+	/*
+	 * Prepare for encrypting the kernel by building new pagetables with
+	 * the necessary attributes needed to encrypt the kernel in place.
+	 *
+	 *   One range of virtual addresses will map the memory occupied
+	 *   by the kernel as encrypted.
+	 *
+	 *   Another range of virtual addresses will map the memory occupied
+	 *   by the kernel as decrypted and write-protected.
+	 *
+	 *     The use of write-protect attribute will prevent any of the
+	 *     memory from being cached.
+	 */
+
+	/* Physical address gives us the identity mapped virtual address */
+	kern_start = __pa_symbol(_text);
+	kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;
+	kern_len = kern_end - kern_start + 1;
+
+	/*
+	 * Calculate required number of workarea bytes needed:
+	 *   executable encryption area size:
+	 *     stack page (PAGE_SIZE)
+	 *     encryption routine page (PAGE_SIZE)
+	 *     intermediate copy buffer (PMD_PAGE_SIZE)
+	 *   pagetable structures for workarea (in case not currently mapped)
+	 *   pagetable structures for the encryption of the kernel
+	 */
+	exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+
+	full_size = exec_size;
+	full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
+		     sizeof(pmd_t) * PTRS_PER_PMD;
+	full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
+
+	next_page = workarea + exec_size;
+
+	/* Make sure the current pagetables have entries for the workarea */
+	pgd = (pgd_t *)native_read_cr3();
+	paddr = (unsigned long)workarea;
+	while (paddr < (unsigned long)workarea + full_size) {
+		vaddr = (void *)paddr;
+		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+					      paddr + PMD_FLAGS);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+	native_write_cr3(native_read_cr3());
+
+	/* Calculate a PGD index to be used for the decrypted mapping */
+	index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
+	index <<= PGDIR_SHIFT;
+
+	/* Set and clear the PGD */
+	pgd = next_page;
+	memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+	next_page += sizeof(*pgd) * PTRS_PER_PGD;
+
+	/* Add encrypted (identity) mappings for the kernel */
+	pmd_flags = PMD_FLAGS | _PAGE_ENC;
+	paddr = kern_start;
+	while (paddr < kern_end) {
+		vaddr = (void *)paddr;
+		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+					      paddr + pmd_flags);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/* Add decrypted (non-identity) mappings for the kernel */
+	pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+	paddr = kern_start;
+	while (paddr < kern_end) {
+		vaddr = (void *)(paddr + index);
+		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+					      paddr + pmd_flags);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/* Add the workarea to both mappings */
+	paddr = kern_end + 1;
+	while (paddr < (kern_end + exec_size)) {
+		vaddr = (void *)paddr;
+		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+					      paddr + PMD_FLAGS);
+
+		vaddr = (void *)(paddr + index);
+		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+					      paddr + PMD_FLAGS);
+
+		paddr += PMD_PAGE_SIZE;
+	}
+
+	/* Perform the encryption */
+	sme_encrypt_execute(kern_start, kern_start + index, kern_len,
+			    workarea, pgd);
+
 }
 
 unsigned long __init sme_get_me_mask(void)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (26 preceding siblings ...)
  2017-02-16 15:48 ` [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place Tom Lendacky
@ 2017-02-16 15:48 ` Tom Lendacky
  2017-03-01 18:40   ` Borislav Petkov
  2017-02-18 18:12 ` [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Borislav Petkov
  2017-03-01  9:17 ` Dave Young
  29 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 15:48 UTC (permalink / raw)
  To: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

This patch adds the support to check if SME has been enabled and if
memory encryption should be activated (checking of command line option
based on the configuration of the default state).  If memory encryption
is to be activated, then the encryption mask is set and the kernel is
encrypted "in place."

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kernel/head_64.S          |    1 +
 arch/x86/kernel/mem_encrypt_init.c |   71 +++++++++++++++++++++++++++++++++++-
 arch/x86/mm/mem_encrypt.c          |    2 +
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index edd2f14..e6820e7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -97,6 +97,7 @@ startup_64:
 	 * Save the returned mask in %r12 for later use.
 	 */
 	push	%rsi
+	movq	%rsi, %rdi
 	call	sme_enable
 	pop	%rsi
 	movq	%rax, %r12
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 07cbb90..35c5e3d 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -19,6 +19,12 @@
 #include <linux/mm.h>
 
 #include <asm/sections.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>
+
+static char sme_cmdline_arg_on[] __initdata = "mem_encrypt=on";
+static char sme_cmdline_arg_off[] __initdata = "mem_encrypt=off";
 
 extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
 				void *, pgd_t *);
@@ -217,8 +223,71 @@ unsigned long __init sme_get_me_mask(void)
 	return sme_me_mask;
 }
 
-unsigned long __init sme_enable(void)
+unsigned long __init sme_enable(void *boot_data)
 {
+	struct boot_params *bp = boot_data;
+	unsigned int eax, ebx, ecx, edx;
+	unsigned long cmdline_ptr;
+	bool enable_if_found;
+	void *cmdline_arg;
+	u64 msr;
+
+	/* Check for an AMD processor */
+	eax = 0;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if ((ebx != 0x68747541) || (edx != 0x69746e65) || (ecx != 0x444d4163))
+		goto out;
+
+	/* Check for the SME support leaf */
+	eax = 0x80000000;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if (eax < 0x8000001f)
+		goto out;
+
+	/*
+	 * Check for the SME feature:
+	 *   CPUID Fn8000_001F[EAX] - Bit 0
+	 *     Secure Memory Encryption support
+	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
+	 *     Pagetable bit position used to indicate encryption
+	 */
+	eax = 0x8000001f;
+	ecx = 0;
+	native_cpuid(&eax, &ebx, &ecx, &edx);
+	if (!(eax & 1))
+		goto out;
+
+	/* Check if SME is enabled */
+	msr = native_read_msr(MSR_K8_SYSCFG);
+	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+		goto out;
+
+	/*
+	 * Fixups have not been to applied phys_base yet, so we must obtain
+	 * the address to the SME command line option in the following way.
+	 */
+	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT)) {
+		asm ("lea sme_cmdline_arg_off(%%rip), %0"
+		     : "=r" (cmdline_arg)
+		     : "p" (sme_cmdline_arg_off));
+		enable_if_found = false;
+	} else {
+		asm ("lea sme_cmdline_arg_on(%%rip), %0"
+		     : "=r" (cmdline_arg)
+		     : "p" (sme_cmdline_arg_on));
+		enable_if_found = true;
+	}
+
+	cmdline_ptr = bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32);
+
+	if (cmdline_find_option_bool((char *)cmdline_ptr, cmdline_arg))
+		sme_me_mask = enable_if_found ? 1UL << (ebx & 0x3f) : 0;
+	else
+		sme_me_mask = enable_if_found ? 0 : 1UL << (ebx & 0x3f);
+
+out:
 	return sme_me_mask;
 }
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a46bcf4..c5062e1 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -204,6 +204,8 @@ void __init mem_encrypt_init(void)
 
 	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
 	swiotlb_update_mem_attributes();
+
+	pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
 void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-02-16 15:42 ` [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
@ 2017-02-16 17:56   ` Borislav Petkov
  2017-02-16 19:48     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-16 17:56 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Ok, this time detailed review :-)

On Thu, Feb 16, 2017 at 09:42:11AM -0600, Tom Lendacky wrote:
> This patch adds a Documenation entry to decribe the AMD Secure Memory
> Encryption (SME) feature.

Please introduce a spellchecker into your patch creation workflow. I see
two typos in one line.

Also, never start patch commit messages with "This patch" - we know it
is this patch. Always write a doer-sentences explaining the why, not the
what. Something like:

"Add a SME and mem_encrypt= kernel parameter documentation."

for example.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |   11 ++++
>  Documentation/x86/amd-memory-encryption.txt     |   57 +++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 110745e..91c40fa 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2145,6 +2145,17 @@
>  			memory contents and reserves bad memory
>  			regions that are detected.
>  
> +	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
> +			Valid arguments: on, off
> +			Default (depends on kernel configuration option):
> +			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
> +			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
> +			mem_encrypt=on:		Activate SME
> +			mem_encrypt=off:	Do not activate SME
> +
> +			Refer to the SME documentation for details on when

"Refer to Documentation/x86/amd-memory-encryption.txt .."

> +			memory encryption can be activated.
> +
>  	mem_sleep_default=	[SUSPEND] Default system suspend mode:
>  			s2idle  - Suspend-To-Idle
>  			shallow - Power-On Suspend or equivalent (if supported)
> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
> new file mode 100644
> index 0000000..0938e89
> --- /dev/null
> +++ b/Documentation/x86/amd-memory-encryption.txt
> @@ -0,0 +1,57 @@
> +Secure Memory Encryption (SME) is a feature found on AMD processors.
> +
> +SME provides the ability to mark individual pages of memory as encrypted using
> +the standard x86 page tables.  A page that is marked encrypted will be
> +automatically decrypted when read from DRAM and encrypted when written to
> +DRAM.  SME can therefore be used to protect the contents of DRAM from physical
> +attacks on the system.
> +
> +A page is encrypted when a page table entry has the encryption bit set (see
> +below how to determine the position of the bit).  The encryption bit can be

"... how to determine its position)."

> +specified in the cr3 register, allowing the PGD table to be encrypted. Each
> +successive level of page tables can also be encrypted.
> +
> +Support for SME can be determined through the CPUID instruction. The CPUID
> +function 0x8000001f reports information related to SME:
> +
> +	0x8000001f[eax]:
> +		Bit[0] indicates support for SME
> +	0x8000001f[ebx]:
> +		Bit[5:0]  pagetable bit number used to activate memory
> +			  encryption

s/Bit/Bits/

> +		Bit[11:6] reduction in physical address space, in bits, when

Ditto.

> +			  memory encryption is enabled (this only affects system
> +			  physical addresses, not guest physical addresses)
> +
> +If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to

Let's use the kernel's define name MSR_K8_SYSCFG to avoid ambiguity.

> +determine if SME is enabled and/or to enable memory encryption:
> +
> +	0xc0010010:
> +		Bit[23]   0 = memory encryption features are disabled
> +			  1 = memory encryption features are enabled
> +
> +Linux relies on BIOS to set this bit if BIOS has determined that the reduction
> +in the physical address space as a result of enabling memory encryption (see
> +CPUID information above) will not conflict with the address space resource
> +requirements for the system.  If this bit is not set upon Linux startup then
> +Linux itself will not set it and memory encryption will not be possible.
> +
> +The state of SME in the Linux kernel can be documented as follows:
> +	- Supported:
> +	  The CPU supports SME (determined through CPUID instruction).
> +
> +	- Enabled:
> +	  Supported and bit 23 of the SYS_CFG MSR is set.

Ditto.

> +
> +	- Active:
> +	  Supported, Enabled and the Linux kernel is actively applying
> +	  the encryption bit to page table entries (the SME mask in the
> +	  kernel is non-zero).
> +
> +SME can also be enabled and activated in the BIOS. If SME is enabled and
> +activated in the BIOS, then all memory accesses will be encrypted and it will
> +not be necessary to activate the Linux memory encryption support.  If the BIOS
> +merely enables SME (sets bit 23 of the SYS_CFG MSR), then Linux can activate
> +memory encryption.

"... This is done by supplying mem_encrypt=on on the kernel command line.
Alternatively, if the kernel should enable SME by default, set
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y."

> However, if BIOS does not enable SME, then Linux will not
> +attempt to activate memory encryption, even if configured to do so by default

will not attempt or will not be able to?

> +or the mem_encrypt=on command line parameter is specified.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature
  2017-02-16 15:42 ` [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature Tom Lendacky
@ 2017-02-16 18:13   ` Borislav Petkov
  2017-02-16 19:42     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-16 18:13 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:42:36AM -0600, Tom Lendacky wrote:
> Update the CPU features to include identifying and reporting on the
> Secure Memory Encryption (SME) feature.  SME is identified by CPUID
> 0x8000001f, but requires BIOS support to enable it (set bit 23 of
> SYS_CFG MSR).  Only show the SME feature as available if reported by
> CPUID and enabled by BIOS.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/cpufeature.h        |    7 +++++--
>  arch/x86/include/asm/cpufeatures.h       |    5 ++++-
>  arch/x86/include/asm/disabled-features.h |    3 ++-
>  arch/x86/include/asm/msr-index.h         |    2 ++
>  arch/x86/include/asm/required-features.h |    3 ++-
>  arch/x86/kernel/cpu/common.c             |   19 +++++++++++++++++++
>  6 files changed, 34 insertions(+), 5 deletions(-)

What happened here?

You had it already:

https://lkml.kernel.org/r/20161110003459.3280.25796.stgit@tlendack-t1.amdoffice.net

The bit in get_cpu_cap() with checking the MSR you can add at the end of
init_amd() for example.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature
  2017-02-16 18:13   ` Borislav Petkov
@ 2017-02-16 19:42     ` Tom Lendacky
  2017-02-16 20:06       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 19:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/2017 12:13 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:42:36AM -0600, Tom Lendacky wrote:
>> Update the CPU features to include identifying and reporting on the
>> Secure Memory Encryption (SME) feature.  SME is identified by CPUID
>> 0x8000001f, but requires BIOS support to enable it (set bit 23 of
>> SYS_CFG MSR).  Only show the SME feature as available if reported by
>> CPUID and enabled by BIOS.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/cpufeature.h        |    7 +++++--
>>  arch/x86/include/asm/cpufeatures.h       |    5 ++++-
>>  arch/x86/include/asm/disabled-features.h |    3 ++-
>>  arch/x86/include/asm/msr-index.h         |    2 ++
>>  arch/x86/include/asm/required-features.h |    3 ++-
>>  arch/x86/kernel/cpu/common.c             |   19 +++++++++++++++++++
>>  6 files changed, 34 insertions(+), 5 deletions(-)
> 
> What happened here?
> 
> You had it already:
> 
> https://lkml.kernel.org/r/20161110003459.3280.25796.stgit@tlendack-t1.amdoffice.net
> 
> The bit in get_cpu_cap() with checking the MSR you can add at the end of
> init_amd() for example.

I realize it's a bit more code and expands the changes but I thought it
would be a bit clearer as to what was going on this way. And then the
follow on patch for the physical address reduction goes in nicely, too.

If you prefer I stay with the scattered feature approach and then clear
the bit based on the MSR at the end of init_amd() I can do that. I'm
not attached to either method.

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME)
  2017-02-16 17:56   ` Borislav Petkov
@ 2017-02-16 19:48     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-16 19:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/2017 11:56 AM, Borislav Petkov wrote:
> Ok, this time detailed review :-)
> 
> On Thu, Feb 16, 2017 at 09:42:11AM -0600, Tom Lendacky wrote:
>> This patch adds a Documenation entry to decribe the AMD Secure Memory
>> Encryption (SME) feature.
> 
> Please introduce a spellchecker into your patch creation workflow. I see
> two typos in one line.
> 
> Also, never start patch commit messages with "This patch" - we know it
> is this patch. Always write a doer-sentences explaining the why, not the
> what. Something like:
> 
> "Add a SME and mem_encrypt= kernel parameter documentation."
> 
> for example.

Ok, will do.

> 
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt |   11 ++++
>>  Documentation/x86/amd-memory-encryption.txt     |   57 +++++++++++++++++++++++
>>  2 files changed, 68 insertions(+)
>>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 110745e..91c40fa 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2145,6 +2145,17 @@
>>  			memory contents and reserves bad memory
>>  			regions that are detected.
>>  
>> +	mem_encrypt=	[X86-64] AMD Secure Memory Encryption (SME) control
>> +			Valid arguments: on, off
>> +			Default (depends on kernel configuration option):
>> +			  on  (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y)
>> +			  off (CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n)
>> +			mem_encrypt=on:		Activate SME
>> +			mem_encrypt=off:	Do not activate SME
>> +
>> +			Refer to the SME documentation for details on when
> 
> "Refer to Documentation/x86/amd-memory-encryption.txt .."

Ok.

> 
>> +			memory encryption can be activated.
>> +
>>  	mem_sleep_default=	[SUSPEND] Default system suspend mode:
>>  			s2idle  - Suspend-To-Idle
>>  			shallow - Power-On Suspend or equivalent (if supported)
>> diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
>> new file mode 100644
>> index 0000000..0938e89
>> --- /dev/null
>> +++ b/Documentation/x86/amd-memory-encryption.txt
>> @@ -0,0 +1,57 @@
>> +Secure Memory Encryption (SME) is a feature found on AMD processors.
>> +
>> +SME provides the ability to mark individual pages of memory as encrypted using
>> +the standard x86 page tables.  A page that is marked encrypted will be
>> +automatically decrypted when read from DRAM and encrypted when written to
>> +DRAM.  SME can therefore be used to protect the contents of DRAM from physical
>> +attacks on the system.
>> +
>> +A page is encrypted when a page table entry has the encryption bit set (see
>> +below how to determine the position of the bit).  The encryption bit can be
> 
> "... how to determine its position)."

Ok.

> 
>> +specified in the cr3 register, allowing the PGD table to be encrypted. Each
>> +successive level of page tables can also be encrypted.
>> +
>> +Support for SME can be determined through the CPUID instruction. The CPUID
>> +function 0x8000001f reports information related to SME:
>> +
>> +	0x8000001f[eax]:
>> +		Bit[0] indicates support for SME
>> +	0x8000001f[ebx]:
>> +		Bit[5:0]  pagetable bit number used to activate memory
>> +			  encryption
> 
> s/Bit/Bits/

Ok.

> 
>> +		Bit[11:6] reduction in physical address space, in bits, when
> 
> Ditto.
> 
>> +			  memory encryption is enabled (this only affects system
>> +			  physical addresses, not guest physical addresses)
>> +
>> +If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
> 
> Let's use the kernel's define name MSR_K8_SYSCFG to avoid ambiguity.

Will do.

> 
>> +determine if SME is enabled and/or to enable memory encryption:
>> +
>> +	0xc0010010:
>> +		Bit[23]   0 = memory encryption features are disabled
>> +			  1 = memory encryption features are enabled
>> +
>> +Linux relies on BIOS to set this bit if BIOS has determined that the reduction
>> +in the physical address space as a result of enabling memory encryption (see
>> +CPUID information above) will not conflict with the address space resource
>> +requirements for the system.  If this bit is not set upon Linux startup then
>> +Linux itself will not set it and memory encryption will not be possible.
>> +
>> +The state of SME in the Linux kernel can be documented as follows:
>> +	- Supported:
>> +	  The CPU supports SME (determined through CPUID instruction).
>> +
>> +	- Enabled:
>> +	  Supported and bit 23 of the SYS_CFG MSR is set.
> 
> Ditto.
> 
>> +
>> +	- Active:
>> +	  Supported, Enabled and the Linux kernel is actively applying
>> +	  the encryption bit to page table entries (the SME mask in the
>> +	  kernel is non-zero).
>> +
>> +SME can also be enabled and activated in the BIOS. If SME is enabled and
>> +activated in the BIOS, then all memory accesses will be encrypted and it will
>> +not be necessary to activate the Linux memory encryption support.  If the BIOS
>> +merely enables SME (sets bit 23 of the SYS_CFG MSR), then Linux can activate
>> +memory encryption.
> 
> "... This is done by supplying mem_encrypt=on on the kernel command line.
> Alternatively, if the kernel should enable SME by default, set
> CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y."

Yup, much clearer.

> 
>> However, if BIOS does not enable SME, then Linux will not
>> +attempt to activate memory encryption, even if configured to do so by default
> 
> will not attempt or will not be able to?

Probably closer to will not be able to right now.  I'll update that.

Thanks,
Tom

> 
>> +or the mem_encrypt=on command line parameter is specified.
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature
  2017-02-16 19:42     ` Tom Lendacky
@ 2017-02-16 20:06       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-16 20:06 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 01:42:13PM -0600, Tom Lendacky wrote:
> I realize it's a bit more code and expands the changes but I thought it
> would be a bit clearer as to what was going on this way. And then the
> follow on patch for the physical address reduction goes in nicely, too.

Well, the code from the next patch should go to AMD-specific place like
arch/x86/kernel/cpu/amd.c anyway, where you don't have to do vendor
checks.

> If you prefer I stay with the scattered feature approach and then clear
> the bit based on the MSR at the end of init_amd() I can do that. I'm
> not attached to either method.

Yes please. We should keep the shole X86_FEATURE machinery from
exploding in size. Especially if CPUID_0x8000001f is not a leaf we're
going to be adding the majority of its bits, to warrant a separate
->x86_capability array element.

  [If it does later, we can always move it to a separate element. ]

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME
  2017-02-16 15:42 ` [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME Tom Lendacky
@ 2017-02-17 11:04   ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-17 11:04 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:42:54AM -0600, Tom Lendacky wrote:
> When System Memory Encryption (SME) is enabled, the physical address
> space is reduced. Adjust the x86_phys_bits value to reflect this
> reduction.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/cpu/common.c |   10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index b33bc06..358208d7 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -771,11 +771,15 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
>  			u64 msr;
>  
>  			/*
> -			 * For SME, BIOS support is required. If BIOS has not
> -			 * enabled SME don't advertise the feature.
> +			 * For SME, BIOS support is required. If BIOS has
> +			 * enabled SME adjust x86_phys_bits by the SME
> +			 * physical address space reduction value. If BIOS
> +			 * has not enabled SME don't advertise the feature.
>  			 */
>  			rdmsrl(MSR_K8_SYSCFG, msr);
> -			if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> +			if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)
> +				c->x86_phys_bits -= (ebx >> 6) & 0x3f;
> +			else
>  				eax &= ~0x01;

Right, as I mentioned yesterday, this should go to arch/x86/kernel/cpu/amd.c

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support
  2017-02-16 15:42 ` [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support Tom Lendacky
@ 2017-02-17 11:07   ` Borislav Petkov
  2017-02-17 15:56     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-17 11:07 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:42:25AM -0600, Tom Lendacky wrote:
> For processors that support PAT, set the write-protect cache mode
> (_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
> 
> Acked-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

Just a nit:

Subject should have "x86/mm/pat: " prefix but that can be fixed when
applying.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support
  2017-02-16 15:43 ` [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support Tom Lendacky
@ 2017-02-17 12:00   ` Borislav Petkov
  2017-02-25 15:29   ` Borislav Petkov
  1 sibling, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-17 12:00 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:07AM -0600, Tom Lendacky wrote:
> Add support for Secure Memory Encryption (SME). This initial support
> provides a Kconfig entry to build the SME support into the kernel and
> defines the memory encryption mask that will be used in subsequent
> patches to mark pages as encrypted.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/Kconfig                   |   22 +++++++++++++++++++
>  arch/x86/include/asm/mem_encrypt.h |   42 ++++++++++++++++++++++++++++++++++++
>  arch/x86/mm/Makefile               |    1 +
>  arch/x86/mm/mem_encrypt.c          |   21 ++++++++++++++++++
>  include/linux/mem_encrypt.h        |   37 ++++++++++++++++++++++++++++++++
>  5 files changed, 123 insertions(+)
>  create mode 100644 arch/x86/include/asm/mem_encrypt.h
>  create mode 100644 arch/x86/mm/mem_encrypt.c
>  create mode 100644 include/linux/mem_encrypt.h
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f8fbfc5..a3b8c71 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1395,6 +1395,28 @@ config X86_DIRECT_GBPAGES
>  	  supports them), so don't confuse the user by printing
>  	  that we have them enabled.
>  
> +config AMD_MEM_ENCRYPT
> +	bool "AMD Secure Memory Encryption (SME) support"
> +	depends on X86_64 && CPU_SUP_AMD
> +	---help---
> +	  Say yes to enable support for the encryption of system memory.
> +	  This requires an AMD processor that supports Secure Memory
> +	  Encryption (SME).
> +
> +config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> +	bool "Activate AMD Secure Memory Encryption (SME) by default"
> +	default y
> +	depends on AMD_MEM_ENCRYPT
> +	---help---
> +	  Say yes to have system memory encrypted by default if running on
> +	  an AMD processor that supports Secure Memory Encryption (SME).
> +
> +	  If set to Y, then the encryption of system memory can be
> +	  deactivated with the mem_encrypt=off command line option.
> +
> +	  If set to N, then the encryption of system memory can be
> +	  activated with the mem_encrypt=on command line option.

Good.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support
  2017-02-17 11:07   ` Borislav Petkov
@ 2017-02-17 15:56     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-17 15:56 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/17/2017 5:07 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:42:25AM -0600, Tom Lendacky wrote:
>> For processors that support PAT, set the write-protect cache mode
>> (_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
>>
>> Acked-by: Borislav Petkov <bp@suse.de>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>
> Just a nit:
>
> Subject should have "x86/mm/pat: " prefix but that can be fixed when
> applying.

I'll go through the series and verify/fix the prefix for each patch.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-16 15:47 ` [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME Tom Lendacky
@ 2017-02-17 15:57   ` Konrad Rzeszutek Wilk
  2017-02-17 16:43     ` Tom Lendacky
  2017-02-28 10:35   ` Borislav Petkov
  1 sibling, 1 reply; 111+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-02-17 15:57 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.

Is the point of kexec and kdump to ehh, dump memory ? But if the
rest of the memory is encrypted you won't get much, will you?

Would it make sense to include some printk to the user if they
are setting up kdump that they won't get anything out of it?

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-16 15:46 ` [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
@ 2017-02-17 15:59   ` Konrad Rzeszutek Wilk
  2017-02-17 16:51     ` Tom Lendacky
  2017-02-27 17:52   ` Borislav Petkov
  1 sibling, 1 reply; 111+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-02-17 15:59 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:46:19AM -0600, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>  include/linux/dma-mapping.h        |   11 +++++++++++
>  include/linux/mem_encrypt.h        |    6 ++++++
>  lib/swiotlb.c                      |    3 +++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 87e816f..5a17f1b 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>  	return (sme_me_mask) ? true : false;
>  }
>  
> +static inline u64 sme_dma_mask(void)
> +{
> +	return ((u64)sme_me_mask << 1) - 1;
> +}
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
>  			      unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> @@ -53,6 +58,12 @@ static inline bool sme_active(void)
>  {
>  	return false;
>  }
> +
> +static inline u64 sme_dma_mask(void)
> +{
> +	return 0ULL;
> +}
> +
>  #endif
>  
>  static inline void __init sme_early_encrypt(resource_size_t paddr,
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 10c5a17..130bef7 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -10,6 +10,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/kmemcheck.h>
>  #include <linux/bug.h>
> +#include <linux/mem_encrypt.h>
>  
>  /**
>   * List of possible attributes associated with a DMA mapping. The semantics
> @@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>  
>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>  		return -EIO;
> +
> +	if (sme_active() && (mask < sme_dma_mask()))
> +		dev_warn(dev,
> +			 "SME is active, device will require DMA bounce buffers\n");

You can make it one line. But I am wondering if you should use
printk_ratelimit as this may fill the console up.

> +
>  	*dev->dma_mask = mask;
>  	return 0;
>  }
> @@ -576,6 +582,11 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
>  {
>  	if (!dma_supported(dev, mask))
>  		return -EIO;
> +
> +	if (sme_active() && (mask < sme_dma_mask()))
> +		dev_warn(dev,
> +			 "SME is active, device will require DMA bounce buffers\n");

Ditto.
> +
>  	dev->coherent_dma_mask = mask;
>  	return 0;
>  }
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 14a7b9f..6829ff1 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -28,6 +28,12 @@ static inline bool sme_active(void)
>  {
>  	return false;
>  }
> +
> +static inline u64 sme_dma_mask(void)
> +{
> +	return 0ULL;
> +}
> +
>  #endif
>  
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index c463067..aff9353 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -509,6 +509,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>  	if (no_iotlb_memory)
>  		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
>  
> +	WARN_ONCE(sme_active(),
> +		  "SME is active and system is using DMA bounce buffers\n");

How does that help?

As in what can the user do with this?
> +
>  	mask = dma_get_seg_boundary(hwdev);
>  
>  	tbl_dma_addr &= mask;
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-17 15:57   ` Konrad Rzeszutek Wilk
@ 2017-02-17 16:43     ` Tom Lendacky
  2017-03-01  9:25       ` Dave Young
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-17 16:43 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
>> Provide support so that kexec can be used to boot a kernel when SME is
>> enabled.
>
> Is the point of kexec and kdump to ehh, dump memory ? But if the
> rest of the memory is encrypted you won't get much, will you?

Kexec can be used to reboot a system without going back through BIOS.
So you can use kexec without using kdump.

For kdump, just taking a quick look, the option to enable memory
encryption can be provided on the crash kernel command line and then
crash kernel can would be able to copy the memory decrypted if the
pagetable is set up properly. It looks like currently ioremap_cache()
is used to map the old memory page.  That might be able to be changed
to a memremap() so that the encryption bit is set in the mapping. That
will mean that memory that is not marked encrypted (EFI tables, swiotlb
memory, etc) would not be read correctly.

>
> Would it make sense to include some printk to the user if they
> are setting up kdump that they won't get anything out of it?

Probably a good idea to add something like that.

Thanks,
Tom

>
> Thanks.
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-17 15:59   ` Konrad Rzeszutek Wilk
@ 2017-02-17 16:51     ` Tom Lendacky
  2017-03-02 17:01       ` Paolo Bonzini
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-17 16:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/17/2017 9:59 AM, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 16, 2017 at 09:46:19AM -0600, Tom Lendacky wrote:
>> Add warnings to let the user know when bounce buffers are being used for
>> DMA when SME is active.  Since the bounce buffers are not in encrypted
>> memory, these notifications are to allow the user to determine some
>> appropriate action - if necessary.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>>  include/linux/dma-mapping.h        |   11 +++++++++++
>>  include/linux/mem_encrypt.h        |    6 ++++++
>>  lib/swiotlb.c                      |    3 +++
>>  4 files changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 87e816f..5a17f1b 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>>  	return (sme_me_mask) ? true : false;
>>  }
>>
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return ((u64)sme_me_mask << 1) - 1;
>> +}
>> +
>>  void __init sme_early_encrypt(resource_size_t paddr,
>>  			      unsigned long size);
>>  void __init sme_early_decrypt(resource_size_t paddr,
>> @@ -53,6 +58,12 @@ static inline bool sme_active(void)
>>  {
>>  	return false;
>>  }
>> +
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return 0ULL;
>> +}
>> +
>>  #endif
>>
>>  static inline void __init sme_early_encrypt(resource_size_t paddr,
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 10c5a17..130bef7 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -10,6 +10,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/kmemcheck.h>
>>  #include <linux/bug.h>
>> +#include <linux/mem_encrypt.h>
>>
>>  /**
>>   * List of possible attributes associated with a DMA mapping. The semantics
>> @@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>>
>>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>>  		return -EIO;
>> +
>> +	if (sme_active() && (mask < sme_dma_mask()))
>> +		dev_warn(dev,
>> +			 "SME is active, device will require DMA bounce buffers\n");
>
> You can make it one line. But I am wondering if you should use
> printk_ratelimit as this may fill the console up.

I thought the use of dma_set_mask() was mostly a one time probe/setup
thing so I didn't think we would get that many of these messages. If
dma_set_mask() is called much more often that that I can change this
to a printk_ratelimit().  I'll look into it further.

>
>> +
>>  	*dev->dma_mask = mask;
>>  	return 0;
>>  }
>> @@ -576,6 +582,11 @@ static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
>>  {
>>  	if (!dma_supported(dev, mask))
>>  		return -EIO;
>> +
>> +	if (sme_active() && (mask < sme_dma_mask()))
>> +		dev_warn(dev,
>> +			 "SME is active, device will require DMA bounce buffers\n");
>
> Ditto.
>> +
>>  	dev->coherent_dma_mask = mask;
>>  	return 0;
>>  }
>> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
>> index 14a7b9f..6829ff1 100644
>> --- a/include/linux/mem_encrypt.h
>> +++ b/include/linux/mem_encrypt.h
>> @@ -28,6 +28,12 @@ static inline bool sme_active(void)
>>  {
>>  	return false;
>>  }
>> +
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return 0ULL;
>> +}
>> +
>>  #endif
>>
>>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> index c463067..aff9353 100644
>> --- a/lib/swiotlb.c
>> +++ b/lib/swiotlb.c
>> @@ -509,6 +509,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>  	if (no_iotlb_memory)
>>  		panic("Can not allocate SWIOTLB buffer earlier and can't now provide you with the DMA bounce buffer");
>>
>> +	WARN_ONCE(sme_active(),
>> +		  "SME is active and system is using DMA bounce buffers\n");
>
> How does that help?
>
> As in what can the user do with this?

It's meant just to notify the user about the condition. The user could
then decide to use an alternative device that supports a greater DMA
range (I can probably change it to a dev_warn_once() so that a device
is identified).  I would be nice if I could issue this message once per
device that experienced this.  I didn't see anything that would do
that, though.

Thanks,
Tom

>> +
>>  	mask = dma_get_seg_boundary(hwdev);
>>
>>  	tbl_dma_addr &= mask;
>>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (27 preceding siblings ...)
  2017-02-16 15:48 ` [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption Tom Lendacky
@ 2017-02-18 18:12 ` Borislav Petkov
  2017-02-21 15:09   ` Tom Lendacky
  2017-02-21 17:42   ` Rik van Riel
  2017-03-01  9:17 ` Dave Young
  29 siblings, 2 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-18 18:12 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:41:59AM -0600, Tom Lendacky wrote:
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
>  create mode 100644 arch/x86/include/asm/mem_encrypt.h
>  create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
>  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
>  create mode 100644 arch/x86/mm/mem_encrypt.c

I don't see anything standing in the way of merging those last two and
having a single:

arch/x86/kernel/mem_encrypt.c

with all functionality in there with ifdeffery around it so
that sme_encrypt_kernel() et all are still visible in the
!CONFIG_AMD_MEM_ENCRYPT case.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing
  2017-02-16 15:43 ` [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing Tom Lendacky
@ 2017-02-20 12:51   ` Borislav Petkov
  2017-02-21 14:55     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 12:51 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:19AM -0600, Tom Lendacky wrote:
> This patch adds support to the early boot code to use Secure Memory
> Encryption (SME).  Support is added to update the early pagetables with
> the memory encryption mask and to encrypt the kernel in place.
> 
> The routines to set the encryption mask and perform the encryption are
> stub routines for now with full function to be added in a later patch.

s/full function/functionality/

> A new file, arch/x86/kernel/mem_encrypt_init.c, is introduced to avoid
> adding #ifdefs within arch/x86/kernel/head_64.S and allow
> arch/x86/mm/mem_encrypt.c to be removed from the build if SME is not
> configured. The mem_encrypt_init.c file will contain the necessary #ifdefs
> to allow head_64.S to successfully build and call the SME routines.

That paragraph is superfluous.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/Makefile           |    2 +
>  arch/x86/kernel/head_64.S          |   46 ++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/mem_encrypt_init.c |   50 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 96 insertions(+), 2 deletions(-)
>  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
> 
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index bdcdb3b..33af80a 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -140,4 +140,6 @@ ifeq ($(CONFIG_X86_64),y)
>  
>  	obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
>  	obj-y				+= vsmp_64.o
> +
> +	obj-y				+= mem_encrypt_init.o
>  endif
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index b467b14..4f8201b 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -91,6 +91,23 @@ startup_64:
>  	jnz	bad_address
>  
>  	/*
> +	 * Enable Secure Memory Encryption (SME), if supported and enabled.
> +	 * The real_mode_data address is in %rsi and that register can be
> +	 * clobbered by the called function so be sure to save it.
> +	 * Save the returned mask in %r12 for later use.
> +	 */
> +	push	%rsi
> +	call	sme_enable
> +	pop	%rsi
> +	movq	%rax, %r12
> +
> +	/*
> +	 * Add the memory encryption mask to %rbp to include it in the page
> +	 * table fixups.
> +	 */
> +	addq	%r12, %rbp
> +
> +	/*
>  	 * Fixup the physical addresses in the page table
>  	 */
>  	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
> @@ -113,6 +130,7 @@ startup_64:
>  	shrq	$PGDIR_SHIFT, %rax
>  
>  	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
> +	addq	%r12, %rdx
>  	movq	%rdx, 0(%rbx,%rax,8)
>  	movq	%rdx, 8(%rbx,%rax,8)
>  
> @@ -129,6 +147,7 @@ startup_64:
>  	movq	%rdi, %rax
>  	shrq	$PMD_SHIFT, %rdi
>  	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
> +	addq	%r12, %rax
>  	leaq	(_end - 1)(%rip), %rcx
>  	shrq	$PMD_SHIFT, %rcx
>  	subq	%rdi, %rcx
> @@ -162,11 +181,25 @@ startup_64:
>  	cmp	%r8, %rdi
>  	jne	1b
>  
> -	/* Fixup phys_base */
> +	/*
> +	 * Fixup phys_base - remove the memory encryption mask from %rbp
> +	 * to obtain the true physical address.
> +	 */
> +	subq	%r12, %rbp
>  	addq	%rbp, phys_base(%rip)
>  
> +	/*
> +	 * Encrypt the kernel if SME is active.
> +	 * The real_mode_data address is in %rsi and that register can be
> +	 * clobbered by the called function so be sure to save it.
> +	 */
> +	push	%rsi
> +	call	sme_encrypt_kernel
> +	pop	%rsi
> +
>  .Lskip_fixup:

So if we land on this label because we can skip the fixup due to %rbp
being 0, we will skip sme_encrypt_kernel() too.

I think you need to move the .Lskip_fixup label above the
sme_encrypt_kernel call.

>  	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
> +	addq	%r12, %rax
>  	jmp 1f
>  ENTRY(secondary_startup_64)
>  	/*
> @@ -186,7 +219,16 @@ ENTRY(secondary_startup_64)
>  	/* Sanitize CPU configuration */
>  	call verify_cpu
>  
> -	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
> +	/*
> +	 * Get the SME encryption mask.
> +	 * The real_mode_data address is in %rsi and that register can be
> +	 * clobbered by the called function so be sure to save it.

You can say here that sme_get_me_mask puts the mask in %rax, that's why
we do ADD below and not MOV. I know, it is very explicit but this is
boot asm and I'd prefer for it to be absolutely clear.

> +	 */
> +	push	%rsi
> +	call	sme_get_me_mask
> +	pop	%rsi
> +
> +	addq	$(init_level4_pgt - __START_KERNEL_map), %rax
>  1:

...

> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
> +
> +void __init sme_encrypt_kernel(void)
> +{
> +}
> +
> +unsigned long __init sme_get_me_mask(void)
> +{
> +	return 0;
> +}
> +
> +unsigned long __init sme_enable(void)
> +{
> +	return 0;
> +}

Do that:

void __init sme_encrypt_kernel(void)            { }
unsigned long __init sme_get_me_mask(void)      { return 0; }
unsigned long __init sme_enable(void)           { return 0; }

to save some lines.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
@ 2017-02-20 15:21   ` Borislav Petkov
  2017-02-21 17:18     ` Tom Lendacky
  2017-02-20 18:38   ` Borislav Petkov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 15:21 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:32AM -0600, Tom Lendacky wrote:
> Adding general kernel support for memory encryption includes:
> - Modify and create some page table macros to include the Secure Memory
>   Encryption (SME) memory encryption mask

Let's not write it like some technical document: "Secure Memory
Encryption (SME) mask" is perfectly fine.

> - Modify and create some macros for calculating physical and virtual
>   memory addresses
> - Provide an SME initialization routine to update the protection map with
>   the memory encryption mask so that it is used by default
> - #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path

These bulletpoints talk about the "what" this patch does but they should
talk about the "why".

For example, it doesn't say why we're using _KERNPG_TABLE_NOENC when
building the initial pagetable and that would be an interesting piece of
information.

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/boot/compressed/pagetable.c |    7 +++++
>  arch/x86/include/asm/fixmap.h        |    7 +++++
>  arch/x86/include/asm/mem_encrypt.h   |   14 +++++++++++
>  arch/x86/include/asm/page.h          |    4 ++-
>  arch/x86/include/asm/pgtable.h       |   26 ++++++++++++++------
>  arch/x86/include/asm/pgtable_types.h |   45 ++++++++++++++++++++++------------
>  arch/x86/include/asm/processor.h     |    3 ++
>  arch/x86/kernel/espfix_64.c          |    2 +-
>  arch/x86/kernel/head64.c             |   12 ++++++++-
>  arch/x86/kernel/head_64.S            |   18 +++++++-------
>  arch/x86/mm/kasan_init_64.c          |    4 ++-
>  arch/x86/mm/mem_encrypt.c            |   20 +++++++++++++++
>  arch/x86/mm/pageattr.c               |    3 ++
>  include/asm-generic/pgtable.h        |    8 ++++++
>  14 files changed, 133 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
> index 56589d0..411c443 100644
> --- a/arch/x86/boot/compressed/pagetable.c
> +++ b/arch/x86/boot/compressed/pagetable.c
> @@ -15,6 +15,13 @@
>  #define __pa(x)  ((unsigned long)(x))
>  #define __va(x)  ((void *)((unsigned long)(x)))
>  
> +/*
> + * The pgtable.h and mm/ident_map.c includes make use of the SME related
> + * information which is not used in the compressed image support. Un-define
> + * the SME support to avoid any compile and link errors.
> + */
> +#undef CONFIG_AMD_MEM_ENCRYPT
> +
>  #include "misc.h"
>  
>  /* These actually do the work of building the kernel identity maps. */
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index 8554f96..83e91f0 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -153,6 +153,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
>  }
>  #endif
>  
> +/*
> + * Fixmap settings used with memory encryption
> + *   - FIXMAP_PAGE_NOCACHE is used for MMIO so make sure the memory
> + *     encryption mask is not part of the page attributes

Make that a regular sentence.

> + */
> +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
> +
>  #include <asm-generic/fixmap.h>
>  
>  #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index ccc53b0..547989d 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -15,6 +15,8 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include <linux/init.h>
> +
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  
>  extern unsigned long sme_me_mask;
> @@ -24,6 +26,11 @@ static inline bool sme_active(void)
>  	return (sme_me_mask) ? true : false;
>  }
>  
> +void __init sme_early_init(void);
> +
> +#define __sme_pa(x)		(__pa((x)) | sme_me_mask)
> +#define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)

Right, I know we did talk about those but in looking more into the
future, you'd have to go educate people to use the __sme_pa* variants.
Otherwise, we'd have to go and fix up code on AMD SME machines because
someone used __pa_* variants where someone should have been using the
__sma_pa_* variants.

IOW, should we simply put sme_me_mask in the actual __pa* macro
definitions?

Or are we saying that the __sme_pa* versions you have above are
the special ones and we need them only in a handful of places like
load_cr3(), for example...? And the __pa_* ones should return the
physical address without the SME mask because callers don't need it?

> +
>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>  
>  #ifndef sme_me_mask
> @@ -35,6 +42,13 @@ static inline bool sme_active(void)
>  }
>  #endif
>  
> +static inline void __init sme_early_init(void)
> +{
> +}
> +
> +#define __sme_pa		__pa
> +#define __sme_pa_nodebug	__pa_nodebug
> +
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>  
>  #endif	/* __ASSEMBLY__ */
> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
> index cf8f619..b1f7bf6 100644
> --- a/arch/x86/include/asm/page.h
> +++ b/arch/x86/include/asm/page.h
> @@ -15,6 +15,8 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include <asm/mem_encrypt.h>
> +
>  struct page;
>  
>  #include <linux/range.h>
> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>  
>  #ifndef __va
> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
> +#define __va(x)			((void *)(((unsigned long)(x) & ~sme_me_mask) + PAGE_OFFSET))

You have a bunch of places where you remove the enc mask:

	address & ~sme_me_mask

so you could do:

#define __sme_unmask(x)		((unsigned long)(x) & ~sme_me_mask)

and use it everywhere. "unmask" is what I could think of, there should
be a better, short name for it...

>  #endif
>  
>  #define __boot_va(x)		__va(x)
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 2d81161..b41caab 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -3,6 +3,7 @@

...

> @@ -563,8 +575,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>   * Currently stuck as a macro due to indirect forward reference to
>   * linux/mmzone.h's __section_mem_map_addr() definition:
>   */
> -#define pmd_page(pmd)		\
> -	pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
> +#define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
>  
>  /*
>   * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
> @@ -632,8 +643,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
>   * Currently stuck as a macro due to indirect forward reference to
>   * linux/mmzone.h's __section_mem_map_addr() definition:
>   */
> -#define pud_page(pud)		\
> -	pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
> +#define pud_page(pud)	pfn_to_page(pud_pfn(pud))
>  
>  /* Find an entry in the second-level page table.. */
>  static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
> @@ -673,7 +683,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
>   * Currently stuck as a macro due to indirect forward reference to
>   * linux/mmzone.h's __section_mem_map_addr() definition:
>   */
> -#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
> +#define pgd_page(pgd)	pfn_to_page(pgd_pfn(pgd))
>  
>  /* to find an entry in a page-table-directory. */
>  static inline unsigned long pud_index(unsigned long address)

This conversion to *_pfn() is an unrelated cleanup. Pls carve it out and
put it in the front of the patchset as a separate patch.

...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index b99d469..d71df97 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -11,6 +11,10 @@
>   */
>  
>  #include <linux/linkage.h>
> +#include <linux/init.h>
> +#include <linux/mm.h>
> +
> +extern pmdval_t early_pmd_flags;

WARNING: externs should be avoided in .c files
#476: FILE: arch/x86/mm/mem_encrypt.c:17:
+extern pmdval_t early_pmd_flags;

>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -19,3 +23,19 @@
>   */
>  unsigned long sme_me_mask __section(.data) = 0;
>  EXPORT_SYMBOL_GPL(sme_me_mask);
> +
> +void __init sme_early_init(void)
> +{
> +	unsigned int i;
> +
> +	if (!sme_me_mask)
> +		return;
> +
> +	early_pmd_flags |= sme_me_mask;
> +
> +	__supported_pte_mask |= sme_me_mask;
> +
> +	/* Update the protection map with memory encryption mask */
> +	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
> +		protection_map[i] = pgprot_encrypted(protection_map[i]);
> +}

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs
  2017-02-16 15:43 ` [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs Tom Lendacky
@ 2017-02-20 15:43   ` Borislav Petkov
  2017-02-22 15:42     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 15:43 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:48AM -0600, Tom Lendacky wrote:
> Add to the early_memremap support to be able to specify encrypted and

early_memremap()

Please append "()" to function names in your commit messages text.

> decrypted mappings with and without write-protection. The use of
> write-protection is necessary when encrypting data "in place". The
> write-protect attribute is considered cacheable for loads, but not
> stores. This implies that the hardware will never give the core a
> dirty line with this memtype.

By "hardware will never give" you mean that WP writes won't land dirty
in the cache but will go out to mem and when some other core needs them,
they will have to come from memory?

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/Kconfig                     |    4 +++
>  arch/x86/include/asm/fixmap.h        |   13 ++++++++++
>  arch/x86/include/asm/pgtable_types.h |    8 ++++++
>  arch/x86/mm/ioremap.c                |   44 ++++++++++++++++++++++++++++++++++
>  include/asm-generic/early_ioremap.h  |    2 ++
>  mm/early_ioremap.c                   |   10 ++++++++
>  6 files changed, 81 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a3b8c71..581eae4 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1417,6 +1417,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
>  	  If set to N, then the encryption of system memory can be
>  	  activated with the mem_encrypt=on command line option.
>  
> +config ARCH_USE_MEMREMAP_PROT
> +	def_bool y
> +	depends on AMD_MEM_ENCRYPT

Why do we need this?

IOW, all those helpers below will end up being defined unconditionally,
in practice. Think distro kernels. Then saving the couple of bytes is
not really worth the overhead.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory
  2017-02-16 15:43 ` [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory Tom Lendacky
@ 2017-02-20 18:22   ` Borislav Petkov
  2017-02-22 15:48     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 18:22 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:58AM -0600, Tom Lendacky wrote:
> Add support to be able to either encrypt or decrypt data in place during
> the early stages of booting the kernel. This does not change the memory
> encryption attribute - it is used for ensuring that data present in either
> an encrypted or decrypted memory area is in the proper state (for example
> the initrd will have been loaded by the boot loader and will not be
> encrypted, but the memory that it resides in is marked as encrypted).
> 
> The early_memmap support is enhanced to specify encrypted and decrypted
> mappings with and without write-protection. The use of write-protection is
> necessary when encrypting data "in place". The write-protect attribute is
> considered cacheable for loads, but not stores. This implies that the
> hardware will never give the core a dirty line with this memtype.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |   15 +++++++
>  arch/x86/mm/mem_encrypt.c          |   79 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 94 insertions(+)

...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index d71df97..ac3565c 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -14,6 +14,9 @@
>  #include <linux/init.h>
>  #include <linux/mm.h>
>  
> +#include <asm/tlbflush.h>
> +#include <asm/fixmap.h>
> +
>  extern pmdval_t early_pmd_flags;
>  
>  /*
> @@ -24,6 +27,82 @@
>  unsigned long sme_me_mask __section(.data) = 0;
>  EXPORT_SYMBOL_GPL(sme_me_mask);
>  
> +/* Buffer used for early in-place encryption by BSP, no locking needed */
> +static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +/*
> + * This routine does not change the underlying encryption setting of the
> + * page(s) that map this memory. It assumes that eventually the memory is
> + * meant to be accessed as either encrypted or decrypted but the contents
> + * are currently not in the desired stated.

				       state.

> + *
> + * This routine follows the steps outlined in the AMD64 Architecture
> + * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
> + */
> +static void __init __sme_early_enc_dec(resource_size_t paddr,
> +				       unsigned long size, bool enc)
> +{
> +	void *src, *dst;
> +	size_t len;
> +
> +	if (!sme_me_mask)
> +		return;
> +
> +	local_flush_tlb();
> +	wbinvd();
> +
> +	/*
> +	 * There are limited number of early mapping slots, so map (at most)
> +	 * one page at time.
> +	 */
> +	while (size) {
> +		len = min_t(size_t, sizeof(sme_early_buffer), size);
> +
> +		/*
> +		 * Create write protected mappings for the current format

			  write-protected

> +		 * of the memory.
> +		 */
> +		src = enc ? early_memremap_decrypted_wp(paddr, len) :
> +			    early_memremap_encrypted_wp(paddr, len);
> +
> +		/*
> +		 * Create mappings for the desired format of the memory.
> +		 */

That comment can go - you already say that in the previous one.

> +		dst = enc ? early_memremap_encrypted(paddr, len) :
> +			    early_memremap_decrypted(paddr, len);

Btw, looking at this again, it seems to me that if you write it this
way:

                if (enc) {
                        src = early_memremap_decrypted_wp(paddr, len);
                        dst = early_memremap_encrypted(paddr, len);
                } else {
                        src = early_memremap_encrypted_wp(paddr, len);
                        dst = early_memremap_decrypted(paddr, len);
                }

it might become even more readable. Anyway, just an idea - your decision
which is better.

> +
> +		/*
> +		 * If a mapping can't be obtained to perform the operation,
> +		 * then eventual access of that area will in the desired

s/will //

> +		 * mode will cause a crash.
> +		 */
> +		BUG_ON(!src || !dst);
> +
> +		/*
> +		 * Use a temporary buffer, of cache-line multiple size, to
> +		 * avoid data corruption as documented in the APM.
> +		 */
> +		memcpy(sme_early_buffer, src, len);
> +		memcpy(dst, sme_early_buffer, len);
> +
> +		early_memunmap(dst, len);
> +		early_memunmap(src, len);
> +
> +		paddr += len;
> +		size -= len;
> +	}
> +}
> +
> +void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
> +{
> +	__sme_early_enc_dec(paddr, size, true);
> +}
> +
> +void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
> +{
> +	__sme_early_enc_dec(paddr, size, false);
> +}
> +
>  void __init sme_early_init(void)
>  {
>  	unsigned int i;
> 
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
  2017-02-20 15:21   ` Borislav Petkov
@ 2017-02-20 18:38   ` Borislav Petkov
  2017-02-22 16:43     ` Tom Lendacky
  2017-02-22 18:13   ` Dave Hansen
  2017-02-22 18:13   ` Dave Hansen
  3 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 18:38 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:32AM -0600, Tom Lendacky wrote:
> Adding general kernel support for memory encryption includes:
> - Modify and create some page table macros to include the Secure Memory
>   Encryption (SME) memory encryption mask
> - Modify and create some macros for calculating physical and virtual
>   memory addresses
> - Provide an SME initialization routine to update the protection map with
>   the memory encryption mask so that it is used by default
> - #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

...

> +#define __sme_pa(x)		(__pa((x)) | sme_me_mask)
> +#define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
> +
>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>  
>  #ifndef sme_me_mask
> @@ -35,6 +42,13 @@ static inline bool sme_active(void)
>  }
>  #endif
>  
> +static inline void __init sme_early_init(void)
> +{
> +}
> +
> +#define __sme_pa		__pa
> +#define __sme_pa_nodebug	__pa_nodebug

One more thing - in the !CONFIG_AMD_MEM_ENCRYPT case, sme_me_mask is 0
so you don't need to define __sme_pa* again.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly
  2017-02-16 15:44 ` [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly Tom Lendacky
@ 2017-02-20 19:45   ` Borislav Petkov
  2017-02-22 18:34     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 19:45 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:44:11AM -0600, Tom Lendacky wrote:
> The boot data and command line data are present in memory in a decrypted
> state and are copied early in the boot process.  The early page fault
> support will map these areas as encrypted, so before attempting to copy
> them, add decrypted mappings so the data is accessed properly when copied.
> 
> For the initrd, encrypt this data in place. Since the future mapping of the
> initrd area will be mapped as encrypted the data will be accessed properly.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---

...

> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 182a4c7..03f8e74 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -46,13 +46,18 @@ static void __init reset_early_page_tables(void)
>  	write_cr3(__sme_pa_nodebug(early_level4_pgt));
>  }
>  
> +void __init __early_pgtable_flush(void)
> +{
> +	write_cr3(__sme_pa_nodebug(early_level4_pgt));
> +}

Move that to mem_encrypt.c where it is used and make it static. The diff
below, ontop of this patch, seems to build fine here.

Also, aren't those mappings global so that you need to toggle CR4.PGE
for that?

PAGE_KERNEL at least has _PAGE_GLOBAL set.

> +
>  /* Create a new PMD entry */
> -int __init early_make_pgtable(unsigned long address)
> +int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)

__early_make_pmd() then, since it creates a PMD entry.

>  	unsigned long physaddr = address - __PAGE_OFFSET;
>  	pgdval_t pgd, *pgd_p;
>  	pudval_t pud, *pud_p;
> -	pmdval_t pmd, *pmd_p;
> +	pmdval_t *pmd_p;
>  
>  	/* Invalid address or early pgt is done ?  */
>  	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))

...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ac3565c..ec548e9 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -16,8 +16,12 @@
>  
>  #include <asm/tlbflush.h>
>  #include <asm/fixmap.h>
> +#include <asm/setup.h>
> +#include <asm/bootparam.h>
>  
>  extern pmdval_t early_pmd_flags;
> +int __init __early_make_pgtable(unsigned long, pmdval_t);
> +void __init __early_pgtable_flush(void);

What's with the forward declarations?

Those should be in some header AFAICT.

>   * Since SME related variables are set early in the boot process they must
> @@ -103,6 +107,76 @@ void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
>  	__sme_early_enc_dec(paddr, size, false);
>  }

...

---
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 03f8e74c7223..c47500d72330 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -46,11 +46,6 @@ static void __init reset_early_page_tables(void)
 	write_cr3(__sme_pa_nodebug(early_level4_pgt));
 }
 
-void __init __early_pgtable_flush(void)
-{
-	write_cr3(__sme_pa_nodebug(early_level4_pgt));
-}
-
 /* Create a new PMD entry */
 int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
 {
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ec548e9a76f1..0af020b36232 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -21,7 +21,7 @@
 
 extern pmdval_t early_pmd_flags;
 int __init __early_make_pgtable(unsigned long, pmdval_t);
-void __init __early_pgtable_flush(void);
+extern pgd_t early_level4_pgt[PTRS_PER_PGD];
 
 /*
  * Since SME related variables are set early in the boot process they must
@@ -34,6 +34,11 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
+static void __init early_pgtable_flush(void)
+{
+	write_cr3(__sme_pa_nodebug(early_level4_pgt));
+}
+
 /*
  * This routine does not change the underlying encryption setting of the
  * page(s) that map this memory. It assumes that eventually the memory is
@@ -158,7 +163,7 @@ void __init sme_unmap_bootdata(char *real_mode_data)
 	 */
 	__sme_map_unmap_bootdata(real_mode_data, false);
 
-	__early_pgtable_flush();
+	early_pgtable_flush();
 }
 
 void __init sme_map_bootdata(char *real_mode_data)
@@ -174,7 +179,7 @@ void __init sme_map_bootdata(char *real_mode_data)
 	 */
 	__sme_map_unmap_bootdata(real_mode_data, true);
 
-	__early_pgtable_flush();
+	early_pgtable_flush();
 }
 
 void __init sme_early_init(void)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address
  2017-02-16 15:44 ` [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address Tom Lendacky
@ 2017-02-20 20:09   ` Borislav Petkov
  2017-02-28 22:34     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-20 20:09 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:44:30AM -0600, Tom Lendacky wrote:
> This patch adds support to return the E820 type associated with an address

s/This patch adds/Add/

> range.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/e820/api.h   |    2 ++
>  arch/x86/include/asm/e820/types.h |    2 ++
>  arch/x86/kernel/e820.c            |   26 +++++++++++++++++++++++---
>  3 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
> index 8e0f8b8..7c1bdc9 100644
> --- a/arch/x86/include/asm/e820/api.h
> +++ b/arch/x86/include/asm/e820/api.h
> @@ -38,6 +38,8 @@
>  extern void e820__reallocate_tables(void);
>  extern void e820__register_nosave_regions(unsigned long limit_pfn);
>  
> +extern enum e820_type e820__get_entry_type(u64 start, u64 end);
> +
>  /*
>   * Returns true iff the specified range [start,end) is completely contained inside
>   * the ISA region.
> diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> index 4adeed0..bf49591 100644
> --- a/arch/x86/include/asm/e820/types.h
> +++ b/arch/x86/include/asm/e820/types.h
> @@ -7,6 +7,8 @@
>   * These are the E820 types known to the kernel:
>   */
>  enum e820_type {
> +	E820_TYPE_INVALID	= 0,
> +

Now this is strange - ACPI spec doesn't explicitly say that range type 0
is invalid. Am I looking at the wrong place?

"Table 15-312 Address Range Types12" in ACPI spec 6.

If 0 is really the invalid entry, then e820_print_type() needs updating
too. And then the invalid-entry-add should be a separate patch.

>  	E820_TYPE_RAM		= 1,
>  	E820_TYPE_RESERVED	= 2,
>  	E820_TYPE_ACPI		= 3,

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types
  2017-02-16 15:44 ` [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types Tom Lendacky
@ 2017-02-21 12:05   ` Matt Fleming
  2017-02-23 17:27     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Matt Fleming @ 2017-02-21 12:05 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, 16 Feb, at 09:44:57AM, Tom Lendacky wrote:
> Update the efi_mem_type() to return EFI_RESERVED_TYPE instead of a
> hardcoded 0.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/platform/efi/efi.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index a15cf81..6407103 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -1037,7 +1037,7 @@ u32 efi_mem_type(unsigned long phys_addr)
>  	efi_memory_desc_t *md;
>  
>  	if (!efi_enabled(EFI_MEMMAP))
> -		return 0;
> +		return EFI_RESERVED_TYPE;
>  
>  	for_each_efi_memory_desc(md) {
>  		if ((md->phys_addr <= phys_addr) &&
> @@ -1045,7 +1045,7 @@ u32 efi_mem_type(unsigned long phys_addr)
>  				  (md->num_pages << EFI_PAGE_SHIFT))))
>  			return md->type;
>  	}
> -	return 0;
> +	return EFI_RESERVED_TYPE;
>  }

I see what you're getting at here, but arguably the return value in
these cases never should have been zero to begin with (your change
just makes that more obvious).

Returning EFI_RESERVED_TYPE implies an EFI memmap entry exists for
this address, which is misleading because it doesn't in the hunks
you've modified above.

Instead, could you look at returning a negative error value in the
usual way we do in the Linux kernel, and update the function prototype
to match? I don't think any callers actually require the return type
to be u32.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing
  2017-02-20 12:51   ` Borislav Petkov
@ 2017-02-21 14:55     ` Tom Lendacky
  2017-02-21 15:10       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-21 14:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 6:51 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:19AM -0600, Tom Lendacky wrote:
>> This patch adds support to the early boot code to use Secure Memory
>> Encryption (SME).  Support is added to update the early pagetables with
>> the memory encryption mask and to encrypt the kernel in place.
>>
>> The routines to set the encryption mask and perform the encryption are
>> stub routines for now with full function to be added in a later patch.
>
> s/full function/functionality/

Ok.

>
>> A new file, arch/x86/kernel/mem_encrypt_init.c, is introduced to avoid
>> adding #ifdefs within arch/x86/kernel/head_64.S and allow
>> arch/x86/mm/mem_encrypt.c to be removed from the build if SME is not
>> configured. The mem_encrypt_init.c file will contain the necessary #ifdefs
>> to allow head_64.S to successfully build and call the SME routines.
>
> That paragraph is superfluous.

I'll remove this, especially since the files will be combined now.

>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/Makefile           |    2 +
>>  arch/x86/kernel/head_64.S          |   46 ++++++++++++++++++++++++++++++++-
>>  arch/x86/kernel/mem_encrypt_init.c |   50 ++++++++++++++++++++++++++++++++++++
>>  3 files changed, 96 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
>>
>> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
>> index bdcdb3b..33af80a 100644
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -140,4 +140,6 @@ ifeq ($(CONFIG_X86_64),y)
>>
>>  	obj-$(CONFIG_PCI_MMCONFIG)	+= mmconf-fam10h_64.o
>>  	obj-y				+= vsmp_64.o
>> +
>> +	obj-y				+= mem_encrypt_init.o
>>  endif
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index b467b14..4f8201b 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -91,6 +91,23 @@ startup_64:
>>  	jnz	bad_address
>>
>>  	/*
>> +	 * Enable Secure Memory Encryption (SME), if supported and enabled.
>> +	 * The real_mode_data address is in %rsi and that register can be
>> +	 * clobbered by the called function so be sure to save it.
>> +	 * Save the returned mask in %r12 for later use.
>> +	 */
>> +	push	%rsi
>> +	call	sme_enable
>> +	pop	%rsi
>> +	movq	%rax, %r12
>> +
>> +	/*
>> +	 * Add the memory encryption mask to %rbp to include it in the page
>> +	 * table fixups.
>> +	 */
>> +	addq	%r12, %rbp
>> +
>> +	/*
>>  	 * Fixup the physical addresses in the page table
>>  	 */
>>  	addq	%rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
>> @@ -113,6 +130,7 @@ startup_64:
>>  	shrq	$PGDIR_SHIFT, %rax
>>
>>  	leaq	(PAGE_SIZE + _KERNPG_TABLE)(%rbx), %rdx
>> +	addq	%r12, %rdx
>>  	movq	%rdx, 0(%rbx,%rax,8)
>>  	movq	%rdx, 8(%rbx,%rax,8)
>>
>> @@ -129,6 +147,7 @@ startup_64:
>>  	movq	%rdi, %rax
>>  	shrq	$PMD_SHIFT, %rdi
>>  	addq	$(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
>> +	addq	%r12, %rax
>>  	leaq	(_end - 1)(%rip), %rcx
>>  	shrq	$PMD_SHIFT, %rcx
>>  	subq	%rdi, %rcx
>> @@ -162,11 +181,25 @@ startup_64:
>>  	cmp	%r8, %rdi
>>  	jne	1b
>>
>> -	/* Fixup phys_base */
>> +	/*
>> +	 * Fixup phys_base - remove the memory encryption mask from %rbp
>> +	 * to obtain the true physical address.
>> +	 */
>> +	subq	%r12, %rbp
>>  	addq	%rbp, phys_base(%rip)
>>
>> +	/*
>> +	 * Encrypt the kernel if SME is active.
>> +	 * The real_mode_data address is in %rsi and that register can be
>> +	 * clobbered by the called function so be sure to save it.
>> +	 */
>> +	push	%rsi
>> +	call	sme_encrypt_kernel
>> +	pop	%rsi
>> +
>>  .Lskip_fixup:
>
> So if we land on this label because we can skip the fixup due to %rbp
> being 0, we will skip sme_encrypt_kernel() too.
>
> I think you need to move the .Lskip_fixup label above the
> sme_encrypt_kernel call.

Actually, %rbp will have the encryption bit set in it at the time of the
check so if SME is active we won't take the jump to .Lskip_fixup.

>
>>  	movq	$(early_level4_pgt - __START_KERNEL_map), %rax
>> +	addq	%r12, %rax
>>  	jmp 1f
>>  ENTRY(secondary_startup_64)
>>  	/*
>> @@ -186,7 +219,16 @@ ENTRY(secondary_startup_64)
>>  	/* Sanitize CPU configuration */
>>  	call verify_cpu
>>
>> -	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
>> +	/*
>> +	 * Get the SME encryption mask.
>> +	 * The real_mode_data address is in %rsi and that register can be
>> +	 * clobbered by the called function so be sure to save it.
>
> You can say here that sme_get_me_mask puts the mask in %rax, that's why
> we do ADD below and not MOV. I know, it is very explicit but this is
> boot asm and I'd prefer for it to be absolutely clear.

Ok, I can be explicit on this.

>
>> +	 */
>> +	push	%rsi
>> +	call	sme_get_me_mask
>> +	pop	%rsi
>> +
>> +	addq	$(init_level4_pgt - __START_KERNEL_map), %rax
>>  1:
>
> ...
>
>> +#else	/* !CONFIG_AMD_MEM_ENCRYPT */
>> +
>> +void __init sme_encrypt_kernel(void)
>> +{
>> +}
>> +
>> +unsigned long __init sme_get_me_mask(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +unsigned long __init sme_enable(void)
>> +{
>> +	return 0;
>> +}
>
> Do that:
>
> void __init sme_encrypt_kernel(void)            { }
> unsigned long __init sme_get_me_mask(void)      { return 0; }
> unsigned long __init sme_enable(void)           { return 0; }
>
> to save some lines.

No problem.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-16 15:45 ` [RFC PATCH v4 14/28] Add support to access boot related data in the clear Tom Lendacky
@ 2017-02-21 15:06   ` Borislav Petkov
  2017-02-23 21:34     ` Tom Lendacky
  2017-03-08  6:55   ` Dave Young
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-21 15:06 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:45:09AM -0600, Tom Lendacky wrote:
> Boot data (such as EFI related data) is not encrypted when the system is
> booted and needs to be mapped decrypted.  Add support to apply the proper
> attributes to the EFI page tables and to the early_memremap and memremap
> APIs to identify the type of data being accessed so that the proper
> encryption attribute can be applied.

So this doesn't even begin to explain *why* we need this. The emphasis
being on *why*.

Lemme guess? kexec? And because of efi_reuse_config?

If so, then that whole ad-hoc caching in parse_setup_data() needs to go.
Especially if efi_reuse_config() already sees those addresses so while
we're there, we could save them somewhere or whatnot. But not doing the
whole thing again in parse_setup_data().

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/io.h      |    3 +
>  arch/x86/include/asm/setup.h   |    8 +++
>  arch/x86/kernel/setup.c        |   33 ++++++++++++
>  arch/x86/mm/ioremap.c          |  111 ++++++++++++++++++++++++++++++++++++++++
>  arch/x86/platform/efi/efi_64.c |   16 ++++--
>  kernel/memremap.c              |   11 ++++
>  mm/early_ioremap.c             |   18 +++++-
>  7 files changed, 192 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index 7afb0e2..833f7cc 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -381,4 +381,7 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
>  #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
>  #endif
>  
> +extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size);
> +#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
> +
>  #endif /* _ASM_X86_IO_H */
> diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
> index ac1d5da..99998d9 100644
> --- a/arch/x86/include/asm/setup.h
> +++ b/arch/x86/include/asm/setup.h
> @@ -63,6 +63,14 @@ static inline void x86_ce4100_early_setup(void) { }
>  #include <asm/espfix.h>
>  #include <linux/kernel.h>
>  
> +struct setup_data_attrs {
> +	u64 paddr;
> +	unsigned long size;
> +};
> +
> +extern struct setup_data_attrs setup_data_list[];
> +extern unsigned int setup_data_list_count;
> +
>  /*
>   * This is set up by the setup-routine at boot-time
>   */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bd5b9a7..d2234bf 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -148,6 +148,9 @@ int default_check_phys_apicid_present(int phys_apicid)
>  
>  struct boot_params boot_params;
>  
> +struct setup_data_attrs setup_data_list[32];
> +unsigned int setup_data_list_count;
> +
>  /*
>   * Machine setup..
>   */
> @@ -419,6 +422,32 @@ static void __init reserve_initrd(void)
>  }
>  #endif /* CONFIG_BLK_DEV_INITRD */
>  
> +static void __init update_setup_data_list(u64 pa_data, unsigned long size)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < setup_data_list_count; i++) {
> +		if (setup_data_list[i].paddr != pa_data)
> +			continue;
> +
> +		setup_data_list[i].size = size;
> +		break;
> +	}
> +}
> +
> +static void __init add_to_setup_data_list(u64 pa_data, unsigned long size)
> +{
> +	if (!sme_active())
> +		return;
> +
> +	if (!WARN(setup_data_list_count == ARRAY_SIZE(setup_data_list),
> +		  "exceeded maximum setup data list slots")) {
> +		setup_data_list[setup_data_list_count].paddr = pa_data;
> +		setup_data_list[setup_data_list_count].size = size;
> +		setup_data_list_count++;
> +	}
> +}
> +
>  static void __init parse_setup_data(void)
>  {
>  	struct setup_data *data;
> @@ -428,12 +457,16 @@ static void __init parse_setup_data(void)
>  	while (pa_data) {
>  		u32 data_len, data_type;
>  
> +		add_to_setup_data_list(pa_data, sizeof(*data));
> +
>  		data = early_memremap(pa_data, sizeof(*data));
>  		data_len = data->len + sizeof(struct setup_data);
>  		data_type = data->type;
>  		pa_next = data->next;
>  		early_memunmap(data, sizeof(*data));
>  
> +		update_setup_data_list(pa_data, data_len);
> +
>  		switch (data_type) {
>  		case SETUP_E820_EXT:
>  			e820__memory_setup_extended(pa_data, data_len);
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 2385e70..b0ff6bc 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -13,6 +13,7 @@
>  #include <linux/slab.h>
>  #include <linux/vmalloc.h>
>  #include <linux/mmiotrace.h>
> +#include <linux/efi.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/e820/api.h>
> @@ -21,6 +22,7 @@
>  #include <asm/tlbflush.h>
>  #include <asm/pgalloc.h>
>  #include <asm/pat.h>
> +#include <asm/setup.h>
>  
>  #include "physaddr.h"
>  
> @@ -419,6 +421,115 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>  	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>  }
>  
> +/*
> + * Examine the physical address to determine if it is boot data. Check
> + * it against the boot params structure and EFI tables.
> + */
> +static bool memremap_is_setup_data(resource_size_t phys_addr,
> +				   unsigned long size)
> +{
> +	unsigned int i;
> +	u64 paddr;
> +
> +	for (i = 0; i < setup_data_list_count; i++) {
> +		if (phys_addr < setup_data_list[i].paddr)
> +			continue;
> +
> +		if (phys_addr >= (setup_data_list[i].paddr +
> +				  setup_data_list[i].size))
> +			continue;
> +
> +		/* Address is within setup data range */
> +		return true;
> +	}
> +
> +	paddr = boot_params.efi_info.efi_memmap_hi;
> +	paddr <<= 32;
> +	paddr |= boot_params.efi_info.efi_memmap;
> +	if (phys_addr == paddr)
> +		return true;
> +
> +	paddr = boot_params.efi_info.efi_systab_hi;
> +	paddr <<= 32;
> +	paddr |= boot_params.efi_info.efi_systab;
> +	if (phys_addr == paddr)
> +		return true;
> +
> +	if (efi_table_address_match(phys_addr))
> +		return true;
> +
> +	return false;
> +}
> +
> +/*
> + * This function determines if an address should be mapped encrypted.
> + * Boot setup data, EFI data and E820 areas are checked in making this
> + * determination.
> + */
> +static bool memremap_should_map_encrypted(resource_size_t phys_addr,
> +					  unsigned long size)
> +{
> +	/*
> +	 * SME is not active, return true:
> +	 *   - For early_memremap_pgprot_adjust(), returning true or false
> +	 *     results in the same protection value
> +	 *   - For arch_memremap_do_ram_remap(), returning true will allow
> +	 *     the RAM remap to occur instead of falling back to ioremap()
> +	 */
> +	if (!sme_active())
> +		return true;
> +
> +	/* Check if the address is part of the setup data */
> +	if (memremap_is_setup_data(phys_addr, size))
> +		return false;
> +
> +	/* Check if the address is part of EFI boot/runtime data */
> +	switch (efi_mem_type(phys_addr)) {

arch/x86/built-in.o: In function `memremap_should_map_encrypted':
/home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:487: undefined reference to `efi_mem_type'
make: *** [vmlinux] Error 1

That's a !CONFIG_EFI .config.

> +	case EFI_BOOT_SERVICES_DATA:
> +	case EFI_RUNTIME_SERVICES_DATA:
> +		return false;
> +	default:
> +		break;
> +	}
> +
> +	/* Check if the address is outside kernel usable area */
> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
> +	case E820_TYPE_RESERVED:
> +	case E820_TYPE_ACPI:
> +	case E820_TYPE_NVS:
> +	case E820_TYPE_UNUSABLE:
> +		return false;
> +	default:
> +		break;
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Architecure function to determine if RAM remap is allowed.
> + */
> +bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
> +{
> +	return memremap_should_map_encrypted(phys_addr, size);
> +}
> +
> +/*
> + * Architecure override of __weak function to adjust the protection attributes
> + * used when remapping memory.
> + */
> +pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
> +					     unsigned long size,
> +					     pgprot_t prot)
> +{
> +	if (memremap_should_map_encrypted(phys_addr, size))
> +		prot = pgprot_encrypted(prot);
> +	else
> +		prot = pgprot_decrypted(prot);
> +
> +	return prot;
> +}
> +
>  #ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
>  /* Remap memory with encryption */
>  void __init *early_memremap_encrypted(resource_size_t phys_addr,
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 2ee7694..2d8674d 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -243,7 +243,7 @@ void efi_sync_low_kernel_mappings(void)
>  
>  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  {
> -	unsigned long pfn, text;
> +	unsigned long pfn, text, pf;
>  	struct page *page;
>  	unsigned npages;
>  	pgd_t *pgd;
> @@ -251,7 +251,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  	if (efi_enabled(EFI_OLD_MEMMAP))
>  		return 0;
>  
> -	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
> +	/*
> +	 * Since the PGD is encrypted, set the encryption mask so that when
> +	 * this value is loaded into cr3 the PGD will be decrypted during
> +	 * the pagetable walk.
> +	 */
> +	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
> +
>  	pgd = efi_pgd;
>  
>  	/*
> @@ -261,7 +267,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  	 * phys_efi_set_virtual_address_map().
>  	 */
>  	pfn = pa_memmap >> PAGE_SHIFT;
> -	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
> +	pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
> +	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
>  		pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
>  		return 1;
>  	}
> @@ -304,7 +311,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>  	text = __pa(_text);
>  	pfn = text >> PAGE_SHIFT;
>  
> -	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
> +	pf = _PAGE_RW | _PAGE_ENC;
> +	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
>  		pr_err("Failed to map kernel text 1:1\n");
>  		return 1;
>  	}

Those changes should be in a separate patch IMHO.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-02-18 18:12 ` [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Borislav Petkov
@ 2017-02-21 15:09   ` Tom Lendacky
  2017-02-21 17:42   ` Rik van Riel
  1 sibling, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-21 15:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/18/2017 12:12 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:41:59AM -0600, Tom Lendacky wrote:
>>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
>>  create mode 100644 arch/x86/include/asm/mem_encrypt.h
>>  create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
>>  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
>>  create mode 100644 arch/x86/mm/mem_encrypt.c
>
> I don't see anything standing in the way of merging those last two and
> having a single:
>
> arch/x86/kernel/mem_encrypt.c
>
> with all functionality in there with ifdeffery around it so
> that sme_encrypt_kernel() et all are still visible in the
> !CONFIG_AMD_MEM_ENCRYPT case.

Sounds good. I'll combine those two files.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing
  2017-02-21 14:55     ` Tom Lendacky
@ 2017-02-21 15:10       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-21 15:10 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 21, 2017 at 08:55:30AM -0600, Tom Lendacky wrote:
> Actually, %rbp will have the encryption bit set in it at the time of the
> check so if SME is active we won't take the jump to .Lskip_fixup.

Ha, I didn't think of that! Do you see now what I mean with being
explicit in the asm boot code? :-)

Please note that in the comment above sme_encrypt_kernel().

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-20 15:21   ` Borislav Petkov
@ 2017-02-21 17:18     ` Tom Lendacky
  2017-02-22 12:08       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-21 17:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 9:21 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:32AM -0600, Tom Lendacky wrote:
>> Adding general kernel support for memory encryption includes:
>> - Modify and create some page table macros to include the Secure Memory
>>   Encryption (SME) memory encryption mask
>
> Let's not write it like some technical document: "Secure Memory
> Encryption (SME) mask" is perfectly fine.

Ok.

>
>> - Modify and create some macros for calculating physical and virtual
>>   memory addresses
>> - Provide an SME initialization routine to update the protection map with
>>   the memory encryption mask so that it is used by default
>> - #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path
>
> These bulletpoints talk about the "what" this patch does but they should
> talk about the "why".
>
> For example, it doesn't say why we're using _KERNPG_TABLE_NOENC when
> building the initial pagetable and that would be an interesting piece of
> information.

I'll work on re-wording this to give a better understanding of the
patch changes.

>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/boot/compressed/pagetable.c |    7 +++++
>>  arch/x86/include/asm/fixmap.h        |    7 +++++
>>  arch/x86/include/asm/mem_encrypt.h   |   14 +++++++++++
>>  arch/x86/include/asm/page.h          |    4 ++-
>>  arch/x86/include/asm/pgtable.h       |   26 ++++++++++++++------
>>  arch/x86/include/asm/pgtable_types.h |   45 ++++++++++++++++++++++------------
>>  arch/x86/include/asm/processor.h     |    3 ++
>>  arch/x86/kernel/espfix_64.c          |    2 +-
>>  arch/x86/kernel/head64.c             |   12 ++++++++-
>>  arch/x86/kernel/head_64.S            |   18 +++++++-------
>>  arch/x86/mm/kasan_init_64.c          |    4 ++-
>>  arch/x86/mm/mem_encrypt.c            |   20 +++++++++++++++
>>  arch/x86/mm/pageattr.c               |    3 ++
>>  include/asm-generic/pgtable.h        |    8 ++++++
>>  14 files changed, 133 insertions(+), 40 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
>> index 56589d0..411c443 100644
>> --- a/arch/x86/boot/compressed/pagetable.c
>> +++ b/arch/x86/boot/compressed/pagetable.c
>> @@ -15,6 +15,13 @@
>>  #define __pa(x)  ((unsigned long)(x))
>>  #define __va(x)  ((void *)((unsigned long)(x)))
>>
>> +/*
>> + * The pgtable.h and mm/ident_map.c includes make use of the SME related
>> + * information which is not used in the compressed image support. Un-define
>> + * the SME support to avoid any compile and link errors.
>> + */
>> +#undef CONFIG_AMD_MEM_ENCRYPT
>> +
>>  #include "misc.h"
>>
>>  /* These actually do the work of building the kernel identity maps. */
>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>> index 8554f96..83e91f0 100644
>> --- a/arch/x86/include/asm/fixmap.h
>> +++ b/arch/x86/include/asm/fixmap.h
>> @@ -153,6 +153,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
>>  }
>>  #endif
>>
>> +/*
>> + * Fixmap settings used with memory encryption
>> + *   - FIXMAP_PAGE_NOCACHE is used for MMIO so make sure the memory
>> + *     encryption mask is not part of the page attributes
>
> Make that a regular sentence.

Ok.

>
>> + */
>> +#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
>> +
>>  #include <asm-generic/fixmap.h>
>>
>>  #define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index ccc53b0..547989d 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -15,6 +15,8 @@
>>
>>  #ifndef __ASSEMBLY__
>>
>> +#include <linux/init.h>
>> +
>>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>>
>>  extern unsigned long sme_me_mask;
>> @@ -24,6 +26,11 @@ static inline bool sme_active(void)
>>  	return (sme_me_mask) ? true : false;
>>  }
>>
>> +void __init sme_early_init(void);
>> +
>> +#define __sme_pa(x)		(__pa((x)) | sme_me_mask)
>> +#define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
>
> Right, I know we did talk about those but in looking more into the
> future, you'd have to go educate people to use the __sme_pa* variants.
> Otherwise, we'd have to go and fix up code on AMD SME machines because
> someone used __pa_* variants where someone should have been using the
> __sma_pa_* variants.
>
> IOW, should we simply put sme_me_mask in the actual __pa* macro
> definitions?
>
> Or are we saying that the __sme_pa* versions you have above are
> the special ones and we need them only in a handful of places like
> load_cr3(), for example...? And the __pa_* ones should return the
> physical address without the SME mask because callers don't need it?

It's the latter.  It's really only used for working with values that
will either be written to or read from cr3.  I'll add some comments
around the macros as well as expand on it in the commit message.

>
>> +
>>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>>
>>  #ifndef sme_me_mask
>> @@ -35,6 +42,13 @@ static inline bool sme_active(void)
>>  }
>>  #endif
>>
>> +static inline void __init sme_early_init(void)
>> +{
>> +}
>> +
>> +#define __sme_pa		__pa
>> +#define __sme_pa_nodebug	__pa_nodebug
>> +
>>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>>
>>  #endif	/* __ASSEMBLY__ */
>> diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
>> index cf8f619..b1f7bf6 100644
>> --- a/arch/x86/include/asm/page.h
>> +++ b/arch/x86/include/asm/page.h
>> @@ -15,6 +15,8 @@
>>
>>  #ifndef __ASSEMBLY__
>>
>> +#include <asm/mem_encrypt.h>
>> +
>>  struct page;
>>
>>  #include <linux/range.h>
>> @@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
>>  	__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
>>
>>  #ifndef __va
>> -#define __va(x)			((void *)((unsigned long)(x)+PAGE_OFFSET))
>> +#define __va(x)			((void *)(((unsigned long)(x) & ~sme_me_mask) + PAGE_OFFSET))
>
> You have a bunch of places where you remove the enc mask:
>
> 	address & ~sme_me_mask
>
> so you could do:
>
> #define __sme_unmask(x)		((unsigned long)(x) & ~sme_me_mask)
>
> and use it everywhere. "unmask" is what I could think of, there should
> be a better, short name for it...
>

Ok, I'll try and come up with something...  maybe __sme_rm or
__sme_clear (__sme_clr).

>>  #endif
>>
>>  #define __boot_va(x)		__va(x)
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index 2d81161..b41caab 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -3,6 +3,7 @@
>
> ...
>
>> @@ -563,8 +575,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
>>   * Currently stuck as a macro due to indirect forward reference to
>>   * linux/mmzone.h's __section_mem_map_addr() definition:
>>   */
>> -#define pmd_page(pmd)		\
>> -	pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
>> +#define pmd_page(pmd)	pfn_to_page(pmd_pfn(pmd))
>>
>>  /*
>>   * the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
>> @@ -632,8 +643,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
>>   * Currently stuck as a macro due to indirect forward reference to
>>   * linux/mmzone.h's __section_mem_map_addr() definition:
>>   */
>> -#define pud_page(pud)		\
>> -	pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
>> +#define pud_page(pud)	pfn_to_page(pud_pfn(pud))
>>
>>  /* Find an entry in the second-level page table.. */
>>  static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
>> @@ -673,7 +683,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
>>   * Currently stuck as a macro due to indirect forward reference to
>>   * linux/mmzone.h's __section_mem_map_addr() definition:
>>   */
>> -#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
>> +#define pgd_page(pgd)	pfn_to_page(pgd_pfn(pgd))
>>
>>  /* to find an entry in a page-table-directory. */
>>  static inline unsigned long pud_index(unsigned long address)
>
> This conversion to *_pfn() is an unrelated cleanup. Pls carve it out and
> put it in the front of the patchset as a separate patch.

Will do.

>
> ...
>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index b99d469..d71df97 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -11,6 +11,10 @@
>>   */
>>
>>  #include <linux/linkage.h>
>> +#include <linux/init.h>
>> +#include <linux/mm.h>
>> +
>> +extern pmdval_t early_pmd_flags;
>
> WARNING: externs should be avoided in .c files
> #476: FILE: arch/x86/mm/mem_encrypt.c:17:
> +extern pmdval_t early_pmd_flags;

I'll add early_pmd_flags to include/asm/pgtable.h file and remove
the extern reference.

Thanks,
Tom

>
>>  /*
>>   * Since SME related variables are set early in the boot process they must
>> @@ -19,3 +23,19 @@
>>   */
>>  unsigned long sme_me_mask __section(.data) = 0;
>>  EXPORT_SYMBOL_GPL(sme_me_mask);
>> +
>> +void __init sme_early_init(void)
>> +{
>> +	unsigned int i;
>> +
>> +	if (!sme_me_mask)
>> +		return;
>> +
>> +	early_pmd_flags |= sme_me_mask;
>> +
>> +	__supported_pte_mask |= sme_me_mask;
>> +
>> +	/* Update the protection map with memory encryption mask */
>> +	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
>> +		protection_map[i] = pgprot_encrypted(protection_map[i]);
>> +}
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-02-18 18:12 ` [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Borislav Petkov
  2017-02-21 15:09   ` Tom Lendacky
@ 2017-02-21 17:42   ` Rik van Riel
  2017-02-21 17:53     ` Borislav Petkov
  1 sibling, 1 reply; 111+ messages in thread
From: Rik van Riel @ 2017-02-21 17:42 UTC (permalink / raw)
  To: Borislav Petkov, Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

[-- Attachment #1: Type: text/plain, Size: 648 bytes --]

On Sat, 2017-02-18 at 19:12 +0100, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:41:59AM -0600, Tom Lendacky wrote:
> > 
> >  create mode 100644 Documentation/x86/amd-memory-encryption.txt
> >  create mode 100644 arch/x86/include/asm/mem_encrypt.h
> >  create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
> >  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
> >  create mode 100644 arch/x86/mm/mem_encrypt.c
> I don't see anything standing in the way of merging those last two
> and
> having a single:
> 
> arch/x86/kernel/mem_encrypt.c

Do we want that in kernel/ or in arch/x86/mm/ ?

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-02-21 17:42   ` Rik van Riel
@ 2017-02-21 17:53     ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-21 17:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86,
	linux-kernel, kasan-dev, linux-mm, iommu,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 21, 2017 at 12:42:45PM -0500, Rik van Riel wrote:
> Do we want that in kernel/ or in arch/x86/mm/ ?

If you'd ask me, I don't have a strong preference. It is a pile of
functionality which is part of the SME feature and as such, it is closer
to the CPU. So arch/x86/cpu/sme.c or so.

But then it is mm-related in a way as it is RAM encryption...

Meh, ask me something easier :-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-21 17:18     ` Tom Lendacky
@ 2017-02-22 12:08       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-22 12:08 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 21, 2017 at 11:18:08AM -0600, Tom Lendacky wrote:
> It's the latter.  It's really only used for working with values that
> will either be written to or read from cr3.  I'll add some comments
> around the macros as well as expand on it in the commit message.

Ok, that makes sense. Normally we will have the mask in the lower levels
of the pagetable hierarchy but we need to add it to the CR3 value by
hand. Yap.

> Ok, I'll try and come up with something...  maybe __sme_rm or
> __sme_clear (__sme_clr).

__sme_clr looks nice to me :)

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs
  2017-02-20 15:43   ` Borislav Petkov
@ 2017-02-22 15:42     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-22 15:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 9:43 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:48AM -0600, Tom Lendacky wrote:
>> Add to the early_memremap support to be able to specify encrypted and
>
> early_memremap()
>
> Please append "()" to function names in your commit messages text.
>
>> decrypted mappings with and without write-protection. The use of
>> write-protection is necessary when encrypting data "in place". The
>> write-protect attribute is considered cacheable for loads, but not
>> stores. This implies that the hardware will never give the core a
>> dirty line with this memtype.
>
> By "hardware will never give" you mean that WP writes won't land dirty
> in the cache but will go out to mem and when some other core needs them,
> they will have to come from memory?

I think this best explains it, from Table 7-8 of the APM Vol 2:

"Reads allocate cache lines on a cache miss, but only to the shared
state. All writes update main memory. Cache lines are not allocated
on a write miss. Write hits invalidate the cache line and update
main memory."

We're early enough that only the BSP is running and we don't have
to worry about accesses from other cores.  If this was to be used
outside of early boot processing, then some safeties might have to
be added.

>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/Kconfig                     |    4 +++
>>  arch/x86/include/asm/fixmap.h        |   13 ++++++++++
>>  arch/x86/include/asm/pgtable_types.h |    8 ++++++
>>  arch/x86/mm/ioremap.c                |   44 ++++++++++++++++++++++++++++++++++
>>  include/asm-generic/early_ioremap.h  |    2 ++
>>  mm/early_ioremap.c                   |   10 ++++++++
>>  6 files changed, 81 insertions(+)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index a3b8c71..581eae4 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1417,6 +1417,10 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
>>  	  If set to N, then the encryption of system memory can be
>>  	  activated with the mem_encrypt=on command line option.
>>
>> +config ARCH_USE_MEMREMAP_PROT
>> +	def_bool y
>> +	depends on AMD_MEM_ENCRYPT
>
> Why do we need this?
>
> IOW, all those helpers below will end up being defined unconditionally,
> in practice. Think distro kernels. Then saving the couple of bytes is
> not really worth the overhead.

I added this because some other architectures use a u64 for the
protection value instead of an unsigned long (i386 for one) and it
was causing build errors/warnings on those archs. And trying to bring
in the header to use pgprot_t instead of an unsigned long caused a ton
of build issues. This seemed to be the simplest and least intrusive way
to approach the issue.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory
  2017-02-20 18:22   ` Borislav Petkov
@ 2017-02-22 15:48     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-22 15:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 12:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:58AM -0600, Tom Lendacky wrote:
>> Add support to be able to either encrypt or decrypt data in place during
>> the early stages of booting the kernel. This does not change the memory
>> encryption attribute - it is used for ensuring that data present in either
>> an encrypted or decrypted memory area is in the proper state (for example
>> the initrd will have been loaded by the boot loader and will not be
>> encrypted, but the memory that it resides in is marked as encrypted).
>>
>> The early_memmap support is enhanced to specify encrypted and decrypted
>> mappings with and without write-protection. The use of write-protection is
>> necessary when encrypting data "in place". The write-protect attribute is
>> considered cacheable for loads, but not stores. This implies that the
>> hardware will never give the core a dirty line with this memtype.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |   15 +++++++
>>  arch/x86/mm/mem_encrypt.c          |   79 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 94 insertions(+)
>
> ...
>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index d71df97..ac3565c 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -14,6 +14,9 @@
>>  #include <linux/init.h>
>>  #include <linux/mm.h>
>>
>> +#include <asm/tlbflush.h>
>> +#include <asm/fixmap.h>
>> +
>>  extern pmdval_t early_pmd_flags;
>>
>>  /*
>> @@ -24,6 +27,82 @@
>>  unsigned long sme_me_mask __section(.data) = 0;
>>  EXPORT_SYMBOL_GPL(sme_me_mask);
>>
>> +/* Buffer used for early in-place encryption by BSP, no locking needed */
>> +static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
>> +
>> +/*
>> + * This routine does not change the underlying encryption setting of the
>> + * page(s) that map this memory. It assumes that eventually the memory is
>> + * meant to be accessed as either encrypted or decrypted but the contents
>> + * are currently not in the desired stated.
>
> 				       state.

Will fix.

>
>> + *
>> + * This routine follows the steps outlined in the AMD64 Architecture
>> + * Programmer's Manual Volume 2, Section 7.10.8 Encrypt-in-Place.
>> + */
>> +static void __init __sme_early_enc_dec(resource_size_t paddr,
>> +				       unsigned long size, bool enc)
>> +{
>> +	void *src, *dst;
>> +	size_t len;
>> +
>> +	if (!sme_me_mask)
>> +		return;
>> +
>> +	local_flush_tlb();
>> +	wbinvd();
>> +
>> +	/*
>> +	 * There are limited number of early mapping slots, so map (at most)
>> +	 * one page at time.
>> +	 */
>> +	while (size) {
>> +		len = min_t(size_t, sizeof(sme_early_buffer), size);
>> +
>> +		/*
>> +		 * Create write protected mappings for the current format
>
> 			  write-protected

Ok.

>
>> +		 * of the memory.
>> +		 */
>> +		src = enc ? early_memremap_decrypted_wp(paddr, len) :
>> +			    early_memremap_encrypted_wp(paddr, len);
>> +
>> +		/*
>> +		 * Create mappings for the desired format of the memory.
>> +		 */
>
> That comment can go - you already say that in the previous one.

Ok.

>
>> +		dst = enc ? early_memremap_encrypted(paddr, len) :
>> +			    early_memremap_decrypted(paddr, len);
>
> Btw, looking at this again, it seems to me that if you write it this
> way:
>
>                 if (enc) {
>                         src = early_memremap_decrypted_wp(paddr, len);
>                         dst = early_memremap_encrypted(paddr, len);
>                 } else {
>                         src = early_memremap_encrypted_wp(paddr, len);
>                         dst = early_memremap_decrypted(paddr, len);
>                 }
>
> it might become even more readable. Anyway, just an idea - your decision
> which is better.

I go back and forth on that one, too.  Not sure what I'll do, I guess it
will depend on my mood :).

>
>> +
>> +		/*
>> +		 * If a mapping can't be obtained to perform the operation,
>> +		 * then eventual access of that area will in the desired
>
> s/will //

Yup.

Thanks,
Tom

>
>> +		 * mode will cause a crash.
>> +		 */
>> +		BUG_ON(!src || !dst);
>> +
>> +		/*
>> +		 * Use a temporary buffer, of cache-line multiple size, to
>> +		 * avoid data corruption as documented in the APM.
>> +		 */
>> +		memcpy(sme_early_buffer, src, len);
>> +		memcpy(dst, sme_early_buffer, len);
>> +
>> +		early_memunmap(dst, len);
>> +		early_memunmap(src, len);
>> +
>> +		paddr += len;
>> +		size -= len;
>> +	}
>> +}
>> +
>> +void __init sme_early_encrypt(resource_size_t paddr, unsigned long size)
>> +{
>> +	__sme_early_enc_dec(paddr, size, true);
>> +}
>> +
>> +void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
>> +{
>> +	__sme_early_enc_dec(paddr, size, false);
>> +}
>> +
>>  void __init sme_early_init(void)
>>  {
>>  	unsigned int i;
>>
>>
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-20 18:38   ` Borislav Petkov
@ 2017-02-22 16:43     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-22 16:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 12:38 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:32AM -0600, Tom Lendacky wrote:
>> Adding general kernel support for memory encryption includes:
>> - Modify and create some page table macros to include the Secure Memory
>>   Encryption (SME) memory encryption mask
>> - Modify and create some macros for calculating physical and virtual
>>   memory addresses
>> - Provide an SME initialization routine to update the protection map with
>>   the memory encryption mask so that it is used by default
>> - #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>
> ...
>
>> +#define __sme_pa(x)		(__pa((x)) | sme_me_mask)
>> +#define __sme_pa_nodebug(x)	(__pa_nodebug((x)) | sme_me_mask)
>> +
>>  #else	/* !CONFIG_AMD_MEM_ENCRYPT */
>>
>>  #ifndef sme_me_mask
>> @@ -35,6 +42,13 @@ static inline bool sme_active(void)
>>  }
>>  #endif
>>
>> +static inline void __init sme_early_init(void)
>> +{
>> +}
>> +
>> +#define __sme_pa		__pa
>> +#define __sme_pa_nodebug	__pa_nodebug
>
> One more thing - in the !CONFIG_AMD_MEM_ENCRYPT case, sme_me_mask is 0
> so you don't need to define __sme_pa* again.

Makes sense.  I'll move those macros outside the #ifdef (I'll do the
same for the new __sme_clr() and __sme_set() macros, too).

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
  2017-02-20 15:21   ` Borislav Petkov
  2017-02-20 18:38   ` Borislav Petkov
@ 2017-02-22 18:13   ` Dave Hansen
  2017-02-23 23:12     ` Tom Lendacky
  2017-02-22 18:13   ` Dave Hansen
  3 siblings, 1 reply; 111+ messages in thread
From: Dave Hansen @ 2017-02-22 18:13 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/2017 07:43 AM, Tom Lendacky wrote:
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
> -	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
> +	return (pte_val(pte) & ~sme_me_mask & PTE_PFN_MASK) >> PAGE_SHIFT;
>  }
>  
>  static inline unsigned long pmd_pfn(pmd_t pmd)
>  {
> -	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
> +	return (pmd_val(pmd) & ~sme_me_mask & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
>  }

Could you talk a bit about why you chose to do the "~sme_me_mask" bit in
here instead of making it a part of PTE_PFN_MASK / pmd_pfn_mask(pmd)?

It might not matter, but I'd be worried that this ends up breaking
direct users of PTE_PFN_MASK / pmd_pfn_mask(pmd) since they now no
longer mask the PFN out of a PTE.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
                     ` (2 preceding siblings ...)
  2017-02-22 18:13   ` Dave Hansen
@ 2017-02-22 18:13   ` Dave Hansen
  3 siblings, 0 replies; 111+ messages in thread
From: Dave Hansen @ 2017-02-22 18:13 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/2017 07:43 AM, Tom Lendacky wrote:
> )
> @@ -673,7 +683,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
>   * Currently stuck as a macro due to indirect forward reference to
>   * linux/mmzone.h's __section_mem_map_addr() definition:
>   */
> -#define pgd_page(pgd)		pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
> +#define pgd_page(pgd)	pfn_to_page(pgd_pfn(pgd))

FWIW, these seem like good cleanups that can go in separately from the
rest of your series.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly
  2017-02-20 19:45   ` Borislav Petkov
@ 2017-02-22 18:34     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-22 18:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 1:45 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:44:11AM -0600, Tom Lendacky wrote:
>> The boot data and command line data are present in memory in a decrypted
>> state and are copied early in the boot process.  The early page fault
>> support will map these areas as encrypted, so before attempting to copy
>> them, add decrypted mappings so the data is accessed properly when copied.
>>
>> For the initrd, encrypt this data in place. Since the future mapping of the
>> initrd area will be mapped as encrypted the data will be accessed properly.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>
> ...
>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index 182a4c7..03f8e74 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -46,13 +46,18 @@ static void __init reset_early_page_tables(void)
>>  	write_cr3(__sme_pa_nodebug(early_level4_pgt));
>>  }
>>
>> +void __init __early_pgtable_flush(void)
>> +{
>> +	write_cr3(__sme_pa_nodebug(early_level4_pgt));
>> +}
>
> Move that to mem_encrypt.c where it is used and make it static. The diff
> below, ontop of this patch, seems to build fine here.

Ok, I can do that.

>
> Also, aren't those mappings global so that you need to toggle CR4.PGE
> for that?
>
> PAGE_KERNEL at least has _PAGE_GLOBAL set.

The early_pmd_flags has _PAGE_GLOBAL cleared:

pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);

so I didn't do the CR4.PGE toggle. I could always add it to be safe in
case that is ever changed. It only happens twice, on the map and on the
unmap, so it shouldn't be a big deal.

>
>> +
>>  /* Create a new PMD entry */
>> -int __init early_make_pgtable(unsigned long address)
>> +int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
>
> __early_make_pmd() then, since it creates a PMD entry.
>
>>  	unsigned long physaddr = address - __PAGE_OFFSET;
>>  	pgdval_t pgd, *pgd_p;
>>  	pudval_t pud, *pud_p;
>> -	pmdval_t pmd, *pmd_p;
>> +	pmdval_t *pmd_p;
>>
>>  	/* Invalid address or early pgt is done ?  */
>>  	if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
>
> ...
>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index ac3565c..ec548e9 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -16,8 +16,12 @@
>>
>>  #include <asm/tlbflush.h>
>>  #include <asm/fixmap.h>
>> +#include <asm/setup.h>
>> +#include <asm/bootparam.h>
>>
>>  extern pmdval_t early_pmd_flags;
>> +int __init __early_make_pgtable(unsigned long, pmdval_t);
>> +void __init __early_pgtable_flush(void);
>
> What's with the forward declarations?
>
> Those should be in some header AFAICT.

I can add them to a header, probably arch/x86/include/asm/pgtable.h.

Thanks,
Tom

>
>>   * Since SME related variables are set early in the boot process they must
>> @@ -103,6 +107,76 @@ void __init sme_early_decrypt(resource_size_t paddr, unsigned long size)
>>  	__sme_early_enc_dec(paddr, size, false);
>>  }
>
> ...
>
> ---
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index 03f8e74c7223..c47500d72330 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -46,11 +46,6 @@ static void __init reset_early_page_tables(void)
>  	write_cr3(__sme_pa_nodebug(early_level4_pgt));
>  }
>
> -void __init __early_pgtable_flush(void)
> -{
> -	write_cr3(__sme_pa_nodebug(early_level4_pgt));
> -}
> -
>  /* Create a new PMD entry */
>  int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
>  {
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ec548e9a76f1..0af020b36232 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -21,7 +21,7 @@
>
>  extern pmdval_t early_pmd_flags;
>  int __init __early_make_pgtable(unsigned long, pmdval_t);
> -void __init __early_pgtable_flush(void);
> +extern pgd_t early_level4_pgt[PTRS_PER_PGD];
>
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -34,6 +34,11 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>  static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
>
> +static void __init early_pgtable_flush(void)
> +{
> +	write_cr3(__sme_pa_nodebug(early_level4_pgt));
> +}
> +
>  /*
>   * This routine does not change the underlying encryption setting of the
>   * page(s) that map this memory. It assumes that eventually the memory is
> @@ -158,7 +163,7 @@ void __init sme_unmap_bootdata(char *real_mode_data)
>  	 */
>  	__sme_map_unmap_bootdata(real_mode_data, false);
>
> -	__early_pgtable_flush();
> +	early_pgtable_flush();
>  }
>
>  void __init sme_map_bootdata(char *real_mode_data)
> @@ -174,7 +179,7 @@ void __init sme_map_bootdata(char *real_mode_data)
>  	 */
>  	__sme_map_unmap_bootdata(real_mode_data, true);
>
> -	__early_pgtable_flush();
> +	early_pgtable_flush();
>  }
>
>  void __init sme_early_init(void)
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute
  2017-02-16 15:45 ` [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute Tom Lendacky
@ 2017-02-22 18:52   ` Borislav Petkov
  2017-02-28 22:46     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-22 18:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:45:35AM -0600, Tom Lendacky wrote:
> Add support for changing the memory encryption attribute for one or more
> memory pages.

"This will be useful when we, ...., for example."

> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/cacheflush.h |    3 ++
>  arch/x86/mm/pageattr.c            |   66 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 69 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index 872877d..33ae60a 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -12,6 +12,7 @@
>   * Executability : eXeutable, NoteXecutable
>   * Read/Write    : ReadOnly, ReadWrite
>   * Presence      : NotPresent
> + * Encryption    : Encrypted, Decrypted
>   *
>   * Within a category, the attributes are mutually exclusive.
>   *
> @@ -47,6 +48,8 @@
>  int set_memory_rw(unsigned long addr, int numpages);
>  int set_memory_np(unsigned long addr, int numpages);
>  int set_memory_4k(unsigned long addr, int numpages);
> +int set_memory_encrypted(unsigned long addr, int numpages);
> +int set_memory_decrypted(unsigned long addr, int numpages);
>  
>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>  int set_memory_array_wc(unsigned long *addr, int addrinarray);
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 91c5c63..9710f5c 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1742,6 +1742,72 @@ int set_memory_4k(unsigned long addr, int numpages)
>  					__pgprot(0), 1, 0, NULL);
>  }
>  
> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> +	struct cpa_data cpa;
> +	unsigned long start;
> +	int ret;
> +
> +	/* Nothing to do if the _PAGE_ENC attribute is zero */
> +	if (_PAGE_ENC == 0)

Why not:

	if (!sme_active())

?

> +		return 0;
> +
> +	/* Save original start address since it will be modified */

That's obvious - it is a small-enough function to fit on the screen. No
need for the comment.

> +	start = addr;
> +
> +	memset(&cpa, 0, sizeof(cpa));
> +	cpa.vaddr = &addr;
> +	cpa.numpages = numpages;
> +	cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
> +	cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
> +	cpa.pgd = init_mm.pgd;
> +
> +	/* Should not be working on unaligned addresses */
> +	if (WARN_ONCE(*cpa.vaddr & ~PAGE_MASK,
> +		      "misaligned address: %#lx\n", *cpa.vaddr))

Use addr here so that you don't have to deref. gcc is probably smart
enough but the code should look more readable this way too.

> +		*cpa.vaddr &= PAGE_MASK;

I know, you must use cpa.vaddr here but if you move that alignment check
over the cpa assignment, you can use addr solely.

> +
> +	/* Must avoid aliasing mappings in the highmem code */
> +	kmap_flush_unused();
> +	vm_unmap_aliases();
> +
> +	/*
> +	 * Before changing the encryption attribute, we need to flush caches.
> +	 */
> +	if (static_cpu_has(X86_FEATURE_CLFLUSH))
> +		cpa_flush_range(start, numpages, 1);
> +	else
> +		cpa_flush_all(1);

I guess we don't really need the distinction since a SME CPU most
definitely implies CLFLUSH support but ok, let's be careful.

> +
> +	ret = __change_page_attr_set_clr(&cpa, 1);
> +
> +	/*
> +	 * After changing the encryption attribute, we need to flush TLBs
> +	 * again in case any speculative TLB caching occurred (but no need
> +	 * to flush caches again).  We could just use cpa_flush_all(), but
> +	 * in case TLB flushing gets optimized in the cpa_flush_range()
> +	 * path use the same logic as above.
> +	 */
> +	if (static_cpu_has(X86_FEATURE_CLFLUSH))
> +		cpa_flush_range(start, numpages, 0);
> +	else
> +		cpa_flush_all(0);
> +
> +	return ret;
> +}

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types
  2017-02-21 12:05   ` Matt Fleming
@ 2017-02-23 17:27     ` Tom Lendacky
  2017-02-24  9:57       ` Matt Fleming
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-23 17:27 UTC (permalink / raw)
  To: Matt Fleming
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/21/2017 6:05 AM, Matt Fleming wrote:
> On Thu, 16 Feb, at 09:44:57AM, Tom Lendacky wrote:
>> Update the efi_mem_type() to return EFI_RESERVED_TYPE instead of a
>> hardcoded 0.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/platform/efi/efi.c |    4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
>> index a15cf81..6407103 100644
>> --- a/arch/x86/platform/efi/efi.c
>> +++ b/arch/x86/platform/efi/efi.c
>> @@ -1037,7 +1037,7 @@ u32 efi_mem_type(unsigned long phys_addr)
>>  	efi_memory_desc_t *md;
>>
>>  	if (!efi_enabled(EFI_MEMMAP))
>> -		return 0;
>> +		return EFI_RESERVED_TYPE;
>>
>>  	for_each_efi_memory_desc(md) {
>>  		if ((md->phys_addr <= phys_addr) &&
>> @@ -1045,7 +1045,7 @@ u32 efi_mem_type(unsigned long phys_addr)
>>  				  (md->num_pages << EFI_PAGE_SHIFT))))
>>  			return md->type;
>>  	}
>> -	return 0;
>> +	return EFI_RESERVED_TYPE;
>>  }
>
> I see what you're getting at here, but arguably the return value in
> these cases never should have been zero to begin with (your change
> just makes that more obvious).
>
> Returning EFI_RESERVED_TYPE implies an EFI memmap entry exists for
> this address, which is misleading because it doesn't in the hunks
> you've modified above.
>
> Instead, could you look at returning a negative error value in the
> usual way we do in the Linux kernel, and update the function prototype
> to match? I don't think any callers actually require the return type
> to be u32.

I can do that, I'll change the return type to an int. For the
!efi_enabled I can return -ENOTSUPP and for when an entry isn't
found I can return -EINVAL.  Sound good?

The ia64 arch is the only other arch that defines the function. It
has just a single return 0 that I'll change to -EINVAL.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-21 15:06   ` Borislav Petkov
@ 2017-02-23 21:34     ` Tom Lendacky
  2017-02-24 10:21       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-23 21:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/21/2017 9:06 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:45:09AM -0600, Tom Lendacky wrote:
>> Boot data (such as EFI related data) is not encrypted when the system is
>> booted and needs to be mapped decrypted.  Add support to apply the proper
>> attributes to the EFI page tables and to the early_memremap and memremap
>> APIs to identify the type of data being accessed so that the proper
>> encryption attribute can be applied.
>
> So this doesn't even begin to explain *why* we need this. The emphasis
> being on *why*.
>
> Lemme guess? kexec? And because of efi_reuse_config?

Hmm... maybe I'm missing something here.  This doesn't have anything to
do with kexec or efi_reuse_config.  This has to do with the fact that
when a system boots the setup data and the EFI data are not encrypted.
Since it's not encrypted we need to be sure that any early_memremap()
and memremap() calls remove the encryption mask from the resulting
pagetable entry that is created so the data can be accessed properly.

>
> If so, then that whole ad-hoc caching in parse_setup_data() needs to go.
> Especially if efi_reuse_config() already sees those addresses so while
> we're there, we could save them somewhere or whatnot. But not doing the
> whole thing again in parse_setup_data().
>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/io.h      |    3 +
>>  arch/x86/include/asm/setup.h   |    8 +++
>>  arch/x86/kernel/setup.c        |   33 ++++++++++++
>>  arch/x86/mm/ioremap.c          |  111 ++++++++++++++++++++++++++++++++++++++++
>>  arch/x86/platform/efi/efi_64.c |   16 ++++--
>>  kernel/memremap.c              |   11 ++++
>>  mm/early_ioremap.c             |   18 +++++-
>>  7 files changed, 192 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 7afb0e2..833f7cc 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -381,4 +381,7 @@ extern int __must_check arch_phys_wc_add(unsigned long base,
>>  #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
>>  #endif
>>
>> +extern bool arch_memremap_do_ram_remap(resource_size_t offset, size_t size);
>> +#define arch_memremap_do_ram_remap arch_memremap_do_ram_remap
>> +
>>  #endif /* _ASM_X86_IO_H */
>> diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
>> index ac1d5da..99998d9 100644
>> --- a/arch/x86/include/asm/setup.h
>> +++ b/arch/x86/include/asm/setup.h
>> @@ -63,6 +63,14 @@ static inline void x86_ce4100_early_setup(void) { }
>>  #include <asm/espfix.h>
>>  #include <linux/kernel.h>
>>
>> +struct setup_data_attrs {
>> +	u64 paddr;
>> +	unsigned long size;
>> +};
>> +
>> +extern struct setup_data_attrs setup_data_list[];
>> +extern unsigned int setup_data_list_count;
>> +
>>  /*
>>   * This is set up by the setup-routine at boot-time
>>   */
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index bd5b9a7..d2234bf 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -148,6 +148,9 @@ int default_check_phys_apicid_present(int phys_apicid)
>>
>>  struct boot_params boot_params;
>>
>> +struct setup_data_attrs setup_data_list[32];
>> +unsigned int setup_data_list_count;
>> +
>>  /*
>>   * Machine setup..
>>   */
>> @@ -419,6 +422,32 @@ static void __init reserve_initrd(void)
>>  }
>>  #endif /* CONFIG_BLK_DEV_INITRD */
>>
>> +static void __init update_setup_data_list(u64 pa_data, unsigned long size)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < setup_data_list_count; i++) {
>> +		if (setup_data_list[i].paddr != pa_data)
>> +			continue;
>> +
>> +		setup_data_list[i].size = size;
>> +		break;
>> +	}
>> +}
>> +
>> +static void __init add_to_setup_data_list(u64 pa_data, unsigned long size)
>> +{
>> +	if (!sme_active())
>> +		return;
>> +
>> +	if (!WARN(setup_data_list_count == ARRAY_SIZE(setup_data_list),
>> +		  "exceeded maximum setup data list slots")) {
>> +		setup_data_list[setup_data_list_count].paddr = pa_data;
>> +		setup_data_list[setup_data_list_count].size = size;
>> +		setup_data_list_count++;
>> +	}
>> +}
>> +
>>  static void __init parse_setup_data(void)
>>  {
>>  	struct setup_data *data;
>> @@ -428,12 +457,16 @@ static void __init parse_setup_data(void)
>>  	while (pa_data) {
>>  		u32 data_len, data_type;
>>
>> +		add_to_setup_data_list(pa_data, sizeof(*data));
>> +
>>  		data = early_memremap(pa_data, sizeof(*data));
>>  		data_len = data->len + sizeof(struct setup_data);
>>  		data_type = data->type;
>>  		pa_next = data->next;
>>  		early_memunmap(data, sizeof(*data));
>>
>> +		update_setup_data_list(pa_data, data_len);
>> +
>>  		switch (data_type) {
>>  		case SETUP_E820_EXT:
>>  			e820__memory_setup_extended(pa_data, data_len);
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 2385e70..b0ff6bc 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -13,6 +13,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/mmiotrace.h>
>> +#include <linux/efi.h>
>>
>>  #include <asm/cacheflush.h>
>>  #include <asm/e820/api.h>
>> @@ -21,6 +22,7 @@
>>  #include <asm/tlbflush.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/pat.h>
>> +#include <asm/setup.h>
>>
>>  #include "physaddr.h"
>>
>> @@ -419,6 +421,115 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
>>  	iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
>>  }
>>
>> +/*
>> + * Examine the physical address to determine if it is boot data. Check
>> + * it against the boot params structure and EFI tables.
>> + */
>> +static bool memremap_is_setup_data(resource_size_t phys_addr,
>> +				   unsigned long size)
>> +{
>> +	unsigned int i;
>> +	u64 paddr;
>> +
>> +	for (i = 0; i < setup_data_list_count; i++) {
>> +		if (phys_addr < setup_data_list[i].paddr)
>> +			continue;
>> +
>> +		if (phys_addr >= (setup_data_list[i].paddr +
>> +				  setup_data_list[i].size))
>> +			continue;
>> +
>> +		/* Address is within setup data range */
>> +		return true;
>> +	}
>> +
>> +	paddr = boot_params.efi_info.efi_memmap_hi;
>> +	paddr <<= 32;
>> +	paddr |= boot_params.efi_info.efi_memmap;
>> +	if (phys_addr == paddr)
>> +		return true;
>> +
>> +	paddr = boot_params.efi_info.efi_systab_hi;
>> +	paddr <<= 32;
>> +	paddr |= boot_params.efi_info.efi_systab;
>> +	if (phys_addr == paddr)
>> +		return true;
>> +
>> +	if (efi_table_address_match(phys_addr))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>> +/*
>> + * This function determines if an address should be mapped encrypted.
>> + * Boot setup data, EFI data and E820 areas are checked in making this
>> + * determination.
>> + */
>> +static bool memremap_should_map_encrypted(resource_size_t phys_addr,
>> +					  unsigned long size)
>> +{
>> +	/*
>> +	 * SME is not active, return true:
>> +	 *   - For early_memremap_pgprot_adjust(), returning true or false
>> +	 *     results in the same protection value
>> +	 *   - For arch_memremap_do_ram_remap(), returning true will allow
>> +	 *     the RAM remap to occur instead of falling back to ioremap()
>> +	 */
>> +	if (!sme_active())
>> +		return true;
>> +
>> +	/* Check if the address is part of the setup data */
>> +	if (memremap_is_setup_data(phys_addr, size))
>> +		return false;
>> +
>> +	/* Check if the address is part of EFI boot/runtime data */
>> +	switch (efi_mem_type(phys_addr)) {
>
> arch/x86/built-in.o: In function `memremap_should_map_encrypted':
> /home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:487: undefined reference to `efi_mem_type'
> make: *** [vmlinux] Error 1
>
> That's a !CONFIG_EFI .config.

Missed that, I'll fix it.

>
>> +	case EFI_BOOT_SERVICES_DATA:
>> +	case EFI_RUNTIME_SERVICES_DATA:
>> +		return false;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	/* Check if the address is outside kernel usable area */
>> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
>> +	case E820_TYPE_RESERVED:
>> +	case E820_TYPE_ACPI:
>> +	case E820_TYPE_NVS:
>> +	case E820_TYPE_UNUSABLE:
>> +		return false;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * Architecure function to determine if RAM remap is allowed.
>> + */
>> +bool arch_memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
>> +{
>> +	return memremap_should_map_encrypted(phys_addr, size);
>> +}
>> +
>> +/*
>> + * Architecure override of __weak function to adjust the protection attributes
>> + * used when remapping memory.
>> + */
>> +pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
>> +					     unsigned long size,
>> +					     pgprot_t prot)
>> +{
>> +	if (memremap_should_map_encrypted(phys_addr, size))
>> +		prot = pgprot_encrypted(prot);
>> +	else
>> +		prot = pgprot_decrypted(prot);
>> +
>> +	return prot;
>> +}
>> +
>>  #ifdef CONFIG_ARCH_USE_MEMREMAP_PROT
>>  /* Remap memory with encryption */
>>  void __init *early_memremap_encrypted(resource_size_t phys_addr,
>> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
>> index 2ee7694..2d8674d 100644
>> --- a/arch/x86/platform/efi/efi_64.c
>> +++ b/arch/x86/platform/efi/efi_64.c
>> @@ -243,7 +243,7 @@ void efi_sync_low_kernel_mappings(void)
>>
>>  int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  {
>> -	unsigned long pfn, text;
>> +	unsigned long pfn, text, pf;
>>  	struct page *page;
>>  	unsigned npages;
>>  	pgd_t *pgd;
>> @@ -251,7 +251,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  	if (efi_enabled(EFI_OLD_MEMMAP))
>>  		return 0;
>>
>> -	efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
>> +	/*
>> +	 * Since the PGD is encrypted, set the encryption mask so that when
>> +	 * this value is loaded into cr3 the PGD will be decrypted during
>> +	 * the pagetable walk.
>> +	 */
>> +	efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
>> +
>>  	pgd = efi_pgd;
>>
>>  	/*
>> @@ -261,7 +267,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  	 * phys_efi_set_virtual_address_map().
>>  	 */
>>  	pfn = pa_memmap >> PAGE_SHIFT;
>> -	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
>> +	pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
>> +	if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
>>  		pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
>>  		return 1;
>>  	}
>> @@ -304,7 +311,8 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
>>  	text = __pa(_text);
>>  	pfn = text >> PAGE_SHIFT;
>>
>> -	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
>> +	pf = _PAGE_RW | _PAGE_ENC;
>> +	if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
>>  		pr_err("Failed to map kernel text 1:1\n");
>>  		return 1;
>>  	}
>
> Those changes should be in a separate patch IMHO.

I can break out the mapping changes from the EFI pagetable changes.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption
  2017-02-22 18:13   ` Dave Hansen
@ 2017-02-23 23:12     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-23 23:12 UTC (permalink / raw)
  To: Dave Hansen, linux-arch, linux-efi, kvm, linux-doc, x86,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/22/2017 12:13 PM, Dave Hansen wrote:
> On 02/16/2017 07:43 AM, Tom Lendacky wrote:
>>  static inline unsigned long pte_pfn(pte_t pte)
>>  {
>> -	return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
>> +	return (pte_val(pte) & ~sme_me_mask & PTE_PFN_MASK) >> PAGE_SHIFT;
>>  }
>>
>>  static inline unsigned long pmd_pfn(pmd_t pmd)
>>  {
>> -	return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
>> +	return (pmd_val(pmd) & ~sme_me_mask & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
>>  }
>
> Could you talk a bit about why you chose to do the "~sme_me_mask" bit in
> here instead of making it a part of PTE_PFN_MASK / pmd_pfn_mask(pmd)?

I think that's a good catch.  Let me look at it, but I believe that it
should be possible to do and avoid what you're worried about below.

Thanks,
Tom

>
> It might not matter, but I'd be worried that this ends up breaking
> direct users of PTE_PFN_MASK / pmd_pfn_mask(pmd) since they now no
> longer mask the PFN out of a PTE.
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types
  2017-02-23 17:27     ` Tom Lendacky
@ 2017-02-24  9:57       ` Matt Fleming
  0 siblings, 0 replies; 111+ messages in thread
From: Matt Fleming @ 2017-02-24  9:57 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, 23 Feb, at 11:27:55AM, Tom Lendacky wrote:
> 
> I can do that, I'll change the return type to an int. For the
> !efi_enabled I can return -ENOTSUPP and for when an entry isn't
> found I can return -EINVAL.  Sound good?
 
Sounds good to me!

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-23 21:34     ` Tom Lendacky
@ 2017-02-24 10:21       ` Borislav Petkov
  2017-02-24 15:04         ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-24 10:21 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 23, 2017 at 03:34:30PM -0600, Tom Lendacky wrote:
> Hmm... maybe I'm missing something here.  This doesn't have anything to
> do with kexec or efi_reuse_config.  This has to do with the fact that

I said kexec because kexec uses the setup_data mechanism to pass config
tables to the second kernel, for example.

> when a system boots the setup data and the EFI data are not encrypted.
> Since it's not encrypted we need to be sure that any early_memremap()
> and memremap() calls remove the encryption mask from the resulting
> pagetable entry that is created so the data can be accessed properly.

Anyway, I'd prefer not to do this ad-hoc caching if it can be
helped. You're imposing an arbitrary limit of 32 there which the
setup_data linked list doesn't have. So if you really want to go
inspect those elements, you could iterate over them starting from
boot_params.hdr.setup_data, just like parse_setup_data() does. Most of
the time that list should be non-existent and if it is, it will be short
anyway.

And if we really decide that we need to cache it for later inspection
due to speed considerations, as you do in memremap_is_setup_data(), you
could do that in the default: branch of parse_setup_data() and do it
just once: I don't see why you need to do add_to_setup_data_list() *and*
update_setup_data_list() when you could add both pointer and updated
size once.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-24 10:21       ` Borislav Petkov
@ 2017-02-24 15:04         ` Tom Lendacky
  2017-02-24 15:22           ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-24 15:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/24/2017 4:21 AM, Borislav Petkov wrote:
> On Thu, Feb 23, 2017 at 03:34:30PM -0600, Tom Lendacky wrote:
>> Hmm... maybe I'm missing something here.  This doesn't have anything to
>> do with kexec or efi_reuse_config.  This has to do with the fact that
>
> I said kexec because kexec uses the setup_data mechanism to pass config
> tables to the second kernel, for example.
>
>> when a system boots the setup data and the EFI data are not encrypted.
>> Since it's not encrypted we need to be sure that any early_memremap()
>> and memremap() calls remove the encryption mask from the resulting
>> pagetable entry that is created so the data can be accessed properly.
>
> Anyway, I'd prefer not to do this ad-hoc caching if it can be
> helped. You're imposing an arbitrary limit of 32 there which the
> setup_data linked list doesn't have. So if you really want to go
> inspect those elements, you could iterate over them starting from
> boot_params.hdr.setup_data, just like parse_setup_data() does. Most of
> the time that list should be non-existent and if it is, it will be short
> anyway.
>

I looked at doing that but you get into this cyclical situation unless
you specifically map each setup data elemement as decrypted. This is ok
for early_memremap since we have early_memremap_decrypted() but a new
memremap_decrypted() would have to be added. But I was trying to avoid
having to do multiple mapping calls inside the current mapping call.

I can always look at converting the setup_data_list from an array
into a list to eliminate the 32 entry limit, too.

Let me look at adding the early_memremap_decrypted() type support to
memremap() and see how that looks.

> And if we really decide that we need to cache it for later inspection
> due to speed considerations, as you do in memremap_is_setup_data(), you
> could do that in the default: branch of parse_setup_data() and do it
> just once: I don't see why you need to do add_to_setup_data_list() *and*
> update_setup_data_list() when you could add both pointer and updated
> size once.

I do the add followed by the update because we can't determine the true
size of the setup data until it is first mapped so that the data->len
field can be accessed. In order to map it properly the physical
address range needs to be added to the list before it is mapped. After
it's mapped, the true physical address range can be calculated and
updated.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-24 15:04         ` Tom Lendacky
@ 2017-02-24 15:22           ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-02-24 15:22 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Fri, Feb 24, 2017 at 09:04:21AM -0600, Tom Lendacky wrote:
> I looked at doing that but you get into this cyclical situation unless
> you specifically map each setup data elemement as decrypted. This is ok
> for early_memremap since we have early_memremap_decrypted() but a new
> memremap_decrypted() would have to be added. But I was trying to avoid
> having to do multiple mapping calls inside the current mapping call.
> 
> I can always look at converting the setup_data_list from an array
> into a list to eliminate the 32 entry limit, too.
> 
> Let me look at adding the early_memremap_decrypted() type support to
> memremap() and see how that looks.

Yes, so this sounds better than the cyclic thing you explained
where you have to add and update since early_memremap() calls into
memremap_should_map_encrypted() which touches the list we're updating at
the same time.

So in the case where you absolutely know that those ranges should
be mapped decrypted, we should have special helpers which do that
explicitly and they are called when we access those special regions.
Well, special for SME. I'm thinking that should simplify the handling
but you'll know better once you write it. :)

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support
  2017-02-16 15:43 ` [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support Tom Lendacky
  2017-02-17 12:00   ` Borislav Petkov
@ 2017-02-25 15:29   ` Borislav Petkov
  2017-02-28 23:01     ` Tom Lendacky
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-25 15:29 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:43:07AM -0600, Tom Lendacky wrote:
> Add support for Secure Memory Encryption (SME). This initial support
> provides a Kconfig entry to build the SME support into the kernel and
> defines the memory encryption mask that will be used in subsequent
> patches to mark pages as encrypted.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

...

> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -0,0 +1,42 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef __X86_MEM_ENCRYPT_H__
> +#define __X86_MEM_ENCRYPT_H__
> +
> +#ifndef __ASSEMBLY__
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +
> +extern unsigned long sme_me_mask;
> +
> +static inline bool sme_active(void)
> +{
> +	return (sme_me_mask) ? true : false;

	return !!sme_me_mask;

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 18/28] x86: DMA support for memory encryption
  2017-02-16 15:46 ` [RFC PATCH v4 18/28] x86: DMA support for memory encryption Tom Lendacky
@ 2017-02-25 17:10   ` Borislav Petkov
  2017-03-06 17:47     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-25 17:10 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:46:04AM -0600, Tom Lendacky wrote:
> Since DMA addresses will effectively look like 48-bit addresses when the
> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
> device performing the DMA does not support 48-bits. SWIOTLB will be
> initialized to create decrypted bounce buffers for use by these devices.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---

Just nitpicks below...

> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index ec548e9..a46bcf4 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -13,11 +13,14 @@
>  #include <linux/linkage.h>
>  #include <linux/init.h>
>  #include <linux/mm.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/swiotlb.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/fixmap.h>
>  #include <asm/setup.h>
>  #include <asm/bootparam.h>
> +#include <asm/cacheflush.h>
>  
>  extern pmdval_t early_pmd_flags;
>  int __init __early_make_pgtable(unsigned long, pmdval_t);
> @@ -192,3 +195,22 @@ void __init sme_early_init(void)
>  	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
>  		protection_map[i] = pgprot_encrypted(protection_map[i]);
>  }
> +
> +/* Architecture __weak replacement functions */
> +void __init mem_encrypt_init(void)
> +{
> +	if (!sme_me_mask)

	    !sme_active()

no?

Unless we're going to be switching SME dynamically at run time?

> +		return;
> +
> +	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
> +	swiotlb_update_mem_attributes();
> +}
> +
> +void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
> +{
> +	WARN(PAGE_ALIGN(size) != size,
> +	     "size is not page aligned (%#lx)\n", size);

"page-aligned" I guess.

> +
> +	/* Make the SWIOTLB buffer area decrypted */
> +	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
> +}
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 4ee479f..15e7160 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -35,6 +35,7 @@ enum swiotlb_force {
>  extern unsigned long swiotlb_nr_tbl(void);
>  unsigned long swiotlb_size_or_default(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> +extern void __init swiotlb_update_mem_attributes(void);
>  
>  /*
>   * Enumeration for sync targets
> diff --git a/init/main.c b/init/main.c
> index 8222caa..ba13f8f 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -466,6 +466,10 @@ void __init __weak thread_stack_cache_init(void)
>  }
>  #endif
>  
> +void __init __weak mem_encrypt_init(void)
> +{
> +}
> +
>  /*
>   * Set up kernel memory allocators
>   */
> @@ -614,6 +618,15 @@ asmlinkage __visible void __init start_kernel(void)
>  	 */
>  	locking_selftest();
>  
> +	/*
> +	 * This needs to be called before any devices perform DMA
> +	 * operations that might use the swiotlb bounce buffers.

					 SWIOTLB

> +	 * This call will mark the bounce buffers as decrypted so
> +	 * that their usage will not cause "plain-text" data to be
> +	 * decrypted when accessed.
> +	 */
> +	mem_encrypt_init();
> +
>  #ifdef CONFIG_BLK_DEV_INITRD
>  	if (initrd_start && !initrd_below_start_ok &&
>  	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index a8d74a7..c463067 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -30,6 +30,7 @@
>  #include <linux/highmem.h>
>  #include <linux/gfp.h>
>  #include <linux/scatterlist.h>
> +#include <linux/mem_encrypt.h>
>  
>  #include <asm/io.h>
>  #include <asm/dma.h>
> @@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
>  	return size ? size : (IO_TLB_DEFAULT_SIZE);
>  }
>  
> +void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
> +{
> +}
> +
> +/* For swiotlb, clear memory encryption mask from dma addresses */
> +static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
> +				      phys_addr_t address)
> +{
> +	return phys_to_dma(hwdev, address) & ~sme_me_mask;
> +}
> +
>  /* Note that this doesn't work with highmem page */
>  static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
>  				      volatile void *address)
> @@ -183,6 +195,31 @@ void swiotlb_print_info(void)
>  	       bytes >> 20, vstart, vend - 1);
>  }
>  
> +/*
> + * Early SWIOTLB allocation may be to early to allow an architecture to

				      too

> + * perform the desired operations.  This function allows the architecture to
> + * call SWIOTLB when the operations are possible.  This function needs to be

s/This function/It/

> + * called before the SWIOTLB memory is used.
> + */
> +void __init swiotlb_update_mem_attributes(void)
> +{
> +	void *vaddr;
> +	unsigned long bytes;
> +
> +	if (no_iotlb_memory || late_alloc)
> +		return;
> +
> +	vaddr = phys_to_virt(io_tlb_start);
> +	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
> +	swiotlb_set_mem_attributes(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +
> +	vaddr = phys_to_virt(io_tlb_overflow_buffer);
> +	bytes = PAGE_ALIGN(io_tlb_overflow);
> +	swiotlb_set_mem_attributes(vaddr, bytes);
> +	memset(vaddr, 0, bytes);
> +}
> +
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>  {
>  	void *v_overflow_buffer;

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-16 15:46 ` [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
  2017-02-17 15:59   ` Konrad Rzeszutek Wilk
@ 2017-02-27 17:52   ` Borislav Petkov
  2017-02-28 23:19     ` Tom Lendacky
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-27 17:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:46:19AM -0600, Tom Lendacky wrote:
> Add warnings to let the user know when bounce buffers are being used for
> DMA when SME is active.  Since the bounce buffers are not in encrypted
> memory, these notifications are to allow the user to determine some
> appropriate action - if necessary.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>  include/linux/dma-mapping.h        |   11 +++++++++++
>  include/linux/mem_encrypt.h        |    6 ++++++
>  lib/swiotlb.c                      |    3 +++
>  4 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 87e816f..5a17f1b 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>  	return (sme_me_mask) ? true : false;
>  }
>  
> +static inline u64 sme_dma_mask(void)
> +{
> +	return ((u64)sme_me_mask << 1) - 1;
> +}
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
>  			      unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> @@ -53,6 +58,12 @@ static inline bool sme_active(void)
>  {
>  	return false;
>  }
> +
> +static inline u64 sme_dma_mask(void)
> +{
> +	return 0ULL;
> +}
> +
>  #endif
>  
>  static inline void __init sme_early_encrypt(resource_size_t paddr,
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 10c5a17..130bef7 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -10,6 +10,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/kmemcheck.h>
>  #include <linux/bug.h>
> +#include <linux/mem_encrypt.h>
>  
>  /**
>   * List of possible attributes associated with a DMA mapping. The semantics
> @@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>  
>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>  		return -EIO;
> +
> +	if (sme_active() && (mask < sme_dma_mask()))
> +		dev_warn(dev,
> +			 "SME is active, device will require DMA bounce buffers\n");
> +

Yes, definitely _once() here.

It could be extended later to be per-device if the need arises.

Also, a bit above in this function, we test if (ops->set_dma_mask) so
device drivers which supply even an empty ->set_dma_mask will circumvent
this check.

It probably doesn't matter all that much right now because the
only driver I see right now defining this method, though, is
ethernet/intel/fm10k/fm10k_pf.c and some other arches' functionality
which is unrelated here.

But still...


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs
  2017-02-16 15:46 ` [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs Tom Lendacky
@ 2017-02-27 18:17   ` Borislav Petkov
  2017-02-28 23:28     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-27 18:17 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:46:47AM -0600, Tom Lendacky wrote:
> Add support to check if memory encryption is active in the kernel and that
> it has been enabled on the AP. If memory encryption is active in the kernel
> but has not been enabled on the AP, then set the SYS_CFG MSR bit to enable
> memory encryption on that AP and allow the AP to continue start up.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/realmode.h      |   12 ++++++++++++
>  arch/x86/realmode/init.c             |    4 ++++
>  arch/x86/realmode/rm/trampoline_64.S |   17 +++++++++++++++++
>  3 files changed, 33 insertions(+)
> 
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 230e190..4f7ef53 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -1,6 +1,15 @@
>  #ifndef _ARCH_X86_REALMODE_H
>  #define _ARCH_X86_REALMODE_H
>  
> +/*
> + * Flag bit definitions for use with the flags field of the trampoline header
> + * int the CONFIG_X86_64 variant.

s/int/in/

> + */
> +#define TH_FLAGS_SME_ACTIVE_BIT		0
> +#define TH_FLAGS_SME_ACTIVE		BIT(TH_FLAGS_SME_ACTIVE_BIT)
> +
> +#ifndef __ASSEMBLY__
> +
>  #include <linux/types.h>
>  #include <asm/io.h>
>  
> @@ -38,6 +47,7 @@ struct trampoline_header {
>  	u64 start;
>  	u64 efer;
>  	u32 cr4;
> +	u32 flags;
>  #endif
>  };
>  
> @@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
>  void set_real_mode_mem(phys_addr_t mem, size_t size);
>  void reserve_real_mode(void);
>  
> +#endif /* __ASSEMBLY__ */
> +
>  #endif /* _ARCH_X86_REALMODE_H */
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 21d7506..5010089 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -102,6 +102,10 @@ static void __init setup_real_mode(void)
>  	trampoline_cr4_features = &trampoline_header->cr4;
>  	*trampoline_cr4_features = mmu_cr4_features;
>  
> +	trampoline_header->flags = 0;
> +	if (sme_active())
> +		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
> +
>  	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
>  	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
>  	trampoline_pgd[511] = init_level4_pgt[511].pgd;
> diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
> index dac7b20..a88c3d1 100644
> --- a/arch/x86/realmode/rm/trampoline_64.S
> +++ b/arch/x86/realmode/rm/trampoline_64.S
> @@ -30,6 +30,7 @@
>  #include <asm/msr.h>
>  #include <asm/segment.h>
>  #include <asm/processor-flags.h>
> +#include <asm/realmode.h>
>  #include "realmode.h"
>  
>  	.text
> @@ -92,6 +93,21 @@ ENTRY(startup_32)
>  	movl	%edx, %fs
>  	movl	%edx, %gs
>  
> +	/* Check for memory encryption support */

Let's add some blurb here about this being a safety net in case BIOS
f*cks up. Which wouldn't be that far-fetched... :-)

> +	bt	$TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
> +	jnc	.Ldone
> +	movl	$MSR_K8_SYSCFG, %ecx
> +	rdmsr
> +	bts	$MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
> +	jc	.Ldone

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-16 15:47 ` [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME Tom Lendacky
  2017-02-17 15:57   ` Konrad Rzeszutek Wilk
@ 2017-02-28 10:35   ` Borislav Petkov
  2017-03-01 15:36     ` Tom Lendacky
  1 sibling, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-02-28 10:35 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
> 
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.
> 
> Additionally, when shutting down all of the CPUs we need to be sure to
> disable caches, flush the caches and then halt. This is needed when booting
> from a state where SME was not active into a state where SME is active.
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/cacheflush.h    |    2 ++
>  arch/x86/include/asm/init.h          |    1 +
>  arch/x86/include/asm/mem_encrypt.h   |   10 ++++++++
>  arch/x86/include/asm/pgtable_types.h |    1 +
>  arch/x86/kernel/machine_kexec_64.c   |    3 ++
>  arch/x86/kernel/process.c            |   43 +++++++++++++++++++++++++++++++++-
>  arch/x86/kernel/smp.c                |    4 ++-
>  arch/x86/mm/ident_map.c              |    6 +++--
>  arch/x86/mm/pageattr.c               |    2 ++
>  include/linux/mem_encrypt.h          |   10 ++++++++
>  kernel/kexec_core.c                  |   24 +++++++++++++++++++
>  11 files changed, 100 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
> index 33ae60a..2180cd5 100644
> --- a/arch/x86/include/asm/cacheflush.h
> +++ b/arch/x86/include/asm/cacheflush.h
> @@ -48,8 +48,10 @@
>  int set_memory_rw(unsigned long addr, int numpages);
>  int set_memory_np(unsigned long addr, int numpages);
>  int set_memory_4k(unsigned long addr, int numpages);
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>  int set_memory_encrypted(unsigned long addr, int numpages);
>  int set_memory_decrypted(unsigned long addr, int numpages);
> +#endif
>  
>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>  int set_memory_array_wc(unsigned long *addr, int addrinarray);

Hmm, why is this ifdeffery creeping in now?

Just supply !CONFIG_AMD_MEM_ENCRYPT versions which don't do anything but
return the address.

> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index 737da62..b2ec511 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>  	void *context;			 /* context for alloc_pgt_page */
>  	unsigned long pmd_flag;		 /* page flag for PMD entry */
>  	unsigned long offset;		 /* ident mapping offset */
> +	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>  };
>  
>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 5a17f1b..1fd5426 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -64,6 +64,16 @@ static inline u64 sme_dma_mask(void)
>  	return 0ULL;
>  }
>  
> +static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
> +{
> +	return 0;
> +}
> +
> +static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
> +{
> +	return 0;
> +}
> +
>  #endif
>  
>  static inline void __init sme_early_encrypt(resource_size_t paddr,
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index f00e70f..456c5cc 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -213,6 +213,7 @@ enum page_cache_mode {
>  #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
>  #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>  #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
> +#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
>  #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>  #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>  #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 307b1f4..b01648c 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -76,7 +76,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>  		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>  	}
>  	pte = pte_offset_kernel(pmd, vaddr);
> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>  	return 0;
>  err:
>  	free_transition_pgtable(image);
> @@ -104,6 +104,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  		.alloc_pgt_page	= alloc_pgt_page,
>  		.context	= image,
>  		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>  	};
>  	unsigned long mstart, mend;
>  	pgd_t *level4p;
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 3ed869c..9b01261 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -279,8 +279,43 @@ bool xen_set_default_idle(void)
>  	return ret;
>  }
>  #endif
> -void stop_this_cpu(void *dummy)
> +
> +static bool is_smt_thread(int cpu)
>  {
> +#ifdef CONFIG_SCHED_SMT
> +	if (cpumask_test_cpu(smp_processor_id(), cpu_smt_mask(cpu)))
> +		return true;
> +#endif

No, no sched stuff in here. Just

	if (cpumask_test_cpu(smp_processor_id(), topology_sibling_cpumask(cpu)))


> +	return false;
> +}
> +
> +void stop_this_cpu(void *data)
> +{
> +	atomic_t *stopping_cpu = data;
> +	bool do_cache_disable = false;
> +	bool do_wbinvd = false;
> +
> +	if (stopping_cpu) {
> +		int stopping_id = atomic_read(stopping_cpu);
> +		struct cpuinfo_x86 *c = &cpu_data(stopping_id);
> +
> +		/*
> +		 * If the processor supports SME then we need to clear
> +		 * out cache information before halting it because we could
> +		 * be performing a kexec. With kexec, going from SME
> +		 * inactive to SME active requires clearing cache entries
> +		 * so that addresses without the encryption bit set don't
> +		 * corrupt the same physical address that has the encryption
> +		 * bit set when caches are flushed. If this is not an SMT
> +		 * thread of the stopping CPU then we disable caching at this
> +		 * point to keep the cache clean.
> +		 */
> +		if (cpu_has(c, X86_FEATURE_SME)) {
> +			do_cache_disable = !is_smt_thread(stopping_id);
> +			do_wbinvd = true;
> +		}
> +	}

Let's simplify this (diff ontop of yours). Notice the sme_active() call
in there - I believe we want to do this only when SME is active - not on
any CPU which merely supports SME.

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9b012612698d..e771d7a42e49 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -296,9 +296,6 @@ void stop_this_cpu(void *data)
 	bool do_wbinvd = false;
 
 	if (stopping_cpu) {
-		int stopping_id = atomic_read(stopping_cpu);
-		struct cpuinfo_x86 *c = &cpu_data(stopping_id);
-
 		/*
 		 * If the processor supports SME then we need to clear
 		 * out cache information before halting it because we could
@@ -310,8 +307,8 @@ void stop_this_cpu(void *data)
 		 * thread of the stopping CPU then we disable caching at this
 		 * point to keep the cache clean.
 		 */
-		if (cpu_has(c, X86_FEATURE_SME)) {
-			do_cache_disable = !is_smt_thread(stopping_id);
+		if (sme_active()) {
+			do_cache_disable = !is_smt_thread(atomic_read(stopping_cpu));
 			do_wbinvd = true;
 		}
 	}

>  	local_irq_disable();
>  	/*
>  	 * Remove this CPU:
> @@ -289,6 +324,12 @@ void stop_this_cpu(void *dummy)
>  	disable_local_APIC();
>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>  
> +	if (do_cache_disable)
> +		write_cr0(read_cr0() | X86_CR0_CD);

Question: what clears CD back again? The CPU online path?

> +
> +	if (do_wbinvd)
> +		wbinvd();
> +

Ok, so this whole shebang is pretty much crippling the machine.
And, AFAICT, you're doing this now from smp_stop_nmi_callback() and
smp_reboot_interrupt() as they both pass a !NULL arg to stop_this_cpu().

And AFAICT those are not all cases where we kexec.

What you need instead, IMO, is __crash_kexec() ->
machine_crash_shutdown() -> native_machine_crash_shutdown() and put all
the SME special handling there.

I *think*.

>  	for (;;)
>  		halt();
>  }

...

> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index 9710f5c..46cc89d 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -1742,6 +1742,7 @@ int set_memory_4k(unsigned long addr, int numpages)
>  					__pgprot(0), 1, 0, NULL);
>  }
>  
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>  {
>  	struct cpa_data cpa;
> @@ -1807,6 +1808,7 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>  	return __set_memory_enc_dec(addr, numpages, false);
>  }
>  EXPORT_SYMBOL(set_memory_decrypted);
> +#endif	/* CONFIG_AMD_MEM_ENCRYPT */

Btw, I don't see those things used in modules to justify the
EXPORT_SYMBOL(). And it should be EXPORT_SYMBOL_GPL() since it is a new
symbol.

So you could put those wrappers in a header and do the ifdeffery there and
__set_memory_enc_dec() you can do like this:

static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
{
	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
		return 0;

...

}

so that you can save yourself the ifdeffery. The compiler would still
parse the function body so everything else used in there would have to
be defined too, even in the !CONFIG_AMD_MEM_ENCRYPT case.

>  
>  int set_pages_uc(struct page *page, int numpages)
>  {
> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
> index 6829ff1..913cf80 100644
> --- a/include/linux/mem_encrypt.h
> +++ b/include/linux/mem_encrypt.h
> @@ -34,6 +34,16 @@ static inline u64 sme_dma_mask(void)
>  	return 0ULL;
>  }
>  
> +static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
> +{
> +	return 0;
> +}
> +
> +static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
> +{
> +	return 0;
> +}
> +
>  #endif
>  
>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5617cc4..ab62f41 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -38,6 +38,7 @@
>  #include <linux/syscore_ops.h>
>  #include <linux/compiler.h>
>  #include <linux/hugetlb.h>
> +#include <linux/mem_encrypt.h>
>  
>  #include <asm/page.h>
>  #include <asm/sections.h>
> @@ -315,6 +316,18 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>  		count = 1 << order;
>  		for (i = 0; i < count; i++)
>  			SetPageReserved(pages + i);
> +
> +		/*
> +		 * If SME is active we need to be sure that kexec pages are
> +		 * not encrypted because when we boot to the new kernel the
> +		 * pages won't be accessed encrypted (initially).
> +		 */
> +		if (sme_active()) {
> +			void *vaddr = page_address(pages);
> +
> +			set_memory_decrypted((unsigned long)vaddr, count);
> +			memset(vaddr, 0, count * PAGE_SIZE);

Why the memset?

> +		}
>  	}
>  
>  	return pages;
> @@ -326,6 +339,17 @@ static void kimage_free_pages(struct page *page)
>  
>  	order = page_private(page);
>  	count = 1 << order;
> +
> +	/*
> +	 * If SME is active we need to reset the pages back to being an
> +	 * encrypted mapping before freeing them.
> +	 */
> +	if (sme_active()) {
> +		void *vaddr = page_address(page);
> +
> +		set_memory_encrypted((unsigned long)vaddr, count);

        if (sme_active())
                set_memory_encrypted((unsigned long)page_address(page), count);

looks ok to me too.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address
  2017-02-20 20:09   ` Borislav Petkov
@ 2017-02-28 22:34     ` Tom Lendacky
  2017-03-03  9:52       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-28 22:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/20/2017 2:09 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:44:30AM -0600, Tom Lendacky wrote:
>> This patch adds support to return the E820 type associated with an address
>
> s/This patch adds/Add/
>
>> range.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/e820/api.h   |    2 ++
>>  arch/x86/include/asm/e820/types.h |    2 ++
>>  arch/x86/kernel/e820.c            |   26 +++++++++++++++++++++++---
>>  3 files changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
>> index 8e0f8b8..7c1bdc9 100644
>> --- a/arch/x86/include/asm/e820/api.h
>> +++ b/arch/x86/include/asm/e820/api.h
>> @@ -38,6 +38,8 @@
>>  extern void e820__reallocate_tables(void);
>>  extern void e820__register_nosave_regions(unsigned long limit_pfn);
>>
>> +extern enum e820_type e820__get_entry_type(u64 start, u64 end);
>> +
>>  /*
>>   * Returns true iff the specified range [start,end) is completely contained inside
>>   * the ISA region.
>> diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
>> index 4adeed0..bf49591 100644
>> --- a/arch/x86/include/asm/e820/types.h
>> +++ b/arch/x86/include/asm/e820/types.h
>> @@ -7,6 +7,8 @@
>>   * These are the E820 types known to the kernel:
>>   */
>>  enum e820_type {
>> +	E820_TYPE_INVALID	= 0,
>> +
>
> Now this is strange - ACPI spec doesn't explicitly say that range type 0
> is invalid. Am I looking at the wrong place?
>
> "Table 15-312 Address Range Types12" in ACPI spec 6.
>
> If 0 is really the invalid entry, then e820_print_type() needs updating
> too. And then the invalid-entry-add should be a separate patch.

The 0 return (originally) was to indicate that an e820 entry for the
range wasn't found. This series just gave it a name.  So it's not that
the type field held a 0.  Since 0 isn't defined in the ACPI spec I don't
see an issue with creating it and I can add a comment to the effect that
this value is used for the type when an e820 entry isn't found. I could
always rename it to E820_TYPE_NOT_FOUND if that would help.

Or if we want to guard against ACPI adding a type 0 in the future, I
could make the function return an int and then return -EINVAL if an e820
entry isn't found.  This might be the better option.

Thanks,
Tom


>
>>  	E820_TYPE_RAM		= 1,
>>  	E820_TYPE_RESERVED	= 2,
>>  	E820_TYPE_ACPI		= 3,
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute
  2017-02-22 18:52   ` Borislav Petkov
@ 2017-02-28 22:46     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-28 22:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/22/2017 12:52 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:45:35AM -0600, Tom Lendacky wrote:
>> Add support for changing the memory encryption attribute for one or more
>> memory pages.
>
> "This will be useful when we, ...., for example."

Yup, will expand on the "why".

>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/cacheflush.h |    3 ++
>>  arch/x86/mm/pageattr.c            |   66 +++++++++++++++++++++++++++++++++++++
>>  2 files changed, 69 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
>> index 872877d..33ae60a 100644
>> --- a/arch/x86/include/asm/cacheflush.h
>> +++ b/arch/x86/include/asm/cacheflush.h
>> @@ -12,6 +12,7 @@
>>   * Executability : eXeutable, NoteXecutable
>>   * Read/Write    : ReadOnly, ReadWrite
>>   * Presence      : NotPresent
>> + * Encryption    : Encrypted, Decrypted
>>   *
>>   * Within a category, the attributes are mutually exclusive.
>>   *
>> @@ -47,6 +48,8 @@
>>  int set_memory_rw(unsigned long addr, int numpages);
>>  int set_memory_np(unsigned long addr, int numpages);
>>  int set_memory_4k(unsigned long addr, int numpages);
>> +int set_memory_encrypted(unsigned long addr, int numpages);
>> +int set_memory_decrypted(unsigned long addr, int numpages);
>>
>>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>>  int set_memory_array_wc(unsigned long *addr, int addrinarray);
>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>> index 91c5c63..9710f5c 100644
>> --- a/arch/x86/mm/pageattr.c
>> +++ b/arch/x86/mm/pageattr.c
>> @@ -1742,6 +1742,72 @@ int set_memory_4k(unsigned long addr, int numpages)
>>  					__pgprot(0), 1, 0, NULL);
>>  }
>>
>> +static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>> +{
>> +	struct cpa_data cpa;
>> +	unsigned long start;
>> +	int ret;
>> +
>> +	/* Nothing to do if the _PAGE_ENC attribute is zero */
>> +	if (_PAGE_ENC == 0)
>
> Why not:
>
> 	if (!sme_active())
>
> ?

Yup, it would be more clear.

>
>> +		return 0;
>> +
>> +	/* Save original start address since it will be modified */
>
> That's obvious - it is a small-enough function to fit on the screen. No
> need for the comment.

Ok.

>
>> +	start = addr;
>> +
>> +	memset(&cpa, 0, sizeof(cpa));
>> +	cpa.vaddr = &addr;
>> +	cpa.numpages = numpages;
>> +	cpa.mask_set = enc ? __pgprot(_PAGE_ENC) : __pgprot(0);
>> +	cpa.mask_clr = enc ? __pgprot(0) : __pgprot(_PAGE_ENC);
>> +	cpa.pgd = init_mm.pgd;
>> +
>> +	/* Should not be working on unaligned addresses */
>> +	if (WARN_ONCE(*cpa.vaddr & ~PAGE_MASK,
>> +		      "misaligned address: %#lx\n", *cpa.vaddr))
>
> Use addr here so that you don't have to deref. gcc is probably smart
> enough but the code should look more readable this way too.
>

Ok.

>> +		*cpa.vaddr &= PAGE_MASK;
>
> I know, you must use cpa.vaddr here but if you move that alignment check
> over the cpa assignment, you can use addr solely.

Ok.

>
>> +
>> +	/* Must avoid aliasing mappings in the highmem code */
>> +	kmap_flush_unused();
>> +	vm_unmap_aliases();
>> +
>> +	/*
>> +	 * Before changing the encryption attribute, we need to flush caches.
>> +	 */
>> +	if (static_cpu_has(X86_FEATURE_CLFLUSH))
>> +		cpa_flush_range(start, numpages, 1);
>> +	else
>> +		cpa_flush_all(1);
>
> I guess we don't really need the distinction since a SME CPU most
> definitely implies CLFLUSH support but ok, let's be careful.
>
>> +
>> +	ret = __change_page_attr_set_clr(&cpa, 1);
>> +
>> +	/*
>> +	 * After changing the encryption attribute, we need to flush TLBs
>> +	 * again in case any speculative TLB caching occurred (but no need
>> +	 * to flush caches again).  We could just use cpa_flush_all(), but
>> +	 * in case TLB flushing gets optimized in the cpa_flush_range()
>> +	 * path use the same logic as above.
>> +	 */
>> +	if (static_cpu_has(X86_FEATURE_CLFLUSH))
>> +		cpa_flush_range(start, numpages, 0);
>> +	else
>> +		cpa_flush_all(0);
>> +
>> +	return ret;
>> +}
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support
  2017-02-25 15:29   ` Borislav Petkov
@ 2017-02-28 23:01     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-02-28 23:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/25/2017 9:29 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:43:07AM -0600, Tom Lendacky wrote:
>> Add support for Secure Memory Encryption (SME). This initial support
>> provides a Kconfig entry to build the SME support into the kernel and
>> defines the memory encryption mask that will be used in subsequent
>> patches to mark pages as encrypted.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>
> ...
>
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -0,0 +1,42 @@
>> +/*
>> + * AMD Memory Encryption Support
>> + *
>> + * Copyright (C) 2016 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Tom Lendacky <thomas.lendacky@amd.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#ifndef __X86_MEM_ENCRYPT_H__
>> +#define __X86_MEM_ENCRYPT_H__
>> +
>> +#ifndef __ASSEMBLY__
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +
>> +extern unsigned long sme_me_mask;
>> +
>> +static inline bool sme_active(void)
>> +{
>> +	return (sme_me_mask) ? true : false;
>
> 	return !!sme_me_mask;

Done.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-27 17:52   ` Borislav Petkov
@ 2017-02-28 23:19     ` Tom Lendacky
  2017-03-01 11:17       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-28 23:19 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/27/2017 11:52 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:46:19AM -0600, Tom Lendacky wrote:
>> Add warnings to let the user know when bounce buffers are being used for
>> DMA when SME is active.  Since the bounce buffers are not in encrypted
>> memory, these notifications are to allow the user to determine some
>> appropriate action - if necessary.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/mem_encrypt.h |   11 +++++++++++
>>  include/linux/dma-mapping.h        |   11 +++++++++++
>>  include/linux/mem_encrypt.h        |    6 ++++++
>>  lib/swiotlb.c                      |    3 +++
>>  4 files changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 87e816f..5a17f1b 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -26,6 +26,11 @@ static inline bool sme_active(void)
>>  	return (sme_me_mask) ? true : false;
>>  }
>>
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return ((u64)sme_me_mask << 1) - 1;
>> +}
>> +
>>  void __init sme_early_encrypt(resource_size_t paddr,
>>  			      unsigned long size);
>>  void __init sme_early_decrypt(resource_size_t paddr,
>> @@ -53,6 +58,12 @@ static inline bool sme_active(void)
>>  {
>>  	return false;
>>  }
>> +
>> +static inline u64 sme_dma_mask(void)
>> +{
>> +	return 0ULL;
>> +}
>> +
>>  #endif
>>
>>  static inline void __init sme_early_encrypt(resource_size_t paddr,
>> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
>> index 10c5a17..130bef7 100644
>> --- a/include/linux/dma-mapping.h
>> +++ b/include/linux/dma-mapping.h
>> @@ -10,6 +10,7 @@
>>  #include <linux/scatterlist.h>
>>  #include <linux/kmemcheck.h>
>>  #include <linux/bug.h>
>> +#include <linux/mem_encrypt.h>
>>
>>  /**
>>   * List of possible attributes associated with a DMA mapping. The semantics
>> @@ -557,6 +558,11 @@ static inline int dma_set_mask(struct device *dev, u64 mask)
>>
>>  	if (!dev->dma_mask || !dma_supported(dev, mask))
>>  		return -EIO;
>> +
>> +	if (sme_active() && (mask < sme_dma_mask()))
>> +		dev_warn(dev,
>> +			 "SME is active, device will require DMA bounce buffers\n");
>> +
>
> Yes, definitely _once() here.

Setting the mask is a probe/init type event, so I think not having the
_once() would be better so that all devices that set a mask to something
less than the SME encryption mask would be identified.  This isn't done
for every DMA, etc.

>
> It could be extended later to be per-device if the need arises.
>
> Also, a bit above in this function, we test if (ops->set_dma_mask) so
> device drivers which supply even an empty ->set_dma_mask will circumvent
> this check.
>
> It probably doesn't matter all that much right now because the
> only driver I see right now defining this method, though, is
> ethernet/intel/fm10k/fm10k_pf.c and some other arches' functionality
> which is unrelated here.

Device drivers don't supply set_dma_mask() since that is part of the
dma_map_ops structure. The fm10k_pf.c file function is unrelated to this
(it's part of an internal driver structure). The dma_map_ops structure
is setup by the arch or an iommu.

Thanks,
Tom

>
> But still...
>
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs
  2017-02-27 18:17   ` Borislav Petkov
@ 2017-02-28 23:28     ` Tom Lendacky
  2017-03-01 11:17       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-02-28 23:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/27/2017 12:17 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:46:47AM -0600, Tom Lendacky wrote:
>> Add support to check if memory encryption is active in the kernel and that
>> it has been enabled on the AP. If memory encryption is active in the kernel
>> but has not been enabled on the AP, then set the SYS_CFG MSR bit to enable
>> memory encryption on that AP and allow the AP to continue start up.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/realmode.h      |   12 ++++++++++++
>>  arch/x86/realmode/init.c             |    4 ++++
>>  arch/x86/realmode/rm/trampoline_64.S |   17 +++++++++++++++++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
>> index 230e190..4f7ef53 100644
>> --- a/arch/x86/include/asm/realmode.h
>> +++ b/arch/x86/include/asm/realmode.h
>> @@ -1,6 +1,15 @@
>>  #ifndef _ARCH_X86_REALMODE_H
>>  #define _ARCH_X86_REALMODE_H
>>
>> +/*
>> + * Flag bit definitions for use with the flags field of the trampoline header
>> + * int the CONFIG_X86_64 variant.
>
> s/int/in/

Fixed.

>
>> + */
>> +#define TH_FLAGS_SME_ACTIVE_BIT		0
>> +#define TH_FLAGS_SME_ACTIVE		BIT(TH_FLAGS_SME_ACTIVE_BIT)
>> +
>> +#ifndef __ASSEMBLY__
>> +
>>  #include <linux/types.h>
>>  #include <asm/io.h>
>>
>> @@ -38,6 +47,7 @@ struct trampoline_header {
>>  	u64 start;
>>  	u64 efer;
>>  	u32 cr4;
>> +	u32 flags;
>>  #endif
>>  };
>>
>> @@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
>>  void set_real_mode_mem(phys_addr_t mem, size_t size);
>>  void reserve_real_mode(void);
>>
>> +#endif /* __ASSEMBLY__ */
>> +
>>  #endif /* _ARCH_X86_REALMODE_H */
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 21d7506..5010089 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -102,6 +102,10 @@ static void __init setup_real_mode(void)
>>  	trampoline_cr4_features = &trampoline_header->cr4;
>>  	*trampoline_cr4_features = mmu_cr4_features;
>>
>> +	trampoline_header->flags = 0;
>> +	if (sme_active())
>> +		trampoline_header->flags |= TH_FLAGS_SME_ACTIVE;
>> +
>>  	trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
>>  	trampoline_pgd[0] = trampoline_pgd_entry.pgd;
>>  	trampoline_pgd[511] = init_level4_pgt[511].pgd;
>> diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
>> index dac7b20..a88c3d1 100644
>> --- a/arch/x86/realmode/rm/trampoline_64.S
>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>> @@ -30,6 +30,7 @@
>>  #include <asm/msr.h>
>>  #include <asm/segment.h>
>>  #include <asm/processor-flags.h>
>> +#include <asm/realmode.h>
>>  #include "realmode.h"
>>
>>  	.text
>> @@ -92,6 +93,21 @@ ENTRY(startup_32)
>>  	movl	%edx, %fs
>>  	movl	%edx, %gs
>>
>> +	/* Check for memory encryption support */
>
> Let's add some blurb here about this being a safety net in case BIOS
> f*cks up. Which wouldn't be that far-fetched... :-)

That's a good idea, I'll expand on that.  I probably won't be that
direct in my comment though :)

Thanks,
Tom

>
>> +	bt	$TH_FLAGS_SME_ACTIVE_BIT, pa_tr_flags
>> +	jnc	.Ldone
>> +	movl	$MSR_K8_SYSCFG, %ecx
>> +	rdmsr
>> +	bts	$MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
>> +	jc	.Ldone
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
                   ` (28 preceding siblings ...)
  2017-02-18 18:12 ` [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Borislav Petkov
@ 2017-03-01  9:17 ` Dave Young
  2017-03-01 17:51   ` Tom Lendacky
  29 siblings, 1 reply; 111+ messages in thread
From: Dave Young @ 2017-03-01  9:17 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, kexec, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Hi Tom,

On 02/16/17 at 09:41am, Tom Lendacky wrote:
> This RFC patch series provides support for AMD's new Secure Memory
> Encryption (SME) feature.
> 
> SME can be used to mark individual pages of memory as encrypted through the
> page tables. A page of memory that is marked encrypted will be automatically
> decrypted when read from DRAM and will be automatically encrypted when
> written to DRAM. Details on SME can found in the links below.
> 
> The SME feature is identified through a CPUID function and enabled through
> the SYSCFG MSR. Once enabled, page table entries will determine how the
> memory is accessed. If a page table entry has the memory encryption mask set,
> then that memory will be accessed as encrypted memory. The memory encryption
> mask (as well as other related information) is determined from settings
> returned through the same CPUID function that identifies the presence of the
> feature.
> 
> The approach that this patch series takes is to encrypt everything possible
> starting early in the boot where the kernel is encrypted. Using the page
> table macros the encryption mask can be incorporated into all page table
> entries and page allocations. By updating the protection map, userspace
> allocations are also marked encrypted. Certain data must be accounted for
> as having been placed in memory before SME was enabled (EFI, initrd, etc.)
> and accessed accordingly.
> 
> This patch series is a pre-cursor to another AMD processor feature called
> Secure Encrypted Virtualization (SEV). The support for SEV will build upon
> the SME support and will be submitted later. Details on SEV can be found
> in the links below.
> 
> The following links provide additional detail:
> 
> AMD Memory Encryption whitepaper:
>    http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf
> 
> AMD64 Architecture Programmer's Manual:
>    http://support.amd.com/TechDocs/24593.pdf
>    SME is section 7.10
>    SEV is section 15.34
> 
> This patch series is based off of the master branch of tip.
>   Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")
> 
> ---
> 
> Still to do: IOMMU enablement support
> 
> Changes since v3:
> - Broke out some of the patches into smaller individual patches
> - Updated Documentation
> - Added a message to indicate why the IOMMU was disabled
> - Updated CPU feature support for SME by taking into account whether
>   BIOS has enabled SME
> - Eliminated redundant functions
> - Added some warning messages for DMA usage of bounce buffers when SME
>   is active
> - Added support for persistent memory
> - Added support to determine when setup data is being mapped and be sure
>   to map it un-encrypted
> - Added CONFIG support to set the default action of whether to activate
>   SME if it is supported/enabled
> - Added support for (re)booting with kexec

Could you please add kexec list in cc when you updating the patches so
that kexec/kdump people do not miss them?

> 
> Changes since v2:
> - Updated Documentation
> - Make the encryption mask available outside of arch/x86 through a
>   standard include file
> - Conversion of assembler routines to C where possible (not everything
>   could be converted, e.g. the routine that does the actual encryption
>   needs to be copied into a safe location and it is difficult to
>   determine the actual length of the function in order to copy it)
> - Fix SME feature use of scattered CPUID feature
> - Creation of SME specific functions for things like encrypting
>   the setup data, ramdisk, etc.
> - New take on early_memremap / memremap encryption support
> - Additional support for accessing video buffers (fbdev/gpu) as
>   un-encrypted
> - Disable IOMMU for now - need to investigate further in relation to
>   how it needs to be programmed relative to accessing physical memory
> 
> Changes since v1:
> - Added Documentation.
> - Removed AMD vendor check for setting the PAT write protect mode
> - Updated naming of trampoline flag for SME as well as moving of the
>   SME check to before paging is enabled.
> - Change to early_memremap to identify the data being mapped as either
>   boot data or kernel data.  The idea being that boot data will have
>   been placed in memory as un-encrypted data and would need to be accessed
>   as such.
> - Updated debugfs support for the bootparams to access the data properly.
> - Do not set the SYSCFG[MEME] bit, only check it.  The setting of the
>   MemEncryptionModeEn bit results in a reduction of physical address size
>   of the processor.  It is possible that BIOS could have configured resources
>   resources into a range that will now not be addressable.  To prevent this,
>   rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
>   encryption support in the kernel.
> 
> Tom Lendacky (28):
>       x86: Documentation for AMD Secure Memory Encryption (SME)
>       x86: Set the write-protect cache mode for full PAT support
>       x86: Add the Secure Memory Encryption CPU feature
>       x86: Handle reduction in physical address size with SME
>       x86: Add Secure Memory Encryption (SME) support
>       x86: Add support to enable SME during early boot processing
>       x86: Provide general kernel support for memory encryption
>       x86: Extend the early_memremap support with additional attrs
>       x86: Add support for early encryption/decryption of memory
>       x86: Insure that boot memory areas are mapped properly
>       x86: Add support to determine the E820 type of an address
>       efi: Add an EFI table address match function
>       efi: Update efi_mem_type() to return defined EFI mem types
>       Add support to access boot related data in the clear
>       Add support to access persistent memory in the clear
>       x86: Add support for changing memory encryption attribute
>       x86: Decrypt trampoline area if memory encryption is active
>       x86: DMA support for memory encryption
>       swiotlb: Add warnings for use of bounce buffers with SME
>       iommu/amd: Disable AMD IOMMU if memory encryption is active
>       x86: Check for memory encryption on the APs
>       x86: Do not specify encrypted memory for video mappings
>       x86/kvm: Enable Secure Memory Encryption of nested page tables
>       x86: Access the setup data through debugfs decrypted
>       x86: Access the setup data through sysfs decrypted
>       x86: Allow kexec to be used with SME
>       x86: Add support to encrypt the kernel in-place
>       x86: Add support to make use of Secure Memory Encryption
> 
> 
>  Documentation/admin-guide/kernel-parameters.txt |   11 +
>  Documentation/x86/amd-memory-encryption.txt     |   57 ++++
>  arch/x86/Kconfig                                |   26 ++
>  arch/x86/boot/compressed/pagetable.c            |    7 +
>  arch/x86/include/asm/cacheflush.h               |    5 
>  arch/x86/include/asm/cpufeature.h               |    7 -
>  arch/x86/include/asm/cpufeatures.h              |    5 
>  arch/x86/include/asm/disabled-features.h        |    3 
>  arch/x86/include/asm/dma-mapping.h              |    5 
>  arch/x86/include/asm/e820/api.h                 |    2 
>  arch/x86/include/asm/e820/types.h               |    2 
>  arch/x86/include/asm/fixmap.h                   |   20 +
>  arch/x86/include/asm/init.h                     |    1 
>  arch/x86/include/asm/io.h                       |    3 
>  arch/x86/include/asm/kvm_host.h                 |    3 
>  arch/x86/include/asm/mem_encrypt.h              |  108 ++++++++
>  arch/x86/include/asm/msr-index.h                |    2 
>  arch/x86/include/asm/page.h                     |    4 
>  arch/x86/include/asm/pgtable.h                  |   26 +-
>  arch/x86/include/asm/pgtable_types.h            |   54 +++-
>  arch/x86/include/asm/processor.h                |    3 
>  arch/x86/include/asm/realmode.h                 |   12 +
>  arch/x86/include/asm/required-features.h        |    3 
>  arch/x86/include/asm/setup.h                    |    8 +
>  arch/x86/include/asm/vga.h                      |   13 +
>  arch/x86/kernel/Makefile                        |    3 
>  arch/x86/kernel/cpu/common.c                    |   23 ++
>  arch/x86/kernel/e820.c                          |   26 ++
>  arch/x86/kernel/espfix_64.c                     |    2 
>  arch/x86/kernel/head64.c                        |   46 +++
>  arch/x86/kernel/head_64.S                       |   65 ++++-
>  arch/x86/kernel/kdebugfs.c                      |   30 +-
>  arch/x86/kernel/ksysfs.c                        |   27 +-
>  arch/x86/kernel/machine_kexec_64.c              |    3 
>  arch/x86/kernel/mem_encrypt_boot.S              |  156 ++++++++++++
>  arch/x86/kernel/mem_encrypt_init.c              |  310 +++++++++++++++++++++++
>  arch/x86/kernel/pci-dma.c                       |   11 +
>  arch/x86/kernel/pci-nommu.c                     |    2 
>  arch/x86/kernel/pci-swiotlb.c                   |    8 -
>  arch/x86/kernel/process.c                       |   43 +++
>  arch/x86/kernel/setup.c                         |   43 +++
>  arch/x86/kernel/smp.c                           |    4 
>  arch/x86/kvm/mmu.c                              |    8 -
>  arch/x86/kvm/vmx.c                              |    3 
>  arch/x86/kvm/x86.c                              |    3 
>  arch/x86/mm/Makefile                            |    1 
>  arch/x86/mm/ident_map.c                         |    6 
>  arch/x86/mm/ioremap.c                           |  157 ++++++++++++
>  arch/x86/mm/kasan_init_64.c                     |    4 
>  arch/x86/mm/mem_encrypt.c                       |  218 ++++++++++++++++
>  arch/x86/mm/pageattr.c                          |   71 +++++
>  arch/x86/mm/pat.c                               |    6 
>  arch/x86/platform/efi/efi.c                     |    4 
>  arch/x86/platform/efi/efi_64.c                  |   16 +
>  arch/x86/realmode/init.c                        |   16 +
>  arch/x86/realmode/rm/trampoline_64.S            |   17 +
>  drivers/firmware/efi/efi.c                      |   33 ++
>  drivers/gpu/drm/drm_gem.c                       |    2 
>  drivers/gpu/drm/drm_vm.c                        |    4 
>  drivers/gpu/drm/ttm/ttm_bo_vm.c                 |    7 -
>  drivers/gpu/drm/udl/udl_fb.c                    |    4 
>  drivers/iommu/amd_iommu_init.c                  |    7 +
>  drivers/video/fbdev/core/fbmem.c                |   12 +
>  include/asm-generic/early_ioremap.h             |    2 
>  include/asm-generic/pgtable.h                   |    8 +
>  include/linux/dma-mapping.h                     |   11 +
>  include/linux/efi.h                             |    7 +
>  include/linux/mem_encrypt.h                     |   53 ++++
>  include/linux/swiotlb.h                         |    1 
>  init/main.c                                     |   13 +
>  kernel/kexec_core.c                             |   24 ++
>  kernel/memremap.c                               |   11 +
>  lib/swiotlb.c                                   |   59 ++++
>  mm/early_ioremap.c                              |   28 ++
>  74 files changed, 1880 insertions(+), 128 deletions(-)
>  create mode 100644 Documentation/x86/amd-memory-encryption.txt
>  create mode 100644 arch/x86/include/asm/mem_encrypt.h
>  create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
>  create mode 100644 arch/x86/kernel/mem_encrypt_init.c
>  create mode 100644 arch/x86/mm/mem_encrypt.c
>  create mode 100644 include/linux/mem_encrypt.h
> 
> -- 
> Tom Lendacky
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks a lot!
Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-17 16:43     ` Tom Lendacky
@ 2017-03-01  9:25       ` Dave Young
  2017-03-01  9:27         ` Dave Young
  2017-03-06 17:58         ` Tom Lendacky
  0 siblings, 2 replies; 111+ messages in thread
From: Dave Young @ 2017-03-01  9:25 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Konrad Rzeszutek Wilk, linux-arch, linux-efi, kvm, linux-doc,
	x86, linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Hi Tom,

On 02/17/17 at 10:43am, Tom Lendacky wrote:
> On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
> > > Provide support so that kexec can be used to boot a kernel when SME is
> > > enabled.
> > 
> > Is the point of kexec and kdump to ehh, dump memory ? But if the
> > rest of the memory is encrypted you won't get much, will you?
> 
> Kexec can be used to reboot a system without going back through BIOS.
> So you can use kexec without using kdump.
> 
> For kdump, just taking a quick look, the option to enable memory
> encryption can be provided on the crash kernel command line and then

Is there a simple way to get the SME status? Probably add some sysfs
file for this purpose.

> crash kernel can would be able to copy the memory decrypted if the
> pagetable is set up properly. It looks like currently ioremap_cache()
> is used to map the old memory page.  That might be able to be changed
> to a memremap() so that the encryption bit is set in the mapping. That
> will mean that memory that is not marked encrypted (EFI tables, swiotlb
> memory, etc) would not be read correctly.

Manage to store info about those ranges which are not encrypted so that
memremap can handle them?

> 
> > 
> > Would it make sense to include some printk to the user if they
> > are setting up kdump that they won't get anything out of it?
> 
> Probably a good idea to add something like that.

It will break kdump functionality, it should be fixed instead of
just adding printk to warn user..

Thanks
Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-03-01  9:25       ` Dave Young
@ 2017-03-01  9:27         ` Dave Young
  2017-03-06 17:58         ` Tom Lendacky
  1 sibling, 0 replies; 111+ messages in thread
From: Dave Young @ 2017-03-01  9:27 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Konrad Rzeszutek Wilk, linux-arch, linux-efi, kvm, linux-doc,
	x86, linux-kernel, kasan-dev, linux-mm, iommu, kexec,
	Rik van Riel, Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

Add kexec list..

On 03/01/17 at 05:25pm, Dave Young wrote:
> Hi Tom,
> 
> On 02/17/17 at 10:43am, Tom Lendacky wrote:
> > On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
> > > > Provide support so that kexec can be used to boot a kernel when SME is
> > > > enabled.
> > > 
> > > Is the point of kexec and kdump to ehh, dump memory ? But if the
> > > rest of the memory is encrypted you won't get much, will you?
> > 
> > Kexec can be used to reboot a system without going back through BIOS.
> > So you can use kexec without using kdump.
> > 
> > For kdump, just taking a quick look, the option to enable memory
> > encryption can be provided on the crash kernel command line and then
> 
> Is there a simple way to get the SME status? Probably add some sysfs
> file for this purpose.
> 
> > crash kernel can would be able to copy the memory decrypted if the
> > pagetable is set up properly. It looks like currently ioremap_cache()
> > is used to map the old memory page.  That might be able to be changed
> > to a memremap() so that the encryption bit is set in the mapping. That
> > will mean that memory that is not marked encrypted (EFI tables, swiotlb
> > memory, etc) would not be read correctly.
> 
> Manage to store info about those ranges which are not encrypted so that
> memremap can handle them?
> 
> > 
> > > 
> > > Would it make sense to include some printk to the user if they
> > > are setting up kdump that they won't get anything out of it?
> > 
> > Probably a good idea to add something like that.
> 
> It will break kdump functionality, it should be fixed instead of
> just adding printk to warn user..
> 
> Thanks
> Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-28 23:19     ` Tom Lendacky
@ 2017-03-01 11:17       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-01 11:17 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 28, 2017 at 05:19:51PM -0600, Tom Lendacky wrote:
> Device drivers don't supply set_dma_mask() since that is part of the
> dma_map_ops structure. The fm10k_pf.c file function is unrelated to this
> (it's part of an internal driver structure). The dma_map_ops structure
> is setup by the arch or an iommu.

That was certainly a brainfart, sorry.

Joerg explained to me on IRC how the whole dma_map_ops handling is
supposed to be happening.

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs
  2017-02-28 23:28     ` Tom Lendacky
@ 2017-03-01 11:17       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-01 11:17 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 28, 2017 at 05:28:48PM -0600, Tom Lendacky wrote:
> That's a good idea, I'll expand on that.  I probably won't be that
> direct in my comment though :)

You either haven't dealt with firmware long enough or you're much better
person than me. :-)))

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-02-28 10:35   ` Borislav Petkov
@ 2017-03-01 15:36     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-01 15:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov, kexec

+kexec list

On 2/28/2017 4:35 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
>> Provide support so that kexec can be used to boot a kernel when SME is
>> enabled.
>>
>> Support is needed to allocate pages for kexec without encryption.  This
>> is needed in order to be able to reboot in the kernel in the same manner
>> as originally booted.
>>
>> Additionally, when shutting down all of the CPUs we need to be sure to
>> disable caches, flush the caches and then halt. This is needed when booting
>> from a state where SME was not active into a state where SME is active.
>> Without these steps, it is possible for cache lines to exist for the same
>> physical location but tagged both with and without the encryption bit. This
>> can cause random memory corruption when caches are flushed depending on
>> which cacheline is written last.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/include/asm/cacheflush.h    |    2 ++
>>  arch/x86/include/asm/init.h          |    1 +
>>  arch/x86/include/asm/mem_encrypt.h   |   10 ++++++++
>>  arch/x86/include/asm/pgtable_types.h |    1 +
>>  arch/x86/kernel/machine_kexec_64.c   |    3 ++
>>  arch/x86/kernel/process.c            |   43 +++++++++++++++++++++++++++++++++-
>>  arch/x86/kernel/smp.c                |    4 ++-
>>  arch/x86/mm/ident_map.c              |    6 +++--
>>  arch/x86/mm/pageattr.c               |    2 ++
>>  include/linux/mem_encrypt.h          |   10 ++++++++
>>  kernel/kexec_core.c                  |   24 +++++++++++++++++++
>>  11 files changed, 100 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
>> index 33ae60a..2180cd5 100644
>> --- a/arch/x86/include/asm/cacheflush.h
>> +++ b/arch/x86/include/asm/cacheflush.h
>> @@ -48,8 +48,10 @@
>>  int set_memory_rw(unsigned long addr, int numpages);
>>  int set_memory_np(unsigned long addr, int numpages);
>>  int set_memory_4k(unsigned long addr, int numpages);
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>>  int set_memory_encrypted(unsigned long addr, int numpages);
>>  int set_memory_decrypted(unsigned long addr, int numpages);
>> +#endif
>>
>>  int set_memory_array_uc(unsigned long *addr, int addrinarray);
>>  int set_memory_array_wc(unsigned long *addr, int addrinarray);
>
> Hmm, why is this ifdeffery creeping in now?
>
> Just supply !CONFIG_AMD_MEM_ENCRYPT versions which don't do anything but
> return the address.

This was added because the set_memory_decrypted() call is now called
from kernel/kexec_core.c.  And since all the set_memory() functions
are defined in an arch include I had to swizzle things around. I think
I should probably do something similar to the SWIOTLB support and have
a __weak function to alter the memory area attributes.

>
>> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
>> index 737da62..b2ec511 100644
>> --- a/arch/x86/include/asm/init.h
>> +++ b/arch/x86/include/asm/init.h
>> @@ -6,6 +6,7 @@ struct x86_mapping_info {
>>  	void *context;			 /* context for alloc_pgt_page */
>>  	unsigned long pmd_flag;		 /* page flag for PMD entry */
>>  	unsigned long offset;		 /* ident mapping offset */
>> +	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>>  };
>>
>>  int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
>> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
>> index 5a17f1b..1fd5426 100644
>> --- a/arch/x86/include/asm/mem_encrypt.h
>> +++ b/arch/x86/include/asm/mem_encrypt.h
>> @@ -64,6 +64,16 @@ static inline u64 sme_dma_mask(void)
>>  	return 0ULL;
>>  }
>>
>> +static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
>> +{
>> +	return 0;
>> +}
>> +
>>  #endif
>>
>>  static inline void __init sme_early_encrypt(resource_size_t paddr,
>> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
>> index f00e70f..456c5cc 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -213,6 +213,7 @@ enum page_cache_mode {
>>  #define PAGE_KERNEL		__pgprot(__PAGE_KERNEL | _PAGE_ENC)
>>  #define PAGE_KERNEL_RO		__pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
>>  #define PAGE_KERNEL_EXEC	__pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
>> +#define PAGE_KERNEL_EXEC_NOENC	__pgprot(__PAGE_KERNEL_EXEC)
>>  #define PAGE_KERNEL_RX		__pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
>>  #define PAGE_KERNEL_NOCACHE	__pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
>>  #define PAGE_KERNEL_LARGE	__pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 307b1f4..b01648c 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -76,7 +76,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
>>  		set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
>>  	}
>>  	pte = pte_offset_kernel(pmd, vaddr);
>> -	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
>> +	set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC_NOENC));
>>  	return 0;
>>  err:
>>  	free_transition_pgtable(image);
>> @@ -104,6 +104,7 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>>  		.alloc_pgt_page	= alloc_pgt_page,
>>  		.context	= image,
>>  		.pmd_flag	= __PAGE_KERNEL_LARGE_EXEC,
>> +		.kernpg_flag	= _KERNPG_TABLE_NOENC,
>>  	};
>>  	unsigned long mstart, mend;
>>  	pgd_t *level4p;
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 3ed869c..9b01261 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -279,8 +279,43 @@ bool xen_set_default_idle(void)
>>  	return ret;
>>  }
>>  #endif
>> -void stop_this_cpu(void *dummy)
>> +
>> +static bool is_smt_thread(int cpu)
>>  {
>> +#ifdef CONFIG_SCHED_SMT
>> +	if (cpumask_test_cpu(smp_processor_id(), cpu_smt_mask(cpu)))
>> +		return true;
>> +#endif
>
> No, no sched stuff in here. Just
>
> 	if (cpumask_test_cpu(smp_processor_id(), topology_sibling_cpumask(cpu)))

Ah, ok, much nicer.

>
>
>> +	return false;
>> +}
>> +
>> +void stop_this_cpu(void *data)
>> +{
>> +	atomic_t *stopping_cpu = data;
>> +	bool do_cache_disable = false;
>> +	bool do_wbinvd = false;
>> +
>> +	if (stopping_cpu) {
>> +		int stopping_id = atomic_read(stopping_cpu);
>> +		struct cpuinfo_x86 *c = &cpu_data(stopping_id);
>> +
>> +		/*
>> +		 * If the processor supports SME then we need to clear
>> +		 * out cache information before halting it because we could
>> +		 * be performing a kexec. With kexec, going from SME
>> +		 * inactive to SME active requires clearing cache entries
>> +		 * so that addresses without the encryption bit set don't
>> +		 * corrupt the same physical address that has the encryption
>> +		 * bit set when caches are flushed. If this is not an SMT
>> +		 * thread of the stopping CPU then we disable caching at this
>> +		 * point to keep the cache clean.
>> +		 */
>> +		if (cpu_has(c, X86_FEATURE_SME)) {
>> +			do_cache_disable = !is_smt_thread(stopping_id);
>> +			do_wbinvd = true;
>> +		}
>> +	}
>
> Let's simplify this (diff ontop of yours). Notice the sme_active() call
> in there - I believe we want to do this only when SME is active - not on
> any CPU which merely supports SME.

No, because we could be going from SME not active and rebooting using
kexec to a state where SME is active.  This is where the cache line
issue can arise.

>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 9b012612698d..e771d7a42e49 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -296,9 +296,6 @@ void stop_this_cpu(void *data)
>  	bool do_wbinvd = false;
>
>  	if (stopping_cpu) {
> -		int stopping_id = atomic_read(stopping_cpu);
> -		struct cpuinfo_x86 *c = &cpu_data(stopping_id);
> -
>  		/*
>  		 * If the processor supports SME then we need to clear
>  		 * out cache information before halting it because we could
> @@ -310,8 +307,8 @@ void stop_this_cpu(void *data)
>  		 * thread of the stopping CPU then we disable caching at this
>  		 * point to keep the cache clean.
>  		 */
> -		if (cpu_has(c, X86_FEATURE_SME)) {
> -			do_cache_disable = !is_smt_thread(stopping_id);
> +		if (sme_active()) {
> +			do_cache_disable = !is_smt_thread(atomic_read(stopping_cpu));
>  			do_wbinvd = true;
>  		}
>  	}
>
>>  	local_irq_disable();
>>  	/*
>>  	 * Remove this CPU:
>> @@ -289,6 +324,12 @@ void stop_this_cpu(void *dummy)
>>  	disable_local_APIC();
>>  	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
>>
>> +	if (do_cache_disable)
>> +		write_cr0(read_cr0() | X86_CR0_CD);
>
> Question: what clears CD back again? The CPU online path?

Yes, when the CPU comes back online the cache is re-enabled.

>
>> +
>> +	if (do_wbinvd)
>> +		wbinvd();
>> +
>
> Ok, so this whole shebang is pretty much crippling the machine.
> And, AFAICT, you're doing this now from smp_stop_nmi_callback() and
> smp_reboot_interrupt() as they both pass a !NULL arg to stop_this_cpu().

I'll take a closer look at the sysfs support to see how moving a cpu
to/from online is affected.

>
> And AFAICT those are not all cases where we kexec.

Yes, kexec can be invoked through a reboot command.
>
> What you need instead, IMO, is __crash_kexec() ->
> machine_crash_shutdown() -> native_machine_crash_shutdown() and put all
> the SME special handling there.
>
> I *think*.

I'll take a closer look at the kexec path to see what can be done. I
might be able to do something with determining if SME is configured
in the kernel and what the default SME state is combined with checking
the command line of the kernel being kexeced.

Thanks,
Tom

>
>>  	for (;;)
>>  		halt();
>>  }
>
> ...
>
>> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
>> index 9710f5c..46cc89d 100644
>> --- a/arch/x86/mm/pageattr.c
>> +++ b/arch/x86/mm/pageattr.c
>> @@ -1742,6 +1742,7 @@ int set_memory_4k(unsigned long addr, int numpages)
>>  					__pgprot(0), 1, 0, NULL);
>>  }
>>
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>>  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
>>  {
>>  	struct cpa_data cpa;
>> @@ -1807,6 +1808,7 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>>  	return __set_memory_enc_dec(addr, numpages, false);
>>  }
>>  EXPORT_SYMBOL(set_memory_decrypted);
>> +#endif	/* CONFIG_AMD_MEM_ENCRYPT */
>
> Btw, I don't see those things used in modules to justify the
> EXPORT_SYMBOL(). And it should be EXPORT_SYMBOL_GPL() since it is a new
> symbol.

Ok.

>
> So you could put those wrappers in a header and do the ifdeffery there and
> __set_memory_enc_dec() you can do like this:

I'll look at making it cleaner. It would be simple if all the
set_memory() functions weren't arch specific. The earlier response about
making it an arch callback function might be best.

>
> static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> {
> 	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
> 		return 0;
>
> ...
>
> }
>
> so that you can save yourself the ifdeffery. The compiler would still
> parse the function body so everything else used in there would have to
> be defined too, even in the !CONFIG_AMD_MEM_ENCRYPT case.
>
>>
>>  int set_pages_uc(struct page *page, int numpages)
>>  {
>> diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
>> index 6829ff1..913cf80 100644
>> --- a/include/linux/mem_encrypt.h
>> +++ b/include/linux/mem_encrypt.h
>> @@ -34,6 +34,16 @@ static inline u64 sme_dma_mask(void)
>>  	return 0ULL;
>>  }
>>
>> +static inline int set_memory_encrypted(unsigned long vaddr, int numpages)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline int set_memory_decrypted(unsigned long vaddr, int numpages)
>> +{
>> +	return 0;
>> +}
>> +
>>  #endif
>>
>>  #endif	/* CONFIG_AMD_MEM_ENCRYPT */
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 5617cc4..ab62f41 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -38,6 +38,7 @@
>>  #include <linux/syscore_ops.h>
>>  #include <linux/compiler.h>
>>  #include <linux/hugetlb.h>
>> +#include <linux/mem_encrypt.h>
>>
>>  #include <asm/page.h>
>>  #include <asm/sections.h>
>> @@ -315,6 +316,18 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>>  		count = 1 << order;
>>  		for (i = 0; i < count; i++)
>>  			SetPageReserved(pages + i);
>> +
>> +		/*
>> +		 * If SME is active we need to be sure that kexec pages are
>> +		 * not encrypted because when we boot to the new kernel the
>> +		 * pages won't be accessed encrypted (initially).
>> +		 */
>> +		if (sme_active()) {
>> +			void *vaddr = page_address(pages);
>> +
>> +			set_memory_decrypted((unsigned long)vaddr, count);
>> +			memset(vaddr, 0, count * PAGE_SIZE);
>
> Why the memset?

Since the memory attribute was changed, a page with all zeroes in memory
when it was an encrypted page would now read as random data (since the
data on the page won't be decrypted). So after setting the attribute
the memset just clears it to zero. I guess I can to the memset only if
__GFP_ZERO is present inf gfp_mask.

>
>> +		}
>>  	}
>>
>>  	return pages;
>> @@ -326,6 +339,17 @@ static void kimage_free_pages(struct page *page)
>>
>>  	order = page_private(page);
>>  	count = 1 << order;
>> +
>> +	/*
>> +	 * If SME is active we need to reset the pages back to being an
>> +	 * encrypted mapping before freeing them.
>> +	 */
>> +	if (sme_active()) {
>> +		void *vaddr = page_address(page);
>> +
>> +		set_memory_encrypted((unsigned long)vaddr, count);
>
>         if (sme_active())
>                 set_memory_encrypted((unsigned long)page_address(page), count);
>
> looks ok to me too.

Ok.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place
  2017-02-16 15:48 ` [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place Tom Lendacky
@ 2017-03-01 17:36   ` Borislav Petkov
  2017-03-02 18:30     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-03-01 17:36 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:48:08AM -0600, Tom Lendacky wrote:
> This patch adds the support to encrypt the kernel in-place. This is
> done by creating new page mappings for the kernel - a decrypted
> write-protected mapping and an encrypted mapping. The kernel is encyrpted

s/encyrpted/encrypted/

> by copying the kernel through a temporary buffer.

"... by copying it... "

> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---

...

> +ENTRY(sme_encrypt_execute)
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> +	/*
> +	 * Entry parameters:
> +	 *   RDI - virtual address for the encrypted kernel mapping
> +	 *   RSI - virtual address for the decrypted kernel mapping
> +	 *   RDX - length of kernel
> +	 *   RCX - address of the encryption workarea

						     , including:

> +	 *     - stack page (PAGE_SIZE)
> +	 *     - encryption routine page (PAGE_SIZE)
> +	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
> +	 *    R8 - address of the pagetables to use for encryption
> +	 */
> +
> +	/* Set up a one page stack in the non-encrypted memory area */
> +	movq	%rcx, %rax
> +	addq	$PAGE_SIZE, %rax
> +	movq	%rsp, %rbp

%rbp is callee-saved and you're overwriting it here. You need to push it
first.

> +	movq	%rax, %rsp
> +	push	%rbp
> +
> +	push	%r12
> +	push	%r13

In general, just do all pushes on function entry and the pops on exit,
like the compiler does.

> +	movq	%rdi, %r10
> +	movq	%rsi, %r11
> +	movq	%rdx, %r12
> +	movq	%rcx, %r13
> +
> +	/* Copy encryption routine into the workarea */
> +	movq	%rax, %rdi
> +	leaq	.Lencrypt_start(%rip), %rsi
> +	movq	$(.Lencrypt_stop - .Lencrypt_start), %rcx
> +	rep	movsb
> +
> +	/* Setup registers for call */
> +	movq	%r10, %rdi
> +	movq	%r11, %rsi
> +	movq	%r8, %rdx
> +	movq	%r12, %rcx
> +	movq	%rax, %r8
> +	addq	$PAGE_SIZE, %r8
> +
> +	/* Call the encryption routine */
> +	call	*%rax
> +
> +	pop	%r13
> +	pop	%r12
> +
> +	pop	%rsp			/* Restore original stack pointer */
> +.Lencrypt_exit:

Please put side comments like this here:

ENTRY(sme_encrypt_execute)

#ifdef CONFIG_AMD_MEM_ENCRYPT
        /*
         * Entry parameters:
         *   RDI - virtual address for the encrypted kernel mapping
         *   RSI - virtual address for the decrypted kernel mapping
         *   RDX - length of kernel
         *   RCX - address of the encryption workarea
         *     - stack page (PAGE_SIZE)
         *     - encryption routine page (PAGE_SIZE)
         *     - intermediate copy buffer (PMD_PAGE_SIZE)
         *    R8 - address of the pagetables to use for encryption
         */

        /* Set up a one page stack in the non-encrypted memory area */
        movq    %rcx, %rax                      # %rax = workarea
        addq    $PAGE_SIZE, %rax                # %rax += 4096
        movq    %rsp, %rbp                      # stash stack ptr
        movq    %rax, %rsp                      # set new stack
        push    %rbp                            # needs to happen before the mov %rsp, %rbp

        push    %r12
        push    %r13

        movq    %rdi, %r10                      # encrypted kernel
        movq    %rsi, %r11                      # decrypted kernel
        movq    %rdx, %r12                      # kernel length
        movq    %rcx, %r13                      # workarea
	...

and so on.

...

> diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
> index 25af15d..07cbb90 100644
> --- a/arch/x86/kernel/mem_encrypt_init.c
> +++ b/arch/x86/kernel/mem_encrypt_init.c
> @@ -16,9 +16,200 @@
>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>  
>  #include <linux/mem_encrypt.h>
> +#include <linux/mm.h>
> +
> +#include <asm/sections.h>
> +
> +extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
> +				void *, pgd_t *);

This belongs into mem_encrypt.h. And I think it already came up: please
use names for those params.

> +
> +#define PGD_FLAGS	_KERNPG_TABLE_NOENC
> +#define PUD_FLAGS	_KERNPG_TABLE_NOENC
> +#define PMD_FLAGS	__PAGE_KERNEL_LARGE_EXEC
> +
> +static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
> +				      void *vaddr, pmdval_t pmd_val)
> +{

sme_populate() or so sounds better.

> +	pud_t *pud;
> +	pmd_t *pmd;
> +
> +	pgd += pgd_index((unsigned long)vaddr);
> +	if (pgd_none(*pgd)) {
> +		pud = next_page;
> +		memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
> +		native_set_pgd(pgd,
> +			       native_make_pgd((unsigned long)pud + PGD_FLAGS));

Let it stick out, no need for those "stairs" in the vertical alignment :)

> +		next_page += sizeof(*pud) * PTRS_PER_PUD;
> +	} else {
> +		pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
> +	}
> +
> +	pud += pud_index((unsigned long)vaddr);
> +	if (pud_none(*pud)) {
> +		pmd = next_page;
> +		memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
> +		native_set_pud(pud,
> +			       native_make_pud((unsigned long)pmd + PUD_FLAGS));
> +		next_page += sizeof(*pmd) * PTRS_PER_PMD;
> +	} else {
> +		pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
> +	}
> +
> +	pmd += pmd_index((unsigned long)vaddr);
> +	if (pmd_none(*pmd) || !pmd_large(*pmd))
> +		native_set_pmd(pmd, native_make_pmd(pmd_val));
> +
> +	return next_page;
> +}
> +
> +static unsigned long __init sme_pgtable_calc(unsigned long start,
> +					     unsigned long end)
> +{
> +	unsigned long addr, total;
> +
> +	total = 0;
> +	addr = start;
> +	while (addr < end) {
> +		unsigned long pgd_end;
> +
> +		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
> +		if (pgd_end > end)
> +			pgd_end = end;
> +
> +		total += sizeof(pud_t) * PTRS_PER_PUD * 2;
> +
> +		while (addr < pgd_end) {
> +			unsigned long pud_end;
> +
> +			pud_end = (addr & PUD_MASK) + PUD_SIZE;
> +			if (pud_end > end)
> +				pud_end = end;
> +
> +			total += sizeof(pmd_t) * PTRS_PER_PMD * 2;

That "* 2" is?

> +
> +			addr = pud_end;

So			addr += PUD_SIZE;

?

> +		}
> +
> +		addr = pgd_end;

So		addr += PGD_SIZE;

?

> +	total += sizeof(pgd_t) * PTRS_PER_PGD;
> +
> +	return total;
> +}
>  
>  void __init sme_encrypt_kernel(void)
>  {
> +	pgd_t *pgd;
> +	void *workarea, *next_page, *vaddr;
> +	unsigned long kern_start, kern_end, kern_len;
> +	unsigned long index, paddr, pmd_flags;
> +	unsigned long exec_size, full_size;
> +
> +	/* If SME is not active then no need to prepare */

That comment is obvious.

> +	if (!sme_active())
> +		return;
> +
> +	/* Set the workarea to be after the kernel */
> +	workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
> +
> +	/*
> +	 * Prepare for encrypting the kernel by building new pagetables with
> +	 * the necessary attributes needed to encrypt the kernel in place.
> +	 *
> +	 *   One range of virtual addresses will map the memory occupied
> +	 *   by the kernel as encrypted.
> +	 *
> +	 *   Another range of virtual addresses will map the memory occupied
> +	 *   by the kernel as decrypted and write-protected.
> +	 *
> +	 *     The use of write-protect attribute will prevent any of the
> +	 *     memory from being cached.
> +	 */
> +
> +	/* Physical address gives us the identity mapped virtual address */
> +	kern_start = __pa_symbol(_text);
> +	kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;

So
	kern_end = (unsigned long)workarea - 1;

?

Also, you can make that workarea be unsigned long and cast it to void *
only when needed so that you don't need to cast it in here for the
calculations.

> +	kern_len = kern_end - kern_start + 1;
> +
> +	/*
> +	 * Calculate required number of workarea bytes needed:
> +	 *   executable encryption area size:
> +	 *     stack page (PAGE_SIZE)
> +	 *     encryption routine page (PAGE_SIZE)
> +	 *     intermediate copy buffer (PMD_PAGE_SIZE)
> +	 *   pagetable structures for workarea (in case not currently mapped)
> +	 *   pagetable structures for the encryption of the kernel
> +	 */
> +	exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
> +
> +	full_size = exec_size;
> +	full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
> +		     sizeof(pmd_t) * PTRS_PER_PMD;
> +	full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
> +
> +	next_page = workarea + exec_size;

So next_page is the next free page after the workarea, correct? Because
of all things, *that* certainly needs a comment. It took me a while to
decipher what's going on here and I'm still not 100% clear.

> +	/* Make sure the current pagetables have entries for the workarea */
> +	pgd = (pgd_t *)native_read_cr3();
> +	paddr = (unsigned long)workarea;
> +	while (paddr < (unsigned long)workarea + full_size) {
> +		vaddr = (void *)paddr;
> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
> +					      paddr + PMD_FLAGS);
> +
> +		paddr += PMD_PAGE_SIZE;
> +	}
> +	native_write_cr3(native_read_cr3());

Why not

	native_write_cr3((unsigned long)pgd);

?

Now you can actually acknowledge that the code block in between changed
the hierarchy in pgd and you're reloading it.

> +	/* Calculate a PGD index to be used for the decrypted mapping */
> +	index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
> +	index <<= PGDIR_SHIFT;

So call it decrypt_mapping_pgd or so. index doesn't say anything. Also,
move it right above where it is being used. This function is very hard
to follow as it is.

> +	/* Set and clear the PGD */

This needs more text: we're building a new temporary pagetable which
will have A, B and C mapped into it and blablabla...

> +	pgd = next_page;
> +	memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
> +	next_page += sizeof(*pgd) * PTRS_PER_PGD;
> +
> +	/* Add encrypted (identity) mappings for the kernel */
> +	pmd_flags = PMD_FLAGS | _PAGE_ENC;
> +	paddr = kern_start;
> +	while (paddr < kern_end) {
> +		vaddr = (void *)paddr;
> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
> +					      paddr + pmd_flags);
> +
> +		paddr += PMD_PAGE_SIZE;
> +	}
> +
> +	/* Add decrypted (non-identity) mappings for the kernel */
> +	pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
> +	paddr = kern_start;
> +	while (paddr < kern_end) {
> +		vaddr = (void *)(paddr + index);
> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
> +					      paddr + pmd_flags);
> +
> +		paddr += PMD_PAGE_SIZE;
> +	}
> +
> +	/* Add the workarea to both mappings */
> +	paddr = kern_end + 1;

	paddr = (unsigned long)workarea;

Now this makes sense when I read the comment above it.

> +	while (paddr < (kern_end + exec_size)) {

... which actually wants that exec_size to be called workarea_size. Then
it'll make more sense.

And then the thing above:

	next_page = workarea + exec_size;

would look like:

	next_page = workarea + workarea_size;

which would make even more sense. And since you have stuff called _start
and _end, you can do:

	next_page = workarea_start + workarea_size;

and not it would make most sense. Eva! :-)

> +		vaddr = (void *)paddr;
> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
> +					      paddr + PMD_FLAGS);
> +
> +		vaddr = (void *)(paddr + index);
> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
> +					      paddr + PMD_FLAGS);
> +
> +		paddr += PMD_PAGE_SIZE;
> +	}
> +
> +	/* Perform the encryption */
> +	sme_encrypt_execute(kern_start, kern_start + index, kern_len,
> +			    workarea, pgd);
> +

Phew, that's one tough patch to review. I'd like to review it again in
your next submission.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD)
  2017-03-01  9:17 ` Dave Young
@ 2017-03-01 17:51   ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-01 17:51 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, kexec, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/1/2017 3:17 AM, Dave Young wrote:
> Hi Tom,

Hi Dave,

>

... SNIP ...

>> - Added support for (re)booting with kexec
>
> Could you please add kexec list in cc when you updating the patches so
> that kexec/kdump people do not miss them?
>

Sorry about that, I'll be sure to add it to the cc list.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption
  2017-02-16 15:48 ` [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption Tom Lendacky
@ 2017-03-01 18:40   ` Borislav Petkov
  2017-03-07 16:05     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Borislav Petkov @ 2017-03-01 18:40 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Feb 16, 2017 at 09:48:25AM -0600, Tom Lendacky wrote:
> This patch adds the support to check if SME has been enabled and if
> memory encryption should be activated (checking of command line option
> based on the configuration of the default state).  If memory encryption
> is to be activated, then the encryption mask is set and the kernel is
> encrypted "in place."
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/head_64.S          |    1 +
>  arch/x86/kernel/mem_encrypt_init.c |   71 +++++++++++++++++++++++++++++++++++-
>  arch/x86/mm/mem_encrypt.c          |    2 +
>  3 files changed, 73 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index edd2f14..e6820e7 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -97,6 +97,7 @@ startup_64:
>  	 * Save the returned mask in %r12 for later use.
>  	 */
>  	push	%rsi
> +	movq	%rsi, %rdi
>  	call	sme_enable
>  	pop	%rsi
>  	movq	%rax, %r12
> diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
> index 07cbb90..35c5e3d 100644
> --- a/arch/x86/kernel/mem_encrypt_init.c
> +++ b/arch/x86/kernel/mem_encrypt_init.c
> @@ -19,6 +19,12 @@
>  #include <linux/mm.h>
>  
>  #include <asm/sections.h>
> +#include <asm/processor-flags.h>
> +#include <asm/msr.h>
> +#include <asm/cmdline.h>
> +
> +static char sme_cmdline_arg_on[] __initdata = "mem_encrypt=on";
> +static char sme_cmdline_arg_off[] __initdata = "mem_encrypt=off";
>  
>  extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
>  				void *, pgd_t *);
> @@ -217,8 +223,71 @@ unsigned long __init sme_get_me_mask(void)
>  	return sme_me_mask;
>  }
>  
> -unsigned long __init sme_enable(void)
> +unsigned long __init sme_enable(void *boot_data)

unsigned long __init sme_enable(struct boot_params *bp)

works too.

And then you need to correct the function signature in the
!CONFIG_AMD_MEM_ENCRYPT case, at the end of this file, too:

unsigned long __init sme_enable(struct boot_params *bp)		{ return 0; }

>  {
> +	struct boot_params *bp = boot_data;
> +	unsigned int eax, ebx, ecx, edx;
> +	unsigned long cmdline_ptr;
> +	bool enable_if_found;
> +	void *cmdline_arg;
> +	u64 msr;
> +
> +	/* Check for an AMD processor */
> +	eax = 0;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if ((ebx != 0x68747541) || (edx != 0x69746e65) || (ecx != 0x444d4163))
> +		goto out;
> +
> +	/* Check for the SME support leaf */
> +	eax = 0x80000000;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (eax < 0x8000001f)
> +		goto out;
> +
> +	/*
> +	 * Check for the SME feature:
> +	 *   CPUID Fn8000_001F[EAX] - Bit 0
> +	 *     Secure Memory Encryption support
> +	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +	 *     Pagetable bit position used to indicate encryption
> +	 */
> +	eax = 0x8000001f;
> +	ecx = 0;
> +	native_cpuid(&eax, &ebx, &ecx, &edx);
> +	if (!(eax & 1))
> +		goto out;
> +
> +	/* Check if SME is enabled */
> +	msr = native_read_msr(MSR_K8_SYSCFG);

This native_read_msr() wankery is adding this check:

	if (msr_tracepoint_active(__tracepoint_read_msr))

and here it is clearly too early for tracepoints. Please use __rdmsr()
which is purely doing the MSR operation. (... and exception handling for
when the RDMSR itself raises an exception but we're very early here too
so the MSR better be there, otherwise we'll blow up).

> +	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> +		goto out;
> +
> +	/*
> +	 * Fixups have not been to applied phys_base yet, so we must obtain

		...    not been applied to phys_base yet ...

> +	 * the address to the SME command line option in the following way.
> +	 */
> +	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT)) {
> +		asm ("lea sme_cmdline_arg_off(%%rip), %0"
> +		     : "=r" (cmdline_arg)
> +		     : "p" (sme_cmdline_arg_off));
> +		enable_if_found = false;
> +	} else {
> +		asm ("lea sme_cmdline_arg_on(%%rip), %0"
> +		     : "=r" (cmdline_arg)
> +		     : "p" (sme_cmdline_arg_on));
> +		enable_if_found = true;
> +	}
> +
> +	cmdline_ptr = bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32);
> +
> +	if (cmdline_find_option_bool((char *)cmdline_ptr, cmdline_arg))
> +		sme_me_mask = enable_if_found ? 1UL << (ebx & 0x3f) : 0;
> +	else
> +		sme_me_mask = enable_if_found ? 0 : 1UL << (ebx & 0x3f);

I have a better idea: you can copy __cmdline_find_option() +
cmdline_find_option() to arch/x86/lib/cmdline.c in a pre-patch. Then,
pass in a buffer and check for "on" and "off". This way you don't
have to misuse the _bool() variant for something which is actually
"option=argument".

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME
  2017-02-17 16:51     ` Tom Lendacky
@ 2017-03-02 17:01       ` Paolo Bonzini
  0 siblings, 0 replies; 111+ messages in thread
From: Paolo Bonzini @ 2017-03-02 17:01 UTC (permalink / raw)
  To: Tom Lendacky, Konrad Rzeszutek Wilk
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Brijesh Singh, Ingo Molnar,
	Alexander Potapenko, Andy Lutomirski, H. Peter Anvin,
	Borislav Petkov, Andrey Ryabinin, Thomas Gleixner, Larry Woodman,
	Dmitry Vyukov



On 17/02/2017 17:51, Tom Lendacky wrote:
> 
> It's meant just to notify the user about the condition. The user could
> then decide to use an alternative device that supports a greater DMA
> range (I can probably change it to a dev_warn_once() so that a device
> is identified).  I would be nice if I could issue this message once per
> device that experienced this.  I didn't see anything that would do
> that, though.

dev_warn_once would print once only, not once per device.  But if you
leave the dev_warn elsewhere, this can be just pr_warn_once.

Paolo

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place
  2017-03-01 17:36   ` Borislav Petkov
@ 2017-03-02 18:30     ` Tom Lendacky
  2017-03-02 18:51       ` Borislav Petkov
  0 siblings, 1 reply; 111+ messages in thread
From: Tom Lendacky @ 2017-03-02 18:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/1/2017 11:36 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:48:08AM -0600, Tom Lendacky wrote:
>> This patch adds the support to encrypt the kernel in-place. This is
>> done by creating new page mappings for the kernel - a decrypted
>> write-protected mapping and an encrypted mapping. The kernel is encyrpted
>
> s/encyrpted/encrypted/
>
>> by copying the kernel through a temporary buffer.
>
> "... by copying it... "

Ok.

>
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>
> ...
>
>> +ENTRY(sme_encrypt_execute)
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +	/*
>> +	 * Entry parameters:
>> +	 *   RDI - virtual address for the encrypted kernel mapping
>> +	 *   RSI - virtual address for the decrypted kernel mapping
>> +	 *   RDX - length of kernel
>> +	 *   RCX - address of the encryption workarea
>
> 						     , including:

Ok.

>
>> +	 *     - stack page (PAGE_SIZE)
>> +	 *     - encryption routine page (PAGE_SIZE)
>> +	 *     - intermediate copy buffer (PMD_PAGE_SIZE)
>> +	 *    R8 - address of the pagetables to use for encryption
>> +	 */
>> +
>> +	/* Set up a one page stack in the non-encrypted memory area */
>> +	movq	%rcx, %rax
>> +	addq	$PAGE_SIZE, %rax
>> +	movq	%rsp, %rbp
>
> %rbp is callee-saved and you're overwriting it here. You need to push it
> first.

Yup, I'll re-work the entry code based on this comment and the one
below.

>
>> +	movq	%rax, %rsp
>> +	push	%rbp
>> +
>> +	push	%r12
>> +	push	%r13
>
> In general, just do all pushes on function entry and the pops on exit,
> like the compiler does.
>
>> +	movq	%rdi, %r10
>> +	movq	%rsi, %r11
>> +	movq	%rdx, %r12
>> +	movq	%rcx, %r13
>> +
>> +	/* Copy encryption routine into the workarea */
>> +	movq	%rax, %rdi
>> +	leaq	.Lencrypt_start(%rip), %rsi
>> +	movq	$(.Lencrypt_stop - .Lencrypt_start), %rcx
>> +	rep	movsb
>> +
>> +	/* Setup registers for call */
>> +	movq	%r10, %rdi
>> +	movq	%r11, %rsi
>> +	movq	%r8, %rdx
>> +	movq	%r12, %rcx
>> +	movq	%rax, %r8
>> +	addq	$PAGE_SIZE, %r8
>> +
>> +	/* Call the encryption routine */
>> +	call	*%rax
>> +
>> +	pop	%r13
>> +	pop	%r12
>> +
>> +	pop	%rsp			/* Restore original stack pointer */
>> +.Lencrypt_exit:
>
> Please put side comments like this here:

Ok, can do.

>
> ENTRY(sme_encrypt_execute)
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
>         /*
>          * Entry parameters:
>          *   RDI - virtual address for the encrypted kernel mapping
>          *   RSI - virtual address for the decrypted kernel mapping
>          *   RDX - length of kernel
>          *   RCX - address of the encryption workarea
>          *     - stack page (PAGE_SIZE)
>          *     - encryption routine page (PAGE_SIZE)
>          *     - intermediate copy buffer (PMD_PAGE_SIZE)
>          *    R8 - address of the pagetables to use for encryption
>          */
>
>         /* Set up a one page stack in the non-encrypted memory area */
>         movq    %rcx, %rax                      # %rax = workarea
>         addq    $PAGE_SIZE, %rax                # %rax += 4096
>         movq    %rsp, %rbp                      # stash stack ptr
>         movq    %rax, %rsp                      # set new stack
>         push    %rbp                            # needs to happen before the mov %rsp, %rbp
>
>         push    %r12
>         push    %r13
>
>         movq    %rdi, %r10                      # encrypted kernel
>         movq    %rsi, %r11                      # decrypted kernel
>         movq    %rdx, %r12                      # kernel length
>         movq    %rcx, %r13                      # workarea
> 	...
>
> and so on.
>
> ...
>
>> diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
>> index 25af15d..07cbb90 100644
>> --- a/arch/x86/kernel/mem_encrypt_init.c
>> +++ b/arch/x86/kernel/mem_encrypt_init.c
>> @@ -16,9 +16,200 @@
>>  #ifdef CONFIG_AMD_MEM_ENCRYPT
>>
>>  #include <linux/mem_encrypt.h>
>> +#include <linux/mm.h>
>> +
>> +#include <asm/sections.h>
>> +
>> +extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
>> +				void *, pgd_t *);
>
> This belongs into mem_encrypt.h. And I think it already came up: please
> use names for those params.

Yup, will move it.

>
>> +
>> +#define PGD_FLAGS	_KERNPG_TABLE_NOENC
>> +#define PUD_FLAGS	_KERNPG_TABLE_NOENC
>> +#define PMD_FLAGS	__PAGE_KERNEL_LARGE_EXEC
>> +
>> +static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
>> +				      void *vaddr, pmdval_t pmd_val)
>> +{
>
> sme_populate() or so sounds better.

Ok.

>
>> +	pud_t *pud;
>> +	pmd_t *pmd;
>> +
>> +	pgd += pgd_index((unsigned long)vaddr);
>> +	if (pgd_none(*pgd)) {
>> +		pud = next_page;
>> +		memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
>> +		native_set_pgd(pgd,
>> +			       native_make_pgd((unsigned long)pud + PGD_FLAGS));
>
> Let it stick out, no need for those "stairs" in the vertical alignment :)

Ok.

>
>> +		next_page += sizeof(*pud) * PTRS_PER_PUD;
>> +	} else {
>> +		pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
>> +	}
>> +
>> +	pud += pud_index((unsigned long)vaddr);
>> +	if (pud_none(*pud)) {
>> +		pmd = next_page;
>> +		memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
>> +		native_set_pud(pud,
>> +			       native_make_pud((unsigned long)pmd + PUD_FLAGS));
>> +		next_page += sizeof(*pmd) * PTRS_PER_PMD;
>> +	} else {
>> +		pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
>> +	}
>> +
>> +	pmd += pmd_index((unsigned long)vaddr);
>> +	if (pmd_none(*pmd) || !pmd_large(*pmd))
>> +		native_set_pmd(pmd, native_make_pmd(pmd_val));
>> +
>> +	return next_page;
>> +}
>> +
>> +static unsigned long __init sme_pgtable_calc(unsigned long start,
>> +					     unsigned long end)
>> +{
>> +	unsigned long addr, total;
>> +
>> +	total = 0;
>> +	addr = start;
>> +	while (addr < end) {
>> +		unsigned long pgd_end;
>> +
>> +		pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
>> +		if (pgd_end > end)
>> +			pgd_end = end;
>> +
>> +		total += sizeof(pud_t) * PTRS_PER_PUD * 2;
>> +
>> +		while (addr < pgd_end) {
>> +			unsigned long pud_end;
>> +
>> +			pud_end = (addr & PUD_MASK) + PUD_SIZE;
>> +			if (pud_end > end)
>> +				pud_end = end;
>> +
>> +			total += sizeof(pmd_t) * PTRS_PER_PMD * 2;
>
> That "* 2" is?

The "* 2" here and above is that a PUD and a PMD is needed for both
the encrypted and decrypted mappings. I'll add a comment to clarify
that.

>
>> +
>> +			addr = pud_end;
>
> So			addr += PUD_SIZE;
>
> ?

Yes, I believe that is correct.

>
>> +		}
>> +
>> +		addr = pgd_end;
>
> So		addr += PGD_SIZE;
>
> ?

Yup, I can do that here too (but need PGDIR_SIZE).

>
>> +	total += sizeof(pgd_t) * PTRS_PER_PGD;
>> +
>> +	return total;
>> +}
>>
>>  void __init sme_encrypt_kernel(void)
>>  {
>> +	pgd_t *pgd;
>> +	void *workarea, *next_page, *vaddr;
>> +	unsigned long kern_start, kern_end, kern_len;
>> +	unsigned long index, paddr, pmd_flags;
>> +	unsigned long exec_size, full_size;
>> +
>> +	/* If SME is not active then no need to prepare */
>
> That comment is obvious.

Ok.

>
>> +	if (!sme_active())
>> +		return;
>> +
>> +	/* Set the workarea to be after the kernel */
>> +	workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
>> +
>> +	/*
>> +	 * Prepare for encrypting the kernel by building new pagetables with
>> +	 * the necessary attributes needed to encrypt the kernel in place.
>> +	 *
>> +	 *   One range of virtual addresses will map the memory occupied
>> +	 *   by the kernel as encrypted.
>> +	 *
>> +	 *   Another range of virtual addresses will map the memory occupied
>> +	 *   by the kernel as decrypted and write-protected.
>> +	 *
>> +	 *     The use of write-protect attribute will prevent any of the
>> +	 *     memory from being cached.
>> +	 */
>> +
>> +	/* Physical address gives us the identity mapped virtual address */
>> +	kern_start = __pa_symbol(_text);
>> +	kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;
>
> So
> 	kern_end = (unsigned long)workarea - 1;
>
> ?
>
> Also, you can make that workarea be unsigned long and cast it to void *
> only when needed so that you don't need to cast it in here for the
> calculations.

Ok, I'll rework this a bit.  I believe I can even get rid of the
"+ 1" and "- 1" stuff, too.

>
>> +	kern_len = kern_end - kern_start + 1;
>> +
>> +	/*
>> +	 * Calculate required number of workarea bytes needed:
>> +	 *   executable encryption area size:
>> +	 *     stack page (PAGE_SIZE)
>> +	 *     encryption routine page (PAGE_SIZE)
>> +	 *     intermediate copy buffer (PMD_PAGE_SIZE)
>> +	 *   pagetable structures for workarea (in case not currently mapped)
>> +	 *   pagetable structures for the encryption of the kernel
>> +	 */
>> +	exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
>> +
>> +	full_size = exec_size;
>> +	full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
>> +		     sizeof(pmd_t) * PTRS_PER_PMD;
>> +	full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
>> +
>> +	next_page = workarea + exec_size;
>
> So next_page is the next free page after the workarea, correct? Because
> of all things, *that* certainly needs a comment. It took me a while to
> decipher what's going on here and I'm still not 100% clear.

So next_page is the first free page within the workarea in which a
pagetable entry (PGD, PUD or PMD) can be created when we are populating
the new mappings or adding the workarea to the current mapping.  Any
new pagetable structures that are created will use this value.

>
>> +	/* Make sure the current pagetables have entries for the workarea */
>> +	pgd = (pgd_t *)native_read_cr3();
>> +	paddr = (unsigned long)workarea;
>> +	while (paddr < (unsigned long)workarea + full_size) {
>> +		vaddr = (void *)paddr;
>> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
>> +					      paddr + PMD_FLAGS);
>> +
>> +		paddr += PMD_PAGE_SIZE;
>> +	}
>> +	native_write_cr3(native_read_cr3());
>
> Why not
>
> 	native_write_cr3((unsigned long)pgd);
>
> ?
>
> Now you can actually acknowledge that the code block in between changed
> the hierarchy in pgd and you're reloading it.

Ok, that makes sense.

>
>> +	/* Calculate a PGD index to be used for the decrypted mapping */
>> +	index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
>> +	index <<= PGDIR_SHIFT;
>
> So call it decrypt_mapping_pgd or so. index doesn't say anything. Also,
> move it right above where it is being used. This function is very hard
> to follow as it is.

Ok, I'll work on the comment.  Something along the line of:

/*
  * The encrypted mapping of the kernel will use identity mapped
  * virtual addresses.  A different PGD index/entry must be used to
  * get different pagetable entries for the decrypted mapping.
  * Choose the next PGD index and convert it to a virtual address
  * to be used as the base of the mapping.
  */

>
>> +	/* Set and clear the PGD */
>
> This needs more text: we're building a new temporary pagetable which
> will have A, B and C mapped into it and blablabla...

Will do.

>
>> +	pgd = next_page;
>> +	memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
>> +	next_page += sizeof(*pgd) * PTRS_PER_PGD;
>> +
>> +	/* Add encrypted (identity) mappings for the kernel */
>> +	pmd_flags = PMD_FLAGS | _PAGE_ENC;
>> +	paddr = kern_start;
>> +	while (paddr < kern_end) {
>> +		vaddr = (void *)paddr;
>> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
>> +					      paddr + pmd_flags);
>> +
>> +		paddr += PMD_PAGE_SIZE;
>> +	}
>> +
>> +	/* Add decrypted (non-identity) mappings for the kernel */
>> +	pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
>> +	paddr = kern_start;
>> +	while (paddr < kern_end) {
>> +		vaddr = (void *)(paddr + index);
>> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
>> +					      paddr + pmd_flags);
>> +
>> +		paddr += PMD_PAGE_SIZE;
>> +	}
>> +
>> +	/* Add the workarea to both mappings */
>> +	paddr = kern_end + 1;
>
> 	paddr = (unsigned long)workarea;
>
> Now this makes sense when I read the comment above it.

Yup, it does.

>
>> +	while (paddr < (kern_end + exec_size)) {
>
> ... which actually wants that exec_size to be called workarea_size. Then
> it'll make more sense.

Except the workarea size includes both the encryption execution
size and the pagetable structure size.  I'll work on this to try
and clarify it better.

>
> And then the thing above:
>
> 	next_page = workarea + exec_size;
>
> would look like:
>
> 	next_page = workarea + workarea_size;
>
> which would make even more sense. And since you have stuff called _start
> and _end, you can do:
>
> 	next_page = workarea_start + workarea_size;
>
> and not it would make most sense. Eva! :-)
>
>> +		vaddr = (void *)paddr;
>> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
>> +					      paddr + PMD_FLAGS);
>> +
>> +		vaddr = (void *)(paddr + index);
>> +		next_page = sme_pgtable_entry(pgd, next_page, vaddr,
>> +					      paddr + PMD_FLAGS);
>> +
>> +		paddr += PMD_PAGE_SIZE;
>> +	}
>> +
>> +	/* Perform the encryption */
>> +	sme_encrypt_execute(kern_start, kern_start + index, kern_len,
>> +			    workarea, pgd);
>> +
>
> Phew, that's one tough patch to review. I'd like to review it again in
> your next submission.

Most definitely.  I appreciate the feedback since I'm very close to
the code and have an understanding of what I'm doing. I'd like to be
sure that everyone can easily understand what is happening.

Thanks,
Tom

>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place
  2017-03-02 18:30     ` Tom Lendacky
@ 2017-03-02 18:51       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-02 18:51 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Thu, Mar 02, 2017 at 12:30:31PM -0600, Tom Lendacky wrote:
> The "* 2" here and above is that a PUD and a PMD is needed for both
> the encrypted and decrypted mappings. I'll add a comment to clarify
> that.

Ah, makes sense. Definitely needs a comment.

> Yup, I can do that here too (but need PGDIR_SIZE).

Right, I did test and wanted to write PGDIR_SIZE but then ... I guess
something distracted me :-)

> So next_page is the first free page within the workarea in which a
> pagetable entry (PGD, PUD or PMD) can be created when we are populating
> the new mappings or adding the workarea to the current mapping.  Any
> new pagetable structures that are created will use this value.

Ok, so I guess this needs an overview comment with maybe some ascii
showing how workarea, exec_size, full_size and all those other things
play together.

> Ok, I'll work on the comment.  Something along the line of:
>
> /*
>  * The encrypted mapping of the kernel will use identity mapped
>  * virtual addresses.  A different PGD index/entry must be used to
>  * get different pagetable entries for the decrypted mapping.
>  * Choose the next PGD index and convert it to a virtual address
>  * to be used as the base of the mapping.

Better.

> Except the workarea size includes both the encryption execution
> size and the pagetable structure size.  I'll work on this to try
> and clarify it better.

That's a useful piece of info, yap, the big picture could use some more
explanation.

> Most definitely.  I appreciate the feedback since I'm very close to
> the code and have an understanding of what I'm doing. I'd like to be
> sure that everyone can easily understand what is happening.

Nice!

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address
  2017-02-28 22:34     ` Tom Lendacky
@ 2017-03-03  9:52       ` Borislav Petkov
  0 siblings, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-03  9:52 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Feb 28, 2017 at 04:34:39PM -0600, Tom Lendacky wrote:
> Or if we want to guard against ACPI adding a type 0 in the future, I
> could make the function return an int and then return -EINVAL if an e820
> entry isn't found.  This might be the better option.

Yap, think so too. I don't trust specs anyway :)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 18/28] x86: DMA support for memory encryption
  2017-02-25 17:10   ` Borislav Petkov
@ 2017-03-06 17:47     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-06 17:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 2/25/2017 11:10 AM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:46:04AM -0600, Tom Lendacky wrote:
>> Since DMA addresses will effectively look like 48-bit addresses when the
>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
>> device performing the DMA does not support 48-bits. SWIOTLB will be
>> initialized to create decrypted bounce buffers for use by these devices.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>
> Just nitpicks below...
>
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index ec548e9..a46bcf4 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -13,11 +13,14 @@
>>  #include <linux/linkage.h>
>>  #include <linux/init.h>
>>  #include <linux/mm.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/swiotlb.h>
>>
>>  #include <asm/tlbflush.h>
>>  #include <asm/fixmap.h>
>>  #include <asm/setup.h>
>>  #include <asm/bootparam.h>
>> +#include <asm/cacheflush.h>
>>
>>  extern pmdval_t early_pmd_flags;
>>  int __init __early_make_pgtable(unsigned long, pmdval_t);
>> @@ -192,3 +195,22 @@ void __init sme_early_init(void)
>>  	for (i = 0; i < ARRAY_SIZE(protection_map); i++)
>>  		protection_map[i] = pgprot_encrypted(protection_map[i]);
>>  }
>> +
>> +/* Architecture __weak replacement functions */
>> +void __init mem_encrypt_init(void)
>> +{
>> +	if (!sme_me_mask)
>
> 	    !sme_active()
>
> no?

I was probably looking ahead to SEV on this one. Basically if the
sme_me_mask is non-zero we will want to make SWIOTLB decrypted.

>
> Unless we're going to be switching SME dynamically at run time?
>
>> +		return;
>> +
>> +	/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
>> +	swiotlb_update_mem_attributes();
>> +}
>> +
>> +void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>> +{
>> +	WARN(PAGE_ALIGN(size) != size,
>> +	     "size is not page aligned (%#lx)\n", size);
>
> "page-aligned" I guess.

Ok.

>
>> +
>> +	/* Make the SWIOTLB buffer area decrypted */
>> +	set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
>> +}
>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>> index 4ee479f..15e7160 100644
>> --- a/include/linux/swiotlb.h
>> +++ b/include/linux/swiotlb.h
>> @@ -35,6 +35,7 @@ enum swiotlb_force {
>>  extern unsigned long swiotlb_nr_tbl(void);
>>  unsigned long swiotlb_size_or_default(void);
>>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>> +extern void __init swiotlb_update_mem_attributes(void);
>>
>>  /*
>>   * Enumeration for sync targets
>> diff --git a/init/main.c b/init/main.c
>> index 8222caa..ba13f8f 100644
>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -466,6 +466,10 @@ void __init __weak thread_stack_cache_init(void)
>>  }
>>  #endif
>>
>> +void __init __weak mem_encrypt_init(void)
>> +{
>> +}
>> +
>>  /*
>>   * Set up kernel memory allocators
>>   */
>> @@ -614,6 +618,15 @@ asmlinkage __visible void __init start_kernel(void)
>>  	 */
>>  	locking_selftest();
>>
>> +	/*
>> +	 * This needs to be called before any devices perform DMA
>> +	 * operations that might use the swiotlb bounce buffers.
>
> 					 SWIOTLB

Ok.

>
>> +	 * This call will mark the bounce buffers as decrypted so
>> +	 * that their usage will not cause "plain-text" data to be
>> +	 * decrypted when accessed.
>> +	 */
>> +	mem_encrypt_init();
>> +
>>  #ifdef CONFIG_BLK_DEV_INITRD
>>  	if (initrd_start && !initrd_below_start_ok &&
>>  	    page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
>> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
>> index a8d74a7..c463067 100644
>> --- a/lib/swiotlb.c
>> +++ b/lib/swiotlb.c
>> @@ -30,6 +30,7 @@
>>  #include <linux/highmem.h>
>>  #include <linux/gfp.h>
>>  #include <linux/scatterlist.h>
>> +#include <linux/mem_encrypt.h>
>>
>>  #include <asm/io.h>
>>  #include <asm/dma.h>
>> @@ -155,6 +156,17 @@ unsigned long swiotlb_size_or_default(void)
>>  	return size ? size : (IO_TLB_DEFAULT_SIZE);
>>  }
>>
>> +void __weak swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
>> +{
>> +}
>> +
>> +/* For swiotlb, clear memory encryption mask from dma addresses */
>> +static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
>> +				      phys_addr_t address)
>> +{
>> +	return phys_to_dma(hwdev, address) & ~sme_me_mask;
>> +}
>> +
>>  /* Note that this doesn't work with highmem page */
>>  static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
>>  				      volatile void *address)
>> @@ -183,6 +195,31 @@ void swiotlb_print_info(void)
>>  	       bytes >> 20, vstart, vend - 1);
>>  }
>>
>> +/*
>> + * Early SWIOTLB allocation may be to early to allow an architecture to
>
> 				      too

Yup.

>
>> + * perform the desired operations.  This function allows the architecture to
>> + * call SWIOTLB when the operations are possible.  This function needs to be
>
> s/This function/It/

Ok.

Thanks,
Tom

>
>> + * called before the SWIOTLB memory is used.
>> + */
>> +void __init swiotlb_update_mem_attributes(void)
>> +{
>> +	void *vaddr;
>> +	unsigned long bytes;
>> +
>> +	if (no_iotlb_memory || late_alloc)
>> +		return;
>> +
>> +	vaddr = phys_to_virt(io_tlb_start);
>> +	bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
>> +	swiotlb_set_mem_attributes(vaddr, bytes);
>> +	memset(vaddr, 0, bytes);
>> +
>> +	vaddr = phys_to_virt(io_tlb_overflow_buffer);
>> +	bytes = PAGE_ALIGN(io_tlb_overflow);
>> +	swiotlb_set_mem_attributes(vaddr, bytes);
>> +	memset(vaddr, 0, bytes);
>> +}
>> +
>>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>>  {
>>  	void *v_overflow_buffer;
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-03-01  9:25       ` Dave Young
  2017-03-01  9:27         ` Dave Young
@ 2017-03-06 17:58         ` Tom Lendacky
  2017-03-06 18:04           ` Tom Lendacky
  2017-03-08  8:12           ` Dave Young
  1 sibling, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-06 17:58 UTC (permalink / raw)
  To: Dave Young
  Cc: Konrad Rzeszutek Wilk, linux-arch, linux-efi, kvm, linux-doc,
	x86, linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/1/2017 3:25 AM, Dave Young wrote:
> Hi Tom,

Hi Dave,

>
> On 02/17/17 at 10:43am, Tom Lendacky wrote:
>> On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
>>>> Provide support so that kexec can be used to boot a kernel when SME is
>>>> enabled.
>>>
>>> Is the point of kexec and kdump to ehh, dump memory ? But if the
>>> rest of the memory is encrypted you won't get much, will you?
>>
>> Kexec can be used to reboot a system without going back through BIOS.
>> So you can use kexec without using kdump.
>>
>> For kdump, just taking a quick look, the option to enable memory
>> encryption can be provided on the crash kernel command line and then
>
> Is there a simple way to get the SME status? Probably add some sysfs
> file for this purpose.

Currently there is not.  I can look at adding something, maybe just the
sme_me_mask value, which if non-zero, would indicate SME is active.

>
>> crash kernel can would be able to copy the memory decrypted if the
>> pagetable is set up properly. It looks like currently ioremap_cache()
>> is used to map the old memory page.  That might be able to be changed
>> to a memremap() so that the encryption bit is set in the mapping. That
>> will mean that memory that is not marked encrypted (EFI tables, swiotlb
>> memory, etc) would not be read correctly.
>
> Manage to store info about those ranges which are not encrypted so that
> memremap can handle them?

I can look into whether something can be done in this area. Any input
you can provide as to what would be the best way/place to store the
range info so kdump can make use of it, would be greatly appreciated.

>
>>
>>>
>>> Would it make sense to include some printk to the user if they
>>> are setting up kdump that they won't get anything out of it?
>>
>> Probably a good idea to add something like that.
>
> It will break kdump functionality, it should be fixed instead of
> just adding printk to warn user..

I do want kdump to work. I'll investigate further what can be done in
this area.

Thanks,
Tom

>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-03-06 17:58         ` Tom Lendacky
@ 2017-03-06 18:04           ` Tom Lendacky
  2017-03-08  8:12           ` Dave Young
  1 sibling, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-06 18:04 UTC (permalink / raw)
  To: Dave Young
  Cc: Konrad Rzeszutek Wilk, linux-arch, linux-efi, kvm, linux-doc,
	x86, linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov, kexec

+kexec-list

On 3/6/2017 11:58 AM, Tom Lendacky wrote:
> On 3/1/2017 3:25 AM, Dave Young wrote:
>> Hi Tom,
>
> Hi Dave,
>
>>
>> On 02/17/17 at 10:43am, Tom Lendacky wrote:
>>> On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
>>>> On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
>>>>> Provide support so that kexec can be used to boot a kernel when SME is
>>>>> enabled.
>>>>
>>>> Is the point of kexec and kdump to ehh, dump memory ? But if the
>>>> rest of the memory is encrypted you won't get much, will you?
>>>
>>> Kexec can be used to reboot a system without going back through BIOS.
>>> So you can use kexec without using kdump.
>>>
>>> For kdump, just taking a quick look, the option to enable memory
>>> encryption can be provided on the crash kernel command line and then
>>
>> Is there a simple way to get the SME status? Probably add some sysfs
>> file for this purpose.
>
> Currently there is not.  I can look at adding something, maybe just the
> sme_me_mask value, which if non-zero, would indicate SME is active.
>
>>
>>> crash kernel can would be able to copy the memory decrypted if the
>>> pagetable is set up properly. It looks like currently ioremap_cache()
>>> is used to map the old memory page.  That might be able to be changed
>>> to a memremap() so that the encryption bit is set in the mapping. That
>>> will mean that memory that is not marked encrypted (EFI tables, swiotlb
>>> memory, etc) would not be read correctly.
>>
>> Manage to store info about those ranges which are not encrypted so that
>> memremap can handle them?
>
> I can look into whether something can be done in this area. Any input
> you can provide as to what would be the best way/place to store the
> range info so kdump can make use of it, would be greatly appreciated.
>
>>
>>>
>>>>
>>>> Would it make sense to include some printk to the user if they
>>>> are setting up kdump that they won't get anything out of it?
>>>
>>> Probably a good idea to add something like that.
>>
>> It will break kdump functionality, it should be fixed instead of
>> just adding printk to warn user..
>
> I do want kdump to work. I'll investigate further what can be done in
> this area.
>
> Thanks,
> Tom
>
>>
>> Thanks
>> Dave
>>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption
  2017-03-01 18:40   ` Borislav Petkov
@ 2017-03-07 16:05     ` Tom Lendacky
  2017-03-07 17:42       ` Borislav Petkov
  2017-03-08 15:05       ` Borislav Petkov
  0 siblings, 2 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-07 16:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/1/2017 12:40 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 09:48:25AM -0600, Tom Lendacky wrote:
>> This patch adds the support to check if SME has been enabled and if
>> memory encryption should be activated (checking of command line option
>> based on the configuration of the default state).  If memory encryption
>> is to be activated, then the encryption mask is set and the kernel is
>> encrypted "in place."
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/head_64.S          |    1 +
>>  arch/x86/kernel/mem_encrypt_init.c |   71 +++++++++++++++++++++++++++++++++++-
>>  arch/x86/mm/mem_encrypt.c          |    2 +
>>  3 files changed, 73 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index edd2f14..e6820e7 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -97,6 +97,7 @@ startup_64:
>>  	 * Save the returned mask in %r12 for later use.
>>  	 */
>>  	push	%rsi
>> +	movq	%rsi, %rdi
>>  	call	sme_enable
>>  	pop	%rsi
>>  	movq	%rax, %r12
>> diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
>> index 07cbb90..35c5e3d 100644
>> --- a/arch/x86/kernel/mem_encrypt_init.c
>> +++ b/arch/x86/kernel/mem_encrypt_init.c
>> @@ -19,6 +19,12 @@
>>  #include <linux/mm.h>
>>
>>  #include <asm/sections.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/msr.h>
>> +#include <asm/cmdline.h>
>> +
>> +static char sme_cmdline_arg_on[] __initdata = "mem_encrypt=on";
>> +static char sme_cmdline_arg_off[] __initdata = "mem_encrypt=off";
>>
>>  extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
>>  				void *, pgd_t *);
>> @@ -217,8 +223,71 @@ unsigned long __init sme_get_me_mask(void)
>>  	return sme_me_mask;
>>  }
>>
>> -unsigned long __init sme_enable(void)
>> +unsigned long __init sme_enable(void *boot_data)
>
> unsigned long __init sme_enable(struct boot_params *bp)
>
> works too.

Ok, will do.

>
> And then you need to correct the function signature in the
> !CONFIG_AMD_MEM_ENCRYPT case, at the end of this file, too:
>
> unsigned long __init sme_enable(struct boot_params *bp)		{ return 0; }

Yup, missed that.  I'll make it match.

>
>>  {
>> +	struct boot_params *bp = boot_data;
>> +	unsigned int eax, ebx, ecx, edx;
>> +	unsigned long cmdline_ptr;
>> +	bool enable_if_found;
>> +	void *cmdline_arg;
>> +	u64 msr;
>> +
>> +	/* Check for an AMD processor */
>> +	eax = 0;
>> +	ecx = 0;
>> +	native_cpuid(&eax, &ebx, &ecx, &edx);
>> +	if ((ebx != 0x68747541) || (edx != 0x69746e65) || (ecx != 0x444d4163))
>> +		goto out;
>> +
>> +	/* Check for the SME support leaf */
>> +	eax = 0x80000000;
>> +	ecx = 0;
>> +	native_cpuid(&eax, &ebx, &ecx, &edx);
>> +	if (eax < 0x8000001f)
>> +		goto out;
>> +
>> +	/*
>> +	 * Check for the SME feature:
>> +	 *   CPUID Fn8000_001F[EAX] - Bit 0
>> +	 *     Secure Memory Encryption support
>> +	 *   CPUID Fn8000_001F[EBX] - Bits 5:0
>> +	 *     Pagetable bit position used to indicate encryption
>> +	 */
>> +	eax = 0x8000001f;
>> +	ecx = 0;
>> +	native_cpuid(&eax, &ebx, &ecx, &edx);
>> +	if (!(eax & 1))
>> +		goto out;
>> +
>> +	/* Check if SME is enabled */
>> +	msr = native_read_msr(MSR_K8_SYSCFG);
>
> This native_read_msr() wankery is adding this check:
>
> 	if (msr_tracepoint_active(__tracepoint_read_msr))
>
> and here it is clearly too early for tracepoints. Please use __rdmsr()
> which is purely doing the MSR operation. (... and exception handling for

Ah, good catch.  I'll switch to __rdmsr().

> when the RDMSR itself raises an exception but we're very early here too
> so the MSR better be there, otherwise we'll blow up).

Yes, it will be there if SME support is indicated in the CPUID result.

>
>> +	if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
>> +		goto out;
>> +
>> +	/*
>> +	 * Fixups have not been to applied phys_base yet, so we must obtain
>
> 		...    not been applied to phys_base yet ...

Yup.

>
>> +	 * the address to the SME command line option in the following way.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT)) {
>> +		asm ("lea sme_cmdline_arg_off(%%rip), %0"
>> +		     : "=r" (cmdline_arg)
>> +		     : "p" (sme_cmdline_arg_off));
>> +		enable_if_found = false;
>> +	} else {
>> +		asm ("lea sme_cmdline_arg_on(%%rip), %0"
>> +		     : "=r" (cmdline_arg)
>> +		     : "p" (sme_cmdline_arg_on));
>> +		enable_if_found = true;
>> +	}
>> +
>> +	cmdline_ptr = bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32);
>> +
>> +	if (cmdline_find_option_bool((char *)cmdline_ptr, cmdline_arg))
>> +		sme_me_mask = enable_if_found ? 1UL << (ebx & 0x3f) : 0;
>> +	else
>> +		sme_me_mask = enable_if_found ? 0 : 1UL << (ebx & 0x3f);
>
> I have a better idea: you can copy __cmdline_find_option() +
> cmdline_find_option() to arch/x86/lib/cmdline.c in a pre-patch. Then,
> pass in a buffer and check for "on" and "off". This way you don't
> have to misuse the _bool() variant for something which is actually
> "option=argument".

I can do that.  Because phys_base hasn't been updated yet, I'll have to
create "on" and "off" constants and get their address in a similar way
to the command line option so that I can do the strncmp properly.

Thanks,
Tom

>
> Thanks.
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption
  2017-03-07 16:05     ` Tom Lendacky
@ 2017-03-07 17:42       ` Borislav Petkov
  2017-03-08 15:05       ` Borislav Petkov
  1 sibling, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-07 17:42 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Mar 07, 2017 at 10:05:00AM -0600, Tom Lendacky wrote:
> I can do that.  Because phys_base hasn't been updated yet, I'll have to
> create "on" and "off" constants and get their address in a similar way
> to the command line option so that I can do the strncmp properly.

Actually, wouldn't it be simpler to inspect the passed in buffer for
containing the chars 'o', 'n' - in that order, or 'o', 'f', 'f' - in
that order too? Because __cmdline_find_option() does copy the option
characters into the buffer.

Then you wouldn't need those "on" and "off" constants...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-02-16 15:45 ` [RFC PATCH v4 14/28] Add support to access boot related data in the clear Tom Lendacky
  2017-02-21 15:06   ` Borislav Petkov
@ 2017-03-08  6:55   ` Dave Young
  2017-03-17 19:50     ` Tom Lendacky
  1 sibling, 1 reply; 111+ messages in thread
From: Dave Young @ 2017-03-08  6:55 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/17 at 09:45am, Tom Lendacky wrote:
[snip]
> + * This function determines if an address should be mapped encrypted.
> + * Boot setup data, EFI data and E820 areas are checked in making this
> + * determination.
> + */
> +static bool memremap_should_map_encrypted(resource_size_t phys_addr,
> +					  unsigned long size)
> +{
> +	/*
> +	 * SME is not active, return true:
> +	 *   - For early_memremap_pgprot_adjust(), returning true or false
> +	 *     results in the same protection value
> +	 *   - For arch_memremap_do_ram_remap(), returning true will allow
> +	 *     the RAM remap to occur instead of falling back to ioremap()
> +	 */
> +	if (!sme_active())
> +		return true;

>From the function name shouldn't above be return false? 

> +
> +	/* Check if the address is part of the setup data */
> +	if (memremap_is_setup_data(phys_addr, size))
> +		return false;
> +
> +	/* Check if the address is part of EFI boot/runtime data */
> +	switch (efi_mem_type(phys_addr)) {
> +	case EFI_BOOT_SERVICES_DATA:
> +	case EFI_RUNTIME_SERVICES_DATA:

Only these two types needed? I'm not sure about this, just bring up the
question.

> +		return false;
> +	default:
> +		break;
> +	}
> +
> +	/* Check if the address is outside kernel usable area */
> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
> +	case E820_TYPE_RESERVED:
> +	case E820_TYPE_ACPI:
> +	case E820_TYPE_NVS:
> +	case E820_TYPE_UNUSABLE:
> +		return false;
> +	default:
> +		break;
> +	}
> +
> +	return true;
> +}
> +

Thanks
Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted
  2017-02-16 15:47 ` [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted Tom Lendacky
@ 2017-03-08  7:04   ` Dave Young
  2017-03-17 19:54     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Young @ 2017-03-08  7:04 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/17 at 09:47am, Tom Lendacky wrote:
> Use memremap() to map the setup data.  This simplifies the code and will
> make the appropriate decision as to whether a RAM remapping can be done
> or if a fallback to ioremap_cache() is needed (which includes checking
> PageHighMem).
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/kdebugfs.c |   30 +++++++++++-------------------
>  1 file changed, 11 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
> index bdb83e4..c3d354d 100644
> --- a/arch/x86/kernel/kdebugfs.c
> +++ b/arch/x86/kernel/kdebugfs.c
> @@ -48,17 +48,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
>  
>  	pa = node->paddr + sizeof(struct setup_data) + pos;
>  	pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
> -	if (PageHighMem(pg)) {
> -		p = ioremap_cache(pa, count);
> -		if (!p)
> -			return -ENXIO;
> -	} else
> -		p = __va(pa);
> +	p = memremap(pa, count, MEMREMAP_WB);
> +	if (!p)
> +		return -ENXIO;

-ENOMEM looks better for memremap, ditto for other places..

>  
>  	remain = copy_to_user(user_buf, p, count);
>  
> -	if (PageHighMem(pg))
> -		iounmap(p);
> +	memunmap(p);
>  
>  	if (remain)
>  		return -EFAULT;
> @@ -127,15 +123,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
>  		}
>  
>  		pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
> -		if (PageHighMem(pg)) {
> -			data = ioremap_cache(pa_data, sizeof(*data));
> -			if (!data) {
> -				kfree(node);
> -				error = -ENXIO;
> -				goto err_dir;
> -			}
> -		} else
> -			data = __va(pa_data);
> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
> +		if (!data) {
> +			kfree(node);
> +			error = -ENXIO;
> +			goto err_dir;
> +		}
>  
>  		node->paddr = pa_data;
>  		node->type = data->type;
> @@ -143,8 +136,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
>  		error = create_setup_data_node(d, no, node);
>  		pa_data = data->next;
>  
> -		if (PageHighMem(pg))
> -			iounmap(data);
> +		memunmap(data);
>  		if (error)
>  			goto err_dir;
>  		no++;
> 

Thanks
Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted
  2017-02-16 15:47 ` [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted Tom Lendacky
@ 2017-03-08  7:09   ` Dave Young
  2017-03-17 20:09     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Dave Young @ 2017-03-08  7:09 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 02/16/17 at 09:47am, Tom Lendacky wrote:
> Use memremap() to map the setup data.  This will make the appropriate
> decision as to whether a RAM remapping can be done or if a fallback to
> ioremap_cache() is needed (similar to the setup data debugfs support).
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kernel/ksysfs.c |   27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
> index 4afc67f..d653b3e 100644
> --- a/arch/x86/kernel/ksysfs.c
> +++ b/arch/x86/kernel/ksysfs.c
> @@ -16,6 +16,7 @@
>  #include <linux/stat.h>
>  #include <linux/slab.h>
>  #include <linux/mm.h>
> +#include <linux/io.h>
>  
>  #include <asm/io.h>
>  #include <asm/setup.h>
> @@ -79,12 +80,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
>  			*paddr = pa_data;
>  			return 0;
>  		}
> -		data = ioremap_cache(pa_data, sizeof(*data));
> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>  		if (!data)
>  			return -ENOMEM;
>  
>  		pa_data = data->next;
> -		iounmap(data);
> +		memunmap(data);
>  		i++;
>  	}
>  	return -EINVAL;
> @@ -97,17 +98,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
>  	u64 pa_data = boot_params.hdr.setup_data;
>  
>  	while (pa_data) {
> -		data = ioremap_cache(pa_data, sizeof(*data));
> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>  		if (!data)
>  			return -ENOMEM;
>  		if (nr == i) {
>  			*size = data->len;
> -			iounmap(data);
> +			memunmap(data);
>  			return 0;
>  		}
>  
>  		pa_data = data->next;
> -		iounmap(data);
> +		memunmap(data);
>  		i++;
>  	}
>  	return -EINVAL;
> @@ -127,12 +128,12 @@ static ssize_t type_show(struct kobject *kobj,
>  	ret = get_setup_data_paddr(nr, &paddr);
>  	if (ret)
>  		return ret;
> -	data = ioremap_cache(paddr, sizeof(*data));
> +	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
>  	if (!data)
>  		return -ENOMEM;
>  
>  	ret = sprintf(buf, "0x%x\n", data->type);
> -	iounmap(data);
> +	memunmap(data);
>  	return ret;
>  }
>  
> @@ -154,7 +155,7 @@ static ssize_t setup_data_data_read(struct file *fp,
>  	ret = get_setup_data_paddr(nr, &paddr);
>  	if (ret)
>  		return ret;
> -	data = ioremap_cache(paddr, sizeof(*data));
> +	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
>  	if (!data)
>  		return -ENOMEM;
>  
> @@ -170,15 +171,15 @@ static ssize_t setup_data_data_read(struct file *fp,
>  		goto out;
>  
>  	ret = count;
> -	p = ioremap_cache(paddr + sizeof(*data), data->len);
> +	p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
>  	if (!p) {
>  		ret = -ENOMEM;
>  		goto out;
>  	}
>  	memcpy(buf, p + off, count);
> -	iounmap(p);
> +	memunmap(p);
>  out:
> -	iounmap(data);
> +	memunmap(data);
>  	return ret;
>  }
>  
> @@ -250,13 +251,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
>  	*nr = 0;
>  	while (pa_data) {
>  		*nr += 1;
> -		data = ioremap_cache(pa_data, sizeof(*data));
> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>  		if (!data) {
>  			ret = -ENOMEM;
>  			goto out;
>  		}
>  		pa_data = data->next;
> -		iounmap(data);
> +		memunmap(data);
>  	}
>  
>  out:
> 

It would be better that these cleanup patches are sent separately.

Acked-by: Dave Young <dyoung@redhat.com>

Thanks
Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME
  2017-03-06 17:58         ` Tom Lendacky
  2017-03-06 18:04           ` Tom Lendacky
@ 2017-03-08  8:12           ` Dave Young
  1 sibling, 0 replies; 111+ messages in thread
From: Dave Young @ 2017-03-08  8:12 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Konrad Rzeszutek Wilk, linux-arch, linux-efi, kvm, linux-doc,
	x86, linux-kernel, kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Paolo Bonzini, Brijesh Singh,
	Ingo Molnar, Alexander Potapenko, Andy Lutomirski,
	H. Peter Anvin, Borislav Petkov, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 03/06/17 at 11:58am, Tom Lendacky wrote:
> On 3/1/2017 3:25 AM, Dave Young wrote:
> > Hi Tom,
> 
> Hi Dave,
> 
> > 
> > On 02/17/17 at 10:43am, Tom Lendacky wrote:
> > > On 2/17/2017 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > > > On Thu, Feb 16, 2017 at 09:47:55AM -0600, Tom Lendacky wrote:
> > > > > Provide support so that kexec can be used to boot a kernel when SME is
> > > > > enabled.
> > > > 
> > > > Is the point of kexec and kdump to ehh, dump memory ? But if the
> > > > rest of the memory is encrypted you won't get much, will you?
> > > 
> > > Kexec can be used to reboot a system without going back through BIOS.
> > > So you can use kexec without using kdump.
> > > 
> > > For kdump, just taking a quick look, the option to enable memory
> > > encryption can be provided on the crash kernel command line and then
> > 
> > Is there a simple way to get the SME status? Probably add some sysfs
> > file for this purpose.
> 
> Currently there is not.  I can look at adding something, maybe just the
> sme_me_mask value, which if non-zero, would indicate SME is active.
> 
> > 
> > > crash kernel can would be able to copy the memory decrypted if the
> > > pagetable is set up properly. It looks like currently ioremap_cache()
> > > is used to map the old memory page.  That might be able to be changed
> > > to a memremap() so that the encryption bit is set in the mapping. That
> > > will mean that memory that is not marked encrypted (EFI tables, swiotlb
> > > memory, etc) would not be read correctly.
> > 
> > Manage to store info about those ranges which are not encrypted so that
> > memremap can handle them?
> 
> I can look into whether something can be done in this area. Any input
> you can provide as to what would be the best way/place to store the
> range info so kdump can make use of it, would be greatly appreciated.

Previously to support efi runtime in kexec, I passed some efi
infomation via setup_data, see below userspace kexec-tools commit:
e1ffc9e9a0769e1f54185003102e9bec428b84e8, it was what Boris mentioned
about the setup_data use case for kexec.

Suppose you have successfully tested kexec reboot, so the EFI tables you
mentioned should be those area in old mem for copying /proc/vmcore? If
only EFI tables and swiotlb maybe not worth to passing those stuff
across kexec reboot.

I have more idea about this for now..
> 
> > 
> > > 
> > > > 
> > > > Would it make sense to include some printk to the user if they
> > > > are setting up kdump that they won't get anything out of it?
> > > 
> > > Probably a good idea to add something like that.
> > 
> > It will break kdump functionality, it should be fixed instead of
> > just adding printk to warn user..
> 
> I do want kdump to work. I'll investigate further what can be done in
> this area.

Thanks a lot!

Dave

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption
  2017-03-07 16:05     ` Tom Lendacky
  2017-03-07 17:42       ` Borislav Petkov
@ 2017-03-08 15:05       ` Borislav Petkov
  1 sibling, 0 replies; 111+ messages in thread
From: Borislav Petkov @ 2017-03-08 15:05 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Andrey Ryabinin,
	Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On Tue, Mar 07, 2017 at 10:05:00AM -0600, Tom Lendacky wrote:
> > And then you need to correct the function signature in the
> > !CONFIG_AMD_MEM_ENCRYPT case, at the end of this file, too:
> > 
> > unsigned long __init sme_enable(struct boot_params *bp)		{ return 0; }
> 
> Yup, missed that.  I'll make it match.

Or, you can do this:

unsigned long __init sme_enable(void *boot_data)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
        struct boot_params *bp = boot_data;
        unsigned int eax, ebx, ecx, edx;
        unsigned long cmdline_ptr;

	...

out:
#endif /* CONFIG_AMD_MEM_ENCRYPT */
        return sme_me_mask;
}

and never worry for function headers going out of whack.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 14/28] Add support to access boot related data in the clear
  2017-03-08  6:55   ` Dave Young
@ 2017-03-17 19:50     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-17 19:50 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/8/2017 12:55 AM, Dave Young wrote:
> On 02/16/17 at 09:45am, Tom Lendacky wrote:
> [snip]
>> + * This function determines if an address should be mapped encrypted.
>> + * Boot setup data, EFI data and E820 areas are checked in making this
>> + * determination.
>> + */
>> +static bool memremap_should_map_encrypted(resource_size_t phys_addr,
>> +					  unsigned long size)
>> +{
>> +	/*
>> +	 * SME is not active, return true:
>> +	 *   - For early_memremap_pgprot_adjust(), returning true or false
>> +	 *     results in the same protection value
>> +	 *   - For arch_memremap_do_ram_remap(), returning true will allow
>> +	 *     the RAM remap to occur instead of falling back to ioremap()
>> +	 */
>> +	if (!sme_active())
>> +		return true;
>
> From the function name shouldn't above be return false?

I've re-worked this so that the check is in a different location and
doesn't cause confusion.

>
>> +
>> +	/* Check if the address is part of the setup data */
>> +	if (memremap_is_setup_data(phys_addr, size))
>> +		return false;
>> +
>> +	/* Check if the address is part of EFI boot/runtime data */
>> +	switch (efi_mem_type(phys_addr)) {
>> +	case EFI_BOOT_SERVICES_DATA:
>> +	case EFI_RUNTIME_SERVICES_DATA:
>
> Only these two types needed? I'm not sure about this, just bring up the
> question.

I've re-worked this code so that there is a single EFI routine that
checks boot_params.efi_info.efi_memmap/efi_systab, EFI tables and the
EFI memtype.  As for the EFI memtypes, I believe those are the only
ones required.  Some of the other types will be picked up by the e820
checks (ACPI, NVS, etc.).

Thanks,
Tom

>
>> +		return false;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	/* Check if the address is outside kernel usable area */
>> +	switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) {
>> +	case E820_TYPE_RESERVED:
>> +	case E820_TYPE_ACPI:
>> +	case E820_TYPE_NVS:
>> +	case E820_TYPE_UNUSABLE:
>> +		return false;
>> +	default:
>> +		break;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted
  2017-03-08  7:04   ` Dave Young
@ 2017-03-17 19:54     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-17 19:54 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/8/2017 1:04 AM, Dave Young wrote:
> On 02/16/17 at 09:47am, Tom Lendacky wrote:
>> Use memremap() to map the setup data.  This simplifies the code and will
>> make the appropriate decision as to whether a RAM remapping can be done
>> or if a fallback to ioremap_cache() is needed (which includes checking
>> PageHighMem).
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/kdebugfs.c |   30 +++++++++++-------------------
>>  1 file changed, 11 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
>> index bdb83e4..c3d354d 100644
>> --- a/arch/x86/kernel/kdebugfs.c
>> +++ b/arch/x86/kernel/kdebugfs.c
>> @@ -48,17 +48,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,
>>
>>  	pa = node->paddr + sizeof(struct setup_data) + pos;
>>  	pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
>> -	if (PageHighMem(pg)) {
>> -		p = ioremap_cache(pa, count);
>> -		if (!p)
>> -			return -ENXIO;
>> -	} else
>> -		p = __va(pa);
>> +	p = memremap(pa, count, MEMREMAP_WB);
>> +	if (!p)
>> +		return -ENXIO;
>
> -ENOMEM looks better for memremap, ditto for other places..

Makes sense, I'll change them.

Thanks,
Tom

>
>>
>>  	remain = copy_to_user(user_buf, p, count);
>>
>> -	if (PageHighMem(pg))
>> -		iounmap(p);
>> +	memunmap(p);
>>
>>  	if (remain)
>>  		return -EFAULT;
>> @@ -127,15 +123,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
>>  		}
>>
>>  		pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
>> -		if (PageHighMem(pg)) {
>> -			data = ioremap_cache(pa_data, sizeof(*data));
>> -			if (!data) {
>> -				kfree(node);
>> -				error = -ENXIO;
>> -				goto err_dir;
>> -			}
>> -		} else
>> -			data = __va(pa_data);
>> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>> +		if (!data) {
>> +			kfree(node);
>> +			error = -ENXIO;
>> +			goto err_dir;
>> +		}
>>
>>  		node->paddr = pa_data;
>>  		node->type = data->type;
>> @@ -143,8 +136,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
>>  		error = create_setup_data_node(d, no, node);
>>  		pa_data = data->next;
>>
>> -		if (PageHighMem(pg))
>> -			iounmap(data);
>> +		memunmap(data);
>>  		if (error)
>>  			goto err_dir;
>>  		no++;
>>
>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted
  2017-03-08  7:09   ` Dave Young
@ 2017-03-17 20:09     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-17 20:09 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu, Rik van Riel,
	Radim Krčmář,
	Toshimitsu Kani, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/8/2017 1:09 AM, Dave Young wrote:
> On 02/16/17 at 09:47am, Tom Lendacky wrote:
>> Use memremap() to map the setup data.  This will make the appropriate
>> decision as to whether a RAM remapping can be done or if a fallback to
>> ioremap_cache() is needed (similar to the setup data debugfs support).
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kernel/ksysfs.c |   27 ++++++++++++++-------------
>>  1 file changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
>> index 4afc67f..d653b3e 100644
>> --- a/arch/x86/kernel/ksysfs.c
>> +++ b/arch/x86/kernel/ksysfs.c
>> @@ -16,6 +16,7 @@
>>  #include <linux/stat.h>
>>  #include <linux/slab.h>
>>  #include <linux/mm.h>
>> +#include <linux/io.h>
>>
>>  #include <asm/io.h>
>>  #include <asm/setup.h>
>> @@ -79,12 +80,12 @@ static int get_setup_data_paddr(int nr, u64 *paddr)
>>  			*paddr = pa_data;
>>  			return 0;
>>  		}
>> -		data = ioremap_cache(pa_data, sizeof(*data));
>> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>>  		if (!data)
>>  			return -ENOMEM;
>>
>>  		pa_data = data->next;
>> -		iounmap(data);
>> +		memunmap(data);
>>  		i++;
>>  	}
>>  	return -EINVAL;
>> @@ -97,17 +98,17 @@ static int __init get_setup_data_size(int nr, size_t *size)
>>  	u64 pa_data = boot_params.hdr.setup_data;
>>
>>  	while (pa_data) {
>> -		data = ioremap_cache(pa_data, sizeof(*data));
>> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>>  		if (!data)
>>  			return -ENOMEM;
>>  		if (nr == i) {
>>  			*size = data->len;
>> -			iounmap(data);
>> +			memunmap(data);
>>  			return 0;
>>  		}
>>
>>  		pa_data = data->next;
>> -		iounmap(data);
>> +		memunmap(data);
>>  		i++;
>>  	}
>>  	return -EINVAL;
>> @@ -127,12 +128,12 @@ static ssize_t type_show(struct kobject *kobj,
>>  	ret = get_setup_data_paddr(nr, &paddr);
>>  	if (ret)
>>  		return ret;
>> -	data = ioremap_cache(paddr, sizeof(*data));
>> +	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
>>  	if (!data)
>>  		return -ENOMEM;
>>
>>  	ret = sprintf(buf, "0x%x\n", data->type);
>> -	iounmap(data);
>> +	memunmap(data);
>>  	return ret;
>>  }
>>
>> @@ -154,7 +155,7 @@ static ssize_t setup_data_data_read(struct file *fp,
>>  	ret = get_setup_data_paddr(nr, &paddr);
>>  	if (ret)
>>  		return ret;
>> -	data = ioremap_cache(paddr, sizeof(*data));
>> +	data = memremap(paddr, sizeof(*data), MEMREMAP_WB);
>>  	if (!data)
>>  		return -ENOMEM;
>>
>> @@ -170,15 +171,15 @@ static ssize_t setup_data_data_read(struct file *fp,
>>  		goto out;
>>
>>  	ret = count;
>> -	p = ioremap_cache(paddr + sizeof(*data), data->len);
>> +	p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
>>  	if (!p) {
>>  		ret = -ENOMEM;
>>  		goto out;
>>  	}
>>  	memcpy(buf, p + off, count);
>> -	iounmap(p);
>> +	memunmap(p);
>>  out:
>> -	iounmap(data);
>> +	memunmap(data);
>>  	return ret;
>>  }
>>
>> @@ -250,13 +251,13 @@ static int __init get_setup_data_total_num(u64 pa_data, int *nr)
>>  	*nr = 0;
>>  	while (pa_data) {
>>  		*nr += 1;
>> -		data = ioremap_cache(pa_data, sizeof(*data));
>> +		data = memremap(pa_data, sizeof(*data), MEMREMAP_WB);
>>  		if (!data) {
>>  			ret = -ENOMEM;
>>  			goto out;
>>  		}
>>  		pa_data = data->next;
>> -		iounmap(data);
>> +		memunmap(data);
>>  	}
>>
>>  out:
>>
>
> It would be better that these cleanup patches are sent separately.

Bjorn suggested something along the same line so I've combined all the
changes from ioremap to memremap as a single pre-patch in the series.
I could send them separately if needed.

Thanks,
Tom

>
> Acked-by: Dave Young <dyoung@redhat.com>
>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [RFC PATCH v4 15/28] Add support to access persistent memory in the clear
  2017-02-16 15:45 ` [RFC PATCH v4 15/28] Add support to access persistent memory " Tom Lendacky
@ 2017-03-17 22:58   ` Elliott, Robert (Persistent Memory)
  2017-03-23 21:02     ` Tom Lendacky
  0 siblings, 1 reply; 111+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2017-03-17 22:58 UTC (permalink / raw)
  To: Tom Lendacky, linux-arch, linux-efi, kvm, linux-doc, x86,
	linux-kernel, kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Kani, Toshimitsu, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Tom Lendacky
> Sent: Thursday, February 16, 2017 9:45 AM
> Subject: [RFC PATCH v4 15/28] Add support to access persistent memory in
> the clear
> 
> Persistent memory is expected to persist across reboots. The encryption
> key used by SME will change across reboots which will result in corrupted
> persistent memory.  Persistent memory is handed out by block devices
> through memory remapping functions, so be sure not to map this memory as
> encrypted.

The system might be able to save and restore the correct encryption key for a 
region of persistent memory, in which case it does need to be mapped as
encrypted.

This might deserve a new EFI_MEMORY_ENCRYPTED attribute bit so the
system firmware can communicate that information to the OS (in the
UEFI memory map and the ACPI NFIT SPA Range structures).  It wouldn't
likely ever be added to the E820h table - ACPI 6.1 already obsoleted the
Extended Attribute for AddressRangeNonVolatile.

> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/mm/ioremap.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index b0ff6bc..c6cb921 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -498,6 +498,8 @@ static bool
> memremap_should_map_encrypted(resource_size_t phys_addr,
>  	case E820_TYPE_ACPI:
>  	case E820_TYPE_NVS:
>  	case E820_TYPE_UNUSABLE:
> +	case E820_TYPE_PMEM:
> +	case E820_TYPE_PRAM:
>  		return false;
>  	default:
>  		break;

E820_TYPE_RESERVED is also used to report persistent memory in
some systems (patch 16 adds that for other reasons).

You might want to intercept the persistent memory types in the 
efi_mem_type(phys_addr) switch statement earlier in the function
as well.  https://lkml.org/lkml/2017/3/13/357 recently mentioned that
"in qemu hotpluggable memory isn't put into E820," with the latest 
information only in the UEFI memory map.

Persistent memory can be reported there as:
* EfiReservedMemoryType type with the EFI_MEMORY_NV attribute
* EfiPersistentMemory type with the EFI_MEMORY_NV attribute

Even the UEFI memory map is not authoritative, though.  To really
determine what is in these regions requires parsing the ACPI NFIT
SPA Ranges structures.  Parts of the E820 or UEFI regions could be
reported as volatile there and should thus be encrypted.

---
Robert Elliott, HPE Persistent Memory

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC PATCH v4 15/28] Add support to access persistent memory in the clear
  2017-03-17 22:58   ` Elliott, Robert (Persistent Memory)
@ 2017-03-23 21:02     ` Tom Lendacky
  0 siblings, 0 replies; 111+ messages in thread
From: Tom Lendacky @ 2017-03-23 21:02 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory),
	linux-arch, linux-efi, kvm, linux-doc, x86, linux-kernel,
	kasan-dev, linux-mm, iommu
  Cc: Rik van Riel, Radim Krčmář,
	Kani, Toshimitsu, Arnd Bergmann, Jonathan Corbet, Matt Fleming,
	Michael S. Tsirkin, Joerg Roedel, Konrad Rzeszutek Wilk,
	Paolo Bonzini, Brijesh Singh, Ingo Molnar, Alexander Potapenko,
	Andy Lutomirski, H. Peter Anvin, Borislav Petkov,
	Andrey Ryabinin, Thomas Gleixner, Larry Woodman, Dmitry Vyukov

On 3/17/2017 5:58 PM, Elliott, Robert (Persistent Memory) wrote:
>
>
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>> owner@vger.kernel.org] On Behalf Of Tom Lendacky
>> Sent: Thursday, February 16, 2017 9:45 AM
>> Subject: [RFC PATCH v4 15/28] Add support to access persistent memory in
>> the clear
>>
>> Persistent memory is expected to persist across reboots. The encryption
>> key used by SME will change across reboots which will result in corrupted
>> persistent memory.  Persistent memory is handed out by block devices
>> through memory remapping functions, so be sure not to map this memory as
>> encrypted.
>
> The system might be able to save and restore the correct encryption key for a
> region of persistent memory, in which case it does need to be mapped as
> encrypted.

If the OS could get some indication that BIOS/UEFI has saved and
restored the encryption key, then it could be mapped encrypted.

>
> This might deserve a new EFI_MEMORY_ENCRYPTED attribute bit so the
> system firmware can communicate that information to the OS (in the
> UEFI memory map and the ACPI NFIT SPA Range structures).  It wouldn't
> likely ever be added to the E820h table - ACPI 6.1 already obsoleted the
> Extended Attribute for AddressRangeNonVolatile.

An attribute bit in some form would be a nice way to inform the OS that
the persistent memory can be mapped encrypted.

>
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/mm/ioremap.c |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index b0ff6bc..c6cb921 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -498,6 +498,8 @@ static bool
>> memremap_should_map_encrypted(resource_size_t phys_addr,
>>  	case E820_TYPE_ACPI:
>>  	case E820_TYPE_NVS:
>>  	case E820_TYPE_UNUSABLE:
>> +	case E820_TYPE_PMEM:
>> +	case E820_TYPE_PRAM:
>>  		return false;
>>  	default:
>>  		break;
>
> E820_TYPE_RESERVED is also used to report persistent memory in
> some systems (patch 16 adds that for other reasons).
>
> You might want to intercept the persistent memory types in the
> efi_mem_type(phys_addr) switch statement earlier in the function
> as well.  https://lkml.org/lkml/2017/3/13/357 recently mentioned that
> "in qemu hotpluggable memory isn't put into E820," with the latest
> information only in the UEFI memory map.
>
> Persistent memory can be reported there as:
> * EfiReservedMemoryType type with the EFI_MEMORY_NV attribute
> * EfiPersistentMemory type with the EFI_MEMORY_NV attribute
>
> Even the UEFI memory map is not authoritative, though.  To really
> determine what is in these regions requires parsing the ACPI NFIT
> SPA Ranges structures.  Parts of the E820 or UEFI regions could be
> reported as volatile there and should thus be encrypted.

Thanks for the details on this. I'll take a closer look at this and
update the checks appropriately.

Thanks,
Tom

>
> ---
> Robert Elliott, HPE Persistent Memory
>
>
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2017-03-23 21:03 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16 15:41 [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Tom Lendacky
2017-02-16 15:42 ` [RFC PATCH v4 01/28] x86: Documentation for AMD Secure Memory Encryption (SME) Tom Lendacky
2017-02-16 17:56   ` Borislav Petkov
2017-02-16 19:48     ` Tom Lendacky
2017-02-16 15:42 ` [RFC PATCH v4 02/28] x86: Set the write-protect cache mode for full PAT support Tom Lendacky
2017-02-17 11:07   ` Borislav Petkov
2017-02-17 15:56     ` Tom Lendacky
2017-02-16 15:42 ` [RFC PATCH v4 03/28] x86: Add the Secure Memory Encryption CPU feature Tom Lendacky
2017-02-16 18:13   ` Borislav Petkov
2017-02-16 19:42     ` Tom Lendacky
2017-02-16 20:06       ` Borislav Petkov
2017-02-16 15:42 ` [RFC PATCH v4 04/28] x86: Handle reduction in physical address size with SME Tom Lendacky
2017-02-17 11:04   ` Borislav Petkov
2017-02-16 15:43 ` [RFC PATCH v4 05/28] x86: Add Secure Memory Encryption (SME) support Tom Lendacky
2017-02-17 12:00   ` Borislav Petkov
2017-02-25 15:29   ` Borislav Petkov
2017-02-28 23:01     ` Tom Lendacky
2017-02-16 15:43 ` [RFC PATCH v4 06/28] x86: Add support to enable SME during early boot processing Tom Lendacky
2017-02-20 12:51   ` Borislav Petkov
2017-02-21 14:55     ` Tom Lendacky
2017-02-21 15:10       ` Borislav Petkov
2017-02-16 15:43 ` [RFC PATCH v4 07/28] x86: Provide general kernel support for memory encryption Tom Lendacky
2017-02-20 15:21   ` Borislav Petkov
2017-02-21 17:18     ` Tom Lendacky
2017-02-22 12:08       ` Borislav Petkov
2017-02-20 18:38   ` Borislav Petkov
2017-02-22 16:43     ` Tom Lendacky
2017-02-22 18:13   ` Dave Hansen
2017-02-23 23:12     ` Tom Lendacky
2017-02-22 18:13   ` Dave Hansen
2017-02-16 15:43 ` [RFC PATCH v4 08/28] x86: Extend the early_memremap support with additional attrs Tom Lendacky
2017-02-20 15:43   ` Borislav Petkov
2017-02-22 15:42     ` Tom Lendacky
2017-02-16 15:43 ` [RFC PATCH v4 09/28] x86: Add support for early encryption/decryption of memory Tom Lendacky
2017-02-20 18:22   ` Borislav Petkov
2017-02-22 15:48     ` Tom Lendacky
2017-02-16 15:44 ` [RFC PATCH v4 10/28] x86: Insure that boot memory areas are mapped properly Tom Lendacky
2017-02-20 19:45   ` Borislav Petkov
2017-02-22 18:34     ` Tom Lendacky
2017-02-16 15:44 ` [RFC PATCH v4 11/28] x86: Add support to determine the E820 type of an address Tom Lendacky
2017-02-20 20:09   ` Borislav Petkov
2017-02-28 22:34     ` Tom Lendacky
2017-03-03  9:52       ` Borislav Petkov
2017-02-16 15:44 ` [RFC PATCH v4 12/28] efi: Add an EFI table address match function Tom Lendacky
2017-02-16 15:44 ` [RFC PATCH v4 13/28] efi: Update efi_mem_type() to return defined EFI mem types Tom Lendacky
2017-02-21 12:05   ` Matt Fleming
2017-02-23 17:27     ` Tom Lendacky
2017-02-24  9:57       ` Matt Fleming
2017-02-16 15:45 ` [RFC PATCH v4 14/28] Add support to access boot related data in the clear Tom Lendacky
2017-02-21 15:06   ` Borislav Petkov
2017-02-23 21:34     ` Tom Lendacky
2017-02-24 10:21       ` Borislav Petkov
2017-02-24 15:04         ` Tom Lendacky
2017-02-24 15:22           ` Borislav Petkov
2017-03-08  6:55   ` Dave Young
2017-03-17 19:50     ` Tom Lendacky
2017-02-16 15:45 ` [RFC PATCH v4 15/28] Add support to access persistent memory " Tom Lendacky
2017-03-17 22:58   ` Elliott, Robert (Persistent Memory)
2017-03-23 21:02     ` Tom Lendacky
2017-02-16 15:45 ` [RFC PATCH v4 16/28] x86: Add support for changing memory encryption attribute Tom Lendacky
2017-02-22 18:52   ` Borislav Petkov
2017-02-28 22:46     ` Tom Lendacky
2017-02-16 15:45 ` [RFC PATCH v4 17/28] x86: Decrypt trampoline area if memory encryption is active Tom Lendacky
2017-02-16 15:46 ` [RFC PATCH v4 18/28] x86: DMA support for memory encryption Tom Lendacky
2017-02-25 17:10   ` Borislav Petkov
2017-03-06 17:47     ` Tom Lendacky
2017-02-16 15:46 ` [RFC PATCH v4 19/28] swiotlb: Add warnings for use of bounce buffers with SME Tom Lendacky
2017-02-17 15:59   ` Konrad Rzeszutek Wilk
2017-02-17 16:51     ` Tom Lendacky
2017-03-02 17:01       ` Paolo Bonzini
2017-02-27 17:52   ` Borislav Petkov
2017-02-28 23:19     ` Tom Lendacky
2017-03-01 11:17       ` Borislav Petkov
2017-02-16 15:46 ` [RFC PATCH v4 20/28] iommu/amd: Disable AMD IOMMU if memory encryption is active Tom Lendacky
2017-02-16 15:46 ` [RFC PATCH v4 21/28] x86: Check for memory encryption on the APs Tom Lendacky
2017-02-27 18:17   ` Borislav Petkov
2017-02-28 23:28     ` Tom Lendacky
2017-03-01 11:17       ` Borislav Petkov
2017-02-16 15:47 ` [RFC PATCH v4 22/28] x86: Do not specify encrypted memory for video mappings Tom Lendacky
2017-02-16 15:47 ` [RFC PATCH v4 23/28] x86/kvm: Enable Secure Memory Encryption of nested page tables Tom Lendacky
2017-02-16 15:47 ` [RFC PATCH v4 24/28] x86: Access the setup data through debugfs decrypted Tom Lendacky
2017-03-08  7:04   ` Dave Young
2017-03-17 19:54     ` Tom Lendacky
2017-02-16 15:47 ` [RFC PATCH v4 25/28] x86: Access the setup data through sysfs decrypted Tom Lendacky
2017-03-08  7:09   ` Dave Young
2017-03-17 20:09     ` Tom Lendacky
2017-02-16 15:47 ` [RFC PATCH v4 26/28] x86: Allow kexec to be used with SME Tom Lendacky
2017-02-17 15:57   ` Konrad Rzeszutek Wilk
2017-02-17 16:43     ` Tom Lendacky
2017-03-01  9:25       ` Dave Young
2017-03-01  9:27         ` Dave Young
2017-03-06 17:58         ` Tom Lendacky
2017-03-06 18:04           ` Tom Lendacky
2017-03-08  8:12           ` Dave Young
2017-02-28 10:35   ` Borislav Petkov
2017-03-01 15:36     ` Tom Lendacky
2017-02-16 15:48 ` [RFC PATCH v4 27/28] x86: Add support to encrypt the kernel in-place Tom Lendacky
2017-03-01 17:36   ` Borislav Petkov
2017-03-02 18:30     ` Tom Lendacky
2017-03-02 18:51       ` Borislav Petkov
2017-02-16 15:48 ` [RFC PATCH v4 28/28] x86: Add support to make use of Secure Memory Encryption Tom Lendacky
2017-03-01 18:40   ` Borislav Petkov
2017-03-07 16:05     ` Tom Lendacky
2017-03-07 17:42       ` Borislav Petkov
2017-03-08 15:05       ` Borislav Petkov
2017-02-18 18:12 ` [RFC PATCH v4 00/28] x86: Secure Memory Encryption (AMD) Borislav Petkov
2017-02-21 15:09   ` Tom Lendacky
2017-02-21 17:42   ` Rik van Riel
2017-02-21 17:53     ` Borislav Petkov
2017-03-01  9:17 ` Dave Young
2017-03-01 17:51   ` Tom Lendacky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).